LiveDocs corpora: accessing reading context in translation memories.

Get more from customer TMs and your own past work

Jun 11, 2024

TMX File Extension | Associated Programs | Free Online Tools - FileProInfo

Many years ago, memoQ enabled *.tmx files to be imported as “translation documents” (making the translation and editing grid into a highly effective tool — now regex-capable — for revising translation memories), and these TMX files could also be transferred to LiveDocs after cleanup or imported directly as reference documents. It’s the last case — TMX as a bilingual reference document that is the subject of the present discussion. People think about TMX as a translation memory thing (well, the extension is an abbreviation of Translation Memory eXchange), but it can be so much more.

Take a look at some of the TMX files stored in the example corpus you can download at the bottom of this article to see how these appear in memoQ (you’ll have to import the export file to your own LiveDocs corpus). The examples are from the collection of EU legislation available as bitexts from the Directorate-General for Translation (DGT), and too many people try to work with these data as massive, multi-million translation unit memories that tend to slow working translation environments to a snail’s pace. But inside all those ZIP files you can download from the DGT site are the individual legislative documents as TMX files bearing the CELEX number (unique identifier) of the document. In a translation memory, this really doesn’t matter, as all the individual translation units are devoid of context, and in those megaton monster TMs, a concordance search is often so overwhelmed with hits that one is inclined to say fuck it, I’ll just grab what I see in one of the first results here.

But in a memoQ LiveDocs corpus, the individual documents are maintained separately, and if they are opened (by right-clicking a hit in the Translation results or accessing the document via the corpus on the LiveDocs page of a project to open a separate, readable reference tab), all the content can be read in both languages of the pair, arranged side-by-side in columns. The DGT TMX data contains no context markers, so imported straight to a translation memory, you might get 100% matches, but never context matches. You get that using the data in a LiveDocs corpus.

In the same way, if you have another TMX export from a translation memory where the segments usually reflect the order of the text in the document, importing that TMX to a LiveDocs corpus can allow you to read that text in the order of the translation units in the file. This is guaranteed with master translation memories where content is written only after the translation review phases are completed, unlike working translation memories where the order of the content might show that the translator skipped around a lot.

When I receive translation memories from a client, instead of importing these to a TM, I usually put them in LiveDocs, which has enabled me to read matches or concordance hits in context, understand the reference more clearly, and produce a better translation.

My original blog article on this subject (with a few screenshots) can be read here.

memoQuickies Substack

LiveDocs corpora: accessing reading context in translation memories.

Get more from customer TMs and your own past work