LiveDocs: tips for moving bilingual content to a TM

Language variant restrictions and how to get around them

Jul 16, 2024

In these two previous articles, I talked about various processes that involved moving bilingual content from a LiveDocs corpus to a translation memory:

Making a project TM from bilingual LiveDocs content

Kevin Lossner

June 12, 2024

Read full story

LiveDocs corpus curation and master memories

Kevin Lossner

June 13, 2024

Read full story

But there were a few possibly important details I left out of those articles.

Bilingual content in a LiveDocs corpus can have many origins. It might come from

alignments of monolingual source and target language documents from prior translations,
translations sent from a memoQ Translations files list on Project home to a LiveDocs corpus attached to that project,
imported bilingual content in *.tmx files,
XLIFF files of some kind, or
some other imported bilingual file format.

All of this bilingual corpus acts much like a translation memory, and can offer match results in the translation and editing grid or be used in pretranslation, for example.

But in some circumstances (described in the linked articles above) you may want to use that bilingual content in a translation memory. For me, this is particularly the case where I have huge corpora and want to do a leverage analysis or pretranslation, which can be very, very slow, because the indexing system used in LiveDocs has not been overhauled for speed (yet), unlike the various TM engines. In those cases, I’ll keep my corpora for looking up the document context of a Translation results or Concordance hit (via the right-click context menu), but I’ll not use them for the Statistics run or the pretranslation.

But if you want to transfer bilingual content from a corpus to a translation memory for these reasons, or perhaps to share it with a partner who uses a different CAT tool that can only accept *.tmx files for use in matching and pretranslation, it’s important to know where the transfer can occur!

As of memoQ version 11.0.17 (the newest build available today), no direct export of TMX is possible for bilingual corpus content. (This is on the wish list at the memoQ Ideas Portal, PUBL-I-535, so go there and upvote it!) So to get TMX, you must first send the content to a translation memory and then export to TMX from there.

This transfer to a TM is currently possible only in an open project. You can’t do it from the Resource Console, for example. The language settings for those TMs must also match the project settings exactly. No slack for sublanguages.

If the languages match, you can actually send the bilingual content to more than one TM (which all must be attached to the open project), as shown in the screenshot above. And this works with both “classic” memoQ TMs and the new TM+ format.

Here are the TMs involved in the example:

The content was successfully transferred to all four TMs, even the reverse language pair reference TMs! Two of the TMs used in this test were in the old format, two were TM+.

But if the sublanguages don’t match the main project, you cannot send the content to a TM attached to the project.

*No attached TMs match the sublanguages of the selected corpus document, though the main languages and direction do match*

If I wanted to send the bilingual content of the selected document in the screenshot above to a translation memory, I would have to do that in a different project with matching sublanguages.

If you want to consolidate content for different sublanguages or even different language directions, that is possible in a single translation memory, from which a single *.tmx file can also be exported for sharing.

First you must transfer all the corpus content you want to compatible TMs
Then export the content of each TM to a *.tmx file
Finally, import all of the TMX files to the TM for consolidation
Then if you want a consolidated *.tmx file to share, you can export it from the “consolidated TM”. You might also think about using that consolidated TM to make a project TM in the Statistics dialog, but currently to do that, you’ll have to take care not to use a TM+ repository, because there’s a bug preventing that which still needs to be fixed as of memoQ version 11.0.17.

Is that a lot of steps? You betcha! If you want to do this in a simpler way, take that to the memoQ Ideas Portal and support the proposal for that, maybe adding your own thoughts in the idea’s comments. And just for good measure, send a gripe mail to support@memoQ.com!

As a test, I made four TMX exports from four TMs with different sublanguages and directions:

I imported these to a TM with Canadian English as the source language and Austrian German as the target language:

Those little warning icons are because the sublanguages don’t match in any case.

And the import worked fine, so from there I was able to export a consolidated TMX with EN-CA as the source language and DE-AT as the target language, and the operation worked successfully with a TM+ format translation memory.

memoQuickies Substack

Making a project TM from bilingual LiveDocs content

LiveDocs corpus curation and master memories

Discussion about this post