I typically manage time representations in translation projects with a lot of fine-tuned regular expressions, in the form of
filtering regexes,
find & replace regexes,
auto-translation rules for the working grid and QA profile use, and
custom regexes on the Regex tab of QA profiles (though rarely, because I find the auto-translation rules more useful in most cases).
None of these approaches require you to have expertise in regular expressions if your memoQ work environment is well-organized. In fact, I’m generally opposed to the advice that translators and reviewers learn regex syntax.
Expressions for filtering, find and replace and many of the elements which can be applied in the other cases, once acquired by whatever means, can be stored in the memoQ Regex Assistant library (in version 9.9 or later):
Once stored in the library, these expressions can be selected in a way not unlike a normal menu command in the memoQ application and used in an appropriate filtering field or in a find & replace dialog.
Auto-translation rules are unique to memoQ — there is nothing like them now as a standard feature in any other translation workspace tools. They are essential to my translation and review work, offering insertable “hits” in the Translation results list and visual warnings when translated text fails to follow the “rules”. And they are part of my automated routine of QA checks.
A few weeks ago, we looked at auto-translation rules for dates, with many downloadable examples of auto-translation rules for taking (mostly) long dates in English and converting these to a great variety of target languages.
Today I want to explore with you some of the considerations for similar rules involving time expressions. Actually constructing these time rules will be the subject of another post. Before you start building rules or having them built, it is usually a good idea to see how time expressions actually occur in your translation texts.
Whenever I write articles like this, I have to confront the challenge of examples. What language pair or pairs should I present? In what direction? It’s always easier for us to follow an example and understand its relevance if it involves the languages we typically work with, and the farther the example strays from those, the harder it becomes to focus and extract something meaningful for one’s own work. And time expressions may involve some issues that can be a little more complicated than those relevant to the date expressions discussed previously. I’ll present examples for English, but similar considerations are necessary in most cases for other languages in source or target texts.
When I began creating rules for converting or checking time expressions from one language to another, they were mostly examples I used for teaching summer CAT tool classes at a local university. I didn’t see a lot of need for these in my own translation work, because I didn’t typically work on texts that included a lot of schedules or business hours information, opening and closing times, etc. as one who translates a lot of tourist or event information might find.
However, as a consultant I was exposed to these very kinds of texts, and especially the frequent problems created by machine translation processes, which often translate time expressions inconsistently or even wrongly (converting “1 p.m.” equivalent in German – 13 Uhr – to “7 p.m.” in one bizarre case I’ll never forget), or by Projects from Hell where a large team of wordworkers labors to grind out linguistic sausage at a pace that hardly allows for adequate checking by the sleep-deprived, excessively stressed project participants.
Such cases screamed the need for regex-based tools to take the burden off human teams and allow them to focus their energies on language quality, not important minutiae like date and time formats.
When dealing with time expressions, one of the first points to consider is which clock applies to the source language and which to the target language. In most cases (with military communications being an exception), English texts use a 12-hour clock. Unless otherwise clear from context, time expressed in English typically includes a marker for ante meridiem or post meridiem, Latin for before or after mid-day. Written in a variety of ways: a.m./p.m., A.M./P.M., am/pm, AM/PM with variations that may involve spaces and possibly with other collocative expressions like “in the morning”, “in the afternoon”, etc.
If you are about to embark on developing rules for converting or checking time expressions, it’s a good idea to understand how these appear in the texts you’ll work with. A ruleset covering all possible cases might involve a lot more work than is necessary, and one which omits frequently occurring forms is likely to get you into trouble if you apply it with confidence to the wrong text.
How does one check such things? Looking at a representative corpus of texts from the same text creators is a good place to start. If a translation memory exists with a lot of past work, exporting a TMX and importing it into a project as a “translation document” for a little filtering and study may prove especially helpful.
An expression like (one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|\d+.?\d*)\s?(o.clock)?\s?in the (morning|afternoon|evening)
can also be used to check for “unusual” time expressions and see that they are rendered correctly in the other language:
When I screened the same data repository for a.m./p.m. time expressions using \d{1,2}(.\d{2})?\s(?i)[ap]\.?\s?m\.?\b
I saw other time-related expressions I hadn’t considered:
Little studies like this are often quite helpful before you sit down to build an auto-translation ruleset for time expressions. And, if you are a memoQ user of version 9.9 or later, it’s helpful to build a little Regex Assistant library of screening expressions like the examples shown above (or better ones) to use for quick checks using the source and target text filters in the working grid.
Note: when working with large quantities of data (hundreds of thousands to millions of TUs), you might encounter memory issues with large regular expressions for filtering. These appear as “no results”. If this happens, try simplifying your expressions, or use a smaller dataset.
Style guides with which you work may require other special formatting, such as non-breaking spaces between the numerical part of a time expression and other markers such as “AM/PM” in English or “Uhr” in German. These can be checked with filtering expressions, or in auto-translation rulesets and QA profiles, and problems can be fixed easily with regex find & replace expressions or auto-translation rules. There are some examples of this shown in the Regex Assistant video on YouTube in the link near the top of this article.
Try using some of the examples presented here in your own texts. If the expressions seem useful, save them in your own Regex Assistant library as shown in the video.
In some follow-up posts I’ll present some specific solutions for filtering, QA and auto-translation rules in languages with which I have some confidence. If you have specific needs related to your working languages, I welcome these questions in comments or private messages, and I’ll do my best to answer them in a way you can use productively in your own work.
A note regarding other CAT tools
With the exception of auto-translation rules, nearly all the things I discuss regarding filtering regexes, find & replace and QA checks can be applied to other translation workspace tools, such as Phrase, Trados Studio, Wordfast, etc. And if you have a good collection of useful expressions in memoQ, it is very easy to export your library in a form that makes it easy to find and use these in other environments. But that’s a subject for another time!