Due to the release of new data, the translation memories made available by the Directorate-General for Translation at the European Commission Joint Research Centre have tripled in size.
I have tweeted this, but as I’m sure many of you will be interested, I’m posting it on the blog too.
The translation memories are parallel texts of the entire body of European legislation, comprising all the treaties, regulations and directives adopted by the European Union (EU), in 22 languages: Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish.
You can download them freely from this page – beware it will take quite a while 🙂
The instructions for extracting the memories are on the same page. Whereas the first version included documents published up to 2006, the current files go up to 2010 – just to make things really clear (!) the data goes to 2010, the files are called DGT-TM-2011 and they were released in April 2012!!
Thanks hugely for this tip, will try a download when I have a spare afternoon! But might these TMs be so vast as to be unmanageable? Would be interested in others’ experiences. S
No Sue, they’re fine in my experience when used with Trados for example. Does depend on computer oomph though, as some many things do these days. But my 5-year old Mac copes!
Actually, Juliette, the data set has increased by more than a factor of three in some languages. I think the last published set had some 300,000 TUs for Dutch/English, and now there are nearly two million. I wonder what I’ll see for German.
Thanks for this information Kevin. I guess the tripling they mention has been calculated overall.
Update: Figures for the changes in the TMs can be found on page 3 of this article: http://langtech.jrc.it/Documents/2012_LREC_DGT-TM_Final.pdf
Pingback: Stop press – EU translation memories updated *again*! « From Words to Deeds: translation & the law
Pingback: Most read posts 2012 « From Words to Deeds: translation & the law
Pingback: What exactly is corpus linguistics? | From Words to Deeds: translation & the law
Pingback: Weekly favorites (Apr 23-29) | Lingua Greca Translations