From WikiLeaks to MT-leaks?

On Sunday I had a “Duh” moment when I saw an article entitled “Free machine translation can leak data“, published in TCWorld, a magazine for international information management. I think I can safely say that this doesn’t take any legal translator by surprise (!), nor indeed any other reader of this blog.

The article is authored by Don DePalma at Common Sense Advisory (CSA), a market research firm that examines translation, localization, interpreting, globalization, and internationalization.

It discusses new EU regulations that will fine companies for data breaches, and warns of the dangers of widespread unconscious leaks of sensitive data through the use of Google Translate, Microsoft Bing Translator (and other similar systems), open Wi-Fi, and email.

A recent survey of localization managers carried out by CSA is cited, giving some figures on the extensive use of free machine translation by company employees and suppliers:

“they might translate e-mails, text messages, project proposals, legal contracts, merger and acquisition documents, and other sensitive content“. [my emphasis]

The article talks about two ways in which corporate data is leaked: firstly when in transit, via web-based services, the cloud, unencrypted connections, or open Wi-Fi; and secondly by machine translation (MT) sites sharing the data after the user has submitted content.

Possible initial solutions are offered: opting out according to the terms of the MT service (with questionable effectiveness); subscribing to proprietary ‘closed’ MT systems; and secure web transmission.

However, it underlines the fact that translation suppliers may not comply with such “safe MT” usage.

Depressingly, the article states “Competition and price pressure being what they are, there is nothing you can do to prevent linguists from using free MT as an efficiency tool. […] market forces require suppliers to use everything they can to be competitive.”

In conclusion, some suggestions are to: “lock down content workflows” by obliging those involved in translation to work within a secure hosted environment; anoymize outgoing MT requests using special software (e.g. Lingosec, CipherCloud); or redirecting MT requests to controlled software.

Whilst I laud the article for bringing this issue to the attention of those that are unaware, perhaps we could ensure that the “blame” for the leaks is equally shared… The focus seems to me to be on breaches by linguists. I wonder what percentage of MT users are linguists vs non-linguists? (!)

Read the full article here.

Credit: Many thanks to Catherine Christaki over at LinguaGreca for the heads-up.