Collaboration or exploitation?

Something rather controversial for you today. The video presentation below discusses book digitization, use of crowd resources, and translation by non-professionals concurrently with language learning.

I think that the lawyers reading this will have plenty to say about various legal issues here, not to mention translators’ opinions.

The presenter, Luis von Ahn, is an associate professor of Computer Science at Carnegie Mellon University, and is at the forefront of the crowdsourcing craze. He “builds systems that combine humans and computers to solve large-scale problems that neither can solve alone”.

As far as I see it, some of the debatable points raised are the use of humans (without them being made aware of it) to assist in the optical character recognition (OCR) of books that may subsequently be sold, translation by using the crowd (and, what’s more, non-native speakers) as opposed to professionals, and the effectiveness of learning a language through translation.

On the other hand, we can, of course, take a positive view – the provision of free open access to books and language learning to all, in particular the poor, and the use of time instead of money as remuneration for services.

Despite the fact that Von Ahn claims to be aiming to “leverage the crowd for human good”, it is possibly of some concern that he mentions “monetizing” translations. Clearly another example of the paradox that is the Internet with its conflicting potentials.

Do write in with a comment below and let me know your thoughts!

14 thoughts on “Collaboration or exploitation?

  1. I watched that presentation a few days ago and his remark on monetizing also caught my attention. Then I posed the next scenario: People who do anything else other than translate may not care, but what will the next step? What is stopping them from crowdsourcing other tasks? Or you think even for a second that you can’t crowdsource the planning of a bridge, the defense of a case or the diagnosis of a patient?

    • I did not get replies where I posed that question, but in general it is just like Martin Niemöller described. As harsh as it may sound, the situation reminds me of something I heard once: “You can never count on people to care about the problems of others. They will, however, always deeply invest in their own.”

  2. To me the presentation of von Ahn was very amusing and his ideas seem brilliant per se at first but his calculations are very much simplified and he does not take into consideration numerous things, for instance: whether people agree to contribute to the digitalisation for free or would claim some kind of remuneration and how it would be organized in that case, the fact that the source texts in Wikipedia may be changed any time and therefore the free translations would not correspond them any more – this would make Wikipedia even more unreliable than it is already. Also working for free would have to be agreed to by the language learners/translators, at least they would have to be informed about it. And who would be responsible for possible mistakes? Indeed many questions arise about crowdsourcing in this way. At least the texts would have to be proofread by a professional who is able to check the consistency and grammar rules…

    • Wikipedia stands alone in every language it is available. I mean, an article in one language is not seen as a translation of the same article in another language. Yes, the source text may change, and even the translation may change (since these translated articles would still be modifiable as that’s the nature of Wikipedia), so what? There will always be someone to translate the newer version.

      As regards informing the translators they are working for free and agreeing to contribute to the digitalisation, again, for free, these contributors (hamsters of translation) are already aware of that. The premise of Duolingo is “learning while you work”, so you don’t pay (money) to learn, but offer your work instaed. Gosh, it all sounds so swell.

      And what if the translations contain errors? You already said Wikipedia is not reliable.

      • What comes to translating Wikipedia articles I meant that it would be impossible to keep track on dozens and hunderds of updates in different languages and modify them in the translations. The translations were by the way the idea of the presentor, not mine…
        Ok the fee part seems to be organized then.
        “Trans errors” referred to the whole idea of the presentor to translate the whole web and not just W. Also I think the idea of W. is of course to be quite a reliable source and therefore translations done by foreign language learners would increase the probability of errors. That would be a pity.

  3. I apologise if I sounded a little “abrasive” since that was not my intention.

    I agree that it would be impossible to keep track, which is why I made a point about Wikipedia as a stand alone. The initial translation would make the information accessible to others and from there it would function as any other article and would be modified as such. Will Wikipedia in English be more up to date all the time? Perhaps, but that doesn’t change the fact that there’s much more information available. Moreover, once every so often they could easily set out to re-translate it and use a system of matches and fuzzies and so forth.

    I don’t really have an answer for the aspect of editting and proofreading. I do remember von Ahn talked about more experienced translators handling more complex pieces of text. I guess the translations will go through several people before they are actually published.

    What scares me the most about all this is that it seems perfectly doable…

  4. By the way I wrote “whether people agree to contribute to the digitalisation for free” so this did not refer to translating by lang learners but to the captchas digitalized by anybody.

  5. Just to chip in to your lively and interesting discussion – have you seen this tool called Manypedia? It’s a lovely little application which allows you to “compare” different language versions of Wikipedia on a given theme. Here I have compared English & Finnish on crowdsourcing (!)|en|Crowd-sourcing|fi

    (You need to hover over the top of the page to see the language options and text entry box appear.)

    I think it might be interesting if you wanted to lift out a few key terms in both languages, although of course it does not constitute a valid parallel corpus as Marcelo rightly pointed out.

  6. “In theory, I get what the creator of reCAPTCHA, Luis von Ahn, is trying to do. He claims that he was motivated to come up with this system so that millions and millions of human brain cycles were not wasted on fruitless quizzes (estimates are 100 million CAPTCHAs every day), but essentially what he did was come up with a way to exploit and profit from the masses. He claims that he has research that shows that reCAPTCHA does not take any longer than CAPTCHA, but anyone who has ever done a reCAPTCHA vs a CAPTCHA knows better. I literally just went to the reCAPTCHA website to get some screen shots for this article and this reCAPTCHA test popped up:”

