CLARIN – or the Common Language Resources and Technology Infrastructure – is a digital infrastructure which provides access to a broad range of language data and tools to support research in the humanities and social sciences, and beyond. CLARIN provides access to multimodal digital language data (text, audio, video) and advanced tools with which to explore, analyse or combine these datasets.
Within CLARIN are legal corpora which contain legislation, legal acts, transcriptions of court decisions, and other kinds of materials related to national or supernational law. There is also a parliamentary corpora resource family.
Many languages can be found in the infrastructure: Bosnian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hungarian, Icelandic, Italian, Latvian, Lithuanian, Modern Greek (1453-), Norwegian, Polish, Portuguese, Russian, Serbian, Slovenian, Spanish, Swedish, Turkish, Ukrainian… (I probably forgot some!).
This short video presents CLARIN really well.
Most of the corpora are richly annotated both linguistically and in other ways (such as speaker roles in the case of courtroom proceedings, e.g., judge, defendant, prosecutor, etc.).
Such corpora are an important resource for anyone who practises or researches law and/or political science, to name but two, and they can be used to investigate issues such as legal phraseology and terminology, variation in legal discourse, legal translation, register and genre perspectives on legal discourse, legal discourse in forensic contexts, and evaluative language in judicial settings (see for example Goźdź-Roszkowski 2021).