This post, in my mini-series of posts entitled ‘What exactly is…”, will try to give an overview of Corpus Linguistics and hopefully pique your interest to find out more.
First of all, a definition: a corpus is a collection of texts, often used to study language. These days, corpora are generally held electronically – access is much faster and analysis can be more powerful.
I love this resource, with fast and immediate results, made available by Leipzig University’s Department of Computer Science. At present 158 languages or sub-languages have been included. The texts making up the databases are general and not specific to law.
When you enter a word, you are presented with significant co-occurrences, as well as left and right neighbours of the word, with their frequencies, and two graphical presentations – a kind of spider’s web showing related words that can be clicked on and explored.
Try it out and let me know what you think!
A few weeks ago, I told you about a dictionary on steroids (see here). Today’s post is about a multilingual semantic map and thesaurus on steroids! It’s called the Sketch Engine. At present it can be used in 42 languages.
The Sketch Engine is an awesome tool. It is extremely useful for everyone who manipulates words and needs ideas.
Here are just a few examples of how it can be used:
– To create brand names: The Most Powerful Naming Tool I’ve Ever Used
– To help translators looking for collocations (the words that ‘sound right’ together)
– To give inspiration to lawyers when wording their pleadings
– To help academics when writing papers or theses
– To help journalists and authors get around ‘writer’s block’
– For non-native speakers of a language to check which words are used together and how