The next lecture in the Nijmegen e-Humanities Lectures presented by TLA will be
given the 5th of june by Antal van den Bosch. He will present on Big Language Data.

Who: Antal van den Bosch, Centre for Language Studies, Radboud University Nijmegen
What: Big Language Data
Where: MPI for Psycholinguistics, room 1.63 (main lecture hall)
When: Tuesday 5th of June, 14:30


Digitized written language can be scooped up at will from the internet and exploited
for science. Even without any explicit linguistic annotation the language data
itself can directly be used for practical purposes such as spelling correction, text
completion, and if parallel text in two languages can be found, for machine
translation. Zipf’s law ensures that when you have more data, results will be better
(log-linearly). In fact many of the best natural language processing systems are
based on data only, plus the power of sophisticated stochastic methods. I’ll argue
that there is a less sophisticated class of methods based on analogical reasoning
that produces the same impressive results. I’ll discuss the linguistic
interestingness of this idea using centenary concepts such as Hermann Paul’s
Analogiebildung and De Saussure’s quatrième proportionelle.

