by Herman Stehouwer
Is there a need to limit certain aspects of statistical language models?
Is it necessary to pre-limit the size of the n-gram?
Is it useful to use linguistic annotation, within alternative sequence selection tasks?
According to a new study by Herman Stehouwer, the size of the n-gram can be completely flexible depending on the situation. The study also finds that the addition of certain linguistic annotations, specifically part-of-speech annotations and dependency-parses, did not aid the model in making decisions.
The study compares the ability of a language model to select the correct alternative from sets of alternatives in hundreds of experiments. These experiments where performed for three different alternative sequence selection tasks, for four different annotations (and also for no annotation), and for four different ways to combine the annotation with the text. The results of the study have been used to write the thesis “Statistical Language Models for Alternative Sequence Selection”. This thesis will be defended on the 7th of December at 18:00 in the Aula of Tilburg University.
Coinciding with the defense a colloquium on language modeling is organized with invited talks by Colin de la Higuera, Louis ten Bosch, and Antal van den Bosch. For more information on the colloquium you can send an e-mail to herman.stehouwer [at] mpi.nl or look at its website.