Chris Biemann is professor at the university of Darmstadt
Structure Discovery in Natural Language – unsupervised language-independent methods
In this seminar, I will talk about the Structure Discovery Paradigm, which
is a framework to find regularities in text material of an arbitrary
language, and to make these explicit in the data to be used in further
processing. After recapitulating work on language separation, unsupervised
part-of-speech tagging and word sense induction, the concept of
two-dimensional text is introduced, which can be utilized for semantic
matching for text similarity and word sense disambiguation.
These unsupervised, knowledge-free methods are especially valuable in
situations where one does not have NLP components for the target language
or domain, yet enough unlabeled data to induce the regularities.