In this project the TLA group collaborates with Fraunhofer institutes IAIS and HHI and language and gesture researchers of the University of Cologne and the German Sport University Cologne. The work is focused on developing tools for the analysis of speech and gesture data in the context of analyzing multi-verb phrases.

The project is coordinated by the University of Cologne. This group is responsible for providing the project partners with field recordings and accompanying transcriptions and annotations collected within several DoBeS language-documentation projects. On the basis of these multimodal corpora, the team also defines exemplary search patterns and intended results for the development of audio (and video) search algorithms by the technical partners and tests and evaluates the quality, effectiveness, and usability of the implemented search algorithms.

The research topic, which provides the concrete research context for the tool development, pertains to the notion of ‘event’ and is especially concerned with the boundaries of events and their linguistic manifestations. Here we are looking for correlations between morpho-syntactic construction units (such as serial verb constructions) and prosodic and gestural units in cross-linguistic perspective.

The linguistics department at the University of Cologne has a strong tradition in research on linguistic typology and language universals as well as in the documentation of lesser-known languages. Areas of specialization of the team members include Austronesian and Papuan languages, the analysis of prosody and intonation, and the analysis of complex verbal constructions, such as serial-verb constructions.

NERUOGES-Elan has been applied successfully on several sets of data in order to investigate cognitive, emotional and interactional correlates of hand movement behaviour by researchers at the Department of Neurology, Psychosomatic Medicine and Psychiatry at the German Sport University Cologne. The analysis according to NEUROGES-Elan relies on kinesic features of the hand movements and its categories are based on underlying neuropsychological processes. Consequently, it allows providing information on co-speech gestures without linguistic or cultural bias. Further, it thus serves as a good basis for automatised hand movement detection and segmentation.

The Language Archive team of the Max Planck Institute for Psycholinguistics will bring their expertise on audio and video archiving and development of tools for researchers in the humanities. The ELAN annotation tool will serve as the platform for execution of any algorithms developed under this collaboration, as well as the user interface for viewing and using their results. With the experience gained in the AVATecH project we will also work on the usability of the developed tools, namely on providing the optimal level of functionality and all necessary technical back-end, like the network infrastructure or computational power.

The Fraunhofer IAIS Institute brings their expertise in audio and video analysis. Based on experience gained in AVATecH we will focus on tools that will help the linguists in the project to answer their research questions much more efficiently. The query-by-example technology will be adapted to handle linguistic examples given by recordings of syllable sequences and intonation sequences to make large corpora of speech data searchable by linguistically meaningful queries. Language independent alignment aims at aligning the words in a given text with an audio recording of the same text, i.e., this technology will help to find out which word has been spoken at which time. The main difficulty we will be handling is the fact that for many languages there is too little data to incorporate a statistical model, such as given by a speech recognizer, into this problem. Therefore, we are developing methods that are language independent and do not use any knowledge about a specific language but rather try to match the structure of text and audio. Speech alignment example

The annotation approaches based on video analysis aim at speeding up the analysis of relations between speech and human motion, i.e. gestures. The automatic analysis allows processing of large databases that have not been properly analyzed due to time consuming manual annotation. Hence, plenty of material has not yet been processed at all. The application of automatic video analysis methods has not been performed in psycho-linguistic to significant extent. The Fraunhofer Heinrich Hertz Institut (HHI) is responsible for further developments of such techniques and for providing the necessary interfaces and output to the humanities researchers. Based on technologies developed in AVATecH, the video analysis will be further extended towards specific research questions resulting from our partners in humanities research. In the figure below, current hand and head tracking result is shown. The red rectangles visualize the gesture space according to a definition by McNeill (1992).

Video analysis example

The AUVIS infrastructure is compatible with technologies developed in the AVATecH project. You can download configuration files for the ELAN annotation software to connect to our server or request a dongle to run the software locally. AUVIS uses an updated version of the AVATecH protocol: Those who want to develop AUVIS compatible software should use the updated AVATecH Component Interface Specification Manual as technical reference.

For users, the HHI video recognizer documentation will be of interest. Note that for using the pre-configured webservice, it is not necessary to install the recognizer itself on your computer, so users with good internet connectivity can ignore that section.

The project is funded by the German Federal Ministry of Education and Research and will run for three years (July 2012 – June 2015).

