by Przemek Lenkiewicz

The AVATecH project is an interesting initiative of the Max Planck Gesellschaft and Fraunhofer Gesellschaft. It aims at developing solutions that would allow creation of automated annotation for media recorded by linguistic researchers, therefore it has been seen as something highly desired and the expectations are high.

The project has recently passed two very important milestones. The first one has happened in November, when the AVATecH Expert Workshop took place. For two days the participants of the project have interacted with each other and with the potential users of their solutions, in order to present what is the status of the development and integration of their work and to get feedback and further suggestions from the linguists. Also experts from different fields have been present (audio/video processing, gesture and sign language research, field researchers) to see the status of work and to get an idea about what can be soon available for their purposes. Naturally they contributed numerous valuable comments.

After the status of work has been presented and suggestions have been gathered, all the project participants have worked on their solutions and another important point of the project has been reached, which was to deliver the first automated annotation functionality to the ELAN tool and make it available for Max Planck researchers. This functionality covers these initial possibilities:

  • The audio part aims at providing some functionality that takes place in major part of the annotations. This would be: detecting how many persons are speaking in the audio recording and create appropriate number of tiers; detect who is speaking when and create annotations for that at appropriate parts of the recording; align the recording with transcription from a text file.
  • The video part provides the following functionality: detecting shots and subshots in the recording; creating representative keyframes for given shots the subshots; estimating the color ranges that represent human skin in the recording; tracing the position of hands and head of the speaker. Further functionality will be built on top of the last mentioned recognizer, namely the position of the hands and head will be taken into account and together with time information they will serve to estimate the speed of hands movement, their relation to each other and to the speaker’s body, etc.

The MPI team is currently working on integrating these features with ELAN and providing manuals for researchers on how to use them.