History of the TLA

Since the foundation of the MPI for Psycholinguistics in the 1970ies, its Technical Group (TG) has been concerned with providing a state-of-the-art infrastructure for research. Earlier than many other institutions in the humanities, the MPI-PL started to engage in computer-based research, supported by the TG. For instance reaction times in psychological experiments were measured, requiring high precision. Today, advanced eye-tracking devices, virtual reality labs and even those producing neurological images are part of the institute’s infrastructure.

Already early in the 1990ies, the TG under its head Peter Wittenburg realized the need to provide lasting access to research data, for later verification or even for re-use in the context of other research projects. For instance, the many hours of videotaped interactions in first and second language acquisition settings are a valuable data source, allowing researchers to apply multiple methodologies fit for answering diverse scientific questions. Also, members of the Language and Cognition group regularly returned from field trips to remote areas of the world bringing with them valuable recordings documenting unique and often hitherto unknown cultural and linguistic features.

Given the perishable nature of analogue and digital carriers, it became clear that only a centralized digital data repository would allow applying measures needed for guaranteeing the availability of the research data in the long term. One of the necessary requirements for long-term preservation is good metadata, which is one reason why the TG participated in the ISLE project started in 2000.

A burst of momentum of these activities of the TG came with the first call for projects aiming at documenting endangered languages, the DOBES programme by the German Volkswagenstiftung, in 1999. With its background in building digital data repositories and its experience with valuable data from field research, the TG was in a perfect position for serving as the central archive of the DOBES initiative which would over the years accumulate terabytes of priceless data on more than 60 languages worldwide. In fact, the TG has been much more than that – it has been the technical centre of the DOBES programme, and started to push standardization and to develop software needed for field linguistics, besides metadata related tools and web-based services allowing the users to build and manage their data collections.

The result of these on-going activities is the “Language Archiving Technology” (LAT) suite of tools and web-applications/services which comprises, among other tools, ELAN, a multimedia annotation tool which is now widely used not only in field research but also for gesture and sign language studies and beyond. The development and maintenance of LAT was financed by a growing number of external research projects, DOBES being one important source among others. The more and better tools and infrastructure components were developed, the more the group grew, and with more expertise, the TG was again capable of attracting more challenging projects and to extend and broaden their activities to an international scale – a self-reinforcing process. Also the archive began to attract datasets originating from other contexts, such as the Corpus of Spoken Dutch, or the complete recordings made by the German Ethnologist Eibl-Eibesfeld.

However, with the eminent end of the DOBES programme (for more than a decade a very reliable and stable source of funding), it became clear that the archive, the tool development and the expertise achieved by the TG could not rely entirely on short-term project funding (although this will continue to play an important role). A more sustainable solution was needed to guarantee the continuation of this data archive and the related work, both of which have become important for a large community in the emerging “digital humanities”, far beyond the institute itself. As for the pure bit-stream preservation of research data, in 2005 a long-term guarantee had been given by the Max-Planck-Gesellschaft for replicas stored at the computer centers – over 50-years, much more than most comparable institutions have. Still, long-term availability requires much more, at least the continuation of the administration of the archive and the maintenance and further development of the core tools needed for its functioning.

Luckily, three major funding bodies (BBAW, KNAW and MPG) recognized the importance of the language archive and related activities. In an exemplary international cooperation they agreed to give it a more secure basis, providing funds for employing about 7 core members. Thus in 2011, the Technical Group split up into a core group (which continues to provide the basic IT and research infrastructure and to support the specific research activities at the MPI) and a new unit called “The Language Archive” (TLA).

A flyer was created in 2011, which can be viewed here: MPI-DOBES-4-Seiter-2011-10-04.pdf (3.8 MB)