by Peter Wittenburg & Wolfgang Klein

The digital era changed a few characteristics of data management fundamentally. For widely persistent carriers such as the old clay tablets or even for some papyrus rolls it was obvious that they survived thousands of years and still contain the information the creators wanted to convene. Already for analogue electro-magnetic storage media that were introduced during the last century it became obvious that the life time of carriers is very limited and we realized that every copying activity was bound to a decrease in quality. As a consequence of this a UNESCO survey found out that about 80% of the material on cultures and languages in the ethno-linguistic domain are highly endangered. It was good practice to store master tapes in air-conditioned Faraday cages, however, for most of the recordings this was impossible and the implicit “don’t touch” policy created a logistical problem aside from the cost aspect, since the old players were not around anymore after a few years.

The digital area in turn changed the challenges again, since (a) copying is comparatively easy and if done carefully does not lead to a quality decrease and (b) it is just a matter of principle that the stored material needs to be touched regularly to do migrations of the carriers, of the formats and the encodings to maintain interpretability. Digital holdings are inherently dynamic and need a 2-tier framework for life-cycle management: (1) data centers that take care of bit-stream preservation and (2) community centers that know about format and encoding principles. The worldwide debate about the loss of our scientific and cultural memory which is being carried out worldwide gives an impression about the urgency of the lifecycle management problem.

This was the background for the Max Planck Society and the MPI for Psycholinguistics to establish a new unit with the name “The Language Archive (TLA)” to take care of the long-term preservation of the huge treasure which is enclosed in its large digital archive and which has been created in a wide range of initiatives and sub-disciplines. As prominent examples we would like to mention the resources about language studies from MPI researchers, the archive about endangered languages created by the DOBES program and the digital human-ethological archive from Eibl-Eibesfeldt. A plan has been submitted for 25 years of persistence of such a unit to offer the necessary services of a digital archive such as deposit, access, searching, visualization and preservation and beyond these to look after a number of critical characteristics such as integrity, authenticity, usability, discoverability and interoperability. TLA will carry out this task in collaboration with the two big computer centres of the Max-Planck-Society which will focus on bit-stream preservation and in future also on giving access to the material following the agreed principles.

To fulfill its mission TLA will have archiving experts who know about metadata, formats, standards and encoding principles as they are used in our domain and who can deploy curation strategies, software experts who can maintain the existing code base and develop new functionality and system experts who will interact with the storage system managers of the MPI and the computer centres to take care of the bit-stream preservation and proper security. We see digital archiving with its many facets as a networking task as well, i.e. we will participate in relevant collaborations to be able to apply state-of-the-art methods. One such network is the worldwide network of regional centres for language material which will be supported in the future.

Due to the proven Language Archiving Technology (LAT) software-suite which has been developed during the last decades the archive can be open for any serious language and cultural material which is of relevance for researchers. Based on open legal and ethical rules, material can be deposited and accessed via the web using a variety of tools. The archive will continue to participate in national and European projects to maintain the existing software and to provide new advanced functionality, and to establish professional research infrastructures that will improve data lifecycle management and the access to language and cultural material.

TLA will start its operation formally at 1. September 2010 lead by Wolfgang Klein and Peter Wittenburg.