by Dieter van Uytvanck

On Friday May 27, about 25 persons gathered in the Max Planck Institute in Nijmegen to attend a workshop on the practical use of the Component Metadata Infrastructure (CMDI) for the description of language resources. CMDI is the metadata part of CLARIN, a European initiative to create a Common Language Resources Infrastructure

After a short introduction about metadata in general and a history sketch, the concepts behind CMDI were introduced: The core ideas behind the new metadata format are modularity, reusability, and the use of data categories. A special session was dedicated to the use of ISOcat, the reference implementation of a data category registry. The idea behind this is to have a dependable definition of what is meant with a data category as, for example, Part of Speech. This way it doesn’t matter how you call or spell it in your particular metadata schema, the connection to similar schemata is always clear.

After these more general introductions, the specific CMDI software was presented.

First the Component Registry was shown. It is a web application that can be used for inspecting, searching, creating and editing CMDI metadata components. Afterwards it was illustrated how to create CMDI metadata files using a version of Arbil that has been modified to directly interact with the Component Registry. Both Arbil and the Component Registry are developed by the Max Planck Institute for Psycholinguistics and were presented by their respective developers. Although both applications are still in a development state it was clear that they can already be used now for the production of CMDI metadata.

All slides of the presentations can be downloaded from the CLARIN NL website.

More information about CMDI, including links to the software so you can try it out yourself, can be found on the main CLARIN site.