Since May 2009 MPI is an official partner of large-scale CLARIN pan-European project. This initiative is aimed at making already existing language resources and technologies available for the whole European Humanities and Social Sciences communities. For more detail the reader can check the project’s web-portal http://www.clarin.eu/  .

The variety of CLARIN subprojects includes Virtual Language Observatory (VLO) and Federated Content Search (FCS).  The Language Archive MPI participates in both of them as a member of CLARIN-Germany, see http://de.clarin.eu/ .

Federated Content Search is a service that allows a scholar to look simultaneously for text resources available at several research institutions. More precisely, every participating institution called an “endpoint” makes a number of corpora with text data, e.g. text annotations, available for FCS. An example of a simple search request is a request to find occurrences of a certain word in the available FCS resources. Other and more complex requests are defined by SRU/CQL standard, see http://www.loc.gov/standards/sru/specs/cql.html .

The centralized Aggregator, which is located at the University of Tübingen, dispatches a request to all the endpoints. Every endpoint has its own software that handles requests. The services proposed by different institutions may slightly differ, however all the endpoints must handle requests that follow (a subset of) SRU/CQL standard. Moreover, at the Institute für Deutsche Sprache a special Java library has been developed that provides common interfaces and methods for all the endpoints. It is desirable that an endpoint bases its FCS software on this library.

The Language Archive’s “CQLSearch” Java software does use this library. As for the core functionality, that is search, we deploy our own powerful library that allows to search through annotations. The library is successfully used in TROVA annotation search interface (check http://corpus1.mpi.nl/ds/trova/search.jsp)  and ELAN annotation search, (check http://www.mpi.nl/corpus/html/elan/ch07s05.html). In fact the library seems to be more powerful than FCS will ever need.

Federated Content Search is still a project under development.  Complex requests, such as ones defined by logical formulae with AND and OR connectives, are still to be implemented.  Both, the search- and the interface-, libraries are regularly updated, so does the CQLSearch software that connects both of them.

At the end aggregated FCS should serve as “an appetizer for scholars”. It does not provide all the advanced functionality that a specialized tool would provide.  Instead, it should give the scholar a general picture on where (s)he can find terms of interests. This will allow to zoom-in on the next steps using specialized tools of FCS participants.

Facebooktwittergoogle_pluslinkedin