Toolbox --> ELAN import, data structure

I’m about to start a large fieldwork project, and I’m trying to plan my work-flow. I want to use toolbox for transcription and interlinearization, and then import into ELAN.

I’ve done a few tests, and I’ve run into two problems:

  • The way I understand the default toolbox data structure for texts is that each new text is an \id, and each sentence/utterance in the the text is a \ref. This doesn’t work well for importing into ELAN. What happens at the moment is that each \id produces an annotation, and all of the \refs for that \id are concatenated together on a dependent tier. I want to be able to import just the text for a particular video file, with each \ref corresponding to a different annotation in ELAN. What would be a better way to structure my toolbox project for importing into ELAN? And what changes will I need to make to the import process?

  • I need to include participant information in my transcription. I tried including this in toolbox, using \ELANParticipant markers, but this didn’t behave as expected: the participant names were concatenated, and each tier was given the @BobHarry suffix. I want Bob and Harry to have their own tiers (with associated tiers, like gloss and free translation). Again I think this is a problem with data structure.

One possibility I have considered is make each new sentence an \id. This is undesirable because I want to be able to view and edit the whole text at once, and not have to flick through records while reading and editing.

Many thanks for any suggestions! And apologies if I am missing something basic, I am new to both pieces of software.

Gus

Concerning your first problem:
it would be best if you could make the \ref line the Record Marker in Toolbox. I don’t know if you have to get rid of the \id marker completely or if there are other ways to achieve that (I’m not a Toolbox expert). Alternatively, you could leave the Toolbox structure as it is but then import it with a so-called Marker file. The import into ELAN allows you to specify the structure you want to have in ELAN, e.g. \ref as the record marker etc. You can store the setup in a .mkr file for successive imports. This is less trivial than import with a .typ file, please look at the Setting Field Markers paragraph of this section of the manual.

The second problem I cannot explain, assuming you have only one \ELANParticipant marker per record.

The best approach might be to setup the structure in Toolbox as complete as possible but without ELAN specific markers, then create one or two records in Toolbox, import the file into ELAN, maybe change the ref@unknown in ref@Bob, export the file to Toolbox format (making sure that \ref is the record marker) and open it in Toolbox. That should add the special ELAN markers to the Toolbox Database Type. Then try with a few more records, with participant names and see if an import-export-import round trip works well.

-Han

Thanks for your help! Unfortunately I’m even more lost than I was before. When I try to create a mkr file, I get ‘unknown error’ or nullPointerExceptions. Is there any other documentation for using mkr files to import, other than what is contained in the link below?

http://www.mpi.nl/corpus/html/elan/ch01s04s02.html

I’m afraid that’s all the documentation there is. It more or less assumes that the user knows the structure of her/his Toolbox project and already knows how the structure should translate to ELAN tiers (including what (stereo)type each tier should have). For this the other paragraphs in the manual concerning Toolbox import and export might be useful, but it will still be difficult for anyone who is not too familiar with Toolbox and/or ELAN.
I could try to advise more on the basis of details from NullPointerExceptions etc. but maybe it will be more productive if you could send your Toolbox file and the project’s database type file (so a .txt or .tbt and a .typ file) and maybe also your .mkr file to me, so that I can advise on how to get an import/export roundtrip working (han.sloetjes AT mpi.nl)?