Home Forums Software ELAN creating multiple .eaf clips with custom begin and end times

creating multiple .eaf clips with custom begin and end times

This topic contains 4 replies, has 2 voices, and was last updated by  Han 14 hours, 19 minutes ago.

Viewing 5 posts - 1 through 5 (of 5 total)
Author Posts
Author Posts
March 11, 2019 at 20:12 #12799

elsb

Hello,

I’m trying to create multiple clips with custom begin and end times. I have a whole bunch of .eaf files, each of which contains a particular word of interest. These are all long videos (e.g. an hour each), and what I’d like to end up with is a set of short clips containing only the (approximately) 30 seconds surrounding the word of interest (i.e. a span starting 15 seconds prior to the token and ending 15 s after). I’ve approached this so far in two ways but have encountered problems with both.

1) “export multiple files as” + “create multiple media clips”

Using the instructions at: https://www.mpi.nl/corpus/html/elan/ch01s09s02.html I created a .csv file listing all the annotations in all the videos. This seems to work fine. I went to create multiple media clips and called the .csv file. A process report is generated, but the clipped files are nowhere to be seen! This is the first problem; however, I’m not sure that the “export multiple files as” + “create multiple media clips” method will actually be able to solve my problem, as I am ultimately looking for clips longer than a single annotation. (My plan was that I could eventually edit the .csv file to reflect my desired durations and begin and end times, and then re-run “create multiple media clips”, but I’m not sure this is even a possibility.)

2) pympi package using “extract” function:

I’ve tried to create my own script in Python (to be specific, I had programmer friends do it) using the pympi package http://dopefishh.github.io/pympi/Elan.html. This script calls a .csv file that has the file paths and desired begin and end times of the clips. However, there seem to be compatibility issues there, as I use the most up-to-date version of ELAN and the package is for ELAN 2. I’ve had no luck finding another Python package with similar functionality.

If you have any advice on how to solve the problem, using either of these possible solutions or in any other way, I would appreciate it very much! I can clip the files by hand but I think it will take 40 hours…

Thank you!

Emily

March 11, 2019 at 21:10 #12800

elsb

UPDATE:
Re: (1), above. The reason why the clipped files did not save is because I was calling a .csv file, not a .txt file. Using .txt solved the problem. With that solved, I edited the .txt file to reflect my desired times and was able to use “create multiple media clips” to create a set of clips of the desired length. However, what I *really* need to end up with is a set of .eaf files (I need the existing annotations) not a set of media files. So, progress, but unfortunately still stuck!
Thanks,
Emily

March 12, 2019 at 10:52 #12801

Han

Hello Emily, good that you managed to solve problem (1).

As you have noticed, there is no function named something like “multiple file save selection as eaf with media clips using tab-delimited text input”. And maybe I should add “with tier selection” or similar, because maybe you want to be able to specify which tiers to include? Or do you always want to export all annotations on all tiers in the selected 30 sec. of the media? When it comes to functionality provided by ELAN, I’m afraid you won’t get around quite some manual work.

(2) I’m not that familiar with the pympi code, but I would be surprised if changes in the EAF format would be the reason for failure of the script. It’s not impossible, but modifications to the EAF format have been fairly minimal in the past few years. Are there any meaningful messages produced by pympi as to why the export doesn’t work?

-Han

March 19, 2019 at 19:10 #12809

elsb

Hi Han,

Thank you for your reply!

Regarding (1), I’m for the moment just trying to save all tiers within a selection – though it would also be useful to have the option to just select a few!

Regarding (2), I’ve been in touch with Michael, who wrote the code. His reply his below.

When calling extract(start,end) as follows:

in_eaf = pympi.Elan.Eaf(in_file)
out_eaf = in_eaf2.extract(3642760, 3942760)

pympi first posts a warning:

Parsing unknown version of ELAN spec… This could result in errors…
Parsing unknown version of ELAN spec… This could result in errors…

and then crashes with a ValueError (except 3 arguments, got 4. With some probing, the error comes when pympi calls the function .get_annotation_data_for_tier(t), in the extract() method. In the extract method, there’s for loop initialized as

for ab, ae, value in eaf_out.get_annotation_data_for_tier(t):

I’ve added some print statements to demonstrate the issue. Pympi is only designed to handle tuples made of three items (from the get_annotation_data_for_tier call) and crashes when there are four. Examples from our .eaf file look like this (these are output directly from the print statements in the extract() method):

With four values:
(474476, 479025, ‘in very high voice’, ‘Mommy Mommy Mommy Mommy Mommy .’)
(862635, 864966, ‘* * *’, ‘xxx yyy yyy .’)

With three values:
(None, 199, ‘hi , it’s xxx .’)
(3078791, 3078991, ‘yes , Elmo can’t wait to xxx .’)

Any thoughts why .eaf files in the older version only had tuples of three values, but the current .eaf files have sometimes three and sometimes four?

——

Many thanks for any thoughts on this,
-Emily

March 26, 2019 at 13:50 #12812

Han

Hi Emily, sorry for the delay.

I’m not quite sure if this has anything to do with .eaf file versions, but I may get the situation wrong.
Looking at the code (as it is on GitHub) I understand that get_annotation_data_for_tier) returns 3 values for time-alignable tiers and 4 values for reference tiers (get_ref_annotation_data_for_tier).
You can print out both the 3-value tuples and the 4-value tuples, so it’s not clear to me where the script crashes. (Where I must add that I don’t know from the top of my head whether Python requires the number of parameters in the for loop to be equal to the number of values in the tuples.)

  • This reply was modified 14 hours, 16 minutes ago by  Han.
  • This reply was modified 14 hours, 15 minutes ago by  Han.
Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.