Home Forums Software ELAN JSON to CSV – issue of word time

JSON to CSV – issue of word time

Tagged: , ,

This topic contains 6 replies, has 2 voices, and was last updated by  ximena 4 weeks, 1 day ago.

Viewing 7 posts - 1 through 7 (of 7 total)
Author Posts
Author Posts
October 23, 2017 at 18:34 #11718

ximena

Hello, I have a JSON file from a speech to text automatic transcription made in a service called speechmatics. This will help me to go through my transcription in ELAN. I am importing a converted file from JSON to CSV, so the CSV into ELAN. I selected the columns which relate to words, duration of each word, and beginning of time of the speech. ELAN brings me all words with a 1000 milliseconds duration, even if I haven’t click the box that sets the default to 1000 milliseconds. I have tried many times and I don’t know what else to try. I would appreciate any support. Thanks! Ximena

October 24, 2017 at 13:09 #11719

Han

Difficult to say. Could it be the time format of the duration column is different from the begin time column and not recognized? After the import, are there relevant messages in the log (View->View Log…)?

Otherwise (don’t know how that would work out) maybe you can paste a few lines of the csv here in a reply?

October 24, 2017 at 16:48 #11720

ximena

Hi Han, many thanks for your reply!

Yes I am pasting few lines of the cvs with the columns that exist there. I am using only three columns: “words_duration” (which I set as Duration), “words_name” (which I set as Annotation),”words_time” (which I set as Begin Time) :

“job__lang”,”job__user_id”,”job__name”,”job__duration”,”job__created_at”,”job__id”,”speakers__duration”,”speakers__confidence”,”speakers__name”,”speakers__time”,”words__duration”,”words__confidence”,”words__name”,”words__time”,”format”
“es”,”29036″,”FragmentTest.wav”,”65″,”Thu Oct 5 10:23:42 2017″,”4856513″,”63.77″,”null”,”F1″,”2.04″,”0.39″,”1.000″,”Y”,”2.04″,”1.0″
“”,””,””,””,””,””,””,””,””,””,”0.60″,”1.000″,”entonces”,”2.43″,””
“”,””,””,””,””,””,””,””,””,””,”0.26″,”1.000″,”claro”,”3.03″,””
“”,””,””,””,””,””,””,””,””,””,”0.28″,”0.720″,”cuando”,”3.40″,””
“”,””,””,””,””,””,””,””,””,””,”0.24″,”0.560″,”el”,”3.76″,””
“”,””,””,””,””,””,””,””,””,””,”0.49″,”0.530″,”me”,”4.13″,””
“”,””,””,””,””,””,””,””,””,””,”0.58″,”0.980″,”comentó”,”4.68″,””
“”,””,””,””,””,””,””,””,””,””,”0.27″,”1.000″,”la”,”5.43″,””
“”,””,””,””,””,””,””,””,””,””,”0.63″,”0.990″,”cuestión”,”5.70″,””

October 25, 2017 at 13:55 #11721

Han

I’m not sure if it is a result of the copying and pasting (maybe it wasn’t a good idea) but when I copy your lines and paste in a text editor, I get 3 different, non-standard quotation marks ‘”’ etc. Even Excel doesn’t recognize them and doesn’t remove the quotation marks when opening the file (or when pasting in Excel).
But apart from that, the ELAN importer expects tab delimited text (but also supports , or space) but does not remove quotation marks. So, the time values are not properly parsed, duration nor begin time.
So, it would be best if the JSON converter removes all quotation marks, or if you do that in a second step, before importing into ELAN.

October 25, 2017 at 15:17 #11722

ximena

Hi Han, Yes, I can see it, probably it is the result of copying and pasting it. Yes, I transformed the JSON file into CVS, to be imported to ELAN. ELAN understands the columns very well. Here I am sending the link to two screenshots for you to see:
Screenshots

October 26, 2017 at 00:42 #11727

Han

Yes, the columns are recognized all right (because of the delimiter, the comma or tab), but the double quotation marks are not removed by ELAN, as both screenshots show. After import, each word/each annotation is (still) surrounded by ” marks. Begin time and duration won’t be parsed correctly this way. The ” characters must be removed (e.g via Find + Replace).

October 26, 2017 at 09:22 #11728

ximena

Many thanks for looking at that Han! You were right, I removed the ” marks, imported again and it worked!!!! Thank you very very much!. All my best wishes, Ximena

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.