Any sentence parsing tools?

Hi all,

I’m currently using ELAN to code a spoken corpus which is being built. I hope to add POS as a tier but have been struggling with finding a good parser. I don’t know how to use python so if there will be any online or computer-end (Mac) parser that would be great.

Another issue I’m having a hard time with is whether ELAN will allow POS produced by the parser to be imported as a tier?

A huge “thank you” to all of you who are reading this post or helping me.

Best wishes,
Wenbo

Importing POS tags into ELAN will probably only be useful if the generated POS file somehow contains the time stamps of the input annotations, otherwise it will be hard to link the POS annotations to the correct word or morph annotations. If the POS file has those timestamps then it probably has to be converted to a .csv file to be imported in ELAN.

What language is this about? Maybe there is a parser for that language available as a WebLicht service? Then you could trigger the parser directly from within ELAN?

-Han

HI Han,

Thanks for the reply! The language in question is English. I will try to use a POS generator first. Do you have any recommendation?

Thanks!

Wenbo

I can’t recommend any POS tagger, especially since I don’t know if there is any that retains time codes.
If I would recommend anything at all, I would advise to first try a POS tagger from within ELAN, via Options->Web Services->WebLicht, select to upload a tier, specify sentence or word/token and select one of the POS taggers from the list (e.g. the Berkeley Parser or the OpenNLP POS Tagger). That would be the least effort, if it works.

Hallo Han! we met some years ago in Bolzano when you came for a workshop on ELAN, a software which I (and my colleagues) use very much and appreciate more and more.
I have a question regarding WebServices: for some reason the WebLicht service hasn’t worked any longer in the last year. I have Elan 6.0 and I try to use it for Italian but in any case the program comes to a stop much earlier than any selection is possible. I wish it can work again because it would be the easiste way to have a POS tagging and a lemmatizing process from within ELAN!
Thanks for helping!
Silvia

Hi Silvia! That must have been five years ago! It was a nice workshop.

I’ve tested the WebLicht service connection every now and then and it always still seemed to work, but now I can reproduce your problem. Looking into the log of ELAN, I notice there are so called OutOfMemory errors, but no matter how much I extend the available memory for ELAN, the error remains.
I see a message there are 575 services now, but I don’t assume that number is the reason for the error, something else must be going wrong.
I’ll have to look into this and try to debug the process. I’ll try to get back to this soon.

Best,
Han

Hi Silvia,

I had a look at the code and it appears there is a programming error in the code that produces the user interface, the list of available services (the program goes into an endless loop).
This will be fixed in the next release, but if you want to use it now (or soon), I can see if I can make available a fix for the main library. I forgot which operating system you are working on?

Oh thank you very much!!! I use windows. We can wait until the next release, but the sooner the better!
Silvia

Well, in case you want to try, I uploaded a modified ELAN library, elan-6.0.jar.
If you have ELAN 6.0 installed somewhere, you’ll find a folder “app” in the install folder and in that “app” folder a file “elan-6.0.jar”. If you move or copy this existing one to another folder and then replace it with the new one, you can try if Weblicht works again. (I hope the new .jar file still fits with the rest of the installed libraries, otherwise it’ll have to wait until the next release.)

1 Like

thanks very much! I’ll have a try and let you know!
Silvia