Text encoding importing Toolbox lexicon (ELAN 5.4 Mac)

Hi all,

I was importing a toolbox lexicon (.txt, utf-8), but the IPA characters weren’t converted correctly in the .xml produced by ELAN. Converting the source lexicon to utf-16 prior to importing didn’t work either. Has anyone experienced this issue? Any workarounds?

Thanks!

Hi,
Thanks for reporting this, I can reproduce the problem. It appears that in the release the import assumes a system default encoding. The source code for the import function allows for the encoding to be set or selected by the user, but the current import window doesn’t have a drop down menu to specify the encoding yet. And instead of “utf-8” as a default, the releases version has no encoding set (leaving it to the system’s default encoding).

I’ve uploaded a “jar” library with a quick fix, setting utf-8 as the default. Here is the link to this lexiconcomponent-1.5.jar library. If you download it, you can replace the original file with the same name by this one. It is located in a subfolder of the ELAN app folder. Choose Show Package Content from the context menu of the .app folder and then navigate to Contents/Java.
I hope this works.

-Han

Works perfectly thank you!