Calculating rate of intraword switching

christinecozien · 18 January 2018 12:51

Hello I’m working with code-switched, multilingual data and I would like to calculate the percentage of intraword switching in my data.

I have a tier in which the transcript is divided into individual words, called ‘mixed by word’. Three further child tiers annotate the parts of each individual word by language. These are ‘morpheme break English’, ‘Morpheme Break Afrikaans’, ‘Morpheme Break Dialect’.

How, for example, can I ask ELAN to count every annotation in parent tier ‘mixed by word’ which also contain any annotations on the child tiers ‘morpheme break English’ and ‘Morpheme break Afrikaans’? (I realise I’ll have to run 4 queries to exhaust all the possible intraword switching possibilities, but once I have the formula I can alter it as needed)

From the manual I understand that the expression \W stands for ‘any’. See my attempt at the query below. ELAN brings up a result, but not the correct result. In the end the result is showing me intrasentential switches, not intraword switches (and I know these exist in the transcript).

FIND
An annotation on tier “mixed by word FA” that matches regular expression \W
WITH CONSTRAINT
An annotation on tier “mb Eng FA” that matches regular expression \W in distance of -X to +X Mixed by Word FA annotations
WITH CONSTRAINT
An annotation on tier “mb Afr FA” that matches regular expression \W in distance of -X to +X Mixed by Word FA annotations

I wonder if my stereotypes are affecting the result? ‘morpheme break’ tiers have the stereotype symbolic subdivision, because I wanted to be able to have gaps between annotations (because of intraword switches). The parent tier ‘mixed by word’ has the stereotype time subdivision, and is also the child of the primary parent tier which is a mixed transcription, stereotype Default.

I hope I’ve explained relatively well, while keeping the post short. I tried to insert a printscreen, but unfortunately doesn’t seem possible.

Thanks,
Christine

hasloe · 19 January 2018 23:49

Hello Christine,
That sounds like an error in the manual, can you point out where in the manual that explanation of \W is given? Then we can fix it.
In Appendix A: Regular Expression the description of \W is “A non-word character” where a word character is defined as “A word character: [a-zA-Z_0-9]”.
If you want to find annotations with any content you could use the regular expression “.+” (without quotation marks). Any character one or more times.

I would guess the structure of your tiers and their types does not prevent to find correct results for your query (but I don’t have a file with similar structure at hand at the moment).

Nevertheless: did you have a look at the structured multiple file search too?
The Multiple Layer Search tab offers similar options with (maybe) a more intuitive user interface. And you can search in multiple files at a time.

If you want to show a screenshot, you could upload it to some other server and copy the URL in your post?

-Han