Bruce is the principal field linguist of Iwaidja Inyman with ten years experience documenting and studying Iwaidja.

Sourcing The Crowd In Language Documentation

This talk will focus on the introduction of the concept of ‘crowdsourcing’ or ‘citizen science’ to the language documentation context. People in the remote Indigenous communities of Arnhem Land in the far north of Australia have enthusiastically adopted the mobile phone as their documentation tool of choice, using it to make video and audio recordings of significant and everyday events, including music and dance performances, ceremony, hunting trips, and much else besides.

In response to this, the community-based language team on Croker Island in Northwestern Arnhem Land is developing easy-to-operate smartphone apps which allow people with little or no experience to record, annotate and upload language data and metadata in the form of audio, video, images, and text. The process involves no preparation and allows users of the apps to take advantage of the spontaneous opportunities for data collection which frequently arise, but which are often missed in the context of traditional fieldwork tools and methods.

Many people who could make valuable contributions to the documentation of a language remain untapped resources in typical documentation scenarios which require the presence of a linguist with recording equipment, working at pre-arranged times often with a tiny percentage of potential ‘language consultants’ involved. A key aim of our project is to facilitate the involvement of large numbers of native speakers of all ages in the documentation process without the need for difficult-to-attain levels of literacy and computer literacy.

The system we have developed may be broken up into three stages:

  •  data collection and upload using smartphone apps
  • data curation (moderation, editing, archiving, export-import, publishing) by a community-based team using a web interface
  • publication of curated data to apps as updates

The talk will include a demonstration of some of the software developed by the Minjilang Endangered Languages Publication project (publishing as Iwaidja Inyman) for the collection and publication of data in Iwaidja, a highly endangered language of Northwestern Arnhem Land, Northern Territory, Australia.

Crowdsourcing tools in the following areas are currently in use or in development:

  • dictionary building
  • phrase books
  • sign language (including video capture)
  • spoken transcription and translation of archival texts
  • interpreting tools
  • upload of on-board video, audio, and images, with spoken metadata