The Penn Treebank Project has produced semantic and syntactic annotations of naturally-occuring text for the Wall Street Journal, Brown, ATIS and Switchboard Corpora. The annotations produced by the Treebank project were published by [#LDC LDC]. Treebank has two query languages: tgrep (at LDC-Online) and CorpusSearch. The principle advantage of tgrep is its speed, and of CorpusSearch is its ability to pipeline queries together. Chris Brew has recently developed an extensible visualisation tool to aid treebank exploration, called TreeStyle. See also the NEGRA Corpus. Douglas Rohde has developed a more powerful version of tgrep called tgrep2. Treebanks for other languages are in development, including: German, Turkish, Polish, Czech, Portuguese, Bulgarian, Chinese, …

http://www.cis.upenn.edu/%7Etreebank/home.html

Facebooktwittergoogle_pluslinkedin