Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks.NLTK includes the following software modules (~120k lines of Python code):Corpus readers interfaces to many corpora Tokenizers whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter Stemmers Porter, Lancaster, regexp Taggers regexp, n-gram, backoff, Brill, HMM, TnTChunkers regexp, n-gram, named-entityParsers recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, … Semantic interpretation untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface WordNet WordNet interface, lexical relations, similarity, interactive browser Classifiers decision tree, maximum entropy, naive Bayes, Weka interface, megam Clusterers expectation maximization, agglomerative, k-means Metrics accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlationEstimation uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell Miscellaneous unification, chatbots, many utilities NLTK-Contrib (less mature) categorial grammar (Lambek, CCG), finite-state automata, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT interface, inter-annotator agreement

TLA-team: Usefull toolkit for performing many NLP actions. Easy to use and easy to incorporate in other systems.

http://www.nltk.org

Facebooktwittergoogle_pluslinkedin