API documentation

Discontinuous Data-Oriented Parsing (disco-dop).

Main components:

  • A parser for Probalistic Linear Context-Free Rewriting Systems (LCFRS), as well as Probabilistic Context-Free Grammars (PCFG).
  • Facilities to extract and parse with tree fragments using data-oriented parsing (DOP) grammars.

Python modules

cli Command-line interfaces to modules.
demos Examples of various formalisms encoded in LCFRS grammars.
eval Evaluation of (discontinuous) parse trees.
fragments Extract recurring tree fragments from constituency treebanks.
functiontags Function tags classifier.
gen Generate random sentences with an LCFRS.
grammar Assorted functions to read off grammars from treebanks.
heads Functions related to finding the linguistic head of a constituent.
lexicon Add rules to handle unknown words and smooth lexical probabilities.
parser This is an interface to Python’s internal parser.
punctuation Punctuation related functions.
runexp
tree Various Tree objects for representing syntax or morphological trees.
treebank Read and write treebanks.
treebanktransforms Treebank transformations.
treedist Tree edit distance implementations.
treesearch
treetransforms Treebank-indenpendent tree transformations.
util Misc code to avoid cyclic imports.

Cython modules

_fragments Fragment extraction with tree kernels.
bit Functions for working with bitvectors.
coarsetofine Project selected items from a chart to corresponding items in next grammar.
containers Data types for chart items, edges, &c.
disambiguation Disambiguate parse forests with various methods for parse selection.
estimates Computation of outside estimates for best-first or A* parsing.
kbest Extract the k-best derivations from a probabilistic parse forest.
pcfg CKY parser for Probabilistic Context-Free Grammar (PCFG).
plcfrs Parser for string-rewriting Linear Context-Free Rewriting Systems.

Indices and tables