API documentation

Discontinuous Data-Oriented Parsing (disco-dop).

Main components:

  • A parser for Probalistic Linear Context-Free Rewriting Systems (LCFRS), as well as Probabilistic Context-Free Grammars (PCFG).
  • Facilities to extract and parse with tree fragments using data-oriented parsing (DOP) grammars.

Python modules

cli Command-line interfaces to modules.
demos Examples of various formalisms encoded in LCFRS grammars.
eval Evaluation of (discontinuous) parse trees.
fragments Extract recurring tree fragments from constituency treebanks.
functiontags Function tags classifier.
gen Generate random sentences with an LCFRS.
grammar Assorted functions to read off grammars from treebanks.
heads Functions related to finding the linguistic head of a constituent.
lexicon Add rules to handle unknown words and smooth lexical probabilities.
parser Parser object that performs coarse-to-fine and postprocessing.
punctuation Punctuation related functions.
runexp Run an experiment given a parameter file.
tree Various Tree objects for representing syntax or morphological trees.
treebank Read and write treebanks.
treebanktransforms Treebank transformations.
treedist Tree edit distance implementations.
treesearch Objects for searching through collections of trees.
treetransforms Treebank-indenpendent tree transformations.
util Misc code to avoid cyclic imports.

Cython modules

_fragments Fragment extraction with tree kernels.
bit Functions for working with bitvectors.
coarsetofine Select suitably probable items from a chart and produce whitelist.
containers Data types for chart items, edges, &c.
disambiguation Disambiguate parse forests with various methods for parse selection.
estimates Computation of outside estimates for best-first or A* parsing.
kbest Extract the k-best derivations from a probabilistic parse forest.
pcfg CKY parser for Probabilistic Context-Free Grammar (PCFG).
plcfrs Parser for string-rewriting Linear Context-Free Rewriting Systems.

Indices and tables