discodop.pcfg

CKY parser for Probabilistic Context-Free Grammar (PCFG).

Functions

bitpar_nbest(nbest, SparseCFGChart chart) Put bitpar’s list of n-best derivations into the chart.
bitpar_yap_forest(forest, SparseCFGChart chart) Read bitpar YAP parse forest (-y option) into a Chart object.
minmaxmatrices(nonterminals, lensent) Create matrices to track minima and maxima for binary splits.
parse(sent, Grammar grammar[, tags, start]) A CKY parser modeled after Bodenstab’s ‘fast grammar loop’.
parse_bitpar(grammar, rulesfile, ...[, tags]) Parse a sentence with bitpar, given filenames of rules and lexicon.
renumber(deriv) Replace terminals of CF-derivation (string) with indices.
test()

Classes

CFGChart(Grammar grammar, list sent[, ...]) A Chart for context-free grammars (CFG).
DenseCFGChart(Grammar grammar, list sent[, ...]) A CFG chart in which edges and probabilities are stored in a dense array; i.e., array is contiguous and all valid combinations of indices 0 <= start <= mid <= end and label can be addressed.
SparseCFGChart(Grammar grammar, list sent[, ...]) A CFG chart which uses a dictionary for each cell so that grammars with a large number of non-terminal labels can be handled.
class discodop.pcfg.CFGChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)

A Chart for context-free grammars (CFG).

An item is a Python integer made up of start, end, lhs indices.

getitems(self)
indices(self, item)
itemstr(self, item)
root(self)
class discodop.pcfg.DenseCFGChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)

A CFG chart in which edges and probabilities are stored in a dense array; i.e., array is contiguous and all valid combinations of indices 0 <= start <= mid <= end and label can be addressed. Whether it is feasible to use this chart depends on the grammar constant, specifically the number of non-terminal labels.

getitems(self)
hasitem(self, item) → bool

Test if item is in chart.

setprob(self, item, double prob)

Set probability for item (unconditionally).

class discodop.pcfg.SparseCFGChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)

A CFG chart which uses a dictionary for each cell so that grammars with a large number of non-terminal labels can be handled.

hasitem(self, item) → bool

Test if item is in chart.

setprob(self, item, prob)

Set probability for item (unconditionally).

discodop.pcfg.parse(sent, Grammar grammar, tags=None, start=None, list whitelist=None, bool symbolic=False, double beam_beta=0.0, int beam_delta=50)

A CKY parser modeled after Bodenstab’s ‘fast grammar loop’.

Parameters:
  • sent – A sequence of tokens that will be parsed.
  • grammar – A Grammar object.
  • tags – Optionally, a sequence of POS tags to use instead of attempting to apply all possible POS tags.
  • start – integer corresponding to the start symbol that complete derivations should be headed by; e.g., grammar.toid['ROOT']. If not given, the default specified by grammar is used.
  • whitelist – a list of items that may enter the chart. The whitelist is a list of cells consisting of sets of labels: whitelist = [{label1, label2, ...}, ...]; The cells are indexed as compact spans; label is an integer for a non-terminal label. The presence of a label means the span with that label will not be pruned.
  • symbolic – If True, parse sentence without regard for probabilities. All Viterbi probabilities will be set to 1.0.
  • beam_beta – keep track of the best score in each cell and only allow items which are within a multiple of beam_beta of the best score. Should be a negative log probability. Pass 0.0 to disable.
  • beam_delta – the maximum span length to which beam search is applied.
Returns:

a Chart object.

discodop.pcfg.renumber(deriv)

Replace terminals of CF-derivation (string) with indices.

discodop.pcfg.minmaxmatrices(nonterminals, lensent)

Create matrices to track minima and maxima for binary splits.

discodop.pcfg.parse_bitpar(grammar, rulesfile, lexiconfile, sent, n, startlabel, startid, tags=None)

Parse a sentence with bitpar, given filenames of rules and lexicon.

Parameters:n – the number of derivations to return (max 1000); if n == 0, return parse forest instead of n-best list (requires binarized grammar).
Returns:a dictionary of derivations with their probabilities.
discodop.pcfg.bitpar_yap_forest(forest, SparseCFGChart chart)

Read bitpar YAP parse forest (-y option) into a Chart object.

The forest has lines of the form::
label start end prob [edge1] % prob [edge2] % .. %%

where an edge is either a quoted “word”, or a rule number and one or two line numbers in the parse forest referring to children. Assumes binarized grammar. Assumes chart’s Grammar object has same order of grammar rules as the grammar that was presented to bitpar.

discodop.pcfg.bitpar_nbest(nbest, SparseCFGChart chart)

Put bitpar’s list of n-best derivations into the chart. Parse forest is not converted.