discodop.plcfrs¶

Parser for string-rewriting Linear Context-Free Rewriting Systems.

Expects binarized, epsilon-free, monotone LCFRS grammars.

Functions

`parse`(sent, Grammar grammar[, tags, start, …])	Parse sentence and produce a chart.
`test`()
`testsent`(sent, grammar)	Parse sentence with grammar and print 10 best derivations.

Classes

`FatLCFRSChart`(Grammar grammar, list sent[, …])	LCFRS chart that supports longer sentences.
`LCFRSChart`(Grammar grammar, list sent[, …])	A chart for LCFRS grammars.
`SmallLCFRSChart`(Grammar grammar, list sent)	For sentences that fit into a single machine word.

class discodop.plcfrs.LCFRSChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True, itemsestimate=None)¶: A chart for LCFRS grammars. An item is a ChartItem object.

class discodop.plcfrs.SmallLCFRSChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True, itemsestimate=None)¶

For sentences that fit into a single machine word.

bestsubtree(self, start, end)¶

indices(self, ItemNo itemidx)¶

itemid(self, unicode label, indices, Whitelist whitelist=None)¶

itemid1(self, Label labelid, indices, Whitelist whitelist=None)¶

itemstr(self, ItemNo itemidx)¶

root(self)¶

class discodop.plcfrs.FatLCFRSChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True, itemsestimate=None)¶

LCFRS chart that supports longer sentences.

bestsubtree(self, start, end)¶

indices(self, ItemNo itemidx)¶

itemid(self, unicode label, indices, Whitelist whitelist=None)¶

itemid1(self, Label labelid, indices, Whitelist whitelist=None)¶

itemstr(self, ItemNo itemidx)¶

root(self)¶

discodop.plcfrs.parse(sent, Grammar grammar, tags=None, bool exhaustive=True, start=None, Whitelist whitelist=None, bool splitprune=False, bool markorigin=False, estimates=None, Prob beam_beta=0.0, int beam_delta=50, itemsestimate=None, postagging=None)¶

Parse sentence and produce a chart.

Parameters:

sent – A sequence of tokens that will be parsed.
grammar – A Grammar object.
tags – Optionally, a sequence of POS tags to use instead of attempting to apply all possible POS tags.
exhaustive – don’t stop at viterbi parse, return a full chart
start – integer corresponding to the start symbol that complete derivations should be headed by; e.g., grammar.toid['ROOT']. If not given, the default specified by grammar is used.
whitelist – a whitelist of allowed ChartItems. Anything else is not added to the agenda.
splitprune – coarse stage used a split-PCFG where discontinuous node appear as multiple CFG nodes. Every discontinuous node will result in multiple lookups into whitelist to see whether it should be allowed on the agenda.
markorigin – in combination with splitprune, coarse labels include an integer to distinguish components; e.g., CFG nodes NP*0 and NP*1 map to the discontinuous node NP_2
estimates – use context-summary estimates (heuristics, figures of merit) to order agenda. should be a tuple with the kind of estimates (‘SX’ or ‘SXlrgaps’), and the estimates themselves in a 4-dimensional numpy matrix. If estimates are not consistent, it is no longer guaranteed that the optimal parse will be found. experimental.
beam_beta – keep track of the best score in each cell and only allow items which are within a multiple of beam_beta of the best score. Should be a negative log probability. Pass 0.0 to disable.
beam_delta – the maximum span length to which beam search is applied.
itemsestimate – the number of chart items to pre-allocate.

Returns:

a tuple (chart, msg); a Chart object and status message.