discodop.plcfrs¶
Parser for string-rewriting Linear Context-Free Rewriting Systems.
Expects binarized, epsilon-free, monotone LCFRS grammars.
Functions
do(sent, grammar) |
Parse sentence with grammar and print 10 best derivations. |
getparent(i) |
Python version of Cython-only _parent() function. |
merge(*iterables[, key]) |
Generator that performs an n-way merge of sorted iterables. |
parse(sent, Grammar grammar[, tags, start, ...]) |
Parse sentence and produce a chart. |
test() |
Classes
Agenda([iterable]) |
Priority Queue implemented with array-based n-ary heap. |
DoubleAgenda([iterable]) |
Priority Queue where priorities are C doubles. |
DoubleEntry |
|
Entry |
|
FatLCFRSChart(Grammar grammar, list sent[, ...]) |
LCFRS chart that supports longer sentences. |
LCFRSChart(Grammar grammar, list sent[, ...]) |
A chart for LCFRS grammars. |
SmallLCFRSChart(Grammar grammar, list sent) |
For sentences that fit into a single machine word. |
-
class
discodop.plcfrs.Agenda(iterable=None)¶ Priority Queue implemented with array-based n-ary heap.
Implements decrease-key and remove operations by marking entries as invalid. Provides dictionary-like interface.
Can be initialized with an iterable; equivalent values are preserved in insertion order and the best priorities are retained on duplicate keys.
-
clear(self)¶ Remove all items from agenda.
-
items(self)¶ Returns: (key, value) pairs in agenda.
-
keys(self)¶ Returns: keys in agenda.
-
peekitem(self)¶ Get the current best (key, value) pair, while keeping it on the agenda.
-
pop(self, key)¶ Returns: value for agenda[key] and remove it.
-
popitem(self)¶ Returns: best scoring (key, value) pair; removed from agenda.
-
update(self, *a, **kw)¶ Change score of items given a sequence of (key, value) pairs.
-
values(self)¶ Returns: values in agenda.
-
-
class
discodop.plcfrs.DoubleAgenda(iterable=None)¶ Priority Queue where priorities are C doubles.
Implements decrease-key and remove operations by marking entries as invalid. Provides dictionary-like interface.
Can be initialized with an iterable of DoubleEntry objects; order of equivalent values remains and the best priorities are retained on duplicate keys.
This version is specialized to be used as agenda with C doubles as priorities (values); keys are hashable Python objects.
-
peekitem(self)¶ Get the current best (key, value) pair, while keeping it on the agenda.
-
pop(self, key)¶ Returns: value for agenda[key] and remove it.
-
popitem(self)¶
-
-
class
discodop.plcfrs.LCFRSChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)¶ A chart for LCFRS grammars. An item is a ChartItem object.
-
getitems(self)¶
-
hasitem(self, ChartItem item)¶
-
itemstr(self, item)¶
-
-
class
discodop.plcfrs.SmallLCFRSChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)¶ For sentences that fit into a single machine word.
-
indices(self, SmallChartItem item)¶
-
root(self)¶
-
-
class
discodop.plcfrs.FatLCFRSChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)¶ LCFRS chart that supports longer sentences.
-
indices(self, FatChartItem item)¶
-
root(self)¶
-
-
discodop.plcfrs.getparent(i)¶ Python version of Cython-only _parent() function.
-
discodop.plcfrs.merge(*iterables, key=None)¶ Generator that performs an n-way merge of sorted iterables.
>>> list(merge([0, 1, 2], [0, 1, 2, 3])) [0, 0, 1, 1, 2, 2, 3]
NB: while a sort key may be specified, the individual iterables must already be sorted with this key.
-
discodop.plcfrs.parse(sent, Grammar grammar, tags=None, bool exhaustive=True, start=None, list whitelist=None, bool splitprune=False, bool markorigin=False, estimates=None, bool symbolic=False, double beam_beta=0.0, int beam_delta=50)¶ Parse sentence and produce a chart.
Parameters: - sent – A sequence of tokens that will be parsed.
- grammar – A
Grammarobject. - tags – Optionally, a sequence of POS tags to use instead of attempting to apply all possible POS tags.
- exhaustive – don’t stop at viterbi parse, return a full chart
- start – integer corresponding to the start symbol that complete
derivations should be headed by; e.g.,
grammar.toid['ROOT']. If not given, the default specified bygrammaris used. - whitelist – a whitelist of allowed ChartItems. Anything else is not added to the agenda.
- splitprune – coarse stage used a split-PCFG where discontinuous node appear as multiple CFG nodes. Every discontinuous node will result in multiple lookups into whitelist to see whether it should be allowed on the agenda.
- markorigin – in combination with splitprune, coarse labels include an integer to distinguish components; e.g., CFG nodes NP*0 and NP*1 map to the discontinuous node NP_2
- estimates – use context-summary estimates (heuristics, figures of merit) to order agenda. should be a tuple with the kind of estimates (‘SX’ or ‘SXlrgaps’), and the estimates themselves in a 4-dimensional numpy matrix. If estimates are not consistent, it is no longer guaranteed that the optimal parse will be found. experimental.
- symbolic – If True, only compute parse forest, disregard probabilities. The agenda is an O(1) queue instead of a O(log n) priority queue.
- beam_beta – keep track of the best score in each cell and only allow
items which are within a multiple of
beam_betaof the best score. Should be a negative log probability. Pass0.0to disable. - beam_delta – the maximum span length to which beam search is applied.
Returns: a
Chartobject.