discodop.pcfg¶
CKY parser for Probabilistic Context-Free Grammar (PCFG).
Functions
parse(sent, Grammar grammar[, tags, start, …]) |
PCFG parsing using CKY. |
test() |
|
testsent(sent, grammar[, expected]) |
Parse sentence with grammar and print 10 best derivations. |
Classes
CFGChart(Grammar grammar, list sent[, …]) |
A Chart for context-free grammars (CFG). |
DenseCFGChart(Grammar grammar, list sent[, …]) |
A CFG chart with fixed, pre-allocated arrays. |
SparseCFGChart(Grammar grammar, list sent[, …]) |
A CFG chart which uses a hash table suitable for large grammars. |
-
class
discodop.pcfg.CFGChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)¶ A Chart for context-free grammars (CFG).
An item is a triple
(start, end, label).
-
class
discodop.pcfg.DenseCFGChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True)¶ A CFG chart with fixed, pre-allocated arrays.
All possible chart items are stored in dense, pre-allocated arrays; i.e., array is contiguous and all valid combinations of indices
0 <= start <= mid <= endandlabelcan be addressed. Whether it is feasible to use this chart depends on the grammar constant, specifically the number of non-terminal labels (and to a lesser extent the sentence length).-
bestsubtree(self, start, end)¶
-
indices(self, ItemNo itemidx)¶
-
itemid(self, unicode label, indices, Whitelist whitelist=None)¶
-
itemid1(self, Label labelid, indices, Whitelist whitelist=None)¶
-
itemstr(self, ItemNo itemidx)¶
-
numitems(self)¶
-
root(self)¶
-
-
class
discodop.pcfg.SparseCFGChart(Grammar grammar, list sent, start=None, logprob=True, viterbi=True, itemsestimate=None)¶ A CFG chart which uses a hash table suitable for large grammars.
-
bestsubtree(self, start, end)¶
-
indices(self, ItemNo itemidx)¶
-
itemid(self, unicode label, indices, Whitelist whitelist=None)¶
-
itemid1(self, Label labelid, indices, Whitelist whitelist=None)¶
-
itemstr(self, ItemNo itemidx)¶
-
root(self)¶
-
-
discodop.pcfg.parse(sent, Grammar grammar, tags=None, start=None, whitelist=None, Prob beam_beta=0.0, int beam_delta=50, itemsestimate=None, postagging=None)¶ PCFG parsing using CKY.
Parameters: - sent – A sequence of tokens that will be parsed.
- grammar – A
Grammarobject. - tags – Optionally, a sequence of POS tags to use instead of attempting to apply all possible POS tags.
- start – integer corresponding to the start symbol that complete
derivations should be headed by; e.g.,
grammar.toid['ROOT']. If not given, the default specified bygrammaris used. - whitelist – a list of items that may enter the chart.
The whitelist is a list of cells consisting of sets of labels:
whitelist = [{label1, label2, ...}, ...]; The cells are indexed as compact spans; label is an integer for a non-terminal label. The presence of a label means the span with that label will not be pruned. - beam_beta – keep track of the best score in each cell and only allow
items which are within a multiple of
beam_betaof the best score. Should be a negative log probability. Pass0.0to disable. - beam_delta – the maximum span length to which beam search is applied.
- itemsestimate – the number of chart items to pre-allocate.
Returns: a
Chartobject.