discodop.coarsetofine

Select suitably probable items from a chart and produce whitelist.

Functions

doctftest(coarse, fine, sent, tree, k, split) Test coarse-to-fine methods on a sentence.
getinside(Chart chart) Compute inside probabilities for a chart given its parse forest.
getmatchingitems(Chart chart, …)
getoutside(Chart chart) Compute outside probabilities for a chart given its parse forest.
posteriorthreshold(Chart chart, double threshold) Prune labeled spans from chart below given posterior threshold.
prunechart(Chart coarsechart, Grammar fine, …) Produce a white list of selected chart items.
test()
discodop.coarsetofine.prunechart(Chart coarsechart, Grammar fine, k, bool splitprune, bool markorigin, bool finecfg, set require=None, set block=None)

Produce a white list of selected chart items.

The criterion is that they occur in the k-best derivations of chart, or with posterior probability > k. Labels X in coarse.toid are projected to the labels in the mapping of the fine grammar, e.g., to X and X@n-m for a DOP reduction.

Parameters:
  • coarsechart – a Chart object produced by the PCFG or PLCFRS parser.
  • fine – the grammar to map labels to after pruning. must have a mapping to the coarse grammar established by fine.getmapping().
  • k – when k >= 1: number of k-best derivations to consider; when k==0, the chart is not pruned but filtered to contain only items that contribute to a complete derivation; when 0 < k < 1, inside-outside probabilities are computed and items with a posterior probabilities < k are pruned.
  • splitprune – coarse stage used a split-PCFG where discontinuous node appear as multiple CFG nodes. Every discontinuous node will result in multiple lookups into whitelist to see whether it should be allowed on the agenda.
  • markorigin – in combination with splitprune, coarse labels include an integer to distinguish components; e.g., CFG nodes NP*0 and NP*1 map to the discontinuous node NP_2.
  • require – optionally, a list of tuples (label, indices); only k-best derivations containing these labeled spans will be selected. For example, ('NP', [0, 1, 2]); expects k > 1.
  • block – optionally, a list of tuples (label, indices); these labeled spans will be pruned.
Returns:

(whitelist, msg)

For LCFRS, the white list is indexed as follows:
whitelisted:item in whitelist[label], item is a SmallChartItem or FatChartItem depending on sent. len.
blocked:item not in whitelist[label]
For a CFG, indexing is as follows:
whitelisted:label in whitelist[span], span is an integer encoding both begin and end; different from a cell because does not include no. of nonterminals.
blocked:label not in whitelist[span]
discodop.coarsetofine.posteriorthreshold(Chart chart, double threshold)

Prune labeled spans from chart below given posterior threshold.

Returns:dictionary of remaining items.
discodop.coarsetofine.getinside(Chart chart)

Compute inside probabilities for a chart given its parse forest.

discodop.coarsetofine.getoutside(Chart chart)

Compute outside probabilities for a chart given its parse forest.