discodop.kbest

Extract the k-best derivations from a probabilistic parse forest.

Implementation of Huang & Chiang (2005): Better k-best parsing. http://www.cis.upenn.edu/~lhuang3/huang-iwpt-correct.pdf

Functions

getderiv(v, RankedEdge ej, Chart chart, ...) Convert a RankedEdge to a string with a tree in bracket notation.
lazykbest(Chart chart, int k, ...) Wrapper function to run lazykthbest.
test() Demonstration of k-best algorithm.
discodop.kbest.getderiv(v, RankedEdge ej, Chart chart, unicode debin)

Convert a RankedEdge to a string with a tree in bracket notation.

A RankedEdge consists of an edge and a rank tuple: (e, j) notation (‘derivation with backpointers’). For example, given an edge based on the rule S => NP VP and the tuple (2, 1), this identifies a derivation headed by S and having the 2nd best NP and the 1st best VP as children.

Parameters:debin – perform on-the-fly debinarization, identify intermediate nodes using the substring debin.
discodop.kbest.lazykbest(Chart chart, int k, unicode debin=None, bool derivs=True)

Wrapper function to run lazykthbest.

Produces the ranked chart, as well as derivations as strings (when derivs is True). chart is a monotone hypergraph; should be acyclic unless probabilities resolve the cycles (maybe nonzero weights for unary productions are sufficient?).

Parameters:
  • k – the number of derivations to enumerate.
  • debin – debinarize derivations.