discodop.parser

Parser object that performs coarse-to-fine and postprocessing.

Additionally, a simple command line interface similar to bitpar.

Functions

doparsing(parser, infile, out, printprob, …) Parse sentences from file and write results to file, log to stdout.
estimateitems(sent, prune, mode, dop) Estimate number of chart items needed for a given sentence.
initworker(parser, printprob, usetags, …) Load parser for a worker process.
main() Handle command line arguments.
mpworker(args) Parse a single sentence (multiprocessing wrapper).
probstr(prob) Render probability / number of subtrees as string.
readgrammars(resultdir, stages[, …]) Read the grammars from a previous experiment.
readinputbitparstyle(infile) Yields lists of tokens, where ‘nn’ identifies a sentence break.
readparam(filename) Parse a parameter file.
worker(args) Parse a single sentence.

Classes

DictObj(*args, **kwds) Trivial class to wrap a dictionary for reasons of syntactic sugar.
Parser(prm[, funcclassifier, loadtrees]) A coarse-to-fine parser based on a given set of parameters.
class discodop.parser.DictObj(*args, **kwds)[source]

Trivial class to wrap a dictionary for reasons of syntactic sugar.

update(*args, **kwds)[source]

Update/add more attributes.

class discodop.parser.Parser(prm, funcclassifier=None, loadtrees=False)[source]

A coarse-to-fine parser based on a given set of parameters.

Parameters:
  • prm – A DictObj with parameters as returned by parser.readparam().
  • funcclassifier – optionally, a function tag classifier trained by functiontags.trainfunctionclassifier().
parse(sent, tags=None, root=None, goldtree=None, require=(), block=())[source]

Parse a sentence and perform postprocessing.

Yields a dictionary from parse trees to probabilities for each stage.

Parameters:
  • sent – a sequence of tokens.
  • tags – optionally, a list of POS tags as strings to be given to the parser instead of trying all possible tags.
  • root – optionally, specify a non-default root label.
  • goldtree – if given, will be used to evaluate pruned parse forests.
  • require – optionally, a list of tuples (label, indices); only parse trees containing these labeled spans will be returned. For example, ('NP', [0, 1, 2]).
  • block – optionally, a list of tuples (label, indices); these labeled spans will be pruned.
postprocess(treestr, sent, stage)[source]

Take parse tree and apply postprocessing.

noparse(stage, sent, tags, lastsuccessfulparse, n)[source]

Return parse from previous stage or a dummy parse.

augmentgrammar(newtrees, newsents)[source]

Extract grammar rules from trees and merge with current grammar.

discodop.parser.readgrammars(resultdir, stages, postagging=None, transformations=None, top='ROOT', cache=False)[source]

Read the grammars from a previous experiment.

Expects a directory resultdir which contains the relevant grammars and the parameter file params.prm, as produced by runexp.

discodop.parser.probstr(prob)[source]

Render probability / number of subtrees as string.

discodop.parser.readparam(filename)[source]

Parse a parameter file.

Parameters:filename – The file should contain a list of comma-separated attribute=value pairs and will be read using eval('dict(%s)' % open(file).read()).
Returns:A DictObj.
discodop.parser.readinputbitparstyle(infile)[source]

Yields lists of tokens, where ‘nn’ identifies a sentence break.

Lazy version of infile.read().split('\n\n').

discodop.parser.initworker(parser, printprob, usetags, numparses, fmt, morphology)[source]

Load parser for a worker process.

discodop.parser.doparsing(parser, infile, out, printprob, oneline, usetags, numparses, numproc, fmt, morphology, sentid)[source]

Parse sentences from file and write results to file, log to stdout.