discodop.runexp¶
Run an experiment given a parameter file.
Does grammar extraction, parsing, and evaluation.
Functions
dobinarization(trees, sents, binarization, …) |
Apply binarization to treebank. |
doparsing(**kwds) |
Parse a set of sentences using worker processes. |
getgrammars(trees, sents, stages, …) |
Read off the requested grammars. |
getposmodel(postagging, train_tagged_sents) |
Apply unknown word model to sentences before extracting grammar. |
initworker(params) |
Set global parameter object. |
loadtraincorpus(corpusfmt, traincorpus, …) |
Load the training corpus. |
mpworker(args) |
Multiprocessing wrapper of worker. |
oldeval(results, goldbrackets) |
Simple evaluation. |
parsetepacoc([stages, trainmaxwords, …]) |
Parse the tepacoc test set. |
readtepacoc() |
Read the tepacoc test set. |
startexp(prm[, resultdir, rerun]) |
Execute an experiment. |
worker(args) |
Parse a sentence using global Parser object, and evaluate incrementally. |
writeresults(results, params) |
Write parsing results to files in same format as the original corpus. |
-
discodop.runexp.loadtraincorpus(corpusfmt, traincorpus, binarization, punct, functions, morphology, removeempty, ensureroot, transformations, relationalrealizational, resultdir)[source]¶ Load the training corpus.
-
discodop.runexp.getposmodel(postagging, train_tagged_sents)[source]¶ Apply unknown word model to sentences before extracting grammar.
-
discodop.runexp.dobinarization(trees, sents, binarization, relationalrealizational, logmsg=True)[source]¶ Apply binarization to treebank.
-
discodop.runexp.getgrammars(trees, sents, stages, testmaxwords, resultdir, numproc, lexmodel, top)[source]¶ Read off the requested grammars.
-
discodop.runexp.worker(args)[source]¶ Parse a sentence using global Parser object, and evaluate incrementally.
Returns: a string with diagnostic information, as well as a list of DictObj instances with the results for each stage.
-
discodop.runexp.writeresults(results, params)[source]¶ Write parsing results to files in same format as the original corpus. (Or export if writer not implemented).
-
discodop.runexp.parsetepacoc(stages=({'mode': 'pcfg', 'split': True, 'markorigin': True}, {'mode': 'plcfrs', 'prune': True, 'k': 10000}, {'mode': 'plcfrs', 'prune': True, 'k': 5000, 'dop': 'doubledop', 'estimator': 'rfe', 'objective': 'mpp'}), trainmaxwords=999, trainnumsents=25005, testmaxwords=999, binarization=DictObj(method='default', h=1, v=1, factor='right', tailmarker='', headrules='negra.headrules', leftmostunary=True, rightmostunary=True, markhead=False, fanout_marks_before_bin=False), transformations=None, usetagger='stanford', resultdir='tepacoc', numproc=1)[source]¶ Parse the tepacoc test set.