runexp¶
Run an experiment given a parameter file. Does grammar extraction, parsing, and evaluation.
Usage: discodop runexp <parameter file> [--rerun]
If a parameter file is given, an experiment is run.
Given the parameter file sample.prm, a new directory will be created with
the base name of the parameter file, i.e., sample/ in this case. This
directory must not exist yet, to avoid accidentally overwriting previous
results. To this directory the grammar rules and lexicon will be written in a
text format, as well as the parsing results and the gold standard parse trees
in the same format.
To repeat an experiment with an existing grammar, pass the option --rerun.
The directory with the name of the parameter file without extension must exist
in the current path; its results will be overwritten.
Parameter file and example invocation¶
See the reference documentation on parameter files. A minimal parameter file:
stages=[
dict(
name='pcfg', # an identifier, used as filename when writing results
mode='pcfg', # use the PCFG CKY parser
),
],
evalparam='proper.prm', # EVALB-style parameter file
# train / test sets
corpusfmt='bracket', # choices: export, bracket, discbracket, alpino, tiger
traincorpus=dict(
path='ptb-02-21.mrg',
maxwords=100, # max number of words for sentences in train corpus
),
testcorpus=dict(
path='ptb-24.mrg',
maxwords=100, # max number of words for sentences in test corpus
),
See sample.prm in the code repository for a more extensive example. The
file proper.prm can also be found there, which is a version of the
COLLINS.prm file typically used with EVALB, adapted for discontinuous
parsing. Ensure that all referenced files are in the current directory or
specified with a path, and run as:
$ discodop runexp sample.prm
Parsing statistics¶
After running discodop runexp, a number of additional files are produced
with parsing statistics:
output.log: | a log file with all messages displayed during parsing. This
file contains ANSI codes for colors, so view it with |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pcdist.txt: | shows the distribution of parsing complexity (cf. Gildea, NAACL 2010 for the definition) among the grammar rules. |
||||||||||||
stats.tsv: |
|