Parser parameters¶
A parser is defined by a sequence of stages, and a set of global options:
stages=[
stage1,
stage2,
],
corpusfmt='...',
traincorpus=dict(...),
testcorpus=dict(...),
binarization=dict(...),
key1=val1,
key2=val2,
The parameters consist of a Python expression surrounded by an implicit
'dict(' and ')'. Note that each key=value is separated by a comma.
Corpora¶
| corpusfmt: | The corpus format; choices:
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| traincorpus: | a dictionary with the following keys:
|
||||||||||||
| testcorpus: | a dictionary with the following keys:
|
Binarization¶
| binarization: | a dictionary with the following keys:
|
||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Stages¶
Through the use of stages it is possible to run multiple parsers on the same test set, or to exploit coarse-to-fine pruning.
A stage has the form:
dict(
key1=val1,
key2=val2,
...
)
Where the keys and values are:
| name: | identifier, used for filenames |
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mode: | The type of parser to use
|
||||||||||||||
| prune: | specify the name of a previous stage to enable coarse-to-fine pruning. |
||||||||||||||
| split: | split disc. nodes |
||||||||||||||
| splitprune: | treat |
||||||||||||||
| markorigin: | mark origin of split nodes: |
||||||||||||||
| k: | pruning parameter:
|
||||||||||||||
| kbest: | extract m-best derivations from chart |
||||||||||||||
| sample: | sample m derivations from chart |
||||||||||||||
| m: | number of derivations to sample / enumerate. |
||||||||||||||
| binarized: | when using |
||||||||||||||
| dop: | enable DOP mode:
|
||||||||||||||
| estimator: | DOP estimator. Choices:
|
||||||||||||||
| objective: | Objective function to choose DOP parse tree. Choices:
|
||||||||||||||
| sldop_n: | When using sl-dop or sl-dop-simple, number of most likely parse trees to consider. |
||||||||||||||
| maxdepth: | with |
||||||||||||||
| maxfrontier: | with |
||||||||||||||
| collapse: | apply a multilevel coarse-to-fine preset. values are of the form
|
||||||||||||||
| packedgraph: | use packed graph encoding for DOP reduction |
||||||||||||||
| iterate: | for Double-DOP, whether to add fragments of fragments |
||||||||||||||
| complement: | for Double-DOP, whether to include fragments which form the complement of the maximal recurring fragments extracted |
||||||||||||||
| neverblockre: | do not prune nodes with label that match this regex |
||||||||||||||
| estimates: | compute, store & use context-summary (outside) estimates |
||||||||||||||
| beam_beta: | beam pruning factor, between 0 and 1; 1 to disable.
if enabled, new constituents must have a larger probability
than the probability of the best constituent in a cell multiplied by this
factor; i.e., a smaller value implies less pruning.
Suggested value: |
||||||||||||||
| beam_delta: | if beam pruning is enabled, only apply it to spans up to this length. |
||||||||||||||
Other options¶
| evalparam: | EVALB-style parameter file to use for reporting F-scores |
||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| postagging: | To disable POS tagging and use the gold POS tags from the
test set, set this to
|
||||||||||||||||||||||||||||||||||||||||||
| punct: | one of ...
|
||||||||||||||||||||||||||||||||||||||||||
| functions: | one of ...
|
||||||||||||||||||||||||||||||||||||||||||
| morphology: | one of ...
|
||||||||||||||||||||||||||||||||||||||||||
| lemmas: | one of ...
|
||||||||||||||||||||||||||||||||||||||||||
| removeempty: |
|
||||||||||||||||||||||||||||||||||||||||||
| ensureroot: | Ensure every tree has a root node with this label |
||||||||||||||||||||||||||||||||||||||||||
| transformations: | |||||||||||||||||||||||||||||||||||||||||||
Apply specific treebank transforms; available presets:
|
|||||||||||||||||||||||||||||||||||||||||||
| relationalrealizational: | |||||||||||||||||||||||||||||||||||||||||||
apply RR-transform;
see |
|||||||||||||||||||||||||||||||||||||||||||
| verbosity: | control the amount of output to console;
a logfile
|
||||||||||||||||||||||||||||||||||||||||||
| numproc: | default 1; increase to use multiple CPUs; |
||||||||||||||||||||||||||||||||||||||||||