|
--inputfmt=<export|bracket|discbracket|alpino|alpinocompact|tiger|ftb> |
| | Input treebank format [default: export]. |
|
--outputfmt=<export|bracket|discbracket|conll|mst|tokens|wordpos> |
| | Output treebank format [default: export].
Selecting the formats conll or mst invokes a dependency
conversion and requires the use of heuristic head rules
(--headrules), to ensure that all constituents have a child
marked as head; labels are based on function tags (if any). |
|
--fmt=x |
Shortcut to specify both input and output format. |
|
--inputenc, --outputenc, --enc=<utf-8|iso-8859-1|…> |
| | Treebank encoding [default: utf-8]. |
|
--slice=<n:m> |
select a range of sentences from input starting with n,
up to but not including m; as in Python, n or m can be left
out or negative, and the first index is 0. |
|
--renumber |
Replace sentence IDs with numbers starting from 1,
padded with 8 spaces. |
|
--sentid |
With ‘tokens’ or ‘wordpos’ output format, prefix lines with identifiers of the form ID|. |
|
--maxlen=n |
only select sentences with up to n tokens. |
|
--punct=x |
| ‘remove’: | remove any punctuation. |
| ‘move’: | re-attach punctuation to nearest constituent
to minimize discontinuity. |
| ‘restore’: | attach punctuation under root node. |
|
|
--functions=x |
| ‘leave’: | (default): leave syntactic labels as is, |
| ‘remove’: | strip away hyphen-separated function labels |
| ‘add’: | concatenate syntactic categories with functions, |
| ‘replace’: | replace syntactic labels w/grammatical functions. |
|
|
--morphology=x |
| ‘no’ (default): | use POS tags as preterminals |
| ‘add’: | concatenate morphological information to POS tags,
e.g., DET/sg.def |
| ‘replace’: | use morphological information as preterminal label |
| ‘between’: | insert node with morphological information between
POS tag and word, e.g., (DET (sg.def the)) |
|
|
--lemmas=x |
| ‘no’ (default): | do not use lemmas. |
| ‘add’: | concatenate lemmas to terminals, e.g., word/lemma |
| ‘replace’: | use lemma instead of terminals |
| ‘between’: | insert node with lemma between POS tag and word,
e.g., (NN (man men)) |
|
|
--ensureroot=x |
add root node labeled x to trees if not already present. |
|
--removeempty |
remove empty / -NONE- terminals. |
|
--factor=<left|right> |
| | specify left- or right-factored binarization [default: right]. |
|
-h n |
horizontal markovization. default: infinite (all siblings) |
|
-v n |
vertical markovization. default: 1 (immediate parent only) |
|
--headrules=x |
turn on head finding; turns on head-outward binarization.
reads rules from file x (e.g., “negra.headrules”). |
|
--markhead |
include label of the head child in all auxiliary labels
of binarization. |
|
--direction |
mark direction when using head-outward binarization. |
|
--labelfun=x |
x is a Python lambda function that takes a node and returns
a label to be used for markovization purposes. For example,
to get labels without state splits, pass this function:
'lambda n: n.label.split("^")[0]' |
|
--leftunary |
make initial / final productions of binarized constituents |
|
--rightunary |
… unary productions. |
|
--tailmarker |
mark rightmost child (the head if headrules are applied), to
avoid cyclic rules when --leftunary and --rightunary
are used. |