discodop.gen¶
Generate random sentences with an LCFRS.
Reads grammar from a text file.
Functions
arraytoyf(args, lengths) |
Inverse of yfarray(). |
chooserule(rules, discount, prodcounts) |
Given a list of weighted rules, choose one following the distribution. |
compose(rule, left, right, verbose) |
Use rule to compose two non-terminals into a new non-terminal. |
gen(grammar[, start, discount, prodcounts, …]) |
Generate a random sentence in top-down fashion. |
main() |
Load a grammar from a text file and generate 20 sentences. |
parsefrac(a) |
Parse a string of a fraction into a float (‘1/2’ => 0.5). |
read_bitpar_grammar(rules, lexicon) |
Read a bitpar grammar given two file objects. |
read_lcfrs_grammar(rules, lexicon) |
Read a grammar produced by grammar.writegrammar from two file objects. |
splitgrammar(rules) |
Split a grammar into various lookup tables. |
test() |
Demonstration on an example grammar. |
yfarray(yf) |
Convert yield function represented as 2D sequence to an array object. |
Classes
Grammar(numrules, unary, lbinary, rbinary, …) |
Create new instance of Grammar(numrules, unary, lbinary, rbinary, bylhs, lexicalbyword, lexicalbylhs, toid, tolabel, fanout) |
LexicalRule(lhs, rhs1, rhs2, word, prob, no) |
Create new instance of LexicalRule(lhs, rhs1, rhs2, word, prob, no) |
Rule(lhs, rhs1, rhs2, args, lengths, prob, no) |
Create new instance of Rule(lhs, rhs1, rhs2, args, lengths, prob, no) |
-
class
discodop.gen.Grammar(numrules, unary, lbinary, rbinary, bylhs, lexicalbyword, lexicalbylhs, toid, tolabel, fanout)¶ Create new instance of Grammar(numrules, unary, lbinary, rbinary, bylhs, lexicalbyword, lexicalbylhs, toid, tolabel, fanout)
-
bylhs¶ Alias for field number 4
-
fanout¶ Alias for field number 9
-
lbinary¶ Alias for field number 2
-
lexicalbylhs¶ Alias for field number 6
-
lexicalbyword¶ Alias for field number 5
-
numrules¶ Alias for field number 0
-
rbinary¶ Alias for field number 3
-
toid¶ Alias for field number 7
-
tolabel¶ Alias for field number 8
-
unary¶ Alias for field number 1
-
-
class
discodop.gen.Rule(lhs, rhs1, rhs2, args, lengths, prob, no)¶ Create new instance of Rule(lhs, rhs1, rhs2, args, lengths, prob, no)
-
args¶ Alias for field number 3
-
lengths¶ Alias for field number 4
-
lhs¶ Alias for field number 0
-
no¶ Alias for field number 6
-
prob¶ Alias for field number 5
-
rhs1¶ Alias for field number 1
-
rhs2¶ Alias for field number 2
-
-
class
discodop.gen.LexicalRule(lhs, rhs1, rhs2, word, prob, no)¶ Create new instance of LexicalRule(lhs, rhs1, rhs2, word, prob, no)
-
lhs¶ Alias for field number 0
-
no¶ Alias for field number 5
-
prob¶ Alias for field number 4
-
rhs1¶ Alias for field number 1
-
rhs2¶ Alias for field number 2
-
word¶ Alias for field number 3
-
-
discodop.gen.gen(grammar, start=1, discount=0.75, prodcounts=None, verbose=False)[source]¶ Generate a random sentence in top-down fashion.
Parameters: discount – a factor between 0 and 1.0; 1.0 means no discount, lower values introduce increasingly larger discount for repeated rules. Cf. http://eli.thegreenplace.net/2010/01/28/generating-random-sentences-from-a-context-free-grammar/
-
discodop.gen.chooserule(rules, discount, prodcounts)[source]¶ Given a list of weighted rules, choose one following the distribution.
-
discodop.gen.compose(rule, left, right, verbose)[source]¶ Use rule to compose two non-terminals into a new non-terminal.
-
discodop.gen.parsefrac(a)[source]¶ Parse a string of a fraction into a float (‘1/2’ => 0.5).
Substitute for creating Fraction objects (which is slow).
-
discodop.gen.read_lcfrs_grammar(rules, lexicon)[source]¶ Read a grammar produced by grammar.writegrammar from two file objects.
-
discodop.gen.read_bitpar_grammar(rules, lexicon)[source]¶ Read a bitpar grammar given two file objects.
Must be a binarized grammar. Integer frequencies will be converted to exact relative frequencies; otherwise weights are kept as-is.
-
discodop.gen.splitgrammar(rules)[source]¶ Split a grammar into various lookup tables.
Also maps nonterminal labels to numeric identifiers, and turns probabilities into negative log-probabilities. Can only represent binary, monotone LCFRS rules.