discodop.gen

Generate random sentences with an LCFRS.

Reads grammar from a text file.

Functions

arraytoyf(args, lengths) Inverse of yfarray().
chooserule(rules, discount, prodcounts) Given a list of weighted rules, choose one following the distribution.
compose(rule, left, right, verbose) Use rule to compose two non-terminals into a new non-terminal.
gen(grammar[, start, discount, prodcounts, …]) Generate a random sentence in top-down fashion.
main() Load a grammar from a text file and generate 20 sentences.
parsefrac(a) Parse a string of a fraction into a float (‘1/2’ => 0.5).
read_bitpar_grammar(rules, lexicon) Read a bitpar grammar given two file objects.
read_lcfrs_grammar(rules, lexicon) Read a grammar produced by grammar.writegrammar from two file objects.
splitgrammar(rules) Split a grammar into various lookup tables.
test() Demonstration on an example grammar.
yfarray(yf) Convert yield function represented as 2D sequence to an array object.

Classes

Grammar(numrules, unary, lbinary, rbinary, …) Create new instance of Grammar(numrules, unary, lbinary, rbinary, bylhs, lexicalbyword, lexicalbylhs, toid, tolabel, fanout)
LexicalRule(lhs, rhs1, rhs2, word, prob, no) Create new instance of LexicalRule(lhs, rhs1, rhs2, word, prob, no)
Rule(lhs, rhs1, rhs2, args, lengths, prob, no) Create new instance of Rule(lhs, rhs1, rhs2, args, lengths, prob, no)
class discodop.gen.Grammar(numrules, unary, lbinary, rbinary, bylhs, lexicalbyword, lexicalbylhs, toid, tolabel, fanout)

Create new instance of Grammar(numrules, unary, lbinary, rbinary, bylhs, lexicalbyword, lexicalbylhs, toid, tolabel, fanout)

bylhs

Alias for field number 4

fanout

Alias for field number 9

lbinary

Alias for field number 2

lexicalbylhs

Alias for field number 6

lexicalbyword

Alias for field number 5

numrules

Alias for field number 0

rbinary

Alias for field number 3

toid

Alias for field number 7

tolabel

Alias for field number 8

unary

Alias for field number 1

class discodop.gen.Rule(lhs, rhs1, rhs2, args, lengths, prob, no)

Create new instance of Rule(lhs, rhs1, rhs2, args, lengths, prob, no)

args

Alias for field number 3

lengths

Alias for field number 4

lhs

Alias for field number 0

no

Alias for field number 6

prob

Alias for field number 5

rhs1

Alias for field number 1

rhs2

Alias for field number 2

class discodop.gen.LexicalRule(lhs, rhs1, rhs2, word, prob, no)

Create new instance of LexicalRule(lhs, rhs1, rhs2, word, prob, no)

lhs

Alias for field number 0

no

Alias for field number 5

prob

Alias for field number 4

rhs1

Alias for field number 1

rhs2

Alias for field number 2

word

Alias for field number 3

discodop.gen.gen(grammar, start=1, discount=0.75, prodcounts=None, verbose=False)[source]

Generate a random sentence in top-down fashion.

Parameters:discount – a factor between 0 and 1.0; 1.0 means no discount, lower values introduce increasingly larger discount for repeated rules.

Cf. http://eli.thegreenplace.net/2010/01/28/generating-random-sentences-from-a-context-free-grammar/

discodop.gen.chooserule(rules, discount, prodcounts)[source]

Given a list of weighted rules, choose one following the distribution.

discodop.gen.compose(rule, left, right, verbose)[source]

Use rule to compose two non-terminals into a new non-terminal.

discodop.gen.parsefrac(a)[source]

Parse a string of a fraction into a float (‘1/2’ => 0.5).

Substitute for creating Fraction objects (which is slow).

discodop.gen.read_lcfrs_grammar(rules, lexicon)[source]

Read a grammar produced by grammar.writegrammar from two file objects.

discodop.gen.read_bitpar_grammar(rules, lexicon)[source]

Read a bitpar grammar given two file objects.

Must be a binarized grammar. Integer frequencies will be converted to exact relative frequencies; otherwise weights are kept as-is.

discodop.gen.splitgrammar(rules)[source]

Split a grammar into various lookup tables.

Also maps nonterminal labels to numeric identifiers, and turns probabilities into negative log-probabilities. Can only represent binary, monotone LCFRS rules.

discodop.gen.yfarray(yf)[source]

Convert yield function represented as 2D sequence to an array object.

discodop.gen.arraytoyf(args, lengths)[source]

Inverse of yfarray().