discodop.util

Misc code to avoid cyclic imports.

Functions

genericcompressor(cmd, filename[, encoding, …]) Run command line compressor on file and return file object.
genericdecompressor(cmd, filename[, encoding]) Run command line decompressor on file and return file object.
graphemecenter(text, width[, fillchar]) Return text centered in a string of grapheme length width (not len()).
graphemelength(text) Return number of graphemes in string.
merge(*iterables[, key]) Generator that performs an n-way merge of sorted iterables.
openread(filename[, encoding]) Open stdin/file for reading; decompress gz/lz4/zst files on-the-fly.
readbytes(filename) Read bytes from stdin/file; decompress gz/lz4/zst files on-the-fly.
run(*popenargs, **kwargs) Run command with arguments and return (returncode, stdout, stderr).
slice_bounds(seq, slice_obj[, allow_step]) Calculate the effective (start, stop) bounds of a slice.
tokenize(text) A basic tokenizer following English/French PTB/FTB conventions.
which(program[, exception]) Return first match for program in search path.
workerfunc(func) Wrap a multiprocessing worker function to produce a full traceback.

Classes

Entry(key, value, count) A PyAgenda entry.
OrderedSet([iterable]) A frozen, ordered set which maintains a regular list/tuple and set.
PyAgenda([iterable]) Priority Queue implemented with array-based heap.
discodop.util.which(program, exception=True)[source]

Return first match for program in search path.

Parameters:exception – By default, ValueError is raised when program not found. Pass False to return None in this case.
discodop.util.workerfunc(func)[source]

Wrap a multiprocessing worker function to produce a full traceback.

discodop.util.genericdecompressor(cmd, filename, encoding='utf8')[source]

Run command line decompressor on file and return file object.

Parameters:
  • cmd – executable in path with gzip-like command line interface; e.g., gzip, zstd, lz4, bzip2, lzop
  • filename – the file to decompress.
  • encoding – if None, mode is binary; otherwise, text.
Raises:

ValueError – if command returns an error.

Returns:

a file-like object that must be used in a with-statement; supports .read() and iteration, but not seeking.

discodop.util.genericcompressor(cmd, filename, encoding='utf8', compresslevel=8)[source]

Run command line compressor on file and return file object.

Parameters:
  • cmd – executable in path with gzip-like command line interface; e.g., gzip, zstd, lz4, bzip2, lzop
  • filename – the compressed output file.
  • encoding – if None, mode is binary; otherwise, text.
Raises:

ValueError – if command returns an error.

Returns:

a file-like object that must be used in a with-statement; supports .write() but not seeking.

discodop.util.openread(filename, encoding='utf8')[source]

Open stdin/file for reading; decompress gz/lz4/zst files on-the-fly.

Parameters:encoding – if None, mode is binary; otherwise, text.
discodop.util.readbytes(filename)[source]

Read bytes from stdin/file; decompress gz/lz4/zst files on-the-fly.

discodop.util.slice_bounds(seq, slice_obj, allow_step=False)[source]

Calculate the effective (start, stop) bounds of a slice.

Takes into account None indices and negative indices.

Returns:tuple (start, stop, 1), s.t. 0 <= start <= stop <= len(seq)
Raises:ValueError – if slice_obj.step is not None.
Parameters:allow_step – If true, then the slice object may have a non-None step. If it does, then return a tuple (start, stop, step).
class discodop.util.OrderedSet(iterable=None)[source]

A frozen, ordered set which maintains a regular list/tuple and set.

The set is indexable. Equality is defined _without_ regard for order.

class discodop.util.PyAgenda(iterable=None)[source]

Priority Queue implemented with array-based heap.

Implements decrease-key and remove operations by marking entries as invalid. Provides dictionary-like interface.

Can be initialized with an iterable; equivalent values are preserved in insertion order and the best priorities are retained on duplicate keys.

peekitem()[source]

Get the current best (key, value) pair; keep it on the agenda.

pop(key)[source]
Returns:value for agenda[key] and remove it.
popitem()[source]
Returns:best scoring (key, value) pair; removed from agenda.
update(*a, **kw)[source]

Change score of items given a sequence of (key, value) pairs.

clear()[source]

Remove all items from agenda.

keys()[source]
Returns:keys in agenda.
values()[source]
Returns:values in agenda.
items()[source]
Returns:(key, value) pairs in agenda.
discodop.util.merge(*iterables, key=None)[source]

Generator that performs an n-way merge of sorted iterables.

>>> list(merge([0, 1, 2], [0, 1, 2, 3]))
[0, 0, 1, 1, 2, 2, 3]

Similar to heapq.merge, but key can be specified.

NB: while a sort key may be specified, the individual iterables must already be sorted with this key.

discodop.util.tokenize(text)[source]

A basic tokenizer following English/French PTB/FTB conventions.

Adapted from nltk.tokenize.TreebankTokenizer.

discodop.util.run(*popenargs, **kwargs)[source]

Run command with arguments and return (returncode, stdout, stderr).

All arguments are the same as for the Popen constructor.