API Reference (auto-generated)

Morphological Analyzer

class pymorphy2.analyzer.MorphAnalyzer(path=None, result_type=<class 'pymorphy2.analyzer.Parse'>)[source]

Morphological analyzer for Russian language.

For a given word it can find all possible inflectional paradigms and thus compute all possible tags and normal forms.

Analyzer uses morphological word features and a lexicon (dictionary compiled from XML available at OpenCorpora.org); for unknown words heuristic algorithm is used.

Create a MorphAnalyzer object:

>>> import pymorphy2
>>> morph = pymorphy2.MorphAnalyzer()

MorphAnalyzer uses dictionaries from pymorphy2-dicts package (which can be installed via pip install pymorphy2-dicts).

Alternatively (e.g. if you have your own precompiled dictionaries), either create PYMORPHY2_DICT_PATH environment variable with a path to dictionaries, or pass path argument to pymorphy2.MorphAnalyzer constructor:

>>> morph = pymorphy2.MorphAnalyzer('/path/to/dictionaries') 

By default, methods of this class return parsing results as namedtuples Parse. This has performance implications under CPython, so if you need maximum speed then pass result_type=None to make analyzer return plain unwrapped tuples:

>>> morph = pymorphy2.MorphAnalyzer(result_type=None)
TagClass[source]
classmethod choose_dictionary_path(path=None)[source]
decline(word)[source]

Return parses for all possible word forms.

dict_meta[source]
env_variable = u'PYMORPHY2_DICT_PATH'
inflect(word, required_grammemes)[source]

Return a list of parsed words that are closest to word and have all required_grammemes.

iter_known_word_parses(prefix=u'')[source]

Return an iterator over parses of dictionary words that starts with a given prefix (default empty prefix means “all words”).

normal_forms(word)[source]

Return a list of word normal forms.

parse(word)[source]

Analyze the word and return a list of Parse namedtuples:

Parse(word, tag, normal_form, para_id, idx, _estimate)

(or plain tuples if result_type=None was used in constructor).

tag(word)[source]
word_is_known(word, strict_ee=False)[source]

Check if a word is in the dictionary. Pass strict_ee=True if word is guaranteed to have correct е/ё letters.

Примечание

Dictionary words are not always correct words; the dictionary also contains incorrect forms which are commonly used. So for spellchecking tasks this method should be used with extra care.

class pymorphy2.analyzer.Parse[source]

Parse result wrapper.

inflect(required_grammemes)[source]
is_known[source]

True if this form is a known dictionary form.

lexeme[source]

A lexeme this form belongs to.

normalized[source]

A Parse instance for self.normal_form.

paradigm[source]

Tagset

Utils for working with grammatical tags.

class pymorphy2.tagset.OpencorporaTag(tag)[source]

Wrapper class for OpenCorpora.org tags.

Предупреждение

In order to work properly, the class has to be globally initialized with actual grammemes (using _init_grammemes method).

Pymorphy2 initializes it when loading a dictionary; it may be not a good idea to use this class directly. If possible, use morph_analyzer.TagClass instead.

Example:

>>> from pymorphy2 import MorphAnalyzer
>>> morph = MorphAnalyzer()
>>> Tag = morph.TagClass # get an initialzed Tag class
>>> tag = Tag('VERB,perf,tran plur,impr,excl')
>>> tag
OpencorporaTag('VERB,perf,tran plur,impr,excl')

Tag instances have attributes for accessing grammemes:

>>> print(tag.POS)
VERB
>>> print(tag.number)
plur
>>> print(tag.case)
None

Available attributes are: POS, animacy, aspect, case, gender, involvement, mood, number, person, tense, transitivity and voice.

You may check if a grammeme is in tag or if all grammemes from a given set are in tag:

>>> 'perf' in tag
True
>>> 'nomn' in tag
False
>>> 'Geox' in tag
False
>>> set(['VERB', 'perf']) in tag
True
>>> set(['VERB', 'perf', 'sing']) in tag
False

In order to fight typos, for unknown grammemes an exception is raised:

>>> 'foobar' in tag
Traceback (most recent call last):
...
ValueError: Grammeme is unknown: foobar
>>> set(['NOUN', 'foo', 'bar']) in tag
Traceback (most recent call last):
...
ValueError: Grammemes are unknown: {'bar', 'foo'}

This also works for attributes:

>>> tag.POS == 'plur'
Traceback (most recent call last):
...
ValueError: 'plur' is not a valid grammeme for this attribute.
grammemes[source]

A frozenset with grammemes for this tag.

updated_grammemes(required)[source]

Return a new set of grammemes with required grammemes added and incompatible grammemes removed.

Command-Line Interface

Usage:

pymorphy dict compile <XML_FILE> [--out <PATH>] [--force] [--verbose] [--min_ending_freq <NUM>] [--min_paradigm_popularity <NUM>] [--max_suffix_length <NUM>]
pymorphy dict download_xml <OUT_FILE> [--verbose]
pymorphy dict mem_usage [--dict <PATH>] [--verbose]
pymorphy dict make_test_suite <XML_FILE> <OUT_FILE> [--limit <NUM>] [--verbose]
pymorphy dict meta [--dict <PATH>]
pymorphy _parse <IN_FILE> <OUT_FILE> [--dict <PATH>] [--verbose]
pymorphy -h | --help
pymorphy --version

Options:

-v --verbose                        Be more verbose
-f --force                          Overwrite target folder
-o --out <PATH>                     Output folder name [default: dict]
--limit <NUM>                       Min. number of words per gram. tag [default: 100]
--min_ending_freq <NUM>             Prediction: min. number of suffix occurances [default: 2]
--min_paradigm_popularity <NUM>     Prediction: min. number of lexemes for the paradigm [default: 3]
--max_suffix_length <NUM>           Prediction: max. length of prediction suffixes [default: 5]
--dict <PATH>                       Dictionary folder path

Low-level Utilities for OpenCorpora Dictionaries

pymorphy2.opencorpora_dict.parse is a module for OpenCorpora XML dictionaries parsing.

class pymorphy2.opencorpora_dict.parse.ParsedDictionary

ParsedDictionary(lexemes, links, grammemes, version, revision)

grammemes

Alias for field number 2

lexemes

Alias for field number 0

Alias for field number 1

revision

Alias for field number 4

version

Alias for field number 3

pymorphy2.opencorpora_dict.parse.parse_opencorpora_xml(filename)[source]

Parse OpenCorpora dict XML and return a ParsedDictionary namedtuple.

pymorphy2.opencorpora_dict.compile is a module for converting OpenCorpora dictionaries to pymorphy2 representation.

class pymorphy2.opencorpora_dict.compile.CompiledDictionary

CompiledDictionary(gramtab, suffixes, paradigms, words_dawg, prediction_suffixes_dawgs, parsed_dict, prediction_options)

gramtab

Alias for field number 0

paradigms

Alias for field number 2

parsed_dict

Alias for field number 5

prediction_options

Alias for field number 6

prediction_suffixes_dawgs

Alias for field number 4

suffixes

Alias for field number 1

words_dawg

Alias for field number 3

pymorphy2.opencorpora_dict.compile.compile_parsed_dict(parsed_dict, prediction_options=None)[source]

Return compacted dictionary data.

pymorphy2.opencorpora_dict.compile.convert_to_pymorphy2(opencorpora_dict_path, out_path, overwrite=False, prediction_options=None)[source]

Convert a dictionary from OpenCorpora XML format to Pymorphy2 compacted format.

out_path should be a name of folder where to put dictionaries.

pymorphy2.opencorpora_dict.storage is a module for saving and loading pymorphy2 dictionaries.

class pymorphy2.opencorpora_dict.storage.LoadedDictionary

LoadedDictionary(meta, gramtab, suffixes, paradigms, words, prediction_prefixes, prediction_suffixes_dawgs, Tag, paradigm_prefixes)

Tag

Alias for field number 7

gramtab

Alias for field number 1

meta

Alias for field number 0

paradigm_prefixes

Alias for field number 8

paradigms

Alias for field number 3

prediction_prefixes

Alias for field number 5

prediction_suffixes_dawgs

Alias for field number 6

suffixes

Alias for field number 2

words

Alias for field number 4

pymorphy2.opencorpora_dict.storage.load_dict(path, gramtab_format=u'opencorpora-int')[source]

Load pymorphy2 dictionary. path is a folder name with dictionary data.

pymorphy2.opencorpora_dict.storage.save_compiled_dict(compiled_dict, out_path)[source]

Save a compiled_dict to out_path out_path should be a name of folder where to put dictionaries.

Various Utilities

pymorphy2.utils.combinations_of_all_lengths(it)[source]

Return an iterable with all possible combinations of items from it:

>>> for comb in combinations_of_all_lengths('ABC'):
...     print("".join(comb))
A
B
C
AB
AC
BC
ABC
pymorphy2.utils.download_bz2(url, out_fp, chunk_size=262144, on_chunk=<function <lambda> at 0x39f6c80>)[source]

Download a bz2-encoded file from url and write it to out_fp file.

pymorphy2.utils.json_read(filename, **json_options)[source]

Read an object from a json file filename

pymorphy2.utils.json_write(filename, obj, **json_options)[source]

Create file filename with obj serialized to JSON

pymorphy2.utils.largest_group(iterable, key)[source]

Find a group of largest elements (according to key).

>>> s = [-4, 3, 5, 7, 4, -7]
>>> largest_group(s, abs)
[7, -7]
pymorphy2.utils.longest_common_substring(data)[source]

Return a longest common substring of a list of strings:

>>> longest_common_substring(["apricot", "rice", "cricket"])
'ric'
>>> longest_common_substring(["apricot", "banana"])
'a'
>>> longest_common_substring(["foo", "bar", "baz"])
''

See http://stackoverflow.com/questions/2892931/.

class pymorphy2.dawg.PredictionSuffixesDAWG(data=None)[source]

DAWG for storing prediction data.

class pymorphy2.dawg.WordsDawg(data=None)[source]

DAWG for storing words.

Project Versions

Содержание

Предыдущий раздел

История изменений

Следующий раздел

Первоначальный формат словарей (отброшенный)

На этой странице