Source code for pymorphy2.tokenizers

# -*- coding: utf-8 -*-
import re
GROUPING_SPACE_REGEX = re.compile('([^\w_-]|[+])', re.U)

[docs]def simple_word_tokenize(text):
    """
    Split text into tokens. Don't split by hyphen.
    """
    return [t for t in GROUPING_SPACE_REGEX.split(text)
            if t and not t.isspace()]

Read the Docs v: 0.5

Versions: latest; 0.5; 0.4; 0.3.5; 0.3.4; 0.3.3; 0.3.2; 0.3.1; 0.3; 0.2; 0.1

Downloads

On Read the Docs: Project Home; Builds

Free document hosting provided by Read the Docs.