Source code for pymorphy2.tokenizers

# -*- coding: utf-8 -*-
import re
GROUPING_SPACE_REGEX = re.compile('([^\w_-]|[+])', re.U)

[docs]def simple_word_tokenize(text): """ Split text into tokens. Don't split by hyphen. """ return [t for t in GROUPING_SPACE_REGEX.split(text) if t and not t.isspace()]
Read the Docs v: 0.5
Versions
latest
0.5
0.4
0.3.5
0.3.4
0.3.3
0.3.2
0.3.1
0.3
0.2
0.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.