pip_services3_expressions.tokenizers.generic.GenericWordState module

class pip_services3_expressions.tokenizers.generic.GenericWordState.GenericWordState

Bases: pip_services3_expressions.tokenizers.IWordState.IWordState

A wordState returns a word from a scanner. Like other states, a tokenizer transfers the job of reading to this state, depending on an initial character. Thus, the tokenizer decides which characters may begin a word, and this state determines which characters may appear as a second or later character in a word. These are typically different sets of characters; in particular, it is typical for digits to appear as parts of a word, but not as the initial character of a word.

By default, the following characters may appear in a word. The method set_word_chars() allows customizing this.

From

To

‘a’

‘z’

‘A’

‘Z’

‘0’

‘9’

as well as: minus sign, underscore, and apostrophe.

clear_word_chars()

Clears definitions of word chars.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Ignore word (such as blanks and tabs), and return the tokenizer’s next token.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

set_word_chars(from_symbol: int, to_symbol: int, enable: bool)

Establish characters in the given range as valid characters for part of a word after the first character. Note that the tokenizer must determine which characters are valid as the beginning character of a word.

Parameters
  • from_symbol – First character index of the interval.

  • to_symbol – Last character index of the interval.

  • enableTrue if this state should use characters in the given range.