pip_services3_expressions.tokenizers.generic package

Submodules

Module contents

class pip_services3_expressions.tokenizers.generic.CCommentState

Bases: pip_services3_expressions.tokenizers.generic.CppCommentState.CppCommentState

This state will either delegate to a comment-handling state, or return a token with just a slash in it.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Either delegate to a comment-handling state, or return a token with just a slash in it.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.CppCommentState

Bases: pip_services3_expressions.tokenizers.generic.GenericCommentState.GenericCommentState

This state will either delegate to a comment-handling state, or return a token with just a slash in it.

get_multi_line_comment(scanner: pip_services3_expressions.io.IScanner.IScanner)str

Ignore everything up to a closing star and slash, and then return the tokenizer’s next token.

Parameters

scanner

get_single_line_comment(scanner: pip_services3_expressions.io.IScanner.IScanner)str

Ignore everything up to an end-of-line and return the tokenizer’s next token.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Either delegate to a comment-handling state, or return a token with just a slash in it.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericCommentState

Bases: pip_services3_expressions.tokenizers.ICommentState.ICommentState

A CommentState object returns a comment from a scanner.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Either delegate to a comment-handling state, or return a token with just a slash in it.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericNumberState

Bases: pip_services3_expressions.tokenizers.INumberState.INumberState

A NumberState object returns a number from a scanner. This state’s idea of a number allows an optional, initial minus sign, followed by one or more digits. A decimal point and another string of digits may follow these digits.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Gets the next token from the stream started from the character linked to this state.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericQuoteState

Bases: pip_services3_expressions.tokenizers.IQuoteState.IQuoteState

A quoteState returns a quoted string token from a scanner. This state will collect characters until it sees a match to the character that the tokenizer used to switch to this state. For example, if a tokenizer uses a double-quote character to enter this state, then next_token will search for another double-quote until it finds one or finds the end of the scanner.

decode_string(value: str, quote_symbol: int) → Optional[str, None]

Decodes a string value.

Parameters
  • value – A string value to be decoded.

  • quote_symbol – A string quote character.

Returns

An decoded string.

encode_string(value: str, quote_symbol: int) → Optional[str, None]

Encodes a string value.

Parameters
  • value – A string value to be encoded.

  • quote_symbol – A string quote character.

Returns

An encoded string.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Return a quoted string token from a scanner. This method will collect characters until it sees a match to the character that the tokenizer used to switch to this state.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericSymbolState

Bases: pip_services3_expressions.tokenizers.ISymbolState.ISymbolState

The idea of a symbol is a character that stands on its own, such as an ampersand or a parenthesis. For example, when tokenizing the expression (is_ready)& (is_willing) **, a typical tokenizer would return 7 tokens, including one for each parenthesis and one for the ampersand. Thus a series of symbols such as **)&( ** becomes three tokens, while a series of letters such as **is_ready becomes a single word token.

Multi-character symbols are an exception to the rule that a symbol is a standalone character. For example, a tokenizer may want less-than-or-equals to tokenize as a single token. This class provides a method for establishing which multi-character symbols an object of this class should treat as single symbols. This allows, for example, “cat <= dog” to tokenize as three tokens, rather than splitting the less-than and equals symbols into separate tokens.

By default, this state recognizes the following multi-character symbols: !=, :-, <=, >=

add(value, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)

Add a multi-character symbol.

Parameters
  • value – The symbol to add, such as “=:=”

  • token_type

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) -> [ \f\t]*(\\\r?\n[ \f\t]*)*(#[^\r\n]*)?((([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\r?\n|(\~|\}|\|=|\||\{|\^=|\^|\]|\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\.\.\.|\.|\->|\-=|\-|, |\+=|\+|\*=|\*\*=|\*\*|\*|\)|\(|\&=|\&|%=|%|!=))|((|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)'[^\n'\\]*(?:\\.[^\n'\\]*)*'|(|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)"[^\n"\\]*(?:\\.[^\n"\\]*)*")|\w+)

Return a symbol token from a scanner.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericTokenizer

Bases: pip_services3_expressions.tokenizers.AbstractTokenizer.AbstractTokenizer

Implements a default tokenizer class.

class pip_services3_expressions.tokenizers.generic.GenericWhitespaceState

Bases: pip_services3_expressions.tokenizers.IWhitespaceState.IWhitespaceState

A whitespace state ignores whitespace (such as blanks and tabs), and returns the tokenizer’s next token. By default, all characters from 0 to 32 are whitespace.

clear_whitespace_chars()

Clears definitions of whitespace characters.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Ignore whitespace (such as blanks and tabs), and return the tokenizer’s next token.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

set_whitespace_chars(from_symbol: int, to_symbol: int, enable: bool)

Establish the given characters as whitespace to ignore.

Parameters
  • from_symbol – First character index of the interval.

  • to_symbol – Last character index of the interval.

  • enableTrue if this state should ignore characters in the given range.

class pip_services3_expressions.tokenizers.generic.GenericWordState

Bases: pip_services3_expressions.tokenizers.IWordState.IWordState

A wordState returns a word from a scanner. Like other states, a tokenizer transfers the job of reading to this state, depending on an initial character. Thus, the tokenizer decides which characters may begin a word, and this state determines which characters may appear as a second or later character in a word. These are typically different sets of characters; in particular, it is typical for digits to appear as parts of a word, but not as the initial character of a word.

By default, the following characters may appear in a word. The method set_word_chars() allows customizing this.

From

To

‘a’

‘z’

‘A’

‘Z’

‘0’

‘9’

as well as: minus sign, underscore, and apostrophe.

clear_word_chars()

Clears definitions of word chars.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer)pip_services3_expressions.tokenizers.Token.Token

Ignore word (such as blanks and tabs), and return the tokenizer’s next token.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

set_word_chars(from_symbol: int, to_symbol: int, enable: bool)

Establish characters in the given range as valid characters for part of a word after the first character. Note that the tokenizer must determine which characters are valid as the beginning character of a word.

Parameters
  • from_symbol – First character index of the interval.

  • to_symbol – Last character index of the interval.

  • enableTrue if this state should use characters in the given range.

class pip_services3_expressions.tokenizers.generic.SymbolNode(parent: Optional[SymbolNode, None], character: int)

Bases: object

A SymbolNode object is a member of a tree that contains all possible prefixes of allowable symbols. Multi-character symbols appear in a SymbolNode tree with one node for each character.

For example, the symbol =:~ will appear in a tree as three nodes. The first node contains an equals sign, and has a child; that child contains a colon and has a child; this third child contains a tilde, and has no children of its own. If the colon node had another child for a dollar sign character, then the tree would contain the symbol =:$.

A tree of SymbolNode objects collaborate to read a (potentially multi-character) symbol from an input stream. A root node with no character of its own finds an initial node that represents the first character in the input. This node looks to see if the next character in the stream matches one of its children. If so, the node delegates its reading task to its child. This approach walks down the tree, pulling symbols from the input that match the path down the tree.

When a node does not have a child that matches the next character, we will have read the longest possible symbol prefix. This prefix may or may not be a valid symbol. Consider a tree that has had =:~ added and has not had =: added. In this tree, of the three nodes that contain =:~, only the first and third contain complete symbols. If, say, the input contains =:a, the colon node will not have a child that matches the ‘a’ and so it will stop reading. The colon node has to “unread”: it must push back its character, and ask its parent to unread. Unreading continues until it reaches an ancestor that represents a valid symbol.

add_descendant_line(value: str, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)

Add a line of descendants that represent the characters in the given string.

Parameters
  • value

  • token_type

ancestry()str

Show the symbol this node represents.

Returns

The symbol this node represents.

deepest_read(scanner: pip_services3_expressions.io.IScanner.IScanner)pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode

Find the descendant that takes as many characters as possible from the input.

Parameters

scanner

ensure_child_with_char(value: int)

Find or create a child for the given character.

find_child_with_char(value)pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode

Find a child with the given character.

Parameters

value

property token_type
unread_to_valid(scanner: pip_services3_expressions.io.IScanner.IScanner)pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode

Unwind to a valid node; this node is “valid” if its ancestry represents a complete symbol. If this node is not valid, put back the character and ask the parent to unwind.

Parameters

scanner

property valid
class pip_services3_expressions.tokenizers.generic.SymbolRootNode

Bases: pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode

This class is a special case of a SymbolNode. A SymbolRootNode object has no symbol of its own, but has children that represent all possible symbols.

add(value: str, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)
next_token(scanner: pip_services3_expressions.io.IScanner.IScanner)pip_services3_expressions.tokenizers.Token.Token

Return a symbol string from a scanner.

Parameters

scanner – A scanner to read from

Returns

A symbol string from a scanner