pip_services3_expressions.tokenizers.generic package

Submodules

Module contents

class pip_services3_expressions.tokenizers.generic.CCommentState

Bases: pip_services3_expressions.tokenizers.generic.CppCommentState.CppCommentState

This state will either delegate to a comment-handling state, or return a token with just a slash in it.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token 

Either delegate to a comment-handling state, or return a token with just a slash in it.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.CppCommentState

Bases: pip_services3_expressions.tokenizers.generic.GenericCommentState.GenericCommentState

This state will either delegate to a comment-handling state, or return a token with just a slash in it.

get_multi_line_comment(scanner: pip_services3_expressions.io.IScanner.IScanner) → str 

Ignore everything up to a closing star and slash, and then return the tokenizer’s next token.

Parameters: scanner –

get_single_line_comment(scanner: pip_services3_expressions.io.IScanner.IScanner) → str : Ignore everything up to an end-of-line and return the tokenizer’s next token.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token 

Either delegate to a comment-handling state, or return a token with just a slash in it.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericCommentState

Bases: pip_services3_expressions.tokenizers.ICommentState.ICommentState

A CommentState object returns a comment from a scanner.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token 

Either delegate to a comment-handling state, or return a token with just a slash in it.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericNumberState

Bases: pip_services3_expressions.tokenizers.INumberState.INumberState

A NumberState object returns a number from a scanner. This state’s idea of a number allows an optional, initial minus sign, followed by one or more digits. A decimal point and another string of digits may follow these digits.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token 

Gets the next token from the stream started from the character linked to this state.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericQuoteState

Bases: pip_services3_expressions.tokenizers.IQuoteState.IQuoteState

A quoteState returns a quoted string token from a scanner. This state will collect characters until it sees a match to the character that the tokenizer used to switch to this state. For example, if a tokenizer uses a double-quote character to enter this state, then next_token will search for another double-quote until it finds one or finds the end of the scanner.

decode_string(value: str, quote_symbol: int) → Optional[str, None]

Decodes a string value.

Parameters

value – A string value to be decoded.
quote_symbol – A string quote character.

Returns

An decoded string.

encode_string(value: str, quote_symbol: int) → Optional[str, None]

Encodes a string value.

Parameters

value – A string value to be encoded.
quote_symbol – A string quote character.

Returns

An encoded string.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token 

Return a quoted string token from a scanner. This method will collect characters until it sees a match to the character that the tokenizer used to switch to this state.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericSymbolState

Bases: pip_services3_expressions.tokenizers.ISymbolState.ISymbolState

The idea of a symbol is a character that stands on its own, such as an ampersand or a parenthesis. For example, when tokenizing the expression (is_ready)& (is_willing) **, a typical tokenizer would return 7 tokens, including one for each parenthesis and one for the ampersand. Thus a series of symbols such as **)&( ** becomes three tokens, while a series of letters such as **is_ready becomes a single word token.

Multi-character symbols are an exception to the rule that a symbol is a standalone character. For example, a tokenizer may want less-than-or-equals to tokenize as a single token. This class provides a method for establishing which multi-character symbols an object of this class should treat as single symbols. This allows, for example, “cat <= dog” to tokenize as three tokens, rather than splitting the less-than and equals symbols into separate tokens.

By default, this state recognizes the following multi-character symbols: !=, :-, <=, >=

add(value, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)

Add a multi-character symbol.

Parameters

value – The symbol to add, such as “=:=”
token_type –

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) -> [ \f\t]*(\\\r?\n[ \f\t]*)*(#[^\r\n]*)?((([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\r?\n|(\~|\}|\|=|\||\{|\^=|\^|\]|\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\.\.\.|\.|\->|\-=|\-|, |\+=|\+|\*=|\*\*=|\*\*|\*|\)|\(|\&=|\&|%=|%|!=))|((|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)'[^\n'\\]*(?:\\.[^\n'\\]*)*'|(|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)"[^\n"\\]*(?:\\.[^\n"\\]*)*")|\w+)

Return a symbol token from a scanner.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

class pip_services3_expressions.tokenizers.generic.GenericTokenizer

Bases: pip_services3_expressions.tokenizers.AbstractTokenizer.AbstractTokenizer

Implements a default tokenizer class.

class pip_services3_expressions.tokenizers.generic.GenericWhitespaceState

Bases: pip_services3_expressions.tokenizers.IWhitespaceState.IWhitespaceState

A whitespace state ignores whitespace (such as blanks and tabs), and returns the tokenizer’s next token. By default, all characters from 0 to 32 are whitespace.

clear_whitespace_chars(): Clears definitions of whitespace characters.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token 

Ignore whitespace (such as blanks and tabs), and return the tokenizer’s next token.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

set_whitespace_chars(from_symbol: int, to_symbol: int, enable: bool)

Establish the given characters as whitespace to ignore.

Parameters

from_symbol – First character index of the interval.
to_symbol – Last character index of the interval.
enable – True if this state should ignore characters in the given range.

class pip_services3_expressions.tokenizers.generic.GenericWordState

Bases: pip_services3_expressions.tokenizers.IWordState.IWordState

A wordState returns a word from a scanner. Like other states, a tokenizer transfers the job of reading to this state, depending on an initial character. Thus, the tokenizer decides which characters may begin a word, and this state determines which characters may appear as a second or later character in a word. These are typically different sets of characters; in particular, it is typical for digits to appear as parts of a word, but not as the initial character of a word.

By default, the following characters may appear in a word. The method set_word_chars() allows customizing this.

From	To
‘a’	‘z’
‘A’	‘Z’
‘0’	‘9’

as well as: minus sign, underscore, and apostrophe.

clear_word_chars(): Clears definitions of word chars.

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token 

Ignore word (such as blanks and tabs), and return the tokenizer’s next token.

Parameters

scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.

set_word_chars(from_symbol: int, to_symbol: int, enable: bool)

Establish characters in the given range as valid characters for part of a word after the first character. Note that the tokenizer must determine which characters are valid as the beginning character of a word.

Parameters

from_symbol – First character index of the interval.
to_symbol – Last character index of the interval.
enable – True if this state should use characters in the given range.

class pip_services3_expressions.tokenizers.generic.SymbolNode(parent: Optional[SymbolNode, None], character: int)

Bases: object

A SymbolNode object is a member of a tree that contains all possible prefixes of allowable symbols. Multi-character symbols appear in a SymbolNode tree with one node for each character.

For example, the symbol =:~ will appear in a tree as three nodes. The first node contains an equals sign, and has a child; that child contains a colon and has a child; this third child contains a tilde, and has no children of its own. If the colon node had another child for a dollar sign character, then the tree would contain the symbol =:$.

A tree of SymbolNode objects collaborate to read a (potentially multi-character) symbol from an input stream. A root node with no character of its own finds an initial node that represents the first character in the input. This node looks to see if the next character in the stream matches one of its children. If so, the node delegates its reading task to its child. This approach walks down the tree, pulling symbols from the input that match the path down the tree.

When a node does not have a child that matches the next character, we will have read the longest possible symbol prefix. This prefix may or may not be a valid symbol. Consider a tree that has had =:~ added and has not had =: added. In this tree, of the three nodes that contain =:~, only the first and third contain complete symbols. If, say, the input contains =:a, the colon node will not have a child that matches the ‘a’ and so it will stop reading. The colon node has to “unread”: it must push back its character, and ask its parent to unread. Unreading continues until it reaches an ancestor that represents a valid symbol.

add_descendant_line(value: str, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)

Add a line of descendants that represent the characters in the given string.

Parameters

value –
token_type –

ancestry() → str 

Show the symbol this node represents.

Returns: The symbol this node represents.

deepest_read(scanner: pip_services3_expressions.io.IScanner.IScanner) → pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode 

Find the descendant that takes as many characters as possible from the input.

Parameters: scanner –

ensure_child_with_char(value: int): Find or create a child for the given character.

find_child_with_char(value) → pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode 

Find a child with the given character.

Parameters: value –

property token_type

unread_to_valid(scanner: pip_services3_expressions.io.IScanner.IScanner) → pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode 

Unwind to a valid node; this node is “valid” if its ancestry represents a complete symbol. If this node is not valid, put back the character and ask the parent to unwind.

Parameters: scanner –

property valid

class pip_services3_expressions.tokenizers.generic.SymbolRootNode

Bases: pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode

This class is a special case of a SymbolNode. A SymbolRootNode object has no symbol of its own, but has children that represent all possible symbols.

add(value: str, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner) → pip_services3_expressions.tokenizers.Token.Token 

Return a symbol string from a scanner.

Parameters: scanner – A scanner to read from
Returns: A symbol string from a scanner