pip_services3_expressions.tokenizers.generic package
Submodules
- pip_services3_expressions.tokenizers.generic.CCommentState module
- pip_services3_expressions.tokenizers.generic.CppCommentState module
- pip_services3_expressions.tokenizers.generic.GenericCommentState module
- pip_services3_expressions.tokenizers.generic.GenericNumberState module
- pip_services3_expressions.tokenizers.generic.GenericQuoteState module
- pip_services3_expressions.tokenizers.generic.GenericSymbolState module
- pip_services3_expressions.tokenizers.generic.GenericTokenizer module
- pip_services3_expressions.tokenizers.generic.GenericWhitespaceState module
- pip_services3_expressions.tokenizers.generic.GenericWordState module
- pip_services3_expressions.tokenizers.generic.SymbolNode module
- pip_services3_expressions.tokenizers.generic.SymbolRootNode module
Module contents
-
class
pip_services3_expressions.tokenizers.generic.
CCommentState
Bases:
pip_services3_expressions.tokenizers.generic.CppCommentState.CppCommentState
This state will either delegate to a comment-handling state, or return a token with just a slash in it.
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token Either delegate to a comment-handling state, or return a token with just a slash in it.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
-
class
pip_services3_expressions.tokenizers.generic.
CppCommentState
Bases:
pip_services3_expressions.tokenizers.generic.GenericCommentState.GenericCommentState
This state will either delegate to a comment-handling state, or return a token with just a slash in it.
-
get_multi_line_comment
(scanner: pip_services3_expressions.io.IScanner.IScanner) → str Ignore everything up to a closing star and slash, and then return the tokenizer’s next token.
- Parameters
scanner –
-
get_single_line_comment
(scanner: pip_services3_expressions.io.IScanner.IScanner) → str Ignore everything up to an end-of-line and return the tokenizer’s next token.
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token Either delegate to a comment-handling state, or return a token with just a slash in it.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
-
class
pip_services3_expressions.tokenizers.generic.
GenericCommentState
Bases:
pip_services3_expressions.tokenizers.ICommentState.ICommentState
A CommentState object returns a comment from a scanner.
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token Either delegate to a comment-handling state, or return a token with just a slash in it.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
-
class
pip_services3_expressions.tokenizers.generic.
GenericNumberState
Bases:
pip_services3_expressions.tokenizers.INumberState.INumberState
A NumberState object returns a number from a scanner. This state’s idea of a number allows an optional, initial minus sign, followed by one or more digits. A decimal point and another string of digits may follow these digits.
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token Gets the next token from the stream started from the character linked to this state.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
-
class
pip_services3_expressions.tokenizers.generic.
GenericQuoteState
Bases:
pip_services3_expressions.tokenizers.IQuoteState.IQuoteState
A quoteState returns a quoted string token from a scanner. This state will collect characters until it sees a match to the character that the tokenizer used to switch to this state. For example, if a tokenizer uses a double-quote character to enter this state, then
next_token
will search for another double-quote until it finds one or finds the end of the scanner.-
decode_string
(value: str, quote_symbol: int) → Optional[str, None] Decodes a string value.
- Parameters
value – A string value to be decoded.
quote_symbol – A string quote character.
- Returns
An decoded string.
-
encode_string
(value: str, quote_symbol: int) → Optional[str, None] Encodes a string value.
- Parameters
value – A string value to be encoded.
quote_symbol – A string quote character.
- Returns
An encoded string.
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token Return a quoted string token from a scanner. This method will collect characters until it sees a match to the character that the tokenizer used to switch to this state.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
-
class
pip_services3_expressions.tokenizers.generic.
GenericSymbolState
Bases:
pip_services3_expressions.tokenizers.ISymbolState.ISymbolState
The idea of a symbol is a character that stands on its own, such as an ampersand or a parenthesis. For example, when tokenizing the expression (is_ready)& (is_willing) **, a typical tokenizer would return 7 tokens, including one for each parenthesis and one for the ampersand. Thus a series of symbols such as **)&( ** becomes three tokens, while a series of letters such as **is_ready becomes a single word token.
Multi-character symbols are an exception to the rule that a symbol is a standalone character. For example, a tokenizer may want less-than-or-equals to tokenize as a single token. This class provides a method for establishing which multi-character symbols an object of this class should treat as single symbols. This allows, for example, “cat <= dog” to tokenize as three tokens, rather than splitting the less-than and equals symbols into separate tokens.
By default, this state recognizes the following multi-character symbols: !=, :-, <=, >=
-
add
(value, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType) Add a multi-character symbol.
- Parameters
value – The symbol to add, such as “=:=”
token_type –
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) -> [ \f\t]*(\\\r?\n[ \f\t]*)*(#[^\r\n]*)?((([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\r?\n|(\~|\}|\|=|\||\{|\^=|\^|\]|\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\.\.\.|\.|\->|\-=|\-|, |\+=|\+|\*=|\*\*=|\*\*|\*|\)|\(|\&=|\&|%=|%|!=))|((|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)'[^\n'\\]*(?:\\.[^\n'\\]*)*'|(|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)"[^\n"\\]*(?:\\.[^\n"\\]*)*")|\w+) Return a symbol token from a scanner.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
-
class
pip_services3_expressions.tokenizers.generic.
GenericTokenizer
Bases:
pip_services3_expressions.tokenizers.AbstractTokenizer.AbstractTokenizer
Implements a default tokenizer class.
-
class
pip_services3_expressions.tokenizers.generic.
GenericWhitespaceState
Bases:
pip_services3_expressions.tokenizers.IWhitespaceState.IWhitespaceState
A whitespace state ignores whitespace (such as blanks and tabs), and returns the tokenizer’s next token. By default, all characters from 0 to 32 are whitespace.
-
clear_whitespace_chars
() Clears definitions of whitespace characters.
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token Ignore whitespace (such as blanks and tabs), and return the tokenizer’s next token.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
set_whitespace_chars
(from_symbol: int, to_symbol: int, enable: bool) Establish the given characters as whitespace to ignore.
- Parameters
from_symbol – First character index of the interval.
to_symbol – Last character index of the interval.
enable – True if this state should ignore characters in the given range.
-
-
class
pip_services3_expressions.tokenizers.generic.
GenericWordState
Bases:
pip_services3_expressions.tokenizers.IWordState.IWordState
A wordState returns a word from a scanner. Like other states, a tokenizer transfers the job of reading to this state, depending on an initial character. Thus, the tokenizer decides which characters may begin a word, and this state determines which characters may appear as a second or later character in a word. These are typically different sets of characters; in particular, it is typical for digits to appear as parts of a word, but not as the initial character of a word.
By default, the following characters may appear in a word. The method
set_word_chars()
allows customizing this.From
To
‘a’
‘z’
‘A’
‘Z’
‘0’
‘9’
as well as: minus sign, underscore, and apostrophe.
-
clear_word_chars
() Clears definitions of word chars.
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) → pip_services3_expressions.tokenizers.Token.Token Ignore word (such as blanks and tabs), and return the tokenizer’s next token.
- Parameters
scanner – A textual string to be tokenized.
tokenizer – A tokenizer class that controls the process.
- Returns
The next token from the top of the stream.
-
set_word_chars
(from_symbol: int, to_symbol: int, enable: bool) Establish characters in the given range as valid characters for part of a word after the first character. Note that the tokenizer must determine which characters are valid as the beginning character of a word.
- Parameters
from_symbol – First character index of the interval.
to_symbol – Last character index of the interval.
enable – True if this state should use characters in the given range.
-
-
class
pip_services3_expressions.tokenizers.generic.
SymbolNode
(parent: Optional[SymbolNode, None], character: int) Bases:
object
A SymbolNode object is a member of a tree that contains all possible prefixes of allowable symbols. Multi-character symbols appear in a SymbolNode tree with one node for each character.
For example, the symbol =:~ will appear in a tree as three nodes. The first node contains an equals sign, and has a child; that child contains a colon and has a child; this third child contains a tilde, and has no children of its own. If the colon node had another child for a dollar sign character, then the tree would contain the symbol =:$.
A tree of SymbolNode objects collaborate to read a (potentially multi-character) symbol from an input stream. A root node with no character of its own finds an initial node that represents the first character in the input. This node looks to see if the next character in the stream matches one of its children. If so, the node delegates its reading task to its child. This approach walks down the tree, pulling symbols from the input that match the path down the tree.
When a node does not have a child that matches the next character, we will have read the longest possible symbol prefix. This prefix may or may not be a valid symbol. Consider a tree that has had =:~ added and has not had =: added. In this tree, of the three nodes that contain =:~, only the first and third contain complete symbols. If, say, the input contains =:a, the colon node will not have a child that matches the ‘a’ and so it will stop reading. The colon node has to “unread”: it must push back its character, and ask its parent to unread. Unreading continues until it reaches an ancestor that represents a valid symbol.
-
add_descendant_line
(value: str, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType) Add a line of descendants that represent the characters in the given string.
- Parameters
value –
token_type –
-
deepest_read
(scanner: pip_services3_expressions.io.IScanner.IScanner) → pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode Find the descendant that takes as many characters as possible from the input.
- Parameters
scanner –
-
find_child_with_char
(value) → pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode Find a child with the given character.
- Parameters
value –
-
property
token_type
-
unread_to_valid
(scanner: pip_services3_expressions.io.IScanner.IScanner) → pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode Unwind to a valid node; this node is “valid” if its ancestry represents a complete symbol. If this node is not valid, put back the character and ask the parent to unwind.
- Parameters
scanner –
-
property
valid
-
-
class
pip_services3_expressions.tokenizers.generic.
SymbolRootNode
Bases:
pip_services3_expressions.tokenizers.generic.SymbolNode.SymbolNode
This class is a special case of a SymbolNode. A SymbolRootNode object has no symbol of its own, but has children that represent all possible symbols.
-
add
(value: str, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)
-
next_token
(scanner: pip_services3_expressions.io.IScanner.IScanner) → pip_services3_expressions.tokenizers.Token.Token Return a symbol string from a scanner.
- Parameters
scanner – A scanner to read from
- Returns
A symbol string from a scanner
-