pip_services3_expressions.tokenizers.generic.GenericSymbolState module

class pip_services3_expressions.tokenizers.generic.GenericSymbolState.GenericSymbolState

Bases: pip_services3_expressions.tokenizers.ISymbolState.ISymbolState

The idea of a symbol is a character that stands on its own, such as an ampersand or a parenthesis. For example, when tokenizing the expression (is_ready)& (is_willing) **, a typical tokenizer would return 7 tokens, including one for each parenthesis and one for the ampersand. Thus a series of symbols such as **)&( ** becomes three tokens, while a series of letters such as **is_ready becomes a single word token.

Multi-character symbols are an exception to the rule that a symbol is a standalone character. For example, a tokenizer may want less-than-or-equals to tokenize as a single token. This class provides a method for establishing which multi-character symbols an object of this class should treat as single symbols. This allows, for example, “cat <= dog” to tokenize as three tokens, rather than splitting the less-than and equals symbols into separate tokens.

By default, this state recognizes the following multi-character symbols: !=, :-, <=, >=

add(value, token_type: pip_services3_expressions.tokenizers.TokenType.TokenType)

Add a multi-character symbol.

Parameters
  • value – The symbol to add, such as “=:=”

  • token_type

next_token(scanner: pip_services3_expressions.io.IScanner.IScanner, tokenizer: pip_services3_expressions.tokenizers.ITokenizer.ITokenizer) -> [ \f\t]*(\\\r?\n[ \f\t]*)*(#[^\r\n]*)?((([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?|\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\r?\n|(\~|\}|\|=|\||\{|\^=|\^|\]|\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\.\.\.|\.|\->|\-=|\-|, |\+=|\+|\*=|\*\*=|\*\*|\*|\)|\(|\&=|\&|%=|%|!=))|((|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)'[^\n'\\]*(?:\\.[^\n'\\]*)*'|(|B|R|BR|rB|FR|u|Br|f|fR|br|Rf|b|U|bR|RB|rf|RF|Fr|rF|Rb|fr|r|rb|F)"[^\n"\\]*(?:\\.[^\n"\\]*)*")|\w+)

Return a symbol token from a scanner.

Parameters
  • scanner – A textual string to be tokenized.

  • tokenizer – A tokenizer class that controls the process.

Returns

The next token from the top of the stream.