我试图为PostgreSQL SQL定义lexer规则。
OP_MINUS: '-' ! ( '-' ) .
这里是PostgreSQL运算符的原始定义:
The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:
+ - * / < > = ~ ! @ # % ^ & | ` ?
There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.
A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:
~ ! @ # % ^ & | ` ?
For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.
您可以在lexer规则中使用语义谓词来执行lookahead(或behind)而不使用字符。例如,下面介绍了运算符的几个规则。
OPERATOR
: ( [+*<>=~!@#%^&|`?]
| '-' {_input.LA(1) != '-'}?
| '/' {_input.LA(1) != '*'}?
)+
;
但是,上面的规则没有解决在运算符末尾包含+
或-
的限制。为了以最简单的方式处理这一问题,我可能会将这两种情况分成不同的规则。
// this rule does not allow + or - at the end of a rule
OPERATOR
: ( [*<>=~!@#%^&|`?]
| ( '+'
| '-' {_input.LA(1) != '-'}?
)+
[*<>=~!@#%^&|`?]
| '/' {_input.LA(1) != '*'}?
)+
;
// this rule allows + or - at the end of a rule and sets the type to OPERATOR
// it requires a character from the special subset to appear
OPERATOR2
: ( [*<>=+]
| '-' {_input.LA(1) != '-'}?
| '/' {_input.LA(1) != '*'}?
)*
[~!@#%^&|`?]
OPERATOR?
( '+'
| '-' {_input.LA(1) != '-'}?
)+
-> type(OPERATOR)
;
如何实现这些模式?
关于antlr4的几个问题使用了书中没有提到的lexer谓词,例如28730446使用了head(String),42058127使用了getCharPositionInLine(),23465358使用了_input.la(1)等。是否有可用的lexer谓词列表及其文档?
可能在内部使用的代码将在规则之后被取消,如下所示: ANTLR4就是这样做事的吗?
我刚刚开始学习ANTLR4 lexer规则。我的目标是为Java属性文件创建一个简单的语法。以下是我目前掌握的信息:
我可以在parser中而不是在lexer中定义范围吗?
是否有方法为模式中捕获的所有字符返回一个字符串类型的标记,包括导致进入模式的字符? 模式何时结束? 我知道我也可以像这样编写字符串标记: