当前位置: 首页 > 工具软件 > Jison > 使用案例 >

从零开始写一个Jison解析器(5/10):Jison解析器语法工作原理简述

笪昌翰
2023-12-01

从零开始写一个Jison解析器(5/10):Jison解析器语法工作原理简述

  1. 从零开始写一个Jison解析器(1/10):Jison,不是Json
  2. 从零开始写一个Jison解析器(2/10):学习解析器生成器parser generator的正确姿势
  3. 从零开始写一个Jison解析器(3/10):良好的开端是成功的一半——《政治学》 (亚里士多德)
  4. 从零开始写一个Jison解析器(4/10):Jison解析器生成器语法格式详解
  5. 从零开始写一个Jison解析器(5/10):Jison解析器语法工作原理简述

本例讲解的 Jison 解析器生成器的代码

在前一节中讲解了本例中的Jison解析器生成器的语法格式。

%lex

%%

[^\n\r]+    { return 'LINE'; }
[\n\r]+     { return 'EOL'; }
<<EOF>>     { return 'EOF'; }

/lex

%%

p
    : ll EOF
        { console.log($1); }
    ;

ll
    : ll l
        { $$ = $1 + $2; }
    | l
        { $$ = $1; }
    ;

l
    : LINE EOL
        { $$ = $1 + $2; }
    ;

如前所述,Jison并不是直接运行这段代码,而是将这段代码编译成解析器,本文的最后可以看到生成的解析器代码。

直接看解析器代码会发现远远比解析器生成器的代码复杂,因此在学习和理解Jison解析器生成器的工作原理时,分析上面的Jison解析器生成器的代码即可,由于Jison解析器的代码是严格遵循Jison解析器生成器的代码生成,因此在实际应用中,几乎不需要阅读Jison解析器的代码。

Jison 词法分析 lexical analysis 工作原理

上文讲到Jison解析器生成器的代码包含词法分析 lexical analysis 和语义分析 semantic analyse 两部分,Jison解析器在工作时也是先进行词法分析 lexical analysis ,然后再进行语义分析 semantic analyse,先看词法分析 lexical analysis 部分。

本例中词法分析 lexical analysis的代码为 %lex/lex 的部分

%lex

%%

[^\n\r]+    { return 'LINE'; }
[\n\r]+     { return 'EOL'; }
<<EOF>>     { return 'EOF'; }

由于本例为按行解析,非常简单,因此从目标数据中截取前几行为例讲解,截取数据如下。

com.apple.xpc.launchd.domain.system = {
	type = system
	handle = 0
	active count = 648

第1个字母是 c,匹配 [^\n\r]+,不匹配 [\n\r]+,因此第一个标记 tokenLINE,标记 token 在匹配时遵循贪婪模式,因此第一个标记 LINE 一直匹配到行末,也就是 com.apple.xpc.launchd.domain.system = {,此时未解析的内容为

 
	type = system
	handle = 0
	active count = 648

注意这里第一行没有内容,并非笔误,而是 LINE 的正则表达式 [^\n\r]+ 并没有匹配第一行的换行符 \r,因此第一行还剩下换行符 \r,看上去未匹配的内容中第一行就是空行。

接下来未匹配内容的第一个字符是 \r,对照词法分析 lexical analysis的代码可知匹配 [\n\r]+,因此第二个标记 tokenEOL。此时已经匹配的标记是

LINE EOL

这两个标记匹配的字符串为

"com.apple.xpc.launchd.domain.system = {" "\r"

这里为了展现换行符,所以使用了引号和转义字符的形式。未匹配的内容是

	type = system
	handle = 0
	active count = 648

由于换行符 \r 已经匹配了,因此未匹配的内容的第一行不再是空行,而是原始数据的第二行。接下来的匹配过程和前面类似,四行全部匹配后的标记为。

LINE EOL
LINE EOL
LINE EOL
LINE EOL

注意这里的空格和分行是为了便于理解,匹配的标记中并不包含空格和换行符,这些标记匹配的字符串为

"com.apple.xpc.launchd.domain.system = {" "\r"
"	type = system" "\r"
"	handle = 0" "\r"
"	active count = 648" "\r"

这里为了展现换行符,所以使用了引号和转义字符的形式。从对应关系可以看出,最后没有未匹配的内容了,但是词法分析并没有结束,因为还有一条规则

<<EOF>>     { return 'EOF'; }

文件结束位置匹配 EOF,从这条规则还可以看出词法分析 lexical analysis 的匹配语法基于正则表达式,并增加相应扩展,此处的 <<EOF>>,即为扩展的匹配语法。词法分析 lexical analysis 完成后的结果为

LINE EOL
LINE EOL
LINE EOL
LINE EOL
EOF

注意这里的空格和分行是为了便于理解,匹配的标记中并不包含空格和换行符。类似的,完整的数据的匹配结果是

LINE EOL
LINE EOL

......

LINE EOL
LINE EOL
EOF

中间匹配的标记 token 重复。

Jison 语义分析 semantic analyse 工作原理

Jison在词法分析 lexical analysis 完成后就是语义分析 semantic analyse 了,本例中语义分析 semantic analyse 的代码为 /lex 之后的部分

/lex

%%

p
    : ll EOF
        { console.log($1); }
    ;

ll
    : ll l
        { $$ = $1 + $2; }
    | l
        { $$ = $1; }
    ;

l
    : LINE EOL
        { $$ = $1 + $2; }
    ;

由于本例为按行解析,非常简单,仍旧使用前面词法分析 lexical analysis 完成后的结果

LINE EOL
LINE EOL
LINE EOL
LINE EOL
EOF

前两个标记 tokenLINEEOL,对照语义分析 semantic analyse 的代码可知匹配

l
    : LINE EOL
    ;

匹配后的结果为

l
	LINE EOL
LINE EOL
LINE EOL
LINE EOL
EOF

为便于展示,本文使用缩进来表现语义分析 semantic analyse 中的匹配关系。

匹配得到的 l,对照语义分析 semantic analyse 的代码可知匹配 ll 中的第二条规则。

ll
    : ll l
    | l
    ;

匹配后的结果为

ll
	l
		LINE EOL
LINE EOL
LINE EOL
LINE EOL
EOF

此时第一个标记 tokenll,语义分析 semantic analyse 中没有匹配 ll 的规则,前两个标记是 llLINE,语义分析 semantic analyse 中也没有匹配的规则,前三个标记是 llLINEEOL,语义分析 semantic analyse 中也没有匹配的规则。

尽管在语义分析 semantic analyse 中没有直接定义匹配前三个标记 llLINEEOL 的规则,却可以通过如下方式组合

ll
	ll
	l
		LINE EOL

也就是将第二行的 LINEEOL 组合成 l,然后和第一行的 ll 匹配 lll 组合成 ll,匹配后的结果为

ll
	ll
		l
			LINE EOL
	l
		LINE EOL

未匹配的标记 token

LINE EOL
LINE EOL
EOF

由于词法分析 lexical analysis是逐个匹配标记 token,因此在语义分析 semantic analyse 阶段往往会误认为也是先按照规则匹配标记 token,例如本例可能被认为先匹配成

l
l
l
l
EOF

然后再进一步匹配。这么理解的误区在于可能有多种组合方式,例如进一步匹配成

ll
	l
ll
	l
ll
	l
ll
	l
EOF

接下来就没有匹配的规则了。而从头匹配的话,将是

ll
	ll
		l
			LINE EOL
	l
		LINE EOL
LINE EOL
LINE EOL
EOF

然后再次匹配 lll 组合

ll
	ll
		ll
			l
				LINE EOL
		l
			LINE EOL
	l
		LINE EOL
LINE EOL
EOF

继续匹配 lll 组合

ll
	ll
		ll
			ll
				l
					LINE EOL
			l
				LINE EOL
		l
			LINE EOL
	l
		LINE EOL
EOF

最后匹配 llEOF 组合成 p 的规则

p
    : ll EOF
    ;

结果为

p
	ll
		ll
			ll
				ll
					l
						LINE EOL
				l
					LINE EOL
			l
				LINE EOL
		l
			LINE EOL
	EOF

对照前面就可以理解从头匹配标记 token 的作用了。

这里的一串 ll 即为前一篇中讲解的递归规则 recursive rule

Jison 语义分析 semantic analyse 中动作 action 工作原理

前面讲解语义分析 semantic analyse 中忽略了动作 action,在 Jison 中动作 action 依附于语义分析 semantic analyse,理解了语义分析 semantic analyse 的工作原理后,带入动作即可。

为区分起见,对词法分析 lexical analysis 结果中的标记 LINE 编号,即为

LINE1 EOL
LINE2 EOL
LINE3 EOL
LINE4 EOL
EOF

类似的对语义分析 semantic analyse 匹配的结果也编号

p
	ll4
		ll3
			ll2
				ll1
					l1
						LINE1 EOL
				l2
					LINE2 EOL
			l3
				LINE3 EOL
		l4
			LINE4 EOL
	EOF

语法规则 grammar rulesl 的动作为

l
    : LINE EOL
        { $$ = $1 + $2; }
    ;

带入可以得到

l1 = LINE1 + EOL
l2 = LINE2 + EOL
l3 = LINE3 + EOL
l4 = LINE4 + EOL

语法规则 grammar rulesll 有两个规则,各自有对应的动作

ll
    : ll l
        { $$ = $1 + $2; }
    | l
        { $$ = $1; }
    ;

匹配哪个规则就执行对应的动作 action,例如 ll1 匹配规则 l,执行 { $$ = $1; },即为

ll1 = l1
  	= LINE1 + EOL

ll2 匹配规则 lll 组合,执行 { $$ = $1 + $2; },即为

ll2 = ll1 +					l2
		= LINE1 + EOL + LINE2 + EOL

以此类推

ll3 = ll2 + 											l3
		= LINE1 + EOL + LINE2 + EOL + LINE3 + EOL

ll4 = ll3 + 																		l4
		= LINE1 + EOL + LINE2 + EOL + LINE3 + EOL + LINE4 + EOL

最后匹配 p

p
    : ll EOF
        { console.log($1); }
    ;

其中的动作 action 的代码 { console.log($1); } 不再是赋值,相当于

console.log(ll4);

展开即为

console.log(LINE1 + EOL + LINE2 + EOL + LINE3 + EOL + LINE4 + EOL);

输出的内容正好就是前面词法分析 lexical analysis 的结果,也就是说本例按行解析然后又组合成原样输出。

本例生成的 Jison 解析器代码

/* parser generated by jison 0.4.18 */
/*
  Returns a Parser object of the following structure:

  Parser: {
    yy: {}
  }

  Parser.prototype: {
    yy: {},
    trace: function(),
    symbols_: {associative list: name ==> number},
    terminals_: {associative list: number ==> name},
    productions_: [...],
    performAction: function anonymous(yytext, yyleng, yylineno, yy, yystate, $$, _$),
    table: [...],
    defaultActions: {...},
    parseError: function(str, hash),
    parse: function(input),

    lexer: {
        EOF: 1,
        parseError: function(str, hash),
        setInput: function(input),
        input: function(),
        unput: function(str),
        more: function(),
        less: function(n),
        pastInput: function(),
        upcomingInput: function(),
        showPosition: function(),
        test_match: function(regex_match_array, rule_index),
        next: function(),
        lex: function(),
        begin: function(condition),
        popState: function(),
        _currentRules: function(),
        topState: function(),
        pushState: function(condition),

        options: {
            ranges: boolean           (optional: true ==> token location info will include a .range[] member)
            flex: boolean             (optional: true ==> flex-like lexing behaviour where the rules are tested exhaustively to find the longest match)
            backtrack_lexer: boolean  (optional: true ==> lexer regexes are tested in order and for each matching regex the action code is invoked; the lexer terminates the scan when a token is returned by the action code)
        },

        performAction: function(yy, yy_, $avoiding_name_collisions, YY_START),
        rules: [...],
        conditions: {associative list: name ==> set},
    }
  }


  token location info (@$, _$, etc.): {
    first_line: n,
    last_line: n,
    first_column: n,
    last_column: n,
    range: [start_number, end_number]       (where the numbers are indexes into the input string, regular zero-based)
  }


  the parseError function receives a 'hash' object with these members for lexer and parser errors: {
    text:        (matched text)
    token:       (the produced terminal token, if any)
    line:        (yylineno)
  }
  while parser (grammar) errors will also provide these members, i.e. parser errors deliver a superset of attributes: {
    loc:         (yylloc)
    expected:    (string describing the set of expected tokens)
    recoverable: (boolean: TRUE when the parser has a error recovery rule available for this particular error)
  }
*/
var step_1_line = (function(){
var o=function(k,v,o,l){for(o=o||{},l=k.length;l--;o[k[l]]=v);return o},$V0=[1,4],$V1=[5,7];
var parser = {trace: function trace () { },
yy: {},
symbols_: {"error":2,"p":3,"ll":4,"EOF":5,"l":6,"LINE":7,"EOL":8,"$accept":0,"$end":1},
terminals_: {2:"error",5:"EOF",7:"LINE",8:"EOL"},
productions_: [0,[3,2],[4,2],[4,1],[6,2]],
performAction: function anonymous(yytext, yyleng, yylineno, yy, yystate /* action[1] */, $$ /* vstack */, _$ /* lstack */) {
/* this == yyval */

var $0 = $$.length - 1;
switch (yystate) {
case 1:
 console.log($$[$0-1]); 
break;
case 2: case 4:
 this.$ = $$[$0-1] + $$[$0]; 
break;
case 3:
 this.$ = $$[$0]; 
break;
}
},
table: [{3:1,4:2,6:3,7:$V0},{1:[3]},{5:[1,5],6:6,7:$V0},o($V1,[2,3]),{8:[1,7]},{1:[2,1]},o($V1,[2,2]),o($V1,[2,4])],
defaultActions: {5:[2,1]},
parseError: function parseError (str, hash) {
    if (hash.recoverable) {
        this.trace(str);
    } else {
        var error = new Error(str);
        error.hash = hash;
        throw error;
    }
},
parse: function parse(input) {
    var self = this, stack = [0], tstack = [], vstack = [null], lstack = [], table = this.table, yytext = '', yylineno = 0, yyleng = 0, recovering = 0, TERROR = 2, EOF = 1;
    var args = lstack.slice.call(arguments, 1);
    var lexer = Object.create(this.lexer);
    var sharedState = { yy: {} };
    for (var k in this.yy) {
        if (Object.prototype.hasOwnProperty.call(this.yy, k)) {
            sharedState.yy[k] = this.yy[k];
        }
    }
    lexer.setInput(input, sharedState.yy);
    sharedState.yy.lexer = lexer;
    sharedState.yy.parser = this;
    if (typeof lexer.yylloc == 'undefined') {
        lexer.yylloc = {};
    }
    var yyloc = lexer.yylloc;
    lstack.push(yyloc);
    var ranges = lexer.options && lexer.options.ranges;
    if (typeof sharedState.yy.parseError === 'function') {
        this.parseError = sharedState.yy.parseError;
    } else {
        this.parseError = Object.getPrototypeOf(this).parseError;
    }
    function popStack(n) {
        stack.length = stack.length - 2 * n;
        vstack.length = vstack.length - n;
        lstack.length = lstack.length - n;
    }
    _token_stack:
        var lex = function () {
            var token;
            token = lexer.lex() || EOF;
            if (typeof token !== 'number') {
                token = self.symbols_[token] || token;
            }
            return token;
        };
    var symbol, preErrorSymbol, state, action, a, r, yyval = {}, p, len, newState, expected;
    while (true) {
        state = stack[stack.length - 1];
        if (this.defaultActions[state]) {
            action = this.defaultActions[state];
        } else {
            if (symbol === null || typeof symbol == 'undefined') {
                symbol = lex();
            }
            action = table[state] && table[state][symbol];
        }
                    if (typeof action === 'undefined' || !action.length || !action[0]) {
                var errStr = '';
                expected = [];
                for (p in table[state]) {
                    if (this.terminals_[p] && p > TERROR) {
                        expected.push('\'' + this.terminals_[p] + '\'');
                    }
                }
                if (lexer.showPosition) {
                    errStr = 'Parse error on line ' + (yylineno + 1) + ':\n' + lexer.showPosition() + '\nExpecting ' + expected.join(', ') + ', got \'' + (this.terminals_[symbol] || symbol) + '\'';
                } else {
                    errStr = 'Parse error on line ' + (yylineno + 1) + ': Unexpected ' + (symbol == EOF ? 'end of input' : '\'' + (this.terminals_[symbol] || symbol) + '\'');
                }
                this.parseError(errStr, {
                    text: lexer.match,
                    token: this.terminals_[symbol] || symbol,
                    line: lexer.yylineno,
                    loc: yyloc,
                    expected: expected
                });
            }
        if (action[0] instanceof Array && action.length > 1) {
            throw new Error('Parse Error: multiple actions possible at state: ' + state + ', token: ' + symbol);
        }
        switch (action[0]) {
        case 1:
            stack.push(symbol);
            vstack.push(lexer.yytext);
            lstack.push(lexer.yylloc);
            stack.push(action[1]);
            symbol = null;
            if (!preErrorSymbol) {
                yyleng = lexer.yyleng;
                yytext = lexer.yytext;
                yylineno = lexer.yylineno;
                yyloc = lexer.yylloc;
                if (recovering > 0) {
                    recovering--;
                }
            } else {
                symbol = preErrorSymbol;
                preErrorSymbol = null;
            }
            break;
        case 2:
            len = this.productions_[action[1]][1];
            yyval.$ = vstack[vstack.length - len];
            yyval._$ = {
                first_line: lstack[lstack.length - (len || 1)].first_line,
                last_line: lstack[lstack.length - 1].last_line,
                first_column: lstack[lstack.length - (len || 1)].first_column,
                last_column: lstack[lstack.length - 1].last_column
            };
            if (ranges) {
                yyval._$.range = [
                    lstack[lstack.length - (len || 1)].range[0],
                    lstack[lstack.length - 1].range[1]
                ];
            }
            r = this.performAction.apply(yyval, [
                yytext,
                yyleng,
                yylineno,
                sharedState.yy,
                action[1],
                vstack,
                lstack
            ].concat(args));
            if (typeof r !== 'undefined') {
                return r;
            }
            if (len) {
                stack = stack.slice(0, -1 * len * 2);
                vstack = vstack.slice(0, -1 * len);
                lstack = lstack.slice(0, -1 * len);
            }
            stack.push(this.productions_[action[1]][0]);
            vstack.push(yyval.$);
            lstack.push(yyval._$);
            newState = table[stack[stack.length - 2]][stack[stack.length - 1]];
            stack.push(newState);
            break;
        case 3:
            return true;
        }
    }
    return true;
}};
/* generated by jison-lex 0.3.4 */
var lexer = (function(){
var lexer = ({

EOF:1,

parseError:function parseError(str, hash) {
        if (this.yy.parser) {
            this.yy.parser.parseError(str, hash);
        } else {
            throw new Error(str);
        }
    },

// resets the lexer, sets new input
setInput:function (input, yy) {
        this.yy = yy || this.yy || {};
        this._input = input;
        this._more = this._backtrack = this.done = false;
        this.yylineno = this.yyleng = 0;
        this.yytext = this.matched = this.match = '';
        this.conditionStack = ['INITIAL'];
        this.yylloc = {
            first_line: 1,
            first_column: 0,
            last_line: 1,
            last_column: 0
        };
        if (this.options.ranges) {
            this.yylloc.range = [0,0];
        }
        this.offset = 0;
        return this;
    },

// consumes and returns one char from the input
input:function () {
        var ch = this._input[0];
        this.yytext += ch;
        this.yyleng++;
        this.offset++;
        this.match += ch;
        this.matched += ch;
        var lines = ch.match(/(?:\r\n?|\n).*/g);
        if (lines) {
            this.yylineno++;
            this.yylloc.last_line++;
        } else {
            this.yylloc.last_column++;
        }
        if (this.options.ranges) {
            this.yylloc.range[1]++;
        }

        this._input = this._input.slice(1);
        return ch;
    },

// unshifts one char (or a string) into the input
unput:function (ch) {
        var len = ch.length;
        var lines = ch.split(/(?:\r\n?|\n)/g);

        this._input = ch + this._input;
        this.yytext = this.yytext.substr(0, this.yytext.length - len);
        //this.yyleng -= len;
        this.offset -= len;
        var oldLines = this.match.split(/(?:\r\n?|\n)/g);
        this.match = this.match.substr(0, this.match.length - 1);
        this.matched = this.matched.substr(0, this.matched.length - 1);

        if (lines.length - 1) {
            this.yylineno -= lines.length - 1;
        }
        var r = this.yylloc.range;

        this.yylloc = {
            first_line: this.yylloc.first_line,
            last_line: this.yylineno + 1,
            first_column: this.yylloc.first_column,
            last_column: lines ?
                (lines.length === oldLines.length ? this.yylloc.first_column : 0)
                 + oldLines[oldLines.length - lines.length].length - lines[0].length :
              this.yylloc.first_column - len
        };

        if (this.options.ranges) {
            this.yylloc.range = [r[0], r[0] + this.yyleng - len];
        }
        this.yyleng = this.yytext.length;
        return this;
    },

// When called from action, caches matched text and appends it on next action
more:function () {
        this._more = true;
        return this;
    },

// When called from action, signals the lexer that this rule fails to match the input, so the next matching rule (regex) should be tested instead.
reject:function () {
        if (this.options.backtrack_lexer) {
            this._backtrack = true;
        } else {
            return this.parseError('Lexical error on line ' + (this.yylineno + 1) + '. You can only invoke reject() in the lexer when the lexer is of the backtracking persuasion (options.backtrack_lexer = true).\n' + this.showPosition(), {
                text: "",
                token: null,
                line: this.yylineno
            });

        }
        return this;
    },

// retain first n characters of the match
less:function (n) {
        this.unput(this.match.slice(n));
    },

// displays already matched input, i.e. for error messages
pastInput:function () {
        var past = this.matched.substr(0, this.matched.length - this.match.length);
        return (past.length > 20 ? '...':'') + past.substr(-20).replace(/\n/g, "");
    },

// displays upcoming input, i.e. for error messages
upcomingInput:function () {
        var next = this.match;
        if (next.length < 20) {
            next += this._input.substr(0, 20-next.length);
        }
        return (next.substr(0,20) + (next.length > 20 ? '...' : '')).replace(/\n/g, "");
    },

// displays the character position where the lexing error occurred, i.e. for error messages
showPosition:function () {
        var pre = this.pastInput();
        var c = new Array(pre.length + 1).join("-");
        return pre + this.upcomingInput() + "\n" + c + "^";
    },

// test the lexed token: return FALSE when not a match, otherwise return token
test_match:function(match, indexed_rule) {
        var token,
            lines,
            backup;

        if (this.options.backtrack_lexer) {
            // save context
            backup = {
                yylineno: this.yylineno,
                yylloc: {
                    first_line: this.yylloc.first_line,
                    last_line: this.last_line,
                    first_column: this.yylloc.first_column,
                    last_column: this.yylloc.last_column
                },
                yytext: this.yytext,
                match: this.match,
                matches: this.matches,
                matched: this.matched,
                yyleng: this.yyleng,
                offset: this.offset,
                _more: this._more,
                _input: this._input,
                yy: this.yy,
                conditionStack: this.conditionStack.slice(0),
                done: this.done
            };
            if (this.options.ranges) {
                backup.yylloc.range = this.yylloc.range.slice(0);
            }
        }

        lines = match[0].match(/(?:\r\n?|\n).*/g);
        if (lines) {
            this.yylineno += lines.length;
        }
        this.yylloc = {
            first_line: this.yylloc.last_line,
            last_line: this.yylineno + 1,
            first_column: this.yylloc.last_column,
            last_column: lines ?
                         lines[lines.length - 1].length - lines[lines.length - 1].match(/\r?\n?/)[0].length :
                         this.yylloc.last_column + match[0].length
        };
        this.yytext += match[0];
        this.match += match[0];
        this.matches = match;
        this.yyleng = this.yytext.length;
        if (this.options.ranges) {
            this.yylloc.range = [this.offset, this.offset += this.yyleng];
        }
        this._more = false;
        this._backtrack = false;
        this._input = this._input.slice(match[0].length);
        this.matched += match[0];
        token = this.performAction.call(this, this.yy, this, indexed_rule, this.conditionStack[this.conditionStack.length - 1]);
        if (this.done && this._input) {
            this.done = false;
        }
        if (token) {
            return token;
        } else if (this._backtrack) {
            // recover context
            for (var k in backup) {
                this[k] = backup[k];
            }
            return false; // rule action called reject() implying the next rule should be tested instead.
        }
        return false;
    },

// return next match in input
next:function () {
        if (this.done) {
            return this.EOF;
        }
        if (!this._input) {
            this.done = true;
        }

        var token,
            match,
            tempMatch,
            index;
        if (!this._more) {
            this.yytext = '';
            this.match = '';
        }
        var rules = this._currentRules();
        for (var i = 0; i < rules.length; i++) {
            tempMatch = this._input.match(this.rules[rules[i]]);
            if (tempMatch && (!match || tempMatch[0].length > match[0].length)) {
                match = tempMatch;
                index = i;
                if (this.options.backtrack_lexer) {
                    token = this.test_match(tempMatch, rules[i]);
                    if (token !== false) {
                        return token;
                    } else if (this._backtrack) {
                        match = false;
                        continue; // rule action called reject() implying a rule MISmatch.
                    } else {
                        // else: this is a lexer rule which consumes input without producing a token (e.g. whitespace)
                        return false;
                    }
                } else if (!this.options.flex) {
                    break;
                }
            }
        }
        if (match) {
            token = this.test_match(match, rules[index]);
            if (token !== false) {
                return token;
            }
            // else: this is a lexer rule which consumes input without producing a token (e.g. whitespace)
            return false;
        }
        if (this._input === "") {
            return this.EOF;
        } else {
            return this.parseError('Lexical error on line ' + (this.yylineno + 1) + '. Unrecognized text.\n' + this.showPosition(), {
                text: "",
                token: null,
                line: this.yylineno
            });
        }
    },

// return next match that has a token
lex:function lex () {
        var r = this.next();
        if (r) {
            return r;
        } else {
            return this.lex();
        }
    },

// activates a new lexer condition state (pushes the new lexer condition state onto the condition stack)
begin:function begin (condition) {
        this.conditionStack.push(condition);
    },

// pop the previously active lexer condition state off the condition stack
popState:function popState () {
        var n = this.conditionStack.length - 1;
        if (n > 0) {
            return this.conditionStack.pop();
        } else {
            return this.conditionStack[0];
        }
    },

// produce the lexer rule set which is active for the currently active lexer condition state
_currentRules:function _currentRules () {
        if (this.conditionStack.length && this.conditionStack[this.conditionStack.length - 1]) {
            return this.conditions[this.conditionStack[this.conditionStack.length - 1]].rules;
        } else {
            return this.conditions["INITIAL"].rules;
        }
    },

// return the currently active lexer condition state; when an index argument is provided it produces the N-th previous condition state, if available
topState:function topState (n) {
        n = this.conditionStack.length - 1 - Math.abs(n || 0);
        if (n >= 0) {
            return this.conditionStack[n];
        } else {
            return "INITIAL";
        }
    },

// alias for begin(condition)
pushState:function pushState (condition) {
        this.begin(condition);
    },

// return the number of states currently on the stack
stateStackSize:function stateStackSize() {
        return this.conditionStack.length;
    },
options: {},
performAction: function anonymous(yy,yy_,$avoiding_name_collisions,YY_START) {
var YYSTATE=YY_START;
switch($avoiding_name_collisions) {
case 0: return 7; 
break;
case 1: return 8; 
break;
case 2: return 5; 
break;
}
},
rules: [/^(?:[^\n\r]+)/,/^(?:{eola})/,/^(?:$)/],
conditions: {"INITIAL":{"rules":[0,1,2],"inclusive":true}}
});
return lexer;
})();
parser.lexer = lexer;
function Parser () {
  this.yy = {};
}
Parser.prototype = parser;parser.Parser = Parser;
return new Parser;
})();


if (typeof require !== 'undefined' && typeof exports !== 'undefined') {
exports.parser = step_1_line;
exports.Parser = step_1_line.Parser;
exports.parse = function () { return step_1_line.parse.apply(step_1_line, arguments); };
exports.main = function commonjsMain (args) {
    if (!args[1]) {
        console.log('Usage: '+args[0]+' FILE');
        process.exit(1);
    }
    var source = require('fs').readFileSync(require('path').normalize(args[1]), "utf8");
    return exports.parser.parse(source);
};
if (typeof module !== 'undefined' && require.main === module) {
  exports.main(process.argv.slice(1));
}
}
 类似资料: