bable官网推荐的 compiler 原理(实现了一个小型的compiler),本文主要是摘抄思想并理解。
For an awesome tutorial on compilers, check out the-super-tiny-compiler,
which also explains how Babel itself works on a high level.
git地址:https://github.com/jamiebuilds/the-super-tiny-compiler
本文案例:将lisp语言描述转为c语言描述
LISP C
2 + 2 (add 2 2) add(2, 2)
4 - 2 (subtract 4 2) subtract(4, 2)
2 + (4 - 2) (add 2 (subtract 4 2)) add(2, subtract(4, 2))
compiler的流程
Most compilers break down into three primary stages: Parsing, Transformation,
and Code Generation
1. Parsing
Parsing typically gets broken down into two phases: Lexical Analysis and Syntactic Analysis.
Lexical Analysis
Takes the raw code and splits it apart into these things called
tokens by a thing called a tokenizer (or lexer).
Tokens are an array of tiny little objects that describe an isolated
piece of the syntax. They could be numbers, labels, punctuation,
operators, whatever.
Syntactic Analysis
Takes the tokens and reformats them into a representation that
describes each part of the syntax and their relation to one another.
This is known as an intermediate representation or Abstract Syntax
Tree.
An Abstract Syntax Tree, or AST for short, is a deeply nested object
that represents code in a way that is both easy to work with and
tells us a lot of information.
案例
const input = '(add 2 (subtract 4 2))';
词法分析结果: Tokens
[
{ type: 'paren', value: '(' },
{ type: 'name', value: 'add' },
{ type: 'number', value: '2' },
{ type: 'paren', value: '(' },
{ type: 'name', value: 'subtract' },
{ type: 'number', value: '4' },
{ type: 'number', value: '2' },
{ type: 'paren', value: ')' },
{ type: 'paren', value: ')' },
]
语法分析结果:AST
{
type: 'Program',
body: [{
type: 'CallExpression',
name: 'add',
params: [{
type: 'NumberLiteral',
value: '2',
}, {
type: 'CallExpression',
name: 'subtract',
params: [{
type: 'NumberLiteral',
value: '4',
}, {
type: 'NumberLiteral',
value: '2',
}]
}]
}]
}
2. Transformation
AST 转换成 new AST
{
type: 'Program',
body: [{
type: 'ExpressionStatement',
expression: {
type: 'CallExpression',
callee: {
type: 'Identifier',
name: 'add'
},
arguments: [{
type: 'NumberLiteral',
value: '2'
}, {
type: 'CallExpression',
callee: {
type: 'Identifier',
name: 'subtract'
},
arguments: [{
type: 'NumberLiteral',
value: '4'
}, {
type: 'NumberLiteral',
value: '2'
}]
}]
}
}]
};
3. Code Generation
基于new AST生成代码
const output = 'add(2, subtract(4, 2));';
总结
/**
* 1. input => tokenizer => tokens
* 2. tokens => parser => ast
* 3. ast => transformer => newAst
* 4. newAst => generator => output
*/
function compiler(input) {
let tokens = tokenizer(input);
let ast = parser(tokens);
let newAst = transformer(ast);
let output = codeGenerator(newAst);
// and simply return the output!
return output;
}