the-super-tiny-compiler

段溪叠
2023-12-01

bable官网推荐的 compiler 原理(实现了一个小型的compiler),本文主要是摘抄思想并理解。

For an awesome tutorial on compilers, check out the-super-tiny-compiler,
 which also explains how Babel itself works on a high level.

git地址:https://github.com/jamiebuilds/the-super-tiny-compiler

本文案例:将lisp语言描述转为c语言描述


                   LISP                      C
 
    2 + 2          (add 2 2)                 add(2, 2)
    4 - 2          (subtract 4 2)            subtract(4, 2)
    2 + (4 - 2)    (add 2 (subtract 4 2))    add(2, subtract(4, 2))

compiler的流程

Most compilers break down into three primary stages: Parsing, Transformation,
and Code Generation

  • Parsing is taking raw code and turning it into a more abstract
    representation of the code.
  • Transformation takes this abstract representation and manipulates to
    do whatever the compiler wants it to.
  • Code Generation takes the transformed representation of the code and
    turns it into new code.

1. Parsing
Parsing typically gets broken down into two phases: Lexical Analysis and Syntactic Analysis.

Lexical Analysis

  • Takes the raw code and splits it apart into these things called
    tokens by a thing called a tokenizer (or lexer).

  • Tokens are an array of tiny little objects that describe an isolated
    piece of the syntax. They could be numbers, labels, punctuation,
    operators, whatever.

Syntactic Analysis

  • Takes the tokens and reformats them into a representation that
    describes each part of the syntax and their relation to one another.
    This is known as an intermediate representation or Abstract Syntax
    Tree.

  • An Abstract Syntax Tree, or AST for short, is a deeply nested object
    that represents code in a way that is both easy to work with and
    tells us a lot of information.

案例

const input  = '(add 2 (subtract 4 2))';

词法分析结果: Tokens

[
    { type: 'paren',  value: '('        },
    { type: 'name',   value: 'add'      },
    { type: 'number', value: '2'        },
    { type: 'paren',  value: '('        },
    { type: 'name',   value: 'subtract' },
    { type: 'number', value: '4'        },
    { type: 'number', value: '2'        },
    { type: 'paren',  value: ')'        },
    { type: 'paren',  value: ')'        },
]

语法分析结果:AST

{
 type: 'Program',
 body: [{
   type: 'CallExpression',
   name: 'add',
   params: [{
     type: 'NumberLiteral',
     value: '2',
   }, {
     type: 'CallExpression',
     name: 'subtract',
     params: [{
       type: 'NumberLiteral',
       value: '4',
     }, {
       type: 'NumberLiteral',
       value: '2',
     }]
   }]
 }]
}

2. Transformation

  • This just takes the AST from the last step and makes changes to it.
    It can manipulate the AST in the same language or it can translate it
    into an entirely new language.
  • When transforming the AST we can manipulate nodes by
    adding/removing/replacing properties, we can add new nodes, remove
    nodes, or we could leave the existing AST alone and create an
    entirely new one based on it.

AST 转换成 new AST

{
  type: 'Program',
  body: [{
    type: 'ExpressionStatement',
    expression: {
      type: 'CallExpression',
      callee: {
        type: 'Identifier',
        name: 'add'
      },
      arguments: [{
        type: 'NumberLiteral',
        value: '2'
      }, {
        type: 'CallExpression',
        callee: {
          type: 'Identifier',
          name: 'subtract'
        },
        arguments: [{
          type: 'NumberLiteral',
          value: '4'
        }, {
          type: 'NumberLiteral',
          value: '2'
        }]
      }]
    }
  }]
};

3. Code Generation

基于new AST生成代码

const output = 'add(2, subtract(4, 2));';

总结

/** 
 *   1. input  => tokenizer   => tokens
 *   2. tokens => parser      => ast
 *   3. ast    => transformer => newAst
 *   4. newAst => generator   => output
 */

function compiler(input) {
  let tokens = tokenizer(input);
  let ast    = parser(tokens);
  let newAst = transformer(ast);
  let output = codeGenerator(newAst);

  // and simply return the output!
  return output;
}
 类似资料: