ANTLR 权威参考前言部分

洪博艺

2023-12-01

In August 1993, I finished school and drove my overloaded moving van to Minnesota to start working. My office mate was a curmudgeonly astrophysicist named Kevin, who has since become a good friend. Kevin has told me on multiple occasions that only physicists do real work and that programmers merely support phys icists. Because all I do is build language tools to support programmers, I am at least two levels of indirection away from doing anything useful. Now, Kevin also claims that Fortran 77 is a good enough language for anybody and, for that matter, that Fortran 66 is probably sufficient, so one might question his judgment. But, concerning my usefulness, he was right—I am undamentally lazy and would much rather work on something that made other people productive than actually do anything useful myself. This attitude has led to my guiding principle:

1993年八月，我毕业了，开着一辆装着铺盖卷的厢式货车到Minnesota上班。我的同事Kevin，是个自负的天文学家，我们后来成了好朋友。Kevin说，只有物理学家做真正的工作，程序员只是纯粹地支持物理学家。因为我为其他程序员构建语言工具，所以我离真正有用的工作至少有两层远。现在，Kevin说了，Fortran 77对任何人都是足够好的语言，并且Fortran 66通常情况下也足够，所以会有人对他的话质疑。但，对我的作用而言，他是对的--我很懒并且，我宁可做些提高其他人生产率的工作，也不会自己亲自去做。这种态度形成了我的指导思想：

Why program by hand in five days what you can spend five years of your life automating?

如果能使五年的生活自动化，为什么不花五天的时间着手编程呢？

Here’s the point: The first time you encounter a problem, wriing a formal, general, and automatic mechanism is expensive and is usually overkill. From then on, though, you are much faster and better at solving similar problems because of your automated tool. Building tools canalso be much more fun than your real job. Now that I’m a profesor, I have the luxury of avoiding real work for a living.

是这样：第一次你遇到问题，编写一个正式的，通用的自动化的机制，是昂贵和过分的。但，以后，因为你的自动化工具，你会更快更好的解决类似问题，而且构建一个工具比解决真正问题更有趣。现在，我已经是教授，可以不用为了生活而做实际事情了。

My passion for the last two decades has been ANTLR, ANother Tool for Language Recognition. ANTLR is a parser generator that automates the construction of language recognizers. It is a program that writes other programs.

我最近这二十年的热情在于ANTLR，Another Tool for Language Recognition(语言识别的另外工具）。ANTLR能生成解析器，这个解析器自动识别出某种语言。它也是编写其他程序的程序。

From a formal language description, ANTLR generates a program that determines whether sentences conform to that language. By adding code snippets to the grammar, the recognizer becomes a translator.
The code snippets compute output phrases based upon computations on input phrases. ANTLR is suitable for the simplest and the most complicated language recognition and translation problems. With each new release, ANTLR becomes more sophisticated and easier to use. ANTLR is extremely popular with 5,000 downloads a month and is included on all Linux and OS X distributions. It is widely used because it:
根据一种语言的正规（语法）描述,ANTLR生成一个程序（译者：语法识别器），用以判定一个语句是否符合这个语言。再通过在语法中加入代码片段，这个识别器就变成了一个翻译器。代码根据输入的短语，计算出输出的短语。ANTLR适用于最简单和最复杂的语言识别和翻译问题。每次发布，ANTLR变得更强大和易于使用。ANTLR 每月5,000下载量，极为流行，而且包含在linux 和 mac OS X 发布找。他被广泛使用，因为：

? Generates human-readable code that is easy to fold into other applications
? Generates powerful recursive-descent recognizers using LL(*), an extension to LL(k) that uses arbitrary lookahead to make decisions
? Tightly integrates StringTemplate, a template engine specifically designed to generate structured text such as source code
? Has a graphical grammar development environment called ANTLRWorks that can debug parsers generated in any ANTLR target language
? Is actively supported with a good project website and a high-traffic mailing list
? Comes with complete source under the BSD license
? Is extremely flexible and automates or formalizes many common tasks
? Supports multiple target languages such as Java, C#, Python,Ruby, Objective-C, C, and C++

* 生成 “人类易读" 的程序，方便集成到其他应用中
* 生成强大的向下-递归的识别器，这种识别器使用 LL(*)--对 LL(k)的扩展，这种机制，可以为做判定而无限制前瞻
* 紧密集成了 StringTemplate （See http://www.stringtemplate.org），这个模版专为生成结构化文本如源代码而设计
* 带有图形化语法开发环境叫ANTLRWorks（See http://www.ANTLR.org/works）,可以调试任何ANTLR目标语言生成的解析器
* 被活跃支持，因好的项目开发网站和密集来往的邮件兴趣小组（See http://www.ANTLR.org:8080/pipermail/ANTLR-interest/）
* 使用BSD授权
* 很灵活，可以使日常一些任务自动化和标准化
* 支持多种目标语言，如java, c#, Python, Ruby, Object-C, C, and C++

Perhaps most importantly, ANTLR is much easier to understand and use than many other parser generators. It generates essentially what you would write by hand when building a recognizer and uses technology that mimics how your brain generates and recognizes language .

可能最重要的是，ANTLR比其他解析器更易于理解和使用。他生成的识别器，本质上就是你想要写的，使用一些技术来模仿你的大脑如何生成和识别语言的（参见第二章计算机语言的特性）

You generate and recognize sentences by walking their implicit tree structure, from the most abstract concept at the root to the vocabulary symbols at the leaves. Each subtree represents a phrase of a sentence and maps directly to a rule in your grammar. ANTLR’s grammar and resulting top-down recursive-descent recognizers thus feel very natural. ANTLR’s fundamental approach dovetails your innate anguage process.
你从叶子开始，遍历一个句子的内在树结构，来生成和识别语句。每个子树代表了语句中的短语，直接对应语法中的一个规则。ANTLR的语法和这种自顶向下递归的识别器使得感觉很自然。它的这些基本手段正好激发了处理语言的天性。

为何一个彻底的ANTLR新版本？

For the past four years, I have been working feverishly to design and build ANTLR v3, the subject of this book. ANTLR v3 is a completely rewritten version and represents the culmination of twenty years of language research. Most ANTLR users will instantly find it familiar, but many of the details are different. ANTLR retains its strong mojo in this new version while correcting a number of deficiencies, quirks,and weaknesses of ANTLR v2 (I felt free to break backward compatibility in order to achieve this). Specifically, I didn’t like th following about v2:

过去的4年，我一直努力的设计和构建ANTLR v3，也就是本书的课题。ANTLR v3是彻底的重写版本，代表了我20年语言研究的顶峰。很多ANTLR 用户发现它很面熟，但很多细节是不同的。ANTLR 新版本保持了强大的，同时更正了很多缺陷、不好用、和ANTLR v2的弱点（为了达到这点，我不惜牺牲了和v2的兼容性)。尤其，我不喜欢v2版本中的以下几点（See http://www.ANTLR.org/blog/ANTLR3/ANTLR2.bashing.tml）：

? The v2 lexers were very slow albeit powerful.
? There were no unit tests for v2.
? The v2 code base was impenetrable. The code was never refactored to clean it up, partially for fear of breaking it without unit tests.
? The linear approximate LL(k) parsing strategy was a bit weak.
? Building a new language target duplicated vast swaths of logic and
print statements.
? The AST construction mechanism was too informal.
? A number of common tasks were not easy (such as obtaining the
text matched by a parser rule).
? It lacked the semantic predicates hoisting of ANTLR v1 (PCCTS).
? The v2 license/contributor trail was loose and made big companies afraid to use it.

* v2 词法器虽然强大，但慢
* v2 没有单元测试
* 因为怕没有单元测试而引入错误，代码从来没有优化过
* 构建一个新的目标语言，需要复制大量的逻辑和打印语句
* AST（抽象树结构）构造机制很不正式
* 一些普通任务不容易做到（比如获取被解析器规则匹配的文本）
* 缺少了在 ANTLR v1(也就是PCCTS) 中有的语义预测
* v2 的许可证使得大公司不敢使用（译者：担心导致版权问题）

ANTLR v3 is my answer to the issues in v2. ANTLR v3 has a very clean and well-organized code base with lots of unit tests. ANTLR generates extremely powerful LL(*) recognizers that are fast and easy to read.

ANTLR v3就是对上面v2的问题的回答。v3有很干净、组织良好的代码，并有很多单元测试保障。ANTLR生成异常强大的LL(*) 语言识别器，效率高、易读。

Many common tasks are now easy by default. For example, reading in some input, tweaking it, and writing it back out while preserving whitespace is easy. ANTLR v3 also reintroduces semantic predicates hoisting. ANTLR’s license is now BSD, and all contributors ust sign a “certificate of origin. ANTLR v3 provides significant functionality beyond v2 as well:

很多普通任务现在很容易了。比如，读入输入，在此基础稍微变化一下，再回写出去，同时保留原来的空格是很容易的了。ANTLR v3 重新引入了语义预测。ANTLR 许可证现在是BSD，并且其代码贡献者都有签署“原著声明”（See http://www.ANTLR.org/license.html）. ANTLR v3 除了有 v2 的基本功能外，还有：

? Powerful LL(*) parsing strategy that supports more natural grammars and makes it easier to build them
? Auto-backtracking mode that shuts off all grammar analysis warnings, forcing the generated parser to simply figure things out at runtime
? Partial parsing result memoization to guarantee linear time complexity during backtracking at the cost of some memory
? Jean Bovet’s ANTLRWorks GUI grammar development environment
? StringTemplate template engine integration that makes generating structured text such as source code easy
? Formal AST construction rules that map input grammar alternatives to tree grammar fragments, making actions that manually construct ASTs no longer necessary
? Dynamically scoped attributes that allow distant rules to communicate
? Improved error reporting and recovery for generated recognizers
? Truly retargetable code generator; building a new target is a matter of defining StringTemplate templates that tell ANTLR how to generate grammar elements such as rule and token references

* 强大的 LL（*）解析策略，支持更自然的语法，这样书写语法更容易
* 自动回溯机制可以关闭语法分析时的警告，使得生成的解析器可以在运行时正确理解你的语句
* 部分解析结果备忘，保证了在回溯时，消耗一些内存的代价下，保持线性的时间复杂度
* Jean Bovet的ANTLRWorks 图形化语法开发环境
* StringTemplate模版引擎的集成，使得生成结构化文本如源代码很容易
* 使用AST（抽象子树）构造规则，可以将输入的语法项映射为语法树片段，这样不必手工构造AST。
* 动态作用域属性允许规则之间通讯
* 为识别器改善了错误报告和恢复机制
* 可信的目标代码生成器：构造一种新的目标就是定义StringTemplate模版，告诉ANTLR如何生成语法要素，如规则和标记引用即可

This book also provides a serious advantage to v3 over v2. Professionally edited and complete documentation is a big deal to developers. You can find more information about the history of ANTLR and its contributions to parsing theory on the ANTLR website
本书也给了v3相对于v2的对比优点。专业编辑的和完整的文档对开发者来说，也是很重要的事。你可以在ANTLR 网站上，发现更多关于ANTLR历史和对解析理论贡献的信息。（See http://www.ANTLR.org/history.html, http://www.ANTLR.org/contributions.html)

读者对象

The primary audience for this book is the practicing software developer,though it is suitable for junior and senior computer science undergraduates. This book is specifically targeted at any programmer interested in learning to use ANTLR to build interpreters and translators for domain-specific languages. Beginners and experts alike will need this book to use ANTLR v3 effectively. For the most part, the level of discussion is accessible to the average programmer. Portions of Part III, however, require some language experience to fully appreciate. Although the examples in this book are written in Java, their substance applies equally well to the other language targets such as C, C++, Objective-C,Python, C#, and so on. Readers should know Java to get the most out of the book.

这本书主要是给工作中的软件开发者，但也适合中高级计算机专业本科生。这本书尤其适合有兴趣使用ANTLR创建面向领域语言的编译器和翻译器。初学者和专家级别的都可以从ANTLR v3中受益。大部分，讨论的水平是中级程序员可理解的。但第三部分的一些内容，需要一些语言方面的经验才能深刻理解。虽然书中例子使用java编写，但同样适用于目标语言是其他语言的，比如C,C++,Object-C,Python,C#等。读者应掌握java以便于理解书中内容

书的内容

This book is the best, most complete source of information on ANTLR v3 that you’ll find anywhere. The free, online documentatio provides enough to learn the basic grammar syntax and semantics but doesn’explain ANTLR concepts in detail. This book helps you get the most out of ANTLR and is required reading to become an advanced user.In particular, Part III provides the only thorough explanation available anywhere of ANTLR’ LL(*) parsing strategy.

本书的信息是你能找到的关于ANTLR v3最好，最完整的。免费的在线文档对学习基础语法和语义已足够，但没有从细节上解释ANTLR的概念。本书帮助你深刻理解ANTLR，并通过阅读使你成为他的高级用户。尤其第三部分，提供了其他任何地方找不到的关于ANTLR的LL(*)解析策略的唯一彻底解释。

This book is organized as follows.

Part I introduces ANTLR, describes how the nature of computer languages dictates the nature of language recognizers, and provides a complete calculator example.

Part II is the main reference section and provides all the details you’ll eed to build large and complex grammars and translators.

Part III treks through ANTLR’s predicatedLL(*) parsing strategy and explains the grammar analysis errors you might encounter. Predicated-LL(*) is a totally new parsing strategy, and Part III is essentially the only written documentation you’ll find for it. You’ll need to be familiar with theontents in order to build complicated translators.

本书是这样组织的：

第一部分介绍ANTLR，描述怎样根据计算机语言的特性推导出语言识别器的特性，并给了一个计算器的完整例子。

第二部分是主要的知识部分，给了你构建强大和复杂语法和翻译器所需的所有细节。

第三部分带你涉猎ANTLR的预言式-LL(*)解析策略，解释你可能遇到的语法分析错误。预言式-LL(*) 是个完整的新解析策略，第三部分是这方面你能找的唯一书面文档。为了构建复杂的翻译器，你必须要熟悉这方面内容。

Readers who are totally new to grammars and language tools should follow the chapter sequence in Part I as is.

Chapter 1, Getting Started with ANTLR, on page 21 will familiarize you with ANTLR’s basic idea
Chapter 2, The Nature of Computer Languages, on page 34 gets you ready to study grammars more formally in Part II; and

Chapter 3, A Quick Tour for the Impatient, on page 59 gives your brain something concrete to consider. Familiarize yourself with the ANTLR details in Part II, but I suggest trying to modify an existing grammar as soon as you can. After you become comfortable with ANTLR’s functonality,you can attempt your own translator from scratch. When you get grammar analysis errors from ANTLR that you don’t understand, ten you need to dive into Part III to learn more about LL(*).

对语法和语言工具完全陌生的初学者应该按第一部分的顺序学习。

第一章，开始ANTLR，使你对ANTLR有基本概念；

第二章，计算机语言的特性，为你第二部分的正规学习语法做准备；

第三章，快速开始，给了一些具体东西（译者：类似hello word之类）。

在第二章，你会熟悉ANTLR的细节，但我建议你近可能对一些已有的语法做些练习修改。在你习惯了ANTLR的功能使用后，你可以从头尝试一个新的翻译器。当ANTLR给出你不理解的一些语法分析错误时，你就需要潜到第三部分学习更多关于LL(*)内容。

Those readers familiar with ANTLR v2 should probably skip directly to Chapter 3, A Quick Tour for the Impatient, on page 59 to figure out how v3 differs. Chapter 4, ANTLR Grammars, on page 86 is also a good place to look for features that v3 changes or improves on. 而对于熟悉ANTLR v2的读者可以直接跳到第三章，“快速开始”，大致知道和v3和它的区别。第四章，ANTLR 语法，也是了解v3变更或改进的特点的地方。

If you are familiar with an older tool, such as YACC [Joh79], I recommend starting from the beginning of the book as if you were totally new to grammars and language tools. If you’re used to JavaC or another top-down parser generator, you can probably skip Chapter 2,The Nature of Computer Languages, on page 34, though it is one of my favorite chapters.

如果你对一些旧的语言工具如YACC熟悉,我建议你像一个对语法和语言工具完全陌生的初学者一样从本书开头开始。如果你用过javaCC（See https://javacc.dev.java.net）或其他自定向下的解析生成器，你可以跳过第二章，计算机语言的特性，虽然这一章是我最喜欢的章节之一。

I hope you enjoy this book and ANTLR v3 as much as I have enjoyed writing them!
Terence Parr
March 2007
University of San Francisco

希望你像我写本书一样喜欢本书和ANTLR。
特伦斯帕
2007年3月
于圣弗朗西斯科大学

ANTLR 权威参考前言部分

为何一个彻底的ANTLR新版本？

读者对象

书的内容

相关阅读

相关文章

相关问答

相关文档

ANTLR 权威参考 前言部分

为何一个彻底的ANTLR新版本？

读者对象

书的内容

相关阅读

相关文章

相关问答

相关文档

ANTLR 权威参考前言部分