MonetDB 服务参考资料 V5.0
1.1 这个参考资料,是介绍monetDb的函数、系统架构、服务、最佳实践的手册。html版本使用makeinfo程序生成,pdf使用pdflatex,还有xml、doc等格式。
书和logo所有者为CWI。
1.2怎么读本参考资料:
如果对技术细节感兴趣,建议读g Section 1.69 [Design Overview], page 31。The Chapter 3 [MAL Reference], page 48讨论MAL语言和性能优化。
1.3特性和限制
什么适合考虑用MonetDB?
1、一个高性能数据库系统
2、一个多模型系统
3、一个列存数据库内核
4、一个广泛使用的数据库
5、一个可以扩展的数据库
6、一个开源系统
什么时候不考虑MonetDB?
1、对象持久化缓存
2、高性能金融级OLTP
The SQL implementation provides full transaction control and recovery。MonetDB has not been designed with a strong focus on security。Scaling over multiple machines. MonetDB does not provide a centralized controlled,distributed database infrastructure yet. Instead, we move towards an architecture where multiple autonomous MonetDB instances are joining together to process a large and distributed workload.
MonetDB的关键特性:
1、The kernel source code is written in ANSI-C and POSIX compliant.
2、The application interface libraries source code complies with the latest language versions.
3、The source code is written in a literate programming style, to stimulate proximity of code and its documentation.
4、The source code is compiled and tested on many platforms with different compiler options to ensure portability.
5、The source code is based on the GNU toolkit, e.g. Automake, Autoconf, and Libtool for portability.
6、 The source code is heavily tested on a daily basis, and scrutinized using the Valgrind toolkit.
MonetDB server的核心特性:
1、 A fully decomposed storage scheme using memory mapped files.
2、It supports scalable databases, 32- and 64-bit platforms.
3、 Connectivity is provided through TCP/IP sockets and SSH on many platforms.
4、 Index selection, creation and maintenance is automatic.
5、 The relational operators materialize their results and are self-optimizing.
6、 The operations are cache- and memory-aware with supreme performance.
7、 The database back-end is multi-threaded and guards a single physical database instance.
1.5手册生成
MonetDB的代码有一群扩展的模型组成,Mx工具使*.mx可以展开成为系统编辑内容。手册使用texinfo格式温度生成。
The components for the reference manual are extracted by
Mx -i -B -H1 <filename>.mx
which generates the file .bdy.texi. running makeinfo to produce the desired output format。
pdflatex <filename>.tex
1.68开发路线
The information is organized around the major system components.
Server Roadmap
1、Parallelism Exploitation of multi-core systems calls for renewed attention to parallel processing of the MonetDB kernel. Stress testing of concurrent processing may reveal race conditions hereto undetected. •
2、Streaming Data A separete area is support for streaming database functionality. It requires additions to the way we support io-channels and schedule query plans.
3、Functional Enhancements Support for geographical application is underway. It consists of a concise library for managing geometric types
SQL Roadmap
the SQL front-end is to provide all features available in SQL:2003.
1、Window functions
2、Full text retrieval
3、Support for multi-media objects
4、Replication Service
5、GIS
6、General column and table constraint enforcement
7、Internationalization of the character sets
8、Full outer-join queries
Those on the list below are not expected to be supported:
1、Cursor based processing, because the execution engine is not based on the iterator model deployed in other engines. A simulation of the cursor based scheme would be utterly expensive from a performance point of view.
2、Multi-level transaction isolation levels. Coarse grain isolation is provided using table level locks
1.70设计考虑
Redesign of the MonetDB software stack was driven by the need to reduce the effort to extend the system into novel directions and to reduce the Total Execution Cost (TEC).
The TEC is composed on several cost factors:
A) API message handling •
P) Parsing and semantic analysis •
O) Optimization and plan generation •
D) Data access to the persistent store •
E) Execution of the query terms •
R) Result delivery to the application
Choosing an architecture for processing database operations pre-supposes an intuition on how the cost will be distributed.In an OLTP setting you expect most of the cost to be in (P,O), while in OLAP it will be (D,E,R). In a distributed setting the components (O,D,E) are dominant. Web-applications would focus on (A,E,R).
Such a simple characterization ignores the wide-spread differences that can be experienced at each level. To illustrate, in D) and R) it makes a big difference whether the data is already in the cache or still on disk. With E) it makes a big difference whether you are comparing two integers, evaluation of a mathematical function, e.g., Gaussian, or a regular expression evaluation on a string. As a result, intense optimization in one area may become completely invisible due to being overshadowed by other cost factors
The Version 5 infrastructure is designed to ease addressing each of these cost factors in a well-defined way, while retaining the flexibility to combine the components needed for a particular situation. It results in an architecture where you assemble the components for a particular application domain and hardware platform.
The primary interface to the database kernel is still based on the exchange of text in the form of queries and simply formatted results. This interface is designed for ease of interpretation, versatility and is flexible to accommodate system debugging and application tool development. Although a textual interface potentially leads to a performance degradation, our experience with earlier system versions showed that the overhead can be kept within acceptable bounds. Moreover, a textual interface reduces the programming effort otherwise needed to develop test and application programs. The XML trend as the language for tool interaction supports our decision
1.7.2 MAL
The target language for a query compiler is the MonetDB Assembly Language (MAL). It was designed to ease code generation and fast interpretation by the server. The compiler produces algebraic query plans, which are turned into physical execution plans by the MAL optimizers.
The design and implementation of MAL takes the functionality offered previously a significant step further. To name a few:
• All instructions are strongly typed before being executed.
• It supports polymorphic functions. They act as templates that produce strongly typed instantiations when needed.
• Function style expressions where each assignment instruction can receive multiple target results; it forms a point in the dataflow graph.
• It supports co-routines (Factories) to build streaming applications.
• Properties are associated with the program code for ease of optimization and scheduling.
• It can be readily extended with user defined types and function modules.
1.73执行引擎
The execution engine comes in several flavors. The default is a simple, sequential MAL interpreter.
1.74Session Scenarios
In MonetDB multiple languages, optimizers, and execution engines can be combined at run time to satisfy a wide user-community. Such an assemblage of components is called a scenario and consists of a reader, parser, optimizer, tactic scheduler and engine. These hooks allow for both linked-in and external components.
1.75Scenario management
Scenarios are captured in modules; they can be dynamically loaded and remain active until the system is brought to a halt. The first time a scenario xyz is used, the system looks for a scenario initialization routine xyzinitSystem() and executes it. It is typically used to prepare the server for language specific interactions. Thereafter its components are set to those required by the scenario and the client initialization takes place.