当前位置: 首页 > 工具软件 > HPCC Systems > 使用案例 >

高性能计算--HPCC--特性篇

刁丰羽
2023-12-01

原文:http://hpccsystems.com/Why-HPCC/features

翻译:那海蓝蓝,译文请见“ 【】”中的部分

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. Understand the key features of the platform:

【HPCC是一个用于操作,转换,查询和大数据量数据仓库的、被证明和被实战检验了的平台。该平台的主要特性如下:】

Hardware 硬件】

  • Processing clusters using commodity off-the-shelf (COTS) hardware【运行中的集群,使用商用现成品或技术(COTS,商用现成品或技术)的硬件
  • Utilizes typical rack-mounted blade servers with Intel or AMD processors, local memory and disk connected to a high-speed communications switch (usually Gigabit Ethernet connections) or hierarchy of communications switches depending on the total size of the cluster【采用使用了Intel或者AMD的处理器的典型的刀片式机架服务器,其本地内存和磁盘连接到了高速通讯交换机上(通常是千兆以太网连接)或者取决与集群规模的层次通信交换机上】
  • Clusters are usually homogeneous (all processors are configured identically), but not a requirement集群通常是相同的(所有的处理器都配置相同),但不强求

Available Configurations 可用配置】

  • Thor, the Data Refinery, is the extraction, transformation and loading engineThor,数据加工,是一个抽取、转换、加载引擎。那海蓝蓝注:即“ETL”,但不同厂商、开源项目实现的ETL互有差别】
  • Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities【Roxie,数据传送引擎,提供了独立的高性能在线查询处理和数据仓库功能】

File Systems 文件系统】

  • Distributed File System【分布式文件系统
  • Thor distributed file system (Thor DFS) is optimized for Big Data ETLThor分布式文件系统(Thor DFS)是专门为大数据量的ETL做过优化的】
  • Roxie distributed file system (Roxie DFS) is optimized for high concurrent query processingRoxie分布式文件系统(Roxie DFS)是专门为高并发查询处理做过优化的】

Core Software 【核心软件

  • Linux Operating System【Linux操作系统
  • Services for job execution【作业执行的“服务”
  • Services for distributed file system access分布式文件系统访问的“服务”】
  • A Thor cluster is also configured with a master node and multiple slave nodes【Thor集群可配置一个主节点和多个从节点
  • A Roxie cluster is a peer-coupled cluster where each node runs Server and Agent tasks for query execution and key and file processing【Roxie集群,是一个有着服务任务和代理任务的协同存在的集群,其上每个节点上都运行着查询执行、关键字和文件处理
  • The file system on the Roxie cluster is a distributed indexed-based file system which uses a custom B+Tree structure for data storage【Toxie集群的文件系统,是一个基于索引的、使用了常规B+树作为数据存储结构的分布式文件系统】
  • Indexes and data supporting queries are pre-built on Thor clusters and deployed to Roxie with portions of the index and data stored on each node支持查询的索引和数据是预先创建在Thor集群、并使用存储在每个节点上的索引和数据发布到Roxie集群上的索引和数据】

Additional Software【附加软件

  • ECL Agent acting on behalf of a client program to manage the execution of a job on a Thor cluster【ECL代理操作代表着客户端程序去管理在Thor集群上执行的作业
  • Roxie file system is optimized for high concurrent query processingRoxie分布式文件系统为高并发查询处理做过优化
  • ESP Server (Enterprise Services Platform) providing authentication, logging, security, and other services for the job execution and Web services environment【ESP服务(企业服务平台)提供验证、日志、安全和其他类作业执行的服务,以及Web服务环境】
  • Dali server which functions as the system data store for job workunit information and provides naming services for the distributed file systems【Dail服务,其功能,是提供作业工作单元信息的存储和分布式文件系统的命名服务

Tools 工具】

  • ECL IDE, the program development environmentECL 集成开发环境,程序开发环境】
  • ECL code migration toolECL 代码迁移工具】
  • Distributed File Utility (DFU)【分布式文件实用工具
  • Environment Configuration Utility环境配置实用工具】
  • Roxie Configuration UtilityRoxie配置实用工具】
  • ECLWatch is a Web-based utility program for monitoring the HPCC environment and includes queue management, distributed file system management, job monitoring, and system performance monitoring tools【ECLWatch是一个基于Web的、用来监控HPCC环境、管理查询、管理分布式文件系统进行、作业监控、系统性能监控工具的实用程序

Additional Information附加信息】

About the HPCC Distributed File System关于HPCC分布式文件系统】

The Thor DFS is record-oriented using a local Linux file system to store file parts. Files are initially loaded (sprayed) across nodes and each node has a single file part which can be empty for each distributed file. 【Thor是使用linxu本地文件系统存储文件的、记录的分布式文件系统。被初始加载的文件跨节点,每个节点作为一个分布式文件的部分,为空。】 Files are divided on even record/document boundaries specified by the user. 文件被用户指定分割为一致的的记录/文档界限】 Master/slave architecture with name services and file mapping information are stored on a separate server. Only one local file per node is required to represent a distributed file. 使用名称服务的主/从架构和文件映射信息被存储在单独的服务器。每个节点只有一个文件被要求代表一个分布式的文件。】 Read/write access is supported between clusters configured in the same environment. 读/写访问被配置为同样环境的机器间支持】 Utilizing special adapters allow files from external databases such as MySQL to be accessed, allowing transactional data to be integrated with DFS data and incorporated into batch jobs. 利用特殊的适配器可允许来自外部的数据、如访问MySQL,允许交易数据与使用DFS数据和合成为成批处理作业中的数据进行整合。】The Roxie DFS utilizes distributed B+Tree index files containing key information and data stored in local files on each node. 【Roxie利用分布式B+树索引文件涵盖关键信息和被存储在每个节点中的本地文件中的数据

Redundancy冗余】

The DFS for Thor and Roxie stores replicas of file parts on other configurable nodes to protect against disk and node failure.对于Thor和Roxie分布式文件系统,在其他配置节点上存储复制信息以防止产品或节点故障】 The Thor system offers either automatic or manual node swap and warm-start following a node failure, and jobs are restarted from last checkpoint or persist. 【在一个节点出现故障时,Thor系统提供自动或手动的节点切换“温启动”,作业被从上一次的“checkpoint”点重新开始或一直运行】Replicas are automatically used while copying data to the new node. 【当拷贝数据到新的节点时,复制被自动进行The Roxie system continues running following a node failure with a reduced number of nodes. 在一个节点出现故障时,Roxie系统可在已减少的节点系统上持续运行】

Job Execution Environment【作业执行环境

Thor utilizes a master/slave processing architecture.【Thor使用主/从处理架构 Processing steps defined in an ECL job can specify local (data processed separately on each node) or global (data is processed across all nodes) operation. 被定义在一个ECL作业的处理步骤可以指定本地(数据在每个节点被独立处理)或全局(数据被跨都有节点处理)操作】 Multiple processing steps for a procedure are executed automatically as part of a single job based on an optimized execution graph for a compiled ECL dataflow program. 对于一个已编译过的数据流程序,作为一个基于执行优化图的作业的部分的一个过程的多个处理步骤是被自动执行的】A single Thor cluster can be configured to run multiple jobs concurrently reducing latency if adequate CPU and memory resources are available on each node. 在每个节点上如果有足够的CPU和内存资源,一个Thor集群可以被配置为并行地运行多个作业且减少延迟】Middleware components including an ECLAgent, ECLServer, and Dali Server provide the client interface and manage execution of the job which is packaged as a workunit. 包括ECLAgent、 ECLServer、 and Dali Serve的中间件组件提供客户端接口和管理着被打包为一个工作单元的作业执行】 Roxie utilizes a multiple server/agent architecture to process ECL programs accessed by queries using server tasks acting as a manager for each query and multiple agent tasks as needed to retrieve and process data for the query. Roxie采用多服务/代理架构,通过使用服务任务作为每个查询管理者、通过多重代理任务收集和处理查询处理的数据来处被查询访问的ECL程序】

Programming Language【编程语言

ECL is the primary programming language for the HPCC environment. ECL是HPCC环境下的主要程序设计语言】ECL is compiled into optimized C++ which is then compiled into DLLs for execution on the Thor and Roxie platforms.【ECL被编译到优化过的C++、被编译为Thor和Roxie平台下的动态库 ECL can include inline C++ code encapsulated in functions. ECL可以包括内嵌C ++代码的功能封装】 External services can be written in any language and compiled into shared libraries of functions callable from ECL. 外部服务可以使用任何语言并且可以便意味共享函数库被ECL调用】A pipe interface allows execution of external programs written in any language to be incorporated into jobs. 【一个管道接口允许执行外部任何语言编写的程序以整合到作业中

Database Capabilities【数据库能力

The HPCC platform includes the capability to build multi-key, multivariate indexes on DFS files. 在分布式文件系统上,HPCC平台包括构建多关键字、多变量索引】These indexes can be used to improve performance and provide keyed access for batch jobs on a Thor system, or be used to support development of queries deployed to Roxie systems. 【那些索引可被用例改善性能、提供在Thor系统上的对于批处理作业的关键字访问、用于支持部署到Roxie系统的查询开发】Keyed access to data is supported directly in the ECL language. 基于关键字访问数据被直接在ECL语言中支持】

Online Query and Data Warehouse Capabilities【在线查询和数据库仓库能力

The Roxie system configuration in the HPCC platform is specifically designed to provide data warehouse capabilities for structured queries and data analysis applications. 在HPCC平台上,Roxie系统配置设计为能提供数据仓库能力以应对结构化查询和数据分析型应用Roxie is a high-performance platform capable of supporting thousands of users and providing sub-second response time depending on the application. 【Roxie是一个高性能平台有能力支持数千用户并且提供低于秒级的且依赖于应用的响应时间】

 类似资料: