第 11 章 性能

优质
小牛编辑
126浏览
2023-12-01

目录

11.1. 数据安全性

11.2. 数据完整性

11.3. 数据集成

11.4. 可用性和可靠性

11.5. 容量

11.1. 数据安全性

某些数据可能需要不受未经授权的访问 (如盗窃、 修改)。Neo4j 不明确,处理数据加密,但支持一切手段内置到 Java 编程语言和 JVM 来保护数据通过加密存储之前。此外,可以由文件系统级别上加密的数据存储区运行轻松地保护数据。最后,数据保护应周边系统上层中考虑为了防止刮、 恶意数据插入和其他威胁的问题。

11.2. 数据完整性

11.2.1. 核心图引擎

11.2.2. 不同的数据源

In order to keep data consistent, there needs to be mechanisms and structures that guarantee the integrity of all stored data. In Neo4j, data integrity is maintained for the core graph engine together with other data sources - see below.

11.2.1. 核心图引擎

In Neo4j, the whole data model is stored as a graph on disk and persisted as part of every committed transaction. In the storage layer, Relationships, Nodes, and Properties have direct pointers to each other. This maintains integrity without the need for data duplication between the different backend store files.

11.2.2. 不同的数据源

在一些情况下,与其他系统结合了为了实现最佳性能非图形执行查找的核心图形引擎。例如,11.3. 数据集成

11.3.1. Event-based Synchronization

11.3.2. Periodic Synchronization

11.3.3. Periodic Full Export/Import of Data

大多数企业主要依靠关系数据库来存储它们的数据,但这可能会导致性能限制。在某些情况下,可以作为扩展使用 Neo4j,以补充搜索/查找速度更快的决策。然而,在任何情况下多个数据存储库中包含相同的数据,同步可以是一个问题。在某些应用程序,是可以接受的搜索平台是稍有脱节与关系数据库。在其他国家,紧数据完整性 (eg。、 Neo4j 与 RDBMS) 是必要的。通常,这是处理数据的实时更改和发生的 RDBMS 的大容量数据更改。集成的数据同步的几个战略如下。

11.3.1. Event-based Synchronization

In this scenario, all data stores, both RDBMS and Neo4j, are fed with domain-specific events via an event bus. Thus, the data held in the different backends is not actually synchronized but rather replicated.

11.3.2. Periodic Synchronization

另一个可行的方案是通过某种形式的 SQL 查询 Neo4j 到 RDBMS 的最新变化的定期出口。这允许在同步过程中的少量的等待时间,但已使用 RDBMS 作为主节点的所有数据的目的的优势。作为主数据源,可以与 Neo4j 应用相同的过程。

11.3.3. Periodic Full Export/Import of Data

Using the Batch Inserter tools for Neo4j, even large amounts of data can be imported into the database in very short times. Thus, a full export from the RDBMS and import into Neo4j becomes possible. If the propagation lag between the RDBMS and Neo4j is not a big issue, this is a very viable solution.

11.4. 可用性和可靠性

11.4.1. Operational Availability

11.4.2. Disaster Recovery/ Resiliency

Most mission-critical systems require the database subsystem to be accessible at all times. Neo4j ensures availability and reliability through a few different strategies.

11.4.1. Operational Availability

In order not to create a single point of failure, Neo4j supports different approaches which provide transparent fallback and/or recovery from failures.

Online backup (Cold spare)

In this approach, a single instance of the master database is used, with Online Backup enabled. In case of a failure, the backup files can be mounted onto a new Neo4j instance and reintegrated into the application.

Online Backup High Availability (Hot spare)

Here, a Neo4j "backup" instance listens to online transfers of changes from the master. In the event of a failure of the master, the backup is already running and can directly take over the load.

High Availability cluster

This approach uses a cluster of database instances, with one (read/write) master and a number of (read-only) slaves. Failing slaves can simply be restarted and brought back online. Alternatively, a new slave may be added by cloning an existing one. Should the master instance fail, a new master will be elected by the remaining cluster nodes.

11.4.2. Disaster Recovery/ Resiliency

In cases of a breakdown of major part of the IT infrastructure, there need to be mechanisms in place that enable the fast recovery and regrouping of the remaining services and servers. In Neo4j, there are different components that are suitable to be part of a disaster recovery strategy.

Prevention

- Online Backup High Availability to other locations outside the current data center.

- Online Backup to different file system locations: this is a simpler form of backup, applying changes directly to backup files; it is thus more suited for local backup scenarios.

- Neo4j High Availability cluster: a cluster of one write-master Neo4j server and a number of read-slaves, getting transaction logs from the master. Write-master failover is handled by quorum election among the read-slaves for a new master.

Detection

- SNMP and JMX monitoring can be used for the Neo4j database.

Correction

- Online Backup: A new Neo4j server can be started directly on the backed-up files and take over new requests.

- Neo4j High Availability cluster: A broken Neo4j read slave can be reinserted into the cluster, getting the latest updates from the master. Alternatively, a new server can be inserted by copying an existing server and applying the latest updates to it.

11.5. 容量

11.5.1. File Sizes

Neo4j 依赖 Java 的非阻塞 I/O 子系统的所有文件处理。此外,虽然相互关联的数据,优化存储文件布局 Neo4j 不需要原始设备。因此,filesizes 只局限于底层操作系统处理大型文件的能力。物理上,没有内置限制的处理能力在 Neo4j 中的文件。Neo4j 尝试内存映射到尽可能多的底层存储文件尽可能。如果可用的 RAM 不足将所有数据保存在 RAM 中,Neo4j 将使用缓冲区在某些情况下,动态地重新分配内存映射的高性能 I/O 窗口,大多数 I/O 活动的地区。因此,酸速度会优雅地降低如 RAM 成为制约因素。

11.5.2. Read speed

企业想要优化的硬件,以提供最大的业务价值,从可用的资源使用。Neo4j 的办法对读取数据提供了最佳地利用所有可用的硬件资源。Neo4j 不会阻止或锁定任何读的操作 ;因此,没有危险的死锁在读取操作和无需读取的交易。具有对数据库的线程读取访问,可能是可用的所有处理器上可同时运行的查询。这很好扩大规模的方案提供更大的服务器.

11.5.3.写入速度

写入速度是很多企业应用程序的一个考虑。然而,有两种不同方案: 持续的连续运转和批量访问 (例如,备份、 初始或批量加载)。为了支持这些方案的不同要求,Neo4j 支持写入存储层的两种模式。在事务性、 酸符合正常操作中,隔离级别是保持和阅读写作过程同时发生操作。在每个提交数据保存到磁盘,可以恢复到系统故障时的一致状态。这需要磁盘写入访问权限和刷新数据的真实。因此,Neo4j 在连续模式下的单个服务器上的写入速度受到硬件 I/O 容量的限制。因此,使用快速策略性污水排放计划的强烈建议为生产方案.

Neo4j has a Batch Inserter that operates directly on the store files. This mode does not provide transactional security, so it can only be used when there is a single write thread. Because data is written sequentially, and never flushed to the logical logs, huge performance boosts are achieved. The Batch Inserter is optimized for non-transactional bulk import of large amounts of data.

11.5.4. Data size

- 在 Neo4j,数据大小主要是由主键的地址空间节点、 关系、 属性和 RelationshipTypes 限制。目前,地址空间如下: 2ˆ35 (~ 340 亿) 节点 2ˆ35 (~ 340 亿) 关系 2ˆ36 (~ 680 亿) 属性 2ˆ15 (~ 32 000) 关系类型