第 13 章 数据导入

优质
小牛编辑
124浏览
2023-12-01

为高性能数据导入,建议使用本章中描述的批量插入设施。将数据导入 Neo4j 的其他方法包括使用小鬼图形导入(see 第 18.18.2 节 “Load a sample graph”) or using the Geoff notation (see http://geoff.nigelsmall.net/).

13.1. 批量插入

13.1.1. Batch Inserter Examples

13.1.2. Batch Graph Database

13.1.3. 批量插入数据

Neo4j 拥有一批插入设施用于初始的进口,绕过交易和其他检查支持性能。这是非常有用的当你有一个大的数据集,需要加载一次.

Batch insertion is inlcuded in the neo4j-kernelcomponent, which is part of all Neo4j distributions and editions.

Be aware of the following points when using batch insertion:

- The intended use is for initial import of data.

- Batch insertion is not thread safe.

- Batch insertion is non-transactional.

- Unless shutdownis successfully invoked at the end of the import, the database files willbe corrupt.

警告

Always perform batch insertion in a single thread(or use synchronization to make only one thread at a time access the batch inserter) and invoke shutdownwhen finished.

13.1.1. Batch Inserter Examples

Creating a batch inserter is similar to how you normally create data in the database, but in this case the low-level BatchInserterinterface is used. As we have already pointed out, you can’t have multiple threads using the batch inserter concurrently without external synchronization.

提示

The source code of the examples is found here: BatchInsertExampleTest.java

To get hold of a BatchInseter, use BatchInsertersand then go from there:

1

2

3

4

5

6

7

8

9

10

11

BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example");

Map<String, Object> properties = newHashMap<String, Object>();

properties.put( "name", "Mattias");

longmattiasNode = inserter.createNode( properties );

properties.put( "name", "Chris");

longchrisNode = inserter.createNode( properties );

RelationshipType knows = DynamicRelationshipType.withName( "KNOWS");

// To set properties on the relationship, use a properties map

// instead of null as the last parameter.

inserter.createRelationship( mattiasNode, chrisNode, knows, null);

inserter.shutdown();

To gain good performance you probably want to set some configuration settings for the batch inserter. Read 第 21.9.2 节 “Batch insert example”for information on configuring a batch inserter. This is how to start a batch inserter with configuration options:

1

2

3

4

5

6

Map<String, String> config = newHashMap<String, String>();

config.put( "neostore.nodestore.db.mapped_memory", "90M");

BatchInserter inserter = BatchInserters.inserter(

"target/batchinserter-example-config", config );

// Insert data here ... and then shut down:

inserter.shutdown();

In case you have stored the configuration in a file, you can load it like this:

1

2

3

4

5

6

Map<String, String> config = MapUtil.load( newFile(

"target/batchinsert-config") );

BatchInserter inserter = BatchInserters.inserter(

"target/batchinserter-example-config", config );

// Insert data here ... and then shut down:

inserter.shutdown();

13.1.2. Batch Graph Database

如果您已经有数据导入依据正常 Neo4j API 编写的代码,您可以考虑使用公开的 API 的批量插入器.

注意

This will not perform as good as using the BatchInserterAPI directly.

Also be aware of the following:

- Starting a transaction or invoking Transaction.finish()or Transaction.success()will do nothing.

- Invoking the Transaction.failure()method will generate a NotInTransactionexception.

- Node.delete()and Node.traverse()are not supported.

- Relationship.delete()is not supported.

- Event handlers and indexes are not supported.

- GraphDatabaseService.getRelationshipTypes(), getAllNodes()and getAllRelationships()are not supported.

With these precautions in mind, this is how to do it:

1

2

3

4

5

6

7

8

9

GraphDatabaseService batchDb =

BatchInserters.batchDatabase( "target/batchdb-example");

Node mattiasNode = batchDb.createNode();

mattiasNode.setProperty( "name", "Mattias");

Node chrisNode = batchDb.createNode();

chrisNode.setProperty( "name", "Chris");

RelationshipType knows = DynamicRelationshipType.withName( "KNOWS");

mattiasNode.createRelationshipTo( chrisNode, knows );

batchDb.shutdown();

提示

The source code of the example is found here: BatchInsertExampleTest.java

13.1.3. 批量插入数据

对批量插入的一般说明,请参见 batchinsert.

Indexing during batch insertion is done using BatchInserterIndexwhich are provided via BatchInserterIndexProvider. An example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

BatchInserter inserter = BatchInserters.inserter( "target/neo4jdb-batchinsert");

BatchInserterIndexProvider indexProvider =

newLuceneBatchInserterIndexProvider( inserter );

BatchInserterIndex actors =

indexProvider.nodeIndex( "actors", MapUtil.stringMap( "type", "exact") );

actors.setCacheCapacity( "name", 100000);

Map<String, Object> properties = MapUtil.map( "name", "Keanu Reeves");

longnode = inserter.createNode( properties );

actors.add( node, properties );

//make the changes visible for reading, use this sparsely, requires IO!

actors.flush();

// Make sure to shut down the index provider as well

indexProvider.shutdown();

inserter.shutdown();

配置参数都相同,如中所述第 14.10 节 “Configuration and fulltext indexes”.

Best practices

这里有一些指针,以便获取 BatchInserterIndex 之外的大多数性能:

- Try to avoid flushingtoo often because each flush will result in all additions (since last flush) to be visible to the querying methods, and publishing those changes can be a performance penalty.

- Have (as big as possible) phases where one phase is either only writes or only reads, and don’t forget to flush after a write phase so that those changes becomes visible to the querying methods.

- Enable cachingfor keys you know you’re going to do lookups for later on to increase performance significantly (though insertion performance may degrade slightly).

注意

对索引的更改都可以用于第一次读取后他们被刷新到磁盘。因此,为获得最佳性能,阅读和查找操作应保持最低期间 batchinsertion 因为他们涉及 IO 和产生消极影响速度。