第 13 章 数据导入
为高性能数据导入,建议使用本章中描述的批量插入设施。将数据导入 Neo4j 的其他方法包括使用小鬼图形导入(see 第 18.18.2 节 “Load a sample graph”) or using the Geoff notation (see http://geoff.nigelsmall.net/).
13.1. 批量插入
13.1.1. Batch Inserter Examples
Neo4j 拥有一批插入设施用于初始的进口,绕过交易和其他检查支持性能。这是非常有用的当你有一个大的数据集,需要加载一次.
Batch insertion is inlcuded in the neo4j-kernelcomponent, which is part of all Neo4j distributions and editions.
Be aware of the following points when using batch insertion:
- The intended use is for initial import of data.
- Batch insertion is not thread safe.
- Batch insertion is non-transactional.
- Unless shutdownis successfully invoked at the end of the import, the database files willbe corrupt.
警告 | |
Always perform batch insertion in a single thread(or use synchronization to make only one thread at a time access the batch inserter) and invoke shutdownwhen finished. |
13.1.1. Batch Inserter Examples
Creating a batch inserter is similar to how you normally create data in the database, but in this case the low-level BatchInserterinterface is used. As we have already pointed out, you can’t have multiple threads using the batch inserter concurrently without external synchronization.
提示 | |
The source code of the examples is found here: BatchInsertExampleTest.java |
To get hold of a BatchInseter, use BatchInsertersand then go from there:
1 2 3 4 5 6 7 8 9 10 11 | BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example"); Map<String, Object> properties = newHashMap<String, Object>(); properties.put( "name", "Mattias"); longmattiasNode = inserter.createNode( properties ); properties.put( "name", "Chris"); longchrisNode = inserter.createNode( properties ); RelationshipType knows = DynamicRelationshipType.withName( "KNOWS"); // To set properties on the relationship, use a properties map // instead of null as the last parameter. inserter.createRelationship( mattiasNode, chrisNode, knows, null); inserter.shutdown(); |
To gain good performance you probably want to set some configuration settings for the batch inserter. Read 第 21.9.2 节 “Batch insert example”for information on configuring a batch inserter. This is how to start a batch inserter with configuration options:
1 2 3 4 5 6 | Map<String, String> config = newHashMap<String, String>(); config.put( "neostore.nodestore.db.mapped_memory", "90M"); BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example-config", config ); // Insert data here ... and then shut down: inserter.shutdown(); |
In case you have stored the configuration in a file, you can load it like this:
1 2 3 4 5 6 | Map<String, String> config = MapUtil.load( newFile( "target/batchinsert-config") ); BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example-config", config ); // Insert data here ... and then shut down: inserter.shutdown(); |
13.1.2. Batch Graph Database
如果您已经有数据导入依据正常 Neo4j API 编写的代码,您可以考虑使用公开的 API 的批量插入器.
注意 | |
This will not perform as good as using the BatchInserterAPI directly. |
Also be aware of the following:
- Starting a transaction or invoking Transaction.finish()or Transaction.success()will do nothing.
- Invoking the Transaction.failure()method will generate a NotInTransactionexception.
- Node.delete()and Node.traverse()are not supported.
- Relationship.delete()is not supported.
- Event handlers and indexes are not supported.
- GraphDatabaseService.getRelationshipTypes(), getAllNodes()and getAllRelationships()are not supported.
With these precautions in mind, this is how to do it:
1 2 3 4 5 6 7 8 9 | GraphDatabaseService batchDb = BatchInserters.batchDatabase( "target/batchdb-example"); Node mattiasNode = batchDb.createNode(); mattiasNode.setProperty( "name", "Mattias"); Node chrisNode = batchDb.createNode(); chrisNode.setProperty( "name", "Chris"); RelationshipType knows = DynamicRelationshipType.withName( "KNOWS"); mattiasNode.createRelationshipTo( chrisNode, knows ); batchDb.shutdown(); | |
提示 | ||
The source code of the example is found here: BatchInsertExampleTest.java |
13.1.3. 批量插入数据
对批量插入的一般说明,请参见 batchinsert.
Indexing during batch insertion is done using BatchInserterIndexwhich are provided via BatchInserterIndexProvider. An example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | BatchInserter inserter = BatchInserters.inserter( "target/neo4jdb-batchinsert"); BatchInserterIndexProvider indexProvider = newLuceneBatchInserterIndexProvider( inserter ); BatchInserterIndex actors = indexProvider.nodeIndex( "actors", MapUtil.stringMap( "type", "exact") ); actors.setCacheCapacity( "name", 100000); Map<String, Object> properties = MapUtil.map( "name", "Keanu Reeves"); longnode = inserter.createNode( properties ); actors.add( node, properties ); //make the changes visible for reading, use this sparsely, requires IO! actors.flush(); // Make sure to shut down the index provider as well indexProvider.shutdown(); inserter.shutdown(); |
配置参数都相同,如中所述第 14.10 节 “Configuration and fulltext indexes”.
Best practices
这里有一些指针,以便获取 BatchInserterIndex 之外的大多数性能:
- Try to avoid flushingtoo often because each flush will result in all additions (since last flush) to be visible to the querying methods, and publishing those changes can be a performance penalty.
- Have (as big as possible) phases where one phase is either only writes or only reads, and don’t forget to flush after a write phase so that those changes becomes visible to the querying methods.
- Enable cachingfor keys you know you’re going to do lookups for later on to increase performance significantly (though insertion performance may degrade slightly).
注意 | |
对索引的更改都可以用于第一次读取后他们被刷新到磁盘。因此,为获得最佳性能,阅读和查找操作应保持最低期间 batchinsertion 因为他们涉及 IO 和产生消极影响速度。 |