原来数据插入:对数据进行预处理,分别创建实体节点和节点的联系,使用cql语句,例:
paper = Node(self.Paper, title=line[0],author=line[1],organ=line[2],keyword=line[3])
self.graph.create(paper)
更新:在之前的基础上,将实体节点和节点之前的联系创建csv文件,使用neo4j,load csv导入数据。
插入文章节点
LOAD CSV WITH HEADERS FROM "file:///paper_node.csv" AS line
CREATE (p:Paper{title: line.title,author:line.author,keyword:line.keyword,srcDatabase:line.srcDatabase, source:line.source, download:line.download, quote:line.quote, year:line.year, url:line.url})
插入作者节点
LOAD CSV WITH HEADERS FROM "file:///author_node.csv" AS line
create (p:Author{name:line.name})
CREATE CONSTRAINT ON (c:`作者`)
ASSERT c.id IS UNIQUE
插入关键词节点
LOAD CSV WITH HEADERS FROM "file:///keyword_node.csv" AS line
create (p:Keyword{name:line.name})
创建索引
CREATE INDEX on:Paper(title)
CREATE INDEX on:Keyword(name)
CREATE INDEX on:Author(name)
CREATE CONSTRAINT ON (c:Keyword)
ASSERT c.id IS UNIQUE
CREATE CONSTRAINT ON (c:Paper)
ASSERT c.id IS UNIQUE
CREATE CONSTRAINT ON (c:Author)
ASSERT c.id IS UNIQUECR```
插入关系
插入关系
LOAD CSV WITH HEADERS FROM "file:///paper_author_relation.csv" AS line
MATCH (entity1:Paper{title:line.name}),(entity2:Author{name:line.name2})
where not (entity1)-[:Author]->(entity2)
CREATE (entity1)-[r:Author]->(entity2)
LOAD CSV WITH HEADERS FROM "file:///paper_keyword_relation.csv" AS line
MATCH (entity1:Paper{title:line.name}),(entity2:Keyword{name:line.name2})
where not (entity2)-[:is_Keyword]->(entity1)
CREATE (entity2)-[:is_Keyword]->(entity1)
LOAD CSV WITH HEADERS FROM "file:///paper_author_relation.csv" AS line
MATCH (entity1:Paper{title:line.name}),(entity2:Author{name:line.name2})
where not (entity2)-[:is_Author]->(entity1)
CREATE (entity2)-[:is_Author]->(entity1)
LOAD CSV WITH HEADERS FROM "file:///paper_keyword_relation.csv" AS line
MATCH (entity1:Paper{title:line.name}),(entity2:Keyword{name:line.name2})
where not (entity1)-[:Keyword]->(entity2)
CREATE (entity1)-[:Keyword]->(entity2)
MATCH p=()-[r:Keyword]->() delete r
建立模式索引
需要使用Cypher语句:CREATE INDEX ON: 标签(待查字段)。一在浏览器http://172.18.34.25:7474/browser/网页上,分别为待查字段建立模式索引。索引建立后只是Populating状态,重启数据库并关闭网页
可用“:schema”指令查看当前数据库中已建好的所有索引和索引是否ONLINE。
explain match data=(na)-[r]->(nb:company{name:'ss'}) return data
explain可查看cql运行过程最后一行查看索引
优化neo4j配置文件
neo4j-community-3.3.7/conf/下打开neo4j.conf文件并进行相应修改。经过查阅一些资料得知,通过添加jvm虚拟环境可以提高数据库的查询速度,即取消neo4j配置文件中关于dbms.memory.heap.initial_size=512m;dbms.memory.heap.max_size=512m两行的注释,并做合适的修改(最大堆内存越大越好,但是要小于机器的物理内存)
优化Cypher查询语句
少用where,尽量直接在match那一行添加节点属性
使用unwind
neo4j刚启动数据需要预热
进入neo4j命令行界面,执行以下语句预热:
MATCH (n)
OPTIONAL MATCH (n)-[r]->()
RETURN count(n.name) + count(r);