DGL笔记1——用DGL表示图

孔宇

2023-12-01

DGL是如何表示一个图的

原文：How Does DGL Represent A Graph?

DGL笔记1——用DGL表示图
 DGL笔记2——用DGL识别节点
 DGL笔记3——自己写一个GNN模型

今天我们来学习一下 DGL 是如何表示一个图的。我们会学到以下内容：

从零开始新建一个图
将节点和边特征分配给图
查询DGL图的属性
将DGL图转换为其他图
加载和保存DGL图

最简单的安装

建议安装在 conda 虚拟环境里面。我用的Mac所以按照的CPU版本，各位按需自取。这里一笔带过。

conda install jupyter
conda install pytorch torchvision torchaudio -c pytorch
conda install -c dglteam dgl

DGL图的构建

DGL 将有向图表示为一个 DGL 图对象。图中的节点编号连续，从0开始。我们一般通过指定图中的节点数，以及源节点和目标节点的列表，来构建这么一个图。

举个 ，下面的代码构造了一个图，这个图有五个叶子节点。中心节点的 ID 为 0，边从中心节点处罚，指向众多的叶子节点。

import dgl
import numpy as np
import torch

g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]), num_nodes=6)
# 同样地，PyTorch LongTensors 也可以使用
g = dgl.graph((torch.LongTensor([0, 0, 0, 0, 0]), torch.LongTensor([1, 2, 3, 4, 5])), num_nodes=6)

# 如果你可以从 edge list 中看出有多少个节点，也可以不制定 nodes 的数量
g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]))

在这个图中，边具有从0开始且连续的ID。并且在创建的过程中，边的顺序和源节点到目标节点列表的顺序相同。换句话说，我们在创建 g 的时候，并不需要特地指定边，而是直接通过起始点列表，也就是 [0, 0, 0, 0, 0] 和目标点列表 [1, 2, 3, 4, 5] 来自动生成边。

# 打印每条边的源节点和目标节点
print(g.edges())

Out:

(tensor([0, 0, 0, 0, 0]), tensor([1, 2, 3, 4, 5]))

插一句，刚刚我们简单构建了一个有向图。如果我们想构建一个无向图，那么可以当做一个双向图来构建。至于双向图怎么构建，以后会讲的。

为图指定节点和边的特征

我们建立的图，往往其边和节点都是有特定的属性的。在现实世界中，图中节点代表的实体可能有多种多样的属性，比如“人”实体可能有性别、年龄、姓名等等属性。

不过在 DGLGraph 中，我们的属性都是张量化的存储的，因此所有的节点或者边的属性都具有相同的维度（shape）。当然我们现在是为了学习图神经网络嘛，所以这里我们就把这些属性称为“特征”。

我们可以采用 ndate 和 edata 来给节点（node）和边（edge）赋予特征。

# 为每个节点赋予一个 3维 的特征向量，总共6个节点。
g.ndata['x'] = torch.randn(6, 3)

# 为每条边赋予一个 4维 的特征向量，总共5条节点。
g.edata['a'] = torch.randn(5, 4)

# 为每个节点赋予一个 5x4 的特征矩阵，总共6个节点。
# 注意在 DGL 中，点和边的特征可以是多维的。
g.ndata['y'] = torch.randn(6, 5, 4)

print(g.edata['a'])

Out:

 tensor([[ 0.5495, -0.6351, -0.7856,  0.2069],
         [-0.6389,  1.0886,  1.7315,  0.8709],
         [-0.7497,  0.7965, -0.6145, -0.8878],
         [-1.2389, -1.1952,  0.3827,  1.6328],
         [ 0.7168,  1.2491,  0.7941,  0.1229]])

请注意，这里我们为 ndata 赋予了 x 和 y 两种特征，这里的 x 和 y 就是节点的特征名称，应该是作为一个key去查询所有节点对应的 tensor 列表，然后返回相应的值。同理我们可以赋予更多的特征，然后给特征起名。

官方在这里还给出了其他建议：

对于分类属性（例如性别、职业），请考虑将它们转换为整数或 one-hot 编码。

对于可变长度的字符串内容（例如新闻文章），请考虑应用语言模型。

对于图像，请考虑应用 CNN 等 CV 模型。

图结构查询

DGLGraph 对象提供了不同的方法，以方便我们查询图的结构。

# 查询节点数量
print(g.num_nodes())
# 查询边数量
print(g.num_edges())
# 中心节点 0 的出度
print(g.out_degrees(0))
# 中心节点 0 的入度，这里是有向图所以入度应该为0
print(g.in_degrees(0))

图变换

子图

DGL 提供了许多API，让我们可以将图转换为其他结构，比如提取一个子图。

# 从原图的节点0、节点1和节点3产生一个子图。
sg1 = g.subgraph([0, 1, 3])
# 从原图的边0、边1和边3产生一个子图。
sg2 = g.edge_subgraph([0, 1, 3])

通过 dgl.NID 或 dgl.EID 我们可以获得从子图到原图的节点/边映射，如下：

# The original IDs of each node in sg1
print(sg1.ndata[dgl.NID])
# The original IDs of each edge in sg1
print(sg1.edata[dgl.EID])
# The original IDs of each node in sg2
print(sg2.ndata[dgl.NID])
# The original IDs of each edge in sg2
print(sg2.edata[dgl.EID])

Out:

tensor([0, 1, 3])
tensor([0, 2])
tensor([0, 1, 2, 4])
tensor([0, 1, 3])

这方便我们查询子图和原图的关系。

此外， subgraph 和 edge_subgraph 也复制了原图的点和边特征到子图：

# The original node feature of each node in sg1
print(sg1.ndata['x'])
# The original edge feature of each node in sg1
print(sg1.edata['a'])
# The original node feature of each node in sg2
print(sg2.ndata['x'])
# The original edge feature of each node in sg2
print(sg2.edata['a'])

Out:

tensor([[ 0.6390, -1.1941,  1.3503],
        [-0.7303,  1.0482,  1.4569],
        [ 0.4740, -0.8831, -1.0024]])
tensor([[ 0.5495, -0.6351, -0.7856,  0.2069],
        [-0.7497,  0.7965, -0.6145, -0.8878]])
tensor([[ 0.6390, -1.1941,  1.3503],
        [-0.7303,  1.0482,  1.4569],
        [ 0.6915, -0.0305,  1.8400],
        [ 0.1856,  1.2294, -0.7548]])
tensor([[ 0.5495, -0.6351, -0.7856,  0.2069],
        [-0.6389,  1.0886,  1.7315,  0.8709],
        [-1.2389, -1.1952,  0.3827,  1.6328]])

增加反向边

还有一种常用的变换就是，使用 dgl.add_reverse_edges ，向原图的每一条边都增加一个反向边。比如你希望建立一个双向图的时候，这就很有用了。再强调一下，如果你想建立一个无向图，那最好当做双向图来建立。

newg = dgl.add_reverse_edges(g)
newg.edges()

Out:

(tensor([0, 0, 0, 0, 0, 1, 2, 3, 4, 5]),
 tensor([1, 2, 3, 4, 5, 0, 0, 0, 0, 0]))

保存和读取图

如果我们想要保存，然后再读取我们建立的图，应该怎么办呢？
保存图可以使用 dgl.save_graphs 来保存，然后把这个图给读回来则可以使用 dgl.load_graphs，如下：

# 保存图
dgl.save_graphs('graph.dgl', g)
dgl.save_graphs('graphs.dgl', [g, sg1, sg2])

# 读取图
(g,), _ = dgl.load_graphs('graph.dgl')
print(g)
(g, sg1, sg2), _ = dgl.load_graphs('graphs.dgl')
print(g)
print(sg1)
print(sg2)

Out:

Graph(num_nodes=6, num_edges=5,
      ndata_schemes={'y': Scheme(shape=(5, 4), dtype=torch.float32), 'x': Scheme(shape=(3,), dtype=torch.float32)}
      edata_schemes={'a': Scheme(shape=(4,), dtype=torch.float32)})
Graph(num_nodes=6, num_edges=5,
      ndata_schemes={'y': Scheme(shape=(5, 4), dtype=torch.float32), 'x': Scheme(shape=(3,), dtype=torch.float32)}
      edata_schemes={'a': Scheme(shape=(4,), dtype=torch.float32)})
Graph(num_nodes=3, num_edges=2,
      ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'y': Scheme(shape=(5, 4), dtype=torch.float32), 'x': Scheme(shape=(3,), dtype=torch.float32)}
      edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'a': Scheme(shape=(4,), dtype=torch.float32)})
Graph(num_nodes=4, num_edges=3,
      ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'y': Scheme(shape=(5, 4), dtype=torch.float32), 'x': Scheme(shape=(3,), dtype=torch.float32)}
      edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'a': Scheme(shape=(4,), dtype=torch.float32)})

当使用了 save_graphs 命令后，我们可以看到在当前目录下多了两个 .dgl 文件，这就是我们保存的两个图。其中 graphs.dgl 文件包含了之前创建的原图和两个子图。

DGL笔记1——用DGL表示图

DGL是如何表示一个图的

最简单的安装

DGL图的构建

为图指定节点和边的特征

图结构查询

图变换

子图

增加反向边

保存和读取图

相关阅读

相关文章

相关问答

相关文档