Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs
如何分块?
-
Voronoi Digram 划分 The GVD computation can be easily implemented in the vertex- centric computing model, by performing multi-source BFS. Specif- ically, in superstep 1, each source s sets block(s) = s and broad- casts it to the neighbors; for each non-source vertex v, block(v) is unassigned. Finally, the vertex votes to halt. In superstep i (i > 1), if block(v) is unassigned, v sets block(v) to an arbitrary source received, and broadcasts block(v) to its neighbors before voting to halt. Otherwise, v votes to halt directly. When the process con- verges, we have block(v) = si for each v ∈ V C(si).
-
2D划分
The first job is vertex-centric and works as follows: (1)each worker samples a subset of its vertices with probability psamp and sends the sample to the master; (2)the master first partitions the sampled vertices into nx slots by the x-coordinates, and then each slot is further partitioned into ny slots by the y-coordinates.
如何管理块? (计算, 消息传递)
Block-centric algorithm. Our block-centric solution operates in VB-mode. Each vertex maintains the same fields as in the vertex- centric algorithm, and blocks do not maintain any information. In each superstep, V-compute() is first executed for all vertices, where a vertex v finds w∗ from the incoming messages as in the vertex- centric algorithm. However, now v votes to halt only if d(w∗) ≥ dist(v). Otherwise, v updates ⟨prev(v), dist(v)⟩ = ⟨w∗, d(w∗)⟩ but stays active. Then, B-compute() is executed, where each block B collects all its active vertices v into a priority queue Q (with dist(v) as the key), and makes these vertices vote to halt. B- compute() then runs Dijkstra’s algorithm on B using Q, which re- moves the vertex v ∈ Q with the smallest value of dist(v) from Q for processing each time. The out-neighbors u ∈ Γ(v) are updated as follows. For each u ∈ V (B), if dist(v)+l(v, u) < dist(u), we update ⟨prev(u), dist(u)⟩ to be ⟨v, dist(v)+l(v, u)⟩, and insert u into Q with key dist(u) if u ∈/ Q, or update dist(u) if u is already in Q. For each u ̸∈ V (B), a message ⟨v, dist(v) + l(v, u)⟩ is sent to u. B votes to halt when Q becomes empty. In the next superstep, if a vertex u receives a message, u is activated along with its block, and the block-centric computation repeats.
-
Highlight, page 1
For processing graphs with a large diameter δ, the message (or neighbor) propagation paradigm of the vertex-centric model often leads to algorithms that require O(δ) rounds (also called super- steps) of computation. -
Underline, page 1
For example, a single-source shortest path algorithm in [11] takes 10,789 supersteps on a USA road network. -
Underline, page 1
Apart from spatial networks, some large web graphs also have large diameters (from a few hundred to thousands). For example, the vertex-centric system in [14] takes 2,450 rounds for computing strongly connected components on a web graph. -
Highlight, page 2
loading: each worker loads a portion of vertices from HDFS into main-memory; the workers then exchange ver- tices through the network (by hashing over vertex ID) so that each worker wi finally holds all and only those vertices assigned to wi -
Highlight, page 3
BTC -
Highlight, page 3
Friendster -
Highlight, page 3
USA Road -
Highlight, page 3
On the contrary, the block- centric model works on G and a high-degree vertex involves at most O(n/b) messages each round, where n is the number of vertices in the giant CC. -
Highlight, page 3
The idea is to broadcast the smallest vertex ID seen so far by each vertex v, denoted by min(v). -
Highlight, page 3
Each vertex be- longs to a unique block, and let block(v) be the ID of the block that v belongs to. -
Highlight, page 4
Similar to a vertex in Pregel, a block in Blogel also has a com- pute() function. We use B-compute() and V-compute() to denote the compute() function of a block and a vertex, respectively. -
Highlight, page 4
A block has access to all its vertices, and can send messages to any block B or vertex v as long as worker(B) or worker(v) is available. -
Highlight, page 4
Each B-worker maintains two message buffers, one for exchang- ing vertex-level messages and the other for exchanging block-level messages. A block also has a state indicating whether it is active, and may vote to halt. -
Highlight, page 6
Block-centric algorithm. Our block-centric solution operates in VB-mode. Each vertex maintains the same fields as in the vertex- centric algorithm, and blocks do not maintain any information. In each superstep, V-compute() is first executed for all vertices, where a vertex v finds w∗ from the incoming messages as in the vertex- centric algorithm. However, now v votes to halt only if d(w∗) ≥ dist(v). Otherwise, v updates ⟨prev(v), dist(v)⟩ = ⟨w∗, d(w∗)⟩ but stays active. Then, B-compute() is executed, where each block B collects all its active vertices v into a priority queue Q (with dist(v) as the key), and makes these vertices vote to halt. B- compute() then runs Dijkstra’s algorithm on B using Q, which re- moves the vertex v ∈ Q with the smallest value of dist(v) from Q for processing each time. The out-neighbors u ∈ Γ(v) are updated as follows. For each u ∈ V (B), if dist(v)+l(v, u) < dist(u), we update ⟨prev(u), dist(u)⟩ to be ⟨v, dist(v)+l(v, u)⟩, and insert u into Q with key dist(u) if u ∈/ Q, or update dist(u) if u is already in Q. For each u ̸∈ V (B), a message ⟨v, dist(v) + l(v, u)⟩ is sent to u. B votes to halt when Q becomes empty. In the next superstep, if a vertex u receives a message, u is activated along with its block, and the block-centric computation repeats. -
Highlight, page 6
v finds the in-neighbor w∗ such that d(w∗) is the small- est among all d(w) received. -
Text Note, page 6
-
Highlight, page 7
We first review the Graph Voronoi Diagram (GVD) [3] of an undirected unweighted graph G = (V, E). -
Highlight, page 8
The sampling rate psamp decides the number of blocks, and usually a small value as 0.1% is a good choice. -
Highlight, page 8
As for the stopping parameters, γ is usually set as 90%, and pmax as 10% with f = 2, so that there will not be too many rounds of multi-source BFS. -
Highlight, page 8
The GVD computation can be easily implemented in the vertex- centric computing model, by performing multi-source BFS. -
Highlight, page 8
Specif- ically, in superstep 1, each source s sets block(s) = s and broad- casts it to the neighbors; for each non-source vertex v, block(v) is unassigned. Finally, the vertex votes to halt. In superstep i (i > 1), if block(v) is unassigned, v sets block(v) to an arbitrary source received, and broadcasts block(v) to its neighbors before voting to halt. Otherwise, v votes to halt directly. When the process con- verges, we have block(v) = si for each v ∈ V C(si). -
Highlight, page 8
The multi-source BFS has linear workload since each vertex only broadcasts a message to its neighbors when block(v) is assigned, and thus the total messages exchanged by all vertices is bounded by O(|E|). -
Highlight, page 8
Initially, each vertex v samples itself as a source with probability psamp. Then, multi- source BFS is performed to partition the vertices into blocks. -
Highlight, page 8
The first job is vertex-centric and works as follows: (1)each worker samples a subset of its vertices with probability psamp and sends the sample to the master; (2)the master first partitions the sampled vertices into nx slots by the x-coordinates, and then each slot is further partitioned into ny slots by the y-coordinates. -
Highlight, page 9
Data Type |V| |E| AVG Deg Max Deg Web Graphs WebUK directed 133,633,040 5,507,679,822 41.21 22,429 WebBase directed 118,142,155 1,019,903,190 8.63 3,841 Social Networks Friendster undirected 65,608,366 3,612,134,270 55.06 5,214 LiveJournal directed 10,690,276 224,614,770 21.01 1,053,676 RDF BTC undirected 164,732,473 772,822,094 4.69 1,637,619 Spatial Networks USA Road undirected 23,947,347 58,333,344 2.44 9 Euro Road undirected 18,029,721 44,826,904 2.49 12 -
Highlight, page 10
Giraph++ [18] proposed a graph coarsening method to reduce the size of the input graph so that METIS can run on the smaller graph.