**Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing **ROBERT RYAN MCCUNE, TIM WENINGER, and GREG MADEY, University of Notre Dame
-
Highlight, page
implements user- defined programs from the perspective of a vertex rather than a graph -
Highlight, page
interdependencies -
Highlight, page
However, for sequential graph algorithms, which require random access to all graph data, poor locality and the indivisibility of the graph structure cause time- and resource-intensive pointer chasing between storage mediums in order to access each datum. -
Highlight, page
MapReduce does not natively support iterative algo- rithms -
Highlight, page
identifying the tradeoffs in component implementations and providing data-driven discussion -
Highlight, page
subgraph-centric, or hybrid, frameworks -
Highlight, page
Think-like-a-vertex frameworks are platforms that iteratively execute a user-defined program over vertices of a graph. -
Highlight, page
Many graph problems can be solved by both a sequential, shared-memory algorithm and a distributed, vertex-centric algorithm. -
Highlight, page
TLAV frameworks are highly scalable and inherently parallel, with manage- able intermachine communication. -
Highlight, page
a given vertex to all other vertices in a graph -
Highlight, page
This algorithm, considered a distributed version of Bellman-Ford [Lynch 1996], is shown in Algorithm 1. -
Highlight, page
respective edge -
Highlight, page
detailing -
Highlight, page
proceeds -
Highlight, page
Communication -
Highlight, page
vertex program execution -
Highlight, page
interdependent -
Highlight, page
thorough understanding -
Highlight, page
interrelation -
Highlight, page
a sequential presentation -
Highlight, page
discussed earlier. -
Highlight, page
conceptually simple -
Highlight, page
the overhead becomes largely amortized for large graphs. -
Underline, page
synchronous systems are often implemented along with message-passing communication -
Highlight, page
batch messaging -
Highlight, page
low computation-to-communication ratio -
Underline, page
synchronization -
Underline, page
accounted for over 80% of the total running time [Chen et al. 2014a], -
Highlight, page
underutilized -
Highlight, page
the curse of the last reducer -
Highlight, page
straggler -
Highlight, page
lightweight -
Highlight, page
straggler -
Highlight, page
outperform -
Highlight, page
Theoretical and empirical research -
Highlight, page
One disadvantage, however, is that asynchronous execution cannot take advantage of batch messaging optimizations -
Highlight, page 1
at the expense of added complexity, not only from scheduling logic, but also from maintaining data consistency. Asynchronous systems typically implement shared memory -
Highlight, page 1
pseudo-supersteps [Chen et al. 2014a] -
Highlight, page 1
dynamic scheduling within a sin- gle superstep [Wang et al. 2013]. -
Highlight, page 1
reduces the number of su- persteps by decoupling intraprocessor computation from the interprocessor commu- nication and synchronization [Chen et al. 2014a]. -
Highlight, page 1
P++ framework [Zhou et al. 2014] -
Underline, page 1
boundary nodes -
Highlight, page 1
local nodes -
Highlight, page 1
pseudo-supersteps -
Highlight, page 1
GraphHP and P++, is the KLA paradigm [Harshvardhan et al. 2014] -
Highlight, page 1
KLA -
Highlight, page 1
is allowed for a certain number of levels before a synchronous round -
Highlight, page 1
KLA has multiple traversals of asyn- chronous execution before coordinating a round of synchronous execution -
Highlight, page 1
GRACE exposes a programming interface that, from within a given superstep, allows for prioritized exe- cution of vertices and selective receiving of messages outside of the previous superstep. -
Highlight, page 1
Motivated by the necessity for execution mode dynamism -
Highlight, page 1
PowerSwitch’s heuristics can accurately predict throughput -
Highlight, page 1
data for one process is directly and immediately accessible by another process -
Underline, page 1
In the case of the former -
Highlight, page 1
Otherwise -
Highlight, page 1
is flushed when it reaches a certain capacity, sending messages over the network in batches -
Highlight, page 1
Shared memory avoids the additional memory overhead constituted by messages and doesn’t require intermedi- ate processing by workers. -
Highlight, page 1
Chandy-Misra locking -
Highlight, page 1
derivative with -
Highlight, page 1
prioritized execution and low communication overhead -
Highlight, page 1
Cyclops is a synchronous shared-memory framework [Chen et al. 2014b] -
Highlight, page 1
a third method called active messages is implemented in the GRE framework [Yan et al. 2014b]. -
Highlight, page 1
Within the GRE architecture, active messages combine the process of sending and receiving messages, removing the need to store an intermediate state, like message queues or edge data -
Highlight, page 1
The GRE framework modifies the data graph into an Agent-Graph. -
Highlight, page 1
The Agent-Graph adds combiner and scatter vertices to the original graph in order to reduce intermachine messaging. -
Highlight, page 1
costly -
Highlight, page 1
IBM’s X-Pregel [Bao and Suzumura 2013], -
Highlight, page 1
plateaus -
Highlight, page 1
Computation for the combiner must be commutative and associative because order cannot be guaranteed -
Underline, page 1
Conversely -
Highlight, page 1
For algorithms that combine vertices into a supervertex, like Boruvka’s Minimum Spanning Tree [Chung and Condon 1996] -
Highlight, page 1
counterintuitively -
Highlight, page 1
Pregel, including ACM Computing Surveys, Vol. 48, No. 2, Article 25, Publication date: October 2015. Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks 25:17 its open-source implementations [Avery 2011; Seo et al. 2010] and several related variants [Salihoglu and Widom 2013; Bao and Suzumura 2013; Redekopp et al. 2013]. -
Highlight, page 1
The user provides two functions, one function that executes across each vertex in the active subset and another function that exe- cutes all outgoing edges in the subset. -
Highlight, page 1
Edge Centric. The X-Stream framework provides an edge-centric two-phase Scatter- Gather programming model [Roy et al. 2013], -
Highlight, page 1
Push Versus Pull. The flow of information for vertex programs can be character- ized as data being pushed or pulled [Nguyen et al. 2013; Hant et al. 2014; Cheng et al. 2012]. -
Highlight, page 1
Ligra is a single-machine graph-processing framework that dynamically switches between push- and pull-based operators based on a threshold. -
Highlight, page 1
interpartition edges -
Highlight, page 1
However, for graphs of even medium size, the high computational cost and necessary random access of the entire graph render METIS and related heuristics impractical. -
Highlight, page 1
The densification problem is addressed in Vaquero et al. [2014], wherein -
Highlight, page 1
More advanced label propagation schemes for partitioning are presented in Wang et al. [2014] and Slota et al. [2014]. -
Highlight, page 1
GraphBuilder [Jain et al. 2013] is a similar library ACM Computing Surveys, Vol. 48, No. 2, Article 25, Publication date: October 2015. 25:20  R. R. McCune et al. that, in addition to partitioning, supports an extensive variety of graph-loading-related processing tasks. -
Highlight, page 2
the partitioner may temporarily store a vertex and decide the partitioning later [Stanton and Kliot 2012]; -
Highlight, page 2
however, experiments demonstrate that performance remains relatively consistent for breadth-first, depth- first, and random orderings of a graph [Stanton and Kliot 2012; Tsourakakis et al. 2014]. -
Highlight, page 2
linear deterministic greedy (LDG), a heuristic that assigns a vertex to the parti- tion with which it shares the most edges while weighted by a penalty function linearly associated with a partition’s remaining capacity. -
Highlight, page 2
restreaming graph partitioning model, where a streaming partitioner is provided ac- cess to previous stream results [Nishimura and Ugander 2013]. -
Highlight, page 2
eightfold -
Highlight, page 2
thorough analysis -
Highlight, page 2
too computationally expensive -
Highlight, page 2
Good workload balance for skewed degree distributions can also be achieved with degree-based hashing [Xie et al. 2014]. -
Highlight, page 2
vary drastically over the course of computation, which creates processing imbalances and increases runtime. -
Highlight, page 2
such as graph coarsening [Wang et al. 2014]. -
Highlight, page 2
According to Salihoglu and Widom [2013], a dynamic repartitioning strategy must directly address (1) how to select vertices to reassign, (2) how and when to move the assigned vertices, and (3) how to locate the reassigned vertices. -
Highlight, page 2
the loading and partitioning can be performed in parallel [Salihoglu and Widom 2013]. -
Highlight, page 2
XPregel [Bao and Suzumura 2013] supports multithreading by dividing a partition into a user-defined number of sub- partitions -
Highlight, page 2
When a failure occurs, the system rolls back to the most recently saved point, all partitions are reloaded, and the entire system resumes process- ing from the checkpoint. -
Highlight, page 2
GraphLab [Low et al. 2012] implements asynchronous vertex checkpointing, based on Chandy-Lamport [Chandy and Lamport 1985] snapshots, -
Highlight, page 2
they are highly scal- able while providing a simple programming interface, -
Highlight, page 2
dictated -
Highlight, page 2
GraphChi. The seminal single-machine TLAV framework is GraphChi [Kyrola et al. 2012], -
Highlight, page 2
PathGraph also implements a path-centric compact storage system that improves compactness and locality [Yuan et al. 2014]. Because most iterative graph algorithms involve path traversal, -
Highlight, page 2
retaining -
Highlight, page 2
in varying degrees -
Highlight, page 2
shared either through vertex programs on boundary nodes or, in the case of Blogel, directly between subgraphs -
Highlight, page 2
Collectively, subgraph-centric frameworks dra- matically outperform TLAV frameworks, often by orders of magnitude in terms of computing time, number of messages, and total supersteps [Tian et al. 2013; Yan et al. 2014a]. -
Highlight, page 2
A vertex subset interface is implemented in Ligra [Shun and Blelloch 2013]. -
Highlight, page 2
Similarly, the Single Pivot (SP) optimization [Salihoglu and Widom 2014], first pre- sented in Quick et al. [2012], also temporarily adopts a global view. -
Highlight, page 2
the Parallel Boost Graph Library [Gregor and Lumsdaine 2005] -
Highlight, page 2
first-class citizens -
Highlight, page 2
Databases offer local or online queries, such as one-hop neighbors, whereas TLAV systems iteratively process the entire graph offline in batch -
Highlight, page 3
Distributed algorithms are a mature field of study [Lynch 1996], -
Highlight, page 3
Not all graphs are large enough to necessitate distributed processing, and not all graph problems need the whole graph to be computed iteratively.