Collaborative Task Engagement (CTE) is a software decomposition technique for nested patterns that achieves sustained high warp execution efficiencies across irregular inputs and provides portable performance.
CTE assigns a group of coarse-grained tasks to the warp and allows threads inside the warp to carry out the expanded list of fine-grained tasks collaboratively, hence, avoiding over-subscription or under-subscription of threads.
[ Paper ]
Collaborative Context Collection (CCC) is a compiler technique that can increase the warp execution efficiency when faced with thread divergence incurred either by different intra-warp task assignment or by intra-warp load imbalance.
CCC collects the relevant registers of divergent threads in a warp specific stack allocated in the fast shared memory, and restores them only when the perfect utilization of warp lanes becomes feasible.
[ Paper ] [ Slides ]
Warp Segmentation (WS) is a method that greatly enhances GPU device utilization in vertex-centric graph processing by dynamically assigning appropriate number of SIMD threads to process a vertex with irregular-sized neighbors while employing compact CSR representation to maximize the graph size that can be kept inside the GPU global memory.
In addition, Vertex Refinement (VR) addresses the challenge of judiciously using the limited bandwidth available for transferring data between GPUs via the PCIe bus.
Simultaneous employment of WS and VR provides scalable and SIMD efficient vertex-centric graph processing.
[ Paper ] [ Slides ] [ Download and Usage ]
PaRMAT is a tool designed to create very large RMAT graphs on machines with limited amount of memory.
It provides various options for the RMAT graph: being directed or non-directed, disallowing duplicate edges, sorting the output, etc.
[ Download and Usage ]
Stadium Hashing (Stash) is a GPU hashing method that accompanies the hash table with a compact data structure named ticket-board to enable scalable out-of-core hashing.
Due to its utilization of double hashing, Stash supports concurrent mixed operations; And with a novel technique named Collaborative Lanes, it enhances warp execution efficiency of the hashing procedure.
[ Paper ] [ Slides ]
CuSha is a CUDA-based vertex-centric graph processing framework that uses G-Shards and Concatenated Windows representations.
Compared to CSR and Virtual Warp-Centric method, such representations increase overall warp utilization and memory access efficiency, leading to faster graph processing on the GPU.
[ Paper ] [ Slides ] [ Download and Usage ] [ Download CPU multi-threaded vertex-centric CSR graph processor ]