Filed under: Papers
This paper evaluates features of graph coloring algorithms implemented on graphics processing units (GPUs), comparing coloring heuristics and thread decompositions. As compared to prior work on graph coloring for other parallel architectures, we find that the large number of cores and relatively high global memory bandwidth of a GPU lead to different strategies for the paral lel implementation. Specifically, we find that a simple uniform block partitioning is very effective on GPUs and our parallel coloring heuristics lead to the same or fewer colors than prior approaches for distributed-memory cluster architecture. Our a lgorithm resolves many coloring conflicts across partitioned blocks on the GPU by iterating through the coloring process, befo re returning to the CPU to resolve remaining conflicts. With this approach we get as few color (if not fewer) than the best se quential graph coloring algorithm and performance is close to the fastest sequential graph coloring algorithms which have poor color quality.