TY - GEN
T1 - Speeding up spmv for power-law graph analytics by enhancing locality vectorization
AU - Yesil, Serif
AU - Heidarshenas, Azin
AU - Morrison, Adam
AU - Torrellas, Josep
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Graph analytics applications often target large-scale web and social networks, which are typically power-law graphs. Graph algorithms can often be recast as generalized Sparse Matrix-Vector multiplication (SpMV) operations, making SpMV optimization important for graph analytics. However, executing SpMV on large-scale power-law graphs results in highly irregular memory access patterns with poor cache utilization. Worse, we find that existing SpMV locality and vectorization optimizations are largely ineffective on modern out-of-order (OOO) processors - they are not faster (or only marginally so) than the standard Compressed Sparse Row (CSR) SpMV implementation. To improve performance for power-law graphs on modern OOO processors, we propose Locality-Aware Vectorization (LAV). LAV is a new approach that leverages a graph's power-law nature to extract locality and enable effective vectorization for SpMV-like memory access patterns. LAV splits the input matrix into a dense and a sparse portion. The dense portion is stored in a new representation, which is vectorization-friendly and exploits data locality. The sparse portion is processed using the standard CSR algorithm. We evaluate LAV with several graphs on an Intel Skylake-SP processor, and find that it is faster than CSR (and prior approaches) by an average of 1.5x. LAV reduces the number of DRAM accesses by 35% on average, with only a 3.3% memory overhead.
AB - Graph analytics applications often target large-scale web and social networks, which are typically power-law graphs. Graph algorithms can often be recast as generalized Sparse Matrix-Vector multiplication (SpMV) operations, making SpMV optimization important for graph analytics. However, executing SpMV on large-scale power-law graphs results in highly irregular memory access patterns with poor cache utilization. Worse, we find that existing SpMV locality and vectorization optimizations are largely ineffective on modern out-of-order (OOO) processors - they are not faster (or only marginally so) than the standard Compressed Sparse Row (CSR) SpMV implementation. To improve performance for power-law graphs on modern OOO processors, we propose Locality-Aware Vectorization (LAV). LAV is a new approach that leverages a graph's power-law nature to extract locality and enable effective vectorization for SpMV-like memory access patterns. LAV splits the input matrix into a dense and a sparse portion. The dense portion is stored in a new representation, which is vectorization-friendly and exploits data locality. The sparse portion is processed using the standard CSR algorithm. We evaluate LAV with several graphs on an Intel Skylake-SP processor, and find that it is faster than CSR (and prior approaches) by an average of 1.5x. LAV reduces the number of DRAM accesses by 35% on average, with only a 3.3% memory overhead.
KW - Graph Algorithms
KW - Locality optimizations
KW - SIMD
KW - Sparse Matrix Vector Products
KW - Vectorization
UR - http://www.scopus.com/inward/record.url?scp=85102347966&partnerID=8YFLogxK
U2 - 10.1109/SC41405.2020.00090
DO - 10.1109/SC41405.2020.00090
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85102347966
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2020
PB - IEEE Computer Society
T2 - 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020
Y2 - 9 November 2020 through 19 November 2020
ER -