TY - JOUR
T1 - Signal processing for a reverse-GPS wildlife tracking system
T2 - CPU and GPU implementation experiences
AU - Rubinpur, Yaniv
AU - Toledo, Sivan
N1 - Publisher Copyright:
© 2021 John Wiley & Sons Ltd.
PY - 2022/6/25
Y1 - 2022/6/25
N2 - We present robust high-performance implementations of signal-processing tasks performed by a high-throughput wildlife tracking system called ATLAS. The system tracks radio transmitters attached to wild animals by estimating the time of arrival of radio packets to multiple receivers (base stations). Time-of-arrival estimation of wideband radio signals is computationally expensive, especially in acquisition mode (when the time of transmission is not known, not even approximately). These computations are a bottleneck that limits the throughput of the system. We developed a sequential high-performance CPU implementation of the computations a few years back, and more recently a GPU implementation. Both strive to balance performance with simplicity, maintainability, and development effort, as most real-world codes do. The article reports on the two implementations and carefully evaluates their performance. The evaluations indicates that the GPU implementation dramatically improves performance and power-performance relative to the sequential CPU implementation running on a desktop CPU typical of the computers in current base stations. Performance improves by more than 50X on a high-end GPU and more than 4X with a GPU platform that consumes almost 5 times less power than the CPU platform. Performance-per-Watt ratios also improve (by more than 16X), and so do the price-performance ratios.
AB - We present robust high-performance implementations of signal-processing tasks performed by a high-throughput wildlife tracking system called ATLAS. The system tracks radio transmitters attached to wild animals by estimating the time of arrival of radio packets to multiple receivers (base stations). Time-of-arrival estimation of wideband radio signals is computationally expensive, especially in acquisition mode (when the time of transmission is not known, not even approximately). These computations are a bottleneck that limits the throughput of the system. We developed a sequential high-performance CPU implementation of the computations a few years back, and more recently a GPU implementation. Both strive to balance performance with simplicity, maintainability, and development effort, as most real-world codes do. The article reports on the two implementations and carefully evaluates their performance. The evaluations indicates that the GPU implementation dramatically improves performance and power-performance relative to the sequential CPU implementation running on a desktop CPU typical of the computers in current base stations. Performance improves by more than 50X on a high-end GPU and more than 4X with a GPU platform that consumes almost 5 times less power than the CPU platform. Performance-per-Watt ratios also improve (by more than 16X), and so do the price-performance ratios.
KW - CUDA
KW - GPU
KW - arrival time estimation
KW - digital signal processing
UR - http://www.scopus.com/inward/record.url?scp=85110959280&partnerID=8YFLogxK
U2 - 10.1002/cpe.6506
DO - 10.1002/cpe.6506
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85110959280
SN - 1532-0626
VL - 34
JO - Concurrency Computation Practice and Experience
JF - Concurrency Computation Practice and Experience
IS - 14
M1 - e6506
ER -