TY - GEN
T1 - ShRing
T2 - 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023
AU - Pismenny, Boris
AU - Morrison, Adam
AU - Tsafrir, Dan
N1 - Publisher Copyright:
© OSDI 2023.All rights reserved.
PY - 2023
Y1 - 2023
N2 - Multicore systems parallelize to accommodate incoming Ethernet traffic, allocating one receive (Rx) ring with ≥1Ki entries per core by default. This ring size is sufficient to absorb packet bursts of single-core workloads. But the combined size of all Rx buffers (pointed to by all Rx rings) can exceed the size of the last-level cache. We observe that, in this case, NIC and CPU memory accesses are increasingly served by main memory, which might incur nonnegligible overheads when scaling to hundreds of incoming gigabits per second. To alleviate this problem, we propose “shRing,” which shares each Rx ring among several cores when networking memory bandwidth consumption is high. ShRing thus adds software synchronization costs, but this overhead is offset by the smaller memory footprint. We show that, consequently, shRing increases the throughput of NFV workloads by up to 1.27x, and that it reduces their latency by up to 38x. The substantial latency reduction occurs when shRing shortens the per-packet processing time to a value smaller than the packet interarrival time, thereby preventing overload conditions.
AB - Multicore systems parallelize to accommodate incoming Ethernet traffic, allocating one receive (Rx) ring with ≥1Ki entries per core by default. This ring size is sufficient to absorb packet bursts of single-core workloads. But the combined size of all Rx buffers (pointed to by all Rx rings) can exceed the size of the last-level cache. We observe that, in this case, NIC and CPU memory accesses are increasingly served by main memory, which might incur nonnegligible overheads when scaling to hundreds of incoming gigabits per second. To alleviate this problem, we propose “shRing,” which shares each Rx ring among several cores when networking memory bandwidth consumption is high. ShRing thus adds software synchronization costs, but this overhead is offset by the smaller memory footprint. We show that, consequently, shRing increases the throughput of NFV workloads by up to 1.27x, and that it reduces their latency by up to 38x. The substantial latency reduction occurs when shRing shortens the per-packet processing time to a value smaller than the packet interarrival time, thereby preventing overload conditions.
UR - http://www.scopus.com/inward/record.url?scp=85173214894&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85173214894
T3 - Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023
SP - 949
EP - 968
BT - Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023
PB - USENIX Association
Y2 - 10 July 2023 through 12 July 2023
ER -