TY - GEN
T1 - A sampling-based approach to accelerating queries in log management systems
AU - Wagner, Tal
AU - Schkufza, Eric
AU - Wieder, Udi
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/20
Y1 - 2016/10/20
N2 - Log management systems are common in industry and an essential part of a system administrator's toolkit. Examples include Splunk, elk, Log Insight, Sexilog, and more. Logs in these systems are characterized by a small number of predefined fields such as timestamp and host, with the bulk of an entry being unstructured text. System administrators query these logs using a combination of range constraints over predefined fields and patterns or regular expressions over the text portion of the message. These queries are both complex and diverse. We propose a method for maintaining a subset of these logs in a much smaller database known as a sublog. Because queries are issued against a much smaller data set they run to completion quickly and avoid common scaling bottlenecks. However, the improvement in performance comes at a price. Because we only consider a subset of the original data, we are only able to provide approximate responses. Nonetheless, the reduction in accuracy is minimal and we are able to produce high-quality, high-performance results.
AB - Log management systems are common in industry and an essential part of a system administrator's toolkit. Examples include Splunk, elk, Log Insight, Sexilog, and more. Logs in these systems are characterized by a small number of predefined fields such as timestamp and host, with the bulk of an entry being unstructured text. System administrators query these logs using a combination of range constraints over predefined fields and patterns or regular expressions over the text portion of the message. These queries are both complex and diverse. We propose a method for maintaining a subset of these logs in a much smaller database known as a sublog. Because queries are issued against a much smaller data set they run to completion quickly and avoid common scaling bottlenecks. However, the improvement in performance comes at a price. Because we only consider a subset of the original data, we are only able to provide approximate responses. Nonetheless, the reduction in accuracy is minimal and we are able to produce high-quality, high-performance results.
KW - Log management systems
KW - Log messages
KW - Stratified sampling
UR - http://www.scopus.com/inward/record.url?scp=84997108973&partnerID=8YFLogxK
U2 - 10.1145/2984043.2989221
DO - 10.1145/2984043.2989221
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84997108973
T3 - SPLASH Companion 2016 - Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity
SP - 37
EP - 38
BT - SPLASH Companion 2016 - Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications
A2 - Visser, Eelco
PB - Association for Computing Machinery, Inc
T2 - 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH Companion 2016
Y2 - 30 October 2016 through 4 November 2016
ER -