TY - JOUR
T1 - Synthesis of Longitudinal Human Location Sequences
T2 - Balancing Utility and Privacy
AU - Benarous, Maya
AU - Toch, Eran
AU - Ben-Gal, Irad
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/7/30
Y1 - 2022/7/30
N2 - People's location data are continuously tracked from various devices and sensors, enabling an ongoing analysis of sensitive information that can violate people's privacy and reveal confidential information. Synthetic data have been used to generate representative location sequences yet to maintain the users' privacy. Nonetheless, the privacy-accuracy tradeoff between these two measures has not been addressed systematically. In this article, we analyze the use of different synthetic data generation models for long location sequences, including extended short-term memory networks (LSTMs), Markov Chains (MC), and variable-order Markov models (VMMs). We employ different performance measures, such as data similarity and privacy, and discuss the inherent tradeoff. Furthermore, we introduce other measurements to quantify each of these measures. Based on the anonymous data of 300 thousand cellular-phone users, our work offers a road map for developing policies for synthetic data generation processes. We propose a framework for building data generation models and evaluating their effectiveness regarding those accuracy and privacy measures.
AB - People's location data are continuously tracked from various devices and sensors, enabling an ongoing analysis of sensitive information that can violate people's privacy and reveal confidential information. Synthetic data have been used to generate representative location sequences yet to maintain the users' privacy. Nonetheless, the privacy-accuracy tradeoff between these two measures has not been addressed systematically. In this article, we analyze the use of different synthetic data generation models for long location sequences, including extended short-term memory networks (LSTMs), Markov Chains (MC), and variable-order Markov models (VMMs). We employ different performance measures, such as data similarity and privacy, and discuss the inherent tradeoff. Furthermore, we introduce other measurements to quantify each of these measures. Based on the anonymous data of 300 thousand cellular-phone users, our work offers a road map for developing policies for synthetic data generation processes. We propose a framework for building data generation models and evaluating their effectiveness regarding those accuracy and privacy measures.
KW - Synthetic data
KW - location sequences
KW - long short term memory network (LSTM)
KW - privacy
UR - http://www.scopus.com/inward/record.url?scp=85141187434&partnerID=8YFLogxK
U2 - 10.1145/3529260
DO - 10.1145/3529260
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85141187434
SN - 1556-4681
VL - 16
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 6
M1 - 118
ER -