Synthesis of Longitudinal Human Location Sequences: Balancing Utility and Privacy

Maya Benarous, Eran Toch, Irad Ben-Gal

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

People's location data are continuously tracked from various devices and sensors, enabling an ongoing analysis of sensitive information that can violate people's privacy and reveal confidential information. Synthetic data have been used to generate representative location sequences yet to maintain the users' privacy. Nonetheless, the privacy-accuracy tradeoff between these two measures has not been addressed systematically. In this article, we analyze the use of different synthetic data generation models for long location sequences, including extended short-term memory networks (LSTMs), Markov Chains (MC), and variable-order Markov models (VMMs). We employ different performance measures, such as data similarity and privacy, and discuss the inherent tradeoff. Furthermore, we introduce other measurements to quantify each of these measures. Based on the anonymous data of 300 thousand cellular-phone users, our work offers a road map for developing policies for synthetic data generation processes. We propose a framework for building data generation models and evaluating their effectiveness regarding those accuracy and privacy measures.

Original languageEnglish
Article number118
JournalACM Transactions on Knowledge Discovery from Data
Volume16
Issue number6
DOIs
StatePublished - 30 Jul 2022

Funding

FundersFunder number
Israel Ministry of Science3-12460

    Keywords

    • Synthetic data
    • location sequences
    • long short term memory network (LSTM)
    • privacy

    Fingerprint

    Dive into the research topics of 'Synthesis of Longitudinal Human Location Sequences: Balancing Utility and Privacy'. Together they form a unique fingerprint.

    Cite this