TY - GEN
T1 - Optimal Distribution-Free Sample-Based Testing Of subsequence-freeness
AU - Ron, Dana
AU - Rosin, Asaf
N1 - Publisher Copyright:
Copyright © 2021 by SIAM
PY - 2021
Y1 - 2021
N2 - In this work, we study the problem of testing subsequence-freeness. For a given subsequence (word) w = w1... wk, a sequence (text) T = t1... tn is said to contain w if there exist indices 1 ≤ i1 < · · · < ik ≤ n such that tij = wj for every 1 ≤ j ≤ k. Otherwise, T is w-free. While a large majority of the research in property testing deals with algorithms that perform queries, here we consider sample-based testing (with one-sided error). In the “standard” sample-based model (i.e., under the uniform distribution), the algorithm is given samples (i, ti) where i is distributed uniformly independently at random. The algorithm should distinguish between the case that T is w-free, and the case that T is ε-far from being w-free (i.e., more than an ε-fraction of its symbols should be modified so as to make it w-free). Freitag, Price, and Swartworth (Proceedings of RANDOM, 2017) showed that O(k2 log k/ε) samples suffice for this testing task. We obtain the following results. • The number of samples sufficient for sample-based testing (under the uniform distribution) is O(k/ε). This upper bound builds on a characterization that we present for the distance of a text T from w-freeness in terms of the maximum number of copies of w in T, where these copies should obey certain restrictions. • We prove a matching lower bound, which holds for every word w. This implies that the above upper bound is tight. • The same upper bound holds in the more general distribution-free sample-based model. In this model the algorithm receives samples (i, ti) where i is distributed according to an arbitrary distribution p (and the distance from w-freeness is measured with respect to p). We highlight the fact that while we require that the testing algorithm work for every distribution and when only provided with samples, the complexity we get matches a known lower bound for a special case of the seemingly easier problem of testing subsequence-freeness under the uniform distribution and with queries (Canonne et al., Theory of Computing, 2019).
AB - In this work, we study the problem of testing subsequence-freeness. For a given subsequence (word) w = w1... wk, a sequence (text) T = t1... tn is said to contain w if there exist indices 1 ≤ i1 < · · · < ik ≤ n such that tij = wj for every 1 ≤ j ≤ k. Otherwise, T is w-free. While a large majority of the research in property testing deals with algorithms that perform queries, here we consider sample-based testing (with one-sided error). In the “standard” sample-based model (i.e., under the uniform distribution), the algorithm is given samples (i, ti) where i is distributed uniformly independently at random. The algorithm should distinguish between the case that T is w-free, and the case that T is ε-far from being w-free (i.e., more than an ε-fraction of its symbols should be modified so as to make it w-free). Freitag, Price, and Swartworth (Proceedings of RANDOM, 2017) showed that O(k2 log k/ε) samples suffice for this testing task. We obtain the following results. • The number of samples sufficient for sample-based testing (under the uniform distribution) is O(k/ε). This upper bound builds on a characterization that we present for the distance of a text T from w-freeness in terms of the maximum number of copies of w in T, where these copies should obey certain restrictions. • We prove a matching lower bound, which holds for every word w. This implies that the above upper bound is tight. • The same upper bound holds in the more general distribution-free sample-based model. In this model the algorithm receives samples (i, ti) where i is distributed according to an arbitrary distribution p (and the distance from w-freeness is measured with respect to p). We highlight the fact that while we require that the testing algorithm work for every distribution and when only provided with samples, the complexity we get matches a known lower bound for a special case of the seemingly easier problem of testing subsequence-freeness under the uniform distribution and with queries (Canonne et al., Theory of Computing, 2019).
UR - http://www.scopus.com/inward/record.url?scp=85105292626&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85105292626
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 337
EP - 356
BT - ACM-SIAM Symposium on Discrete Algorithms, SODA 2021
A2 - Marx, Daniel
PB - Association for Computing Machinery
T2 - 32nd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2021
Y2 - 10 January 2021 through 13 January 2021
ER -