TY - JOUR
T1 - Sample-Based Distance-Approximation for Subsequence-Freeness
AU - Cohen Sidon, Omer
AU - Ron, Dana
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/8
Y1 - 2024/8
N2 - In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w=w1…wk, a sequence (text) T=t1…tn is said to contain w if there exist indices 1≤i1<⋯k≤n such that tij=wj for every 1≤j≤k. Otherwise, T is w-free. Ron and Rosin (ACM Trans Comput Theory 14(4):1–31, 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is Θ(k/ϵ). Denoting by Δ(T,w,p) the distance of T to w-freeness under a distribution p:[n]→[0,1], we are interested in obtaining an estimate Δ^, such that |Δ^-Δ(T,w,p)|≤δ with probability at least 2/3, for a given error parameter δ. Our main result is a sample-based distribution-free algorithm whose sample complexity is O~(k2/δ2). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.
AB - In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w=w1…wk, a sequence (text) T=t1…tn is said to contain w if there exist indices 1≤i1<⋯k≤n such that tij=wj for every 1≤j≤k. Otherwise, T is w-free. Ron and Rosin (ACM Trans Comput Theory 14(4):1–31, 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is Θ(k/ϵ). Denoting by Δ(T,w,p) the distance of T to w-freeness under a distribution p:[n]→[0,1], we are interested in obtaining an estimate Δ^, such that |Δ^-Δ(T,w,p)|≤δ with probability at least 2/3, for a given error parameter δ. Our main result is a sample-based distribution-free algorithm whose sample complexity is O~(k2/δ2). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.
KW - Distance-approximation
KW - Property testing
KW - Sample-based
KW - Subsequence-freeness
UR - http://www.scopus.com/inward/record.url?scp=85192800256&partnerID=8YFLogxK
U2 - 10.1007/s00453-024-01233-4
DO - 10.1007/s00453-024-01233-4
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85192800256
SN - 0178-4617
VL - 86
SP - 2519
EP - 2556
JO - Algorithmica
JF - Algorithmica
IS - 8
ER -