TY - JOUR

T1 - Sample-Based Distance-Approximation for Subsequence-Freeness

AU - Cohen Sidon, Omer

AU - Ron, Dana

N1 - Publisher Copyright:
© The Author(s) 2024.

PY - 2024/8

Y1 - 2024/8

N2 - In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w=w1…wk, a sequence (text) T=t1…tn is said to contain w if there exist indices 1≤i1<⋯k≤n such that tij=wj for every 1≤j≤k. Otherwise, T is w-free. Ron and Rosin (ACM Trans Comput Theory 14(4):1–31, 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is Θ(k/ϵ). Denoting by Δ(T,w,p) the distance of T to w-freeness under a distribution p:[n]→[0,1], we are interested in obtaining an estimate Δ^, such that |Δ^-Δ(T,w,p)|≤δ with probability at least 2/3, for a given error parameter δ. Our main result is a sample-based distribution-free algorithm whose sample complexity is O~(k2/δ2). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.

AB - In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w=w1…wk, a sequence (text) T=t1…tn is said to contain w if there exist indices 1≤i1<⋯k≤n such that tij=wj for every 1≤j≤k. Otherwise, T is w-free. Ron and Rosin (ACM Trans Comput Theory 14(4):1–31, 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is Θ(k/ϵ). Denoting by Δ(T,w,p) the distance of T to w-freeness under a distribution p:[n]→[0,1], we are interested in obtaining an estimate Δ^, such that |Δ^-Δ(T,w,p)|≤δ with probability at least 2/3, for a given error parameter δ. Our main result is a sample-based distribution-free algorithm whose sample complexity is O~(k2/δ2). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.

KW - Distance-approximation

KW - Property testing

KW - Sample-based

KW - Subsequence-freeness

UR - http://www.scopus.com/inward/record.url?scp=85192800256&partnerID=8YFLogxK

U2 - 10.1007/s00453-024-01233-4

DO - 10.1007/s00453-024-01233-4

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:85192800256

SN - 0178-4617

VL - 86

SP - 2519

EP - 2556

JO - Algorithmica

JF - Algorithmica

IS - 8

ER -