Good-bootstrap: simultaneous confidence intervals for large alphabet distributions

Daniel Marton, Amichai Painsky*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Simultaneous confidence intervals (SCI) for multinomial proportions are a corner stone in count data analysis and a key component in many applications. A variety of schemes were introduced over the years, mostly focussing on an asymptotic regime where the sample is large and the alphabet size is relatively small. In this work we introduce a new SCI framework which considers the large alphabet setup. Our proposed framework utilises bootstrap sampling with the Good-Turing probability estimator as a plug-in distribution. We demonstrate the favourable performance of our proposed method in synthetic and real-world experiments. Importantly, we provide an exact analytical expression for the bootstrapped statistic, which replaces the computationally costly sampling procedure. Our proposed framework is publicly available at the first author's Github page.

Original languageEnglish
Pages (from-to)1177-1191
Number of pages15
JournalJournal of Nonparametric Statistics
Volume36
Issue number4
DOIs
StatePublished - 2024

Funding

FundersFunder number
Israel Science Foundation963/21

    Keywords

    • Simultaneous confidence intervals
    • count data
    • good-turing
    • large alphabet
    • multinomial distribution

    Fingerprint

    Dive into the research topics of 'Good-bootstrap: simultaneous confidence intervals for large alphabet distributions'. Together they form a unique fingerprint.

    Cite this