TY - JOUR
T1 - Consistent distribution-free K-sample and independence tests for univariate random variables
AU - Heller, Ruth
AU - Heller, Yair
AU - Kaufman, Shachar
AU - Brill, Barak
AU - Gorfine, Malka
N1 - Publisher Copyright:
©2016 Ruth Heller and Yair Heller and Shachar Kaufman and Barak Brill and Malka Gorfine.
PY - 2016/2/1
Y1 - 2016/2/1
N2 - A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for computing the suggested test statistics efficiently. We show that the power of the regularized tests is excellent compared to existing tests, and almost as powerful as the tests based on the optimal (yet unknown in practice) partition size, in simulations as well as on a real data example.
AB - A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for computing the suggested test statistics efficiently. We show that the power of the regularized tests is excellent compared to existing tests, and almost as powerful as the tests based on the optimal (yet unknown in practice) partition size, in simulations as well as on a real data example.
KW - Bivariate distribution
KW - HHG R package
KW - Mutual information
KW - Nonparametric test
KW - Statistical independence
KW - Two-sample test
UR - http://www.scopus.com/inward/record.url?scp=84979939249&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.systematicreview???
AN - SCOPUS:84979939249
SN - 1532-4435
VL - 17
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -