Sufficient dimensionality reduction

Amir Globerson, Naftali Tishby

Research output: Contribution to journalArticlepeer-review

62 Scopus citations


Dimensionality reduction of empirical co-occurrence data is a fundamental problem in unsupervised learning. It is also a well studied problem in statistics known as the analysis of cross-classified data. One principled approach to this problem is to represent the data in low dimension with minimal loss of (mutual) information contained in the original data. In this paper we introduce an information theoretic nonlinear method for finding such a most informative dimension reduction. In contrast with previously introduced clustering based approaches, here we extract continuous feature functions directly from the co-occurrencematrix. In a sense, we automatically extract functions of the variables that serve as approximate sufficient statistics for a sample of one variable about the other one. Our method is different from dimensionality reduction methods which are based on a specific, sometimes arbitrary, metric or embedding. Another interpretation of our method is as generalized - multi-dimensional - non-linear regression, where rather than fitting one regression function through two dimensional data, we extract d-regression functions whose expectation values capture the information among the variables. It thus presents a new learning paradigm that unifies aspects from both supervised and unsupervised learning. The resulting dimension reduction can be described by two conjugate d-dimensional differential manifolds that are coupled through Maximum Entropy I-projections. The Riemannian metrics of these manifolds are determined by the observed expectation values of our extracted features. Following this geometric interpretation we present an iterative information projection algorithm for finding such features and prove its convergence. Our algorithm is similar to the method of "association analysis" in statistics, though the feature extraction context as well as the information theoretic and geometric interpretation are new. The algorithm is illustrated by various synthetic co-occurrence data. It is then demonstrated for text categorization and information retrieval and proves effective in selecting a small set of features, often improving performance over the original feature set.

Original languageEnglish
Pages (from-to)1307-1331
Number of pages25
JournalJournal of Machine Learning Research
StatePublished - Mar 2003
Externally publishedYes


  • Association Analysis
  • Feature Extraction
  • Information Geometry
  • Maximum Entropy
  • Mutual Information


Dive into the research topics of 'Sufficient dimensionality reduction'. Together they form a unique fingerprint.

Cite this