Abstract
We examine some mathematical aspects of learning unknown mappings with the Mixture of Experts Model (MEM). Specifically, we observe that the MEM is at least as powerful as a class of neural networks, in a sense that will be made precise. Upper bounds on the approximation error are established for a wide class of target functions. The general theorem states that ∥ f - fn ∥p < c/nr/d for f ∈ Wpr(L) (a Sobolev class over [-1,1]d), and fn belongs to an n-dimensional manifold of normalized ridge functions. The same bound holds for the MEM as a special case of the above. The stochastic error, in the context of learning from independent and identically distributed (i.i.d.) examples, is also examined. An asymptotic analysis establishes the limiting behavior of this error, in terms of certain pseudoinformation matrices. These results substantiate the intuition behind the MEM, and motivate applications.
Original language | English |
---|---|
Pages (from-to) | 1010-1025 |
Number of pages | 16 |
Journal | IEEE Transactions on Information Theory |
Volume | 44 |
Issue number | 3 |
DOIs | |
State | Published - 1998 |
Externally published | Yes |
Keywords
- Approximation error
- Estimation error
- Mixture of experts