Figure: The circles indicate 10 Gaussians approximating the input density distribution. indicates the known input,
is unknown.
We assume that a neural network has been trained to predict , the expectation of given . During recall we would like to know the network's prediction based on an incomplete input vector where denotes the known inputs and the unknown inputs. The optimal prediction given the known features can be written as (Ahmad and Tresp, 1993)
Similarly, for a network trained to estimate class probabilities, , simply substitute for and for in the last equation.
The integrals in the last equations can be problematic. In the worst case they have to be approximated numerically (Tresp, Ahmad and Neuneier, 1994) which is costly, since the computation is exponential in the number of missing inputs. For networks of normalized Gaussians, there exist closed form solutions to the integrals (Ahmad and Tresp, 1993). The following section shows how to efficiently approximate the integral for a large class of algorithms.