Assuming the standard signal-plus-Gaussian-noise model we obtain for a complete sample
where is the set of weights in the network. For an incomplete sample
Using the same approximation as in Section 2.2,
where sums over all complete samples. As before, we substitute for the missing components the ones from the complete training data. The log-likelihood (a function of the network weights ) can be calculated as ( can be either complete or incomplete) The maximum likelihood solution consists of finding weights which maximize the log-likelihood. Using the approximation of Equation 1, we obtain for an incomplete sample as gradient Equation 3 (compare Tresp, Ahmad and Neuneier, 1994).