If we assume an additive noise model, we can again use the same update rules as in regression. Often it is preferable to use the cross-entropy as a log-likelihood function. We only consider the 2-class problem such that and . is now either equal to one or zero. We obtain
and for missing data
We obtain for a compete pattern
and for an incomplete pattern
This equation looks more formidable than it really is if one realizes that is either zero or one. Using the approximation for the density, we obtain
As an example, a network trained for economical predictions should not become useless if one of the indicator variabled becomes unavailable or useless because of economical or political changes. Rather we could consider them as unavailable information. Similarly, in control applications it might not be necessary to stop a production line simply because a sensor had failed: the information that the failing sensor delivers might be also contained in the measurements of other sensors, although this relationship might not be clear or obvious.
For the log-likelihood we really need the logarithm of this
such the gradient becomes