The problem of missing data (incomplete feature vectors) is of great practical and theoretical interest. In many applications it is important to know how to react if the available information is incomplete, if sensors fail or if sources of information become unavailable. As an example, when a sensor fails in a production process, it might not be necessary to stop everything if sufficient information is implicitly contained in the remaining sensor data. Furthermore, in economic forecasting, one might want to continue to use a predictor even when an input variable becomes meaningless (for example, due to political changes in a country). As we have elaborated in earlier papers, heuristics such as the substitution of the mean for an unknown feature can lead to solutions that are far from optimal (Ahmad and Tresp, 1993, Tresp, Ahmad, and Neuneier, 1994). Biological systems must deal continuously with the problem of unknown uncertain features and they are certainly extremely good at it. >From a biological point of view it is therefore interesting which solutions to this problem can be derived from theory and if these solutions are in any way related to the way that biology deals with this problem (compare Brunelli and Poggio, 1991). Finally, having efficient methods for dealing with missing features allows a novel pruning strategy: if the quality of the prediction is not affected if an input is pruned, we can remove it and use our solutions for prediction with missing inputs or retrain the model without that input (Tresp, Hollatz and Ahmad, 1995).
In Ahmad and Tresp (1993) and in Tresp, Ahmad and Neuneier (1994) equations for training and recall were derived using a probabilistic setting (compare also Buntine and Weigend, 1991, Ghahramani and Jordan, 1994). For general feedforward neural networks the solution was in the form of an integral which has to be approximated using numerical integration techniques. The computational complexity of these solutions grows exponentially with the number of missing features. In these two publications, we could only obtain efficient algorithms for networks of normalized Gaussian basis functions. It is of great practical interest to find efficient ways of dealing with missing inputs for general feedforward neural networks which are more commonly used in applications. In this paper we describe an efficient approximation for the problem of missing information that is applicable to a large class of learning algorithms, including feedforward networks. The main results are Equation 2 (recall) and Equation 3 (training). One major advantage of the proposed solution is that the complexity does not increase with an increasing number of missing inputs. The solutions can easily be generalized to the problem of uncertain (noisy) inputs.