
The emergence of new challenging realworld applications has exposed serious
issues with current approaches to model adaptation in machine learning. Existing
theory and algorithms focus on fitting the available training data, but
cannot provide worstcase guarantees in missioncritical applications. Vapnik's
statistical learning theory is useless for model selection, as the bounds on generalisation
errors it predicts are too wide to be useful, and rely on the assumption
that training and testing data come from the same (unknown) distribution. The
crucial question is: what exactly can one infer from a training set?
Max entropy classifiers provide a signficant example, due to their simplicity
and widespread application. There, the entropy of the sought joint (or
conditional) probability distribution of data and class is maximised, following
the maximum entropy principle that the least informative distribution which
matches the available evidence should be chosen. Having picked a set of feature
functions, selected to efficiently encode the training information, the joint distribution
is subject to the constraint that their empirical expectation equals that
associated with the max entropy distribution. The assumptions that (i) training
and test data come from the same probability distribution, and that (ii) the
empirical expectation of the training data is correct, and the model expectation
should match it, are rather strong, and work against generalisation power.
A way around this issue is to adopt as models convex sets of probability
distributions, rather than standard probability measures.


Relevant papers:

 
Fabio Cuzzolin The geometry of uncertainty  The geometry of imprecise probabilities Artificial Intelligence: Foundations, Theory, and Algorithms
(http://www.springer.com/series/13900), SpringerVerlag, 2018 (in press) 


Fabio Cuzzolin Generalised max entropy classifiers Proceedings of the
Fifth International Conference on the Theory of Belief Functions (BELIEF 2018) Compiegne, France, September 2018 

