A TUTORIAL FOR OOD GENERALIZATION

Introduction

Modern machine learning techniques have illustrated their excellent capabilities in many areas like computer vision, natural language processing and recommendation system, etc. While enjoying the human-surpassing performance in experimental conditions, many researchers have revealed the vulnerability of machine learning model when exposed to data with different distributions. Such massive gap is induced by the violation of a fundamental assumption that training and test data are independent and identically distributed (a.k.a. i.i.d. assumption), upon which most of the existing learning models are developed. In many real cases where i.i.d. assumption can hardly be satisfied, especially those high-stake applications such as healthcare, military, and autonomous driving, instead of generalization within the training distribution, the ability to generalize under distribution shift (a.k.a. out-of-distribution generalization) is of more significance.

  • To make you start with OOD (Out-of-Distribution) generalization research faster, we design a simple hands-on tutorial including several runable cases, feel free to play with it as well as the NICO++.

  • Out-of-distribution generalization problem closely relates to a bunch of popular research fields such as transfer learning, federated learning, algorithmic fairness & interpretability, etc. To facilitate the research on OOD generalization, we've written a survey paper as the first effort to systematically and comprehensively discuss the OOD generalization problem. We categorize enormous methods into three main classes according to their positions in the whole learning pipeline, as illustrated by the figure below.

OOD Tutorial
  • Along with the survey paper, we also build & maintain a website containing an up-to-date paper list.

  • In dealing with OOD generalization, many researchers come up with the idea that we can leverage some contextual information to make learned models more stable. The context label in NICO++ is indeed such an example which defines a class-dependent marginal distribution of features.

  • In Common Contexts Generalization challenge, we actually assume that all the classes in given task share the same contexts, and each common context defines a joint distribution of features and labels. Recent developments on domain generalization generally fit such setting (although the common context is called domain in relevant literature). The key notion in common contexts generalization is to learn an invariant model w.r.t. the changing contexts and a bunch of methods have been proposed for this purpose.

  • However, in real situations we often encounter more challenging OOD problems that the contexts are not aligned or even known in advance, leading to the Hybrid Contexts Generalization challenge. It actually corresponds to a more general form of OOD generalization and remains an open question for the machine learning community. A strand of literature targeting on hybrid context generalization is self-supervised learning which involves addtional tasks (i.e. solving puzzles) in learning procudure. Another way to improve generalization ability is incorporating causal inference into predictive modelling, among which stable learning is a typical paradigm. Besides, new optimization algorithms (e.g. Distributionally Robust Optimization) also have been proposed to ensure robustness in a more theoretical way.