NON-IID IMAGE DATASET WITH CONTEXTS

Dataset Description

NICO++ dataset is dedicatedly designed for OOD (Out-of-Distribution) image classification. It simulates a real world setting that the testing distribution may induce arbitrary shifting from the training distribution, which violates the traditional I.I.D. hypothesis of most ML methods. The typical research directions that the dataset can well support include but are not limited to Domain Generalization or Domain Adaptation (when testing distribution is known) and General OOD generalization (when testing distribution is unknown).

The basic idea of constructing the dataset is to label images with both main concepts/categories (e.g. dog) and the contexts (e.g. on grass) that visual concepts appear in. By adjusting the proportions of different contexts in training and testing data, one can control the degree of distribution shift flexibly and conduct studies on different kinds of Non-I.I.D. settings.

Common Context

Unique Context

Statistics

To boost the heterogeneity and availability of NICO++, the contexts in NICO++ are divided into two types: 1) 10 common contexts that are aligned across all categories, containing nature, season, humanity and light; 2) 10 unique domains specifically for each of the 80 categories, including attributes (e.g. action, color), background, camera shooting angle, and accompanying objects and so on.

Download

The released data (for NICO challenge) is available here[Dropbox] or here[Tsinghua Cloud]. You can also free to use NICO++ data for your research for non-economic purpose.

Copyright

Please note that NICO++ dataset does not own the copyright of images. We make such material available in an effort to advance understanding of technological, scientific, and cultural issues. The material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes. If you wish to use copyrighted material on this site or in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. (adapted from Christopher Thomas)

Citing

The paper of NICO++ is here.

If you find the dataset useful for your research, please consider citing the paper.

@misc{zhang2022nico,
      title={NICO++: Towards Better Benchmarking for Domain Generalization},
      author={Xingxuan Zhang, Yue He, Renzhe Xu, Han Yu, Zheyan Shen, Peng Cui},
      year={2022},
      eprint={2204.08040},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Original Versions

If you want to obtain the previous version of NICO, which has relative small scale, you can go through here.