Thomas Mensink; Efstratios Gavves; Zeynep Akata; Cees G.M. Snoek
We live in the age of Big Data, featuring
huge image and video datasets. Despite their size,
however, we cannot guarantee sufficient annotations for
all possible concepts. Moreover while annotations are
easy to obtain for common object concepts, such as ball
or helicopter, this is not straightforward for more exotic
concepts like a “lagerphone” (a percussion musical
instrument): not only the available images do not suffice,
but often the annotations can be made only be experts.
In the absence of annotations we promote zero-shot
learning, where the combination of a) existing classifiers
and b) semantic, cross-concept mappings between these
classifiers allows for building novel classifiers without
resorting to any visual examples. From a more
philosophical point-of- view zero-shot learning relates to
the ability to “learn new things” and to “reason over what
is learned”. While a DeepNet can reason (almost)
perfectly over the 1,000 concepts it is trained on, it can
not reason over any new concept, nor explain novel
concepts in terms of what is already known. In this
tutorial we focus on zero-shot learning for Computer
Vision.