Understanding semantic similarity among images is the core of a wide range of computer graphics and computer vision applications. However, the visual context of images is often ambiguous as images that can be perceived with emphasis on different attributes. In this paper, we present a method for learning the semantic visual similarity among images, inferring their latent attributes and embedding them into multi-spaces corresponding to each latent attribute. We consider the multi-embedding problem as an optimization function that evaluates the embedded distances with respect to qualitative crowdsourced clusterings. The key idea of our approach is to collect and embed qualitative pairwise tuples that share the same attributes in clusters. To ensure similarity attribute sharing among multiple measures, image classification clusters are presented to, and solved by users. The collected image clusters are then converted into groups of tuples, which are fed into our group optimization algorithm that jointly infers the attribute similarity and multi-attribute embedding. Our multi-attribute embedding allows retrieving similar objects in different attribute spaces. Experimental results show that our approach outperforms state-of-the-art multi-embedding approaches on various datasets, and demonstrate the usage of the multi-attribute embedding in image retrieval application.
- Semantic similarity
- Visual retrieval