Many image editing applications rely on the analysis of image patches. In this paper, we present a method to analyze patches by embedding them to a vector space, in which the Euclidean distance reflects patch similarity. Inspired by Word2Vec, we term our approach Patch2Vec. However, there is a significant difference between words and patches. Words have a fairly small and well defined dictionary. Image patches, on the other hand, have no such dictionary and the number of different patch types is not well defined. The problem is aggravated by the fact that each patch might contain several objects and textures. Moreover, Patch2Vec should be universal because it must be able to map never-seen-before texture to the vector space. The mapping is learned by analyzing the distribution of all natural patches. We use Convolutional Neural Networks (CNN) to learn Patch2Vec. In particular, we train a CNN on labeled images with a triplet-loss objective function. The trained network encodes a given patch to a 128D vector. Patch2Vec is evaluated visually, qualitatively, and quantitatively. We then use several variants of an interactive single-click image segmentation algorithm to demonstrate the power of our method.
- Categories and Subject Descriptors (according to ACM CCS)
- I.4.10 [Image Processing and Computer Vision]: Image Representation—Multidimensional