To learn a visual code in an unsupervised manner, one may attempt to capture those features of the stimulus set that would contribute significantly to a statistically efficient representation (as dictated, e.g., by the Minimum Description Length principle). Paradoxically, all the candidate features in this approach need to be known before statistics over them can be computed. This paradox may be circumvented by confining the repertoire of candidate features to actual scene fragments, which resemble the “what+where” receptive fields found in the ventral visual stream in primates. We describe a single-layer network that learns such fragments from unsegmented raw images of structured objects. The learning method combines fast imprinting in the feedforward stream with lateral interactions to achieve single-epoch unsupervised acquisition of spatially localized features that can support systematic treatment of structured objects .