TY - GEN
T1 - Learning Object Permanence from Video
AU - Shamsian, Aviv
AU - Kleinfeld, Ofri
AU - Globerson, Amir
AU - Chechik, Gal
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Object Permanence allows people to reason about the location of non-visible objects, by understanding that they continue to exist even when not perceived directly. Object Permanence is critical for building a model of the world, since objects in natural visual scenes dynamically occlude and contain each-other. Intensive studies in developmental psychology suggest that object permanence is a challenging task that is learned through extensive experience. Here we introduce the setup of learning Object Permanence from labeled videos. We explain why this learning problem should be dissected into four components, where objects are (1) visible, (2) occluded, (3) contained by another object and (4) carried by a containing object. The fourth subtask, where a target object is carried by a containing object, is particularly challenging because it requires a system to reason about a moving location of an invisible object. We then present a unified deep architecture that learns to predict object location under these four scenarios. We evaluate the architecture and system on a new dataset based on CATER, with per-frame labels, and find that it outperforms previous localization methods and various baselines.
AB - Object Permanence allows people to reason about the location of non-visible objects, by understanding that they continue to exist even when not perceived directly. Object Permanence is critical for building a model of the world, since objects in natural visual scenes dynamically occlude and contain each-other. Intensive studies in developmental psychology suggest that object permanence is a challenging task that is learned through extensive experience. Here we introduce the setup of learning Object Permanence from labeled videos. We explain why this learning problem should be dissected into four components, where objects are (1) visible, (2) occluded, (3) contained by another object and (4) carried by a containing object. The fourth subtask, where a target object is carried by a containing object, is particularly challenging because it requires a system to reason about a moving location of an invisible object. We then present a unified deep architecture that learns to predict object location under these four scenarios. We evaluate the architecture and system on a new dataset based on CATER, with per-frame labels, and find that it outperforms previous localization methods and various baselines.
KW - Object Permanence
KW - Reasoning
KW - Video Analysis
UR - http://www.scopus.com/inward/record.url?scp=85092927418&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-58517-4_3
DO - 10.1007/978-3-030-58517-4_3
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85092927418
SN - 9783030585167
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 35
EP - 50
BT - Computer Vision – ECCV 2020 - 16th European Conference, Proceedings
A2 - Vedaldi, Andrea
A2 - Bischof, Horst
A2 - Brox, Thomas
A2 - Frahm, Jan-Michael
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th European Conference on Computer Vision, ECCV 2020
Y2 - 23 August 2020 through 28 August 2020
ER -