TY - GEN
T1 - Spatio-temporal action graph networks
AU - Herzig, Roei
AU - Levi, Elad
AU - Xu, Huijuan
AU - Gao, Hang
AU - Brosh, Eli
AU - Wang, Xiaolong
AU - Globerson, Amir
AU - Darrell, Trevor
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance. Activity recognition models that represent object interactions explicitly have the potential to learn in a more efficient manner than those that represent scenes with global descriptors. We propose a novel inter-object graph representation for activity recognition based on a disentangled graph embedding with direct observation of edge appearance. In contrast to prior efforts, our approach uses explicit appearance for high order relations derived from object-object interaction, formed over regions that are the union of the spatial extent of the constituent objects. We employ a novel factored embedding of the graph structure, disentangling a representation hierarchy formed over spatial dimensions from that found over temporal variation. We demonstrate the effectiveness of our model on the Charades activity recognition benchmark, as well as a new dataset of driving activities focusing on multi-object interactions with near-collision events. Our model offers significantly improved performance compared to baseline approaches without object-graph representations, or with previous graph-based models.
AB - Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance. Activity recognition models that represent object interactions explicitly have the potential to learn in a more efficient manner than those that represent scenes with global descriptors. We propose a novel inter-object graph representation for activity recognition based on a disentangled graph embedding with direct observation of edge appearance. In contrast to prior efforts, our approach uses explicit appearance for high order relations derived from object-object interaction, formed over regions that are the union of the spatial extent of the constituent objects. We employ a novel factored embedding of the graph structure, disentangling a representation hierarchy formed over spatial dimensions from that found over temporal variation. We demonstrate the effectiveness of our model on the Charades activity recognition benchmark, as well as a new dataset of driving activities focusing on multi-object interactions with near-collision events. Our model offers significantly improved performance compared to baseline approaches without object-graph representations, or with previous graph-based models.
KW - Activity recognition
KW - Autonomous driving
KW - Collisions
KW - Graph neural network
UR - http://www.scopus.com/inward/record.url?scp=85082448798&partnerID=8YFLogxK
U2 - 10.1109/ICCVW.2019.00288
DO - 10.1109/ICCVW.2019.00288
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85082448798
T3 - Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
SP - 2347
EP - 2356
BT - Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Y2 - 27 October 2019 through 28 October 2019
ER -