TY - JOUR
T1 - Ultra-Range Gesture Recognition using a web-camera in Human–Robot Interaction
AU - Bamani, Eran
AU - Nissinman, Eden
AU - Meir, Inbar
AU - Koenigsberg, Lisa
AU - Sintov, Avishai
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6
Y1 - 2024/6
N2 - Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human–Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 m and in the context of HRI. We propose the URGR framework, a novel deep-learning, using solely a simple RGB camera. Gesture inference is based on a single image. First, a novel super-resolution model termed High-Quality Network (HQ-Net) uses a set of self-attention and convolutional layers to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT ) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments, acquiring 96% recognition rate on average.
AB - Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human–Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 m and in the context of HRI. We propose the URGR framework, a novel deep-learning, using solely a simple RGB camera. Gesture inference is based on a single image. First, a novel super-resolution model termed High-Quality Network (HQ-Net) uses a set of self-attention and convolutional layers to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT ) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments, acquiring 96% recognition rate on average.
KW - Graph Convolutional Network
KW - Human–Robot Interaction
KW - Ultra-Range Gesture Recognition
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85190779979&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2024.108443
DO - 10.1016/j.engappai.2024.108443
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85190779979
SN - 0952-1976
VL - 132
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 108443
ER -