TY - JOUR
T1 - Learning to Throw with a Handful of Samples Using Decision Transformers
AU - Monastirsky, Maxim
AU - Azulay, Osher
AU - Sintov, Avishai
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2023/2/1
Y1 - 2023/2/1
N2 - Throwing objects by a robot extends its reach and has many industrial applications. While analytical models can provide efficient performance, they require accurate estimation of system parameters. Reinforcement Learning (RL) algorithms can provide an accurate throwing policy without prior knowledge. However, they require an extensive amount of real world samples which may be time consuming and, most importantly, pose danger. Training in simulation, on the other hand, would most likely result in poor performance on the real robot. In this letter, we explore the use of Decision Transformers (DT) and their ability to transfer from a simulation-based policy into the real-world. Contrary to RL, we re-frame the problem as sequence modelling and train a DT by supervised learning. The DT is trained off-line on data collected from a far-from-reality simulation through random actions without any prior knowledge on how to throw. Then, the DT is fine-tuned on an handful (∼5) of real throws. Results on various objects show accurate throws reaching an error of approximately 4 cm. Also, the DT can extrapolate and accurately throw to goals that are out-of-distribution to the training data. We additionally show that few expert throw samples, and no pre-training in simulation, are sufficient for training an accurate policy.
AB - Throwing objects by a robot extends its reach and has many industrial applications. While analytical models can provide efficient performance, they require accurate estimation of system parameters. Reinforcement Learning (RL) algorithms can provide an accurate throwing policy without prior knowledge. However, they require an extensive amount of real world samples which may be time consuming and, most importantly, pose danger. Training in simulation, on the other hand, would most likely result in poor performance on the real robot. In this letter, we explore the use of Decision Transformers (DT) and their ability to transfer from a simulation-based policy into the real-world. Contrary to RL, we re-frame the problem as sequence modelling and train a DT by supervised learning. The DT is trained off-line on data collected from a far-from-reality simulation through random actions without any prior knowledge on how to throw. Then, the DT is fine-tuned on an handful (∼5) of real throws. Results on various objects show accurate throws reaching an error of approximately 4 cm. Also, the DT can extrapolate and accurately throw to goals that are out-of-distribution to the training data. We additionally show that few expert throw samples, and no pre-training in simulation, are sufficient for training an accurate policy.
KW - Reinforcement learning
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85144752233&partnerID=8YFLogxK
U2 - 10.1109/LRA.2022.3229266
DO - 10.1109/LRA.2022.3229266
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85144752233
SN - 2377-3766
VL - 8
SP - 576
EP - 583
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 2
ER -