Abstract

This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This “gray box” method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at https://github.com/ZhengtongXu/LeTO. Note to Practitioners—LeTO is driven by the goal of developing an imitation learning algorithm capable of generating safe and constraint-satisfying robotic behaviors. The idea of imitation learning is to enable the robot to learn from human demonstrations of certain tasks. Subsequently, the robot is able to autonomously perform the learned tasks on its own. Thanks to the powerful representational and fitting capabilities of neural networks, imitation learning can let robots perform complex manipulation tasks. However, neural networks often exhibit a certain level of uncertainty and lack theoretical safety guarantees. For robotic systems, it is crucial that robot behaviors meet specific constraints; otherwise, the system may not be sufficiently reliable. Therefore, we introduce LeTO, an approach that integrates trajectory optimization with neural networks to generate actions that not only achieve manipulation tasks, but also comply with constraints. This improves the interpretability, safety, and reliability of robot policies acquired through imitation learning, facilitating their deployment in scenarios with high safety requirements.

Comments

This is the author-accepted manuscript of Z. Xu and Y. She, "LeTO: Learning Constrained Visuomotor Policy With Differentiable Trajectory Optimization," in IEEE Transactions on Automation Science and Engineering. (c) 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. The version of record is available at DOI: 10.1109/TASE.2024.3486542.

Keywords

Robots; Imitation learning; Trajectory optimization; Safety; Training; Recurrent neural networks; Uncertainty; Supervised learning; Robot learning; Real-time systems; Robotic manipulation; imitation learning; differentiable optimization

Date of this Version

2024

Share

COinS