Abstract
UniT is an approach to tactile representation learning, using VQGAN to learn a compact latent space and serve as the tactile representation. It uses tactile images obtained from a single simple object to train the representation with generalizability. This tactile representation can be zero-shot transferred to various downstream tasks, including perception tasks and manipulation policy learning. Our benchmarkings on in-hand 3D pose and 6D pose estimation tasks and a tactile classification task show that UniT outperforms existing visual and tactile representation learning methods. Additionally, UniT's effectiveness in policy learning is demonstrated across three real-world tasks involving diverse manipulated objects and complex robot-object-environment interactions. Through extensive experimentation, UniT is shown to be a simple-to-train, plug-and-play, yet widely effective method for tactile representation learning.
Keywords
robot sensing systems, training, representation learning, decoding, image reconstruction, visualization, autoencoders, imitation learning, robots, force, representation learning, force and tactile sensing, imitation learning
Date of this Version
2025
Recommended Citation
Xu, Zhengtong; Uppuluri, Raghava; Zhang, Xinwei; Fitch, Cael; Crandall, Philip Glen; Shou, Wan; Wang, Dongyi; and She, Yu, "UniT: Data Efficient Tactile Representation with Generalization to Unseen Objects" (2025). School of Industrial Engineering Faculty Publications. Paper 20.
https://docs.lib.purdue.edu/iepubs/20
Comments
This is the author-accepted manuscript of Data Efficient Tactile Representation with Generalization to Unseen Objects, Zhengtong Xu, Raghava Uppuluri, Xinwei Zhang, Cael Fitch, Philip Glen Crandall, Wan Shou, Dongyi Wang, Yu She, IEEE Robotics and Automation Letters (RAL), 2025. (c) 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. The version of record can be found at DOI: 10.1109/LRA.2025.3559835.