研究目的
To transfer learned detection of an object position from a simulation environment to the real world using a significantly limited dataset of real images while leveraging a large dataset of synthetic images.
研究成果
The proposed transfer learning method using two VAEs effectively bridges the gap between synthetic and real images, enabling precise object position detection with an accuracy of 1.5mm to 3.5mm on average. The method performs well under different lighting conditions, with distractor objects, and on various backgrounds. It has practical applications in real-world robotic tasks such as pick-and-place.
研究不足
The method requires a simulation environment to generate synthetic images and a limited dataset of real images for training. The performance may degrade with highly complex or unseen object textures and backgrounds.
1:Experimental Design and Method Selection:
The method uses two variational autoencoders (VAEs) to generate common pseudo-synthetic images from synthetic and real images. A convolutional neural network (CNN) is then trained to detect object positions from these common images.
2:Sample Selection and Data Sources:
Synthetic images were generated in a Gazebo simulation environment, and real images were captured using a Kinect camera.
3:List of Experimental Equipment and Materials:
Gazebo simulation environment, Kinect camera, 3D printed objects, and textured household objects.
4:Experimental Procedures and Operational Workflow:
Train two VAEs sequentially to generate common images from synthetic and real images. Train a CNN to detect object positions using the synthetic images. Detect real object positions using the real images and the trained CNN.
5:Data Analysis Methods:
Mean squared error (MSE) was used to evaluate the similarity between synthetic and real images. Prediction errors were measured to assess the accuracy of object position detection.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容