研究目的
To design a novel multi-modal CNN architecture for real-time and high-precision object detection by exploiting complementary input cues in addition to sole color information.
研究成果
The paper introduces a fast multimodal fusion net for object detection that applies multiscale merging layers at mid-level to fuse features from two separate modalities. The network outperforms other fusion models in speed and achieves the real-time requirement without loss of accuracy. Experiments on NYUD2 RGB-D dataset show that the fusion net is almost 10.0% better than one single modal SSD baseline in mAP, and processes images in real-time at 35 FPS.
研究不足
The paper does not explicitly mention the limitations of the research.
1:Experimental Design and Method Selection:
The paper presents a one-stage architecture that fuses multiscale mid-level features from two individual feature extractors for end-to-end object detection.
2:Sample Selection and Data Sources:
The NYUD2 challenging dataset is used for evaluation, with 795 training images and 654 testing images.
3:List of Experimental Equipment and Materials:
A single Nvidia GTX 1080 GPU and an Intel 7700K CPU are used for experiments.
4:Experimental Procedures and Operational Workflow:
The network is trained in three stages, including fine-tuning on RGB images, supervision transfer to depth images, and fusion network training.
5:Data Analysis Methods:
The accuracy is evaluated with mean average precision (mAP), and the speed is tested with the runtime of one forward operation of networks.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容