研究目的
To propose an end-to-end deep learning architecture for semantic segmentation of high spatial resolution remote sensing images, incorporating local and global contextual information to improve accuracy.
研究成果
The proposed end-to-end framework effectively extracts local and global deep features for semantic segmentation, outperforming state-of-the-art methods on Vaihingen and Potsdam datasets. The use of auxiliary losses improves accuracy, demonstrating the network's capability in handling complex urban scenes.
研究不足
The input images only contain three bands (infrared, red, green), not utilizing the full four bands available in high-resolution remote sensing images. Digital surface models (DSMs) are not considered, missing height information that could aid classification.
1:Experimental Design and Method Selection:
The methodology involves adapting the Pyramid Scene Parsing Network (PSPNet) with ResNet-101-v2 for feature learning and a pyramid pooling module for multi-scale global context extraction. Multiple auxiliary losses are added to optimize the network.
2:Sample Selection and Data Sources:
Two public datasets, Vaihingen and Potsdam from ISPRS, are used. Vaihingen has 16 training and 17 testing tiles; Potsdam has 24 training and 14 testing tiles.
3:List of Experimental Equipment and Materials:
A machine with two Nvidia Titan X GPUs, Caffe deep learning framework, and pre-trained ResNet-101-v2 model on ImageNet.
4:Experimental Procedures and Operational Workflow:
Images are divided into patches, augmented with random mirroring, rescaling, and cropping. The network is trained with a 'poly' learning rate policy, momentum, and weight decay. Testing involves predicting labels for entire images.
5:Data Analysis Methods:
Evaluation uses confusion matrix metrics: recall, precision, F1 score, and overall accuracy, with Cochran's Q test for statistical significance.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容