研究目的
To propose a more efficient fully convolutional network for dense semantic labeling in high-resolution remote sensing imagery by combining the advantages of atrous spatial pyramid pooling (ASPP) and encoder-decoder structures, aiming to improve performance in detecting objects at multiple scales and restoring sharper object boundaries.
研究成果
The proposed model effectively combines ASPP and encoder-decoder structures with a multi-scale loss function, achieving higher accuracy on the Potsdam and Vaihingen datasets compared to other methods. It demonstrates better handling of multi-scale objects and sharper boundaries, with post-processing further refining results. The study highlights the potential for improved semantic segmentation in remote sensing applications but notes the need for less labor-intensive annotation methods in future research.
研究不足
The method relies on manually annotated ground truth, which is labor-intensive. The network complexity may require significant computational resources, and the improvements over state-of-the-art methods are marginal (e.g., 0.4-0.6% accuracy gains). Future work could explore semi-supervised or weak supervision methods to reduce annotation effort.