研究目的
To propose a novel high-level image representation using image attributes for scene classification to reduce the semantic gap between low-level features and high-level scene semantics.
研究成果
The attribute-based representation significantly outperforms existing methods on scene classification tasks, effectively narrowing the semantic gap and solving issues like semantic hierarchy. Future work will focus on testing in other applications and developing more efficient network structures.
研究不足
The time complexity is higher than low-level representations due to the complexity of the network. The performance depends on the quality and size of the training data, and the method may not generalize well to all types of images without further tuning.
1:Experimental Design and Method Selection:
The study uses a deep convolutional neural network (CNN) pre-trained on ImageNet and fine-tuned on the COCO dataset with a multi-label classification framework using an element-wise logistic loss function. Attributes are extracted from image captions to form a vocabulary, and max-pooling is applied for feature extraction.
2:Sample Selection and Data Sources:
Four datasets are used: Sports, Indoor, Outdoor, and 15 Scene datasets. The COCO dataset provides images and captions for attribute vocabulary construction.
3:List of Experimental Equipment and Materials:
A server with an Intel Xeon Haswel processor, Intel C612 motherboard chipset, and TITANXXTREME-12GD graphics card. Software includes CxxNet framework.
4:Experimental Procedures and Operational Workflow:
Pre-train CNN on ImageNet, fine-tune on COCO with multi-label loss, predict attributes for test images using the fine-tuned model, and classify scenes using linear binary classifiers.
5:Data Analysis Methods:
Classification accuracy is evaluated using average class accuracy. Comparisons are made with state-of-the-art methods, and robustness is tested with varying training sample sizes.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容