研究目的
To handle the mislabeling problem in satellite image data by proposing an ensemble margin-based mislabeled training data identification, elimination, and correction approach.
研究成果
The margin-based mislabelling handling approach is significantly more effective for both mislabeled data removal and correction than the majority vote method. Future work will investigate both mislabelling and imbalance major training data issues in the same ensemble margin framework.
研究不足
The method's effectiveness may be limited by the imbalance ratio of the dataset and the actual mislabelling rate, which is unknown and assumed to be smaller than the considered artificial mislabelling rate.
1:Experimental Design and Method Selection:
The study uses random forests as a robust ensemble classifier and involves an iterative training margins calculation scheme.
2:Sample Selection and Data Sources:
Four satellite image datasets were used, each divided into training, validation, and test sets. Artificially corrupted data were injected by randomly choosing a subset of 20% from the original datasets.
3:List of Experimental Equipment and Materials:
Random forests and boosting ensembles were implemented with 100 pruned trees.
4:Experimental Procedures and Operational Workflow:
The method involves calculating the ensemble margin of each training instance, ordering misclassified instances according to their margin values, and iteratively removing or correcting mislabeled data.
5:Data Analysis Methods:
The effectiveness of the method is assessed by comparing the classification accuracy of boosting on test sets with and without mislabeled data filtering.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容