研究目的
To propose a novel computational method for spectra identification that mixes two approaches: (a) the proper selection, preprocessing and augmentation of the training dataset and (b) the application of an effective machine learning method that is powerful in classifying multi-dimensional and noisy data while not being prone to overfitting.
研究成果
The proposed method, particularly in the case of Extra Trees, outperforms the best results in classification accuracy previously reported on the corrected RRUFF data, demonstrating that ensemble based methods are able to learn and discriminate efficiently the subtle discrepancies of the chemical mixtures in spectral samples without computational burden.
研究不足
The method's performance is dependent on the quality and representativeness of the training dataset. The preprocessing and augmentation steps require careful tuning to avoid altering the identity of the compound represented by a given peak.