研究目的
To identify robust features at various layers of a neural network using Random Forests and evaluate their classification performance in isolation, proposing methods for feature saliency evaluation and validating intuitions about network features.
研究成果
The study demonstrates that feature size in neural networks can be reduced by up to 50% without significant loss in accuracy, indicating information repetition. Outer layers show higher discriminatory power, and pooling layers concentrate information. Random Forests trained on CNN features can outperform the CNN itself in some cases, providing tools for network analysis and potential improvements in training efficiency.
研究不足
The experiments are limited to specific datasets (Hand and MNIST) and network architectures. The feature reduction methods may not generalize to all types of neural networks or tasks. Computational complexity of training multiple Random Forests could be high for very large datasets.
1:Experimental Design and Method Selection:
The methodology involves using Guided Regularized Random Forests (GRRF) and shadow features to evaluate feature saliency in trained neural networks. Theoretical models include entropy maximization from Random Forests and statistical tests like the Wilcoxon rank-sum test.
2:Sample Selection and Data Sources:
Two datasets are used: a Hand dataset with 105,000 training and 10,000 test samples captured with a Time of Flight camera, and the MNIST handwritten digit dataset. Features are extracted from various layers of pre-trained CNNs (e.g., LeNet for MNIST and a custom network for Hand data).
3:List of Experimental Equipment and Materials:
A Time of Flight camera for the Hand dataset; no specific models or brands are mentioned. Computational resources for training neural networks and Random Forests.
4:Experimental Procedures and Operational Workflow:
Networks are trained for 10,000 iterations with batch size
5:Features from all layers are extracted for training sets. GRRF is applied with varying γ parameters to select feature subsets. Shadow features are created by duplicating and randomizing labels. Random Forests are trained and evaluated for accuracy and feature survival rates. Data Analysis Methods:
Accuracy is measured for classification tasks. Feature survival rates are calculated per layer. Statistical analysis includes Wilcoxon rank-sum test for p-values to compare real and shadow features.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容