- 标题
- 摘要
- 关键词
- 实验方案
- 产品
-
Prediction model optimization using full model selection with regression trees demonstrated with FTIR data from bovine milk
摘要: Predictive modeling is the development of a model that is best able to predict an outcome based on given input variables. Model algorithms are different processes that are used to define functions that transform the data within models. Common algorithms include logistic regression (LR), linear discriminant analysis (LDA), classification and regression trees (CART), na?ve Bayes (NB), and k-nearest neighbor (KNN). Data preprocessing option, such as feature extraction and reduction, and model algorithms are commonly selected empirically in epidemiological studies even though these decisions can significantly affect model performance. Accordingly, full model selection (FMS) methods were developed to provide a systematic approach to select predictive modeling methods; however, current limitations of FMS, such as its dependency on user-selected hyperparameters, have prevented their routine incorporation into analyses for model performance optimization. Here we present the use of regression trees as an innovative method to apply FMS. Regression tree FMS (rtFMS) requires the development of a model for every combination of predictive modeling method options under consideration. The iterated, cross-validation performances of these models are then passed through a regression tree for selection of a final model. We demonstrate the benefits of rtFMS using a milk Fourier transform infrared spectroscopy dataset, wherein we build prediction models for two blood metabolic health parameters in dairy cows, nonesterified fatty acids (NEFA) and β-hydroxybutyrate acid (BHBA). The goal for building NEFA and BHBA prediction models is to provide a milk-based screening tool for metabolic health in dairy cattle that can be incorporated automatically in milk analysis routines. These models could be used in conjunction with physical exams, cow side tests, and other indications to initiate medical intervention. In contrast to previously reported FMS methods, rtFMS is not a black box, is simple to implement and interpret, it does not have hyperparameters, and it illustrates the relative importance of modeling options. Additionally, rtFMS allows for indirect comparisons among models developed using different datasets. Finally, rtFMS eliminates user bias due to personal preference for certain methods and rtFMS removes the dependency on published comparisons of methods. Thus, rtFMS provides clear benefits over the empirical selection of data preprocessing options and model algorithms.
关键词: Prediction model,Fourier-transform infrared spectra,Regression tree,Preprocessing,Full model selection
更新于2025-09-23 15:23:52
-
Tilted Photovoltaic Energy Outputs in Outdoor Environments
摘要: The direction and environment of photovoltaics (PVs) may influence their energy output. The practical PV performance under various conditions should be estimated, particularly during initial design stages when PV model types are unknown. Previous studies have focused on a limited number of PV projects, which required the details of many PV models; furthermore, the models can be case sensitive. According to the 18 projects conducted in 7 locations (latitude 29.5–51.25N) around the world, we developed polynomials for the crystalline silicon PV energy output for different accessible input variables. A regression tree effectively evaluated the correlations of the outcomes with the input variables; those of high importance were identified. The coefficient of determination, indicating the percentage of datasets being predictable by the input, was higher than 0.65 for 14 of the 18 projects when the polynomial was developed using the accessible variables such as global horizontal solar radiation. However, individual equations should be derived for horizontal cases, indicating that a universal polynomial for crystalline silicon PVs with a tilt angle in the range 0°–66° can be difficult to develop. The proposed model will contribute to evaluating the performance of PVs with low and medium tilt angles for places of similar climates.
关键词: real-time estimation,regression tree,photovoltaic efficiency,polynomial,universal model
更新于2025-09-19 17:13:59
-
[IEEE IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium - Valencia (2018.7.22-2018.7.27)] IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium - The Potential of Sentinel Satellites for Large Area Aboveground Forest Biomass Mapping
摘要: Estimation of aboveground forest biomass is critical for regional carbon policies and sustainable forest management. Both passive optical remote sensing and active microwave remote sensing can play an important role in the monitoring of forest biomass. In this study, the recently launched Sentinel-2 Multi Spectral Instrument satellite and Sentinel-1 SAR satellite systems were evaluated and integrated to investigate the relative strengths of each sensor for mapping aboveground forest biomass at a regional scale. The Australian state of Victoria, with its wide range of forest vegetation was chosen as the study area to demonstrate the scalability and transferability of the approach. In this study aboveground forest biomass (AGB) was defined as the tons of carbon per hectare for the aboveground components (stem, branches, leaves) of all live large trees greater than 10 cm in diameter at breast height (DBHOB). Sentinel-2 and Sentinel-1 data were fused within a machine learning framework using a boosted regression tree model and high-quality ground survey data. Multi-criteria evaluations showed the use of the two independent and fundamentally different Sentinel satellite systems were able to provide robust estimates (R2 of 0.62, RMSE of 32.2 t.C.ha-1) of aboveground forest biomass, with each sensor compensating for the weakness (cloud perturbations and spectral saturation for Sentinel 2, and sensitivity to ground moisture for Sentinel 1) of each other. As archives for Sentinel-2 and Sentinel-1 continue to grow, mapping aboveground forest biomass and dynamics at moderate resolution over large regions should become increasingly feasible.
关键词: Sentinel-2,machine learning,data fusion,Sentinel-1,Victoria,boosted regression tree model,Australia,biomass estimation
更新于2025-09-04 15:30:14