Optimum molecular descriptors based on 89 machine learning methods for predicting the recovery rate of pesticides in crops by GC-MS
Posters | 2020 | Agilent TechnologiesInstrumentation
Residual pesticide residues in crops represent a critical food safety challenge. Gas chromatography–mass spectrometry (GC-MS) is extensively used for routine analysis, yet recovery rates can fluctuate due to matrix effects. Optimizing molecular descriptors for predictive modeling enhances accuracy and efficiency in pesticide quantification.
This work aimed to identify the optimal set of molecular descriptors (MDs) for predicting pesticide recovery rates in fruits and vegetables by GC-MS. A total of 178 descriptors derived via the rcdk package were considered, and 89 machine learning regression methods were evaluated to build robust predictive models.
A systematic workflow combining correlation analysis and graph clustering effectively reduced descriptor redundancy while preserving predictive power. The selected MDs improved machine learning performance for pesticide recovery prediction by GC-MS, offering a reproducible strategy for analytical method development.
GC/MSD, Software
IndustriesFood & Agriculture
ManufacturerAgilent Technologies
Summary
Importance of the Topic
Residual pesticide residues in crops represent a critical food safety challenge. Gas chromatography–mass spectrometry (GC-MS) is extensively used for routine analysis, yet recovery rates can fluctuate due to matrix effects. Optimizing molecular descriptors for predictive modeling enhances accuracy and efficiency in pesticide quantification.
Objectives and Study Overview
This work aimed to identify the optimal set of molecular descriptors (MDs) for predicting pesticide recovery rates in fruits and vegetables by GC-MS. A total of 178 descriptors derived via the rcdk package were considered, and 89 machine learning regression methods were evaluated to build robust predictive models.
Methodology
- Correlation analysis of all 178 MDs using Pearson coefficient (threshold r=0.7) to flag highly correlated pairs.
- Graph clustering via DPClus to visualize descriptor relationships and select representative, weakly correlated MDs.
- Descriptor groups MD-r1a, MD-r1b and MD-r1c defined through sequential selection steps to balance correlation reduction and information retention.
- Regression models trained with selected MD subsets and compared against full descriptor set using prediction error metrics.
Instrumentation Used
- GC-MS system for pesticide residue analysis in crop matrices.
- R packages: rcdk for MD calculation, corrr for correlation analysis, strech function, and DPClus for clustering.
Main Results and Discussion
- Correlation analysis identified 118 strongly correlated MDs, leading to 28 clusters; 83 descriptors selected for modeling.
- Of 89 machine learning methods, 57 showed improved prediction error with descriptor selection, while 32 performed worse.
- Notable improvements observed with bagEarthGCV, projection pursuit regression (ppr), and several sparse modeling algorithms.
- Some simple linear models exhibited degraded performance after descriptor reduction, highlighting method-specific sensitivity.
Benefits and Practical Applications
- Streamlined descriptor set reduces multicollinearity and speeds up model development.
- Enhanced predictive accuracy supports more reliable pesticide residue quantification in QA/QC workflows.
- Generalizable selection protocol can be adapted for other analytes and analytical platforms.
Future Trends and Opportunities
- Integration of deep learning and automated feature engineering for descriptor optimization.
- Extension of the workflow to liquid chromatography–mass spectrometry and other food matrices.
- Development of web-based tools for real-time descriptor selection and model deployment.
Conclusion
A systematic workflow combining correlation analysis and graph clustering effectively reduced descriptor redundancy while preserving predictive power. The selected MDs improved machine learning performance for pesticide recovery prediction by GC-MS, offering a reproducible strategy for analytical method development.
References
- Serino T, Nakamura S, Takigawa Y, Anumol T, Altaf-UI-Amin M, Kanaya S. Comprehensive Machine Learning Prediction of GC/MS Pesticide Recovery Based on the Molecular Fingerprinting for Food QA/QC; Poster TP-298, ASMS; June 2019.
- Garg A, Tai K. Comparison of Statistical and Machine Learning Methods in Modelling of Data with Multicollinearity. Int J Model Ident Control. 2013;18:295-312.
- Altaf-Ul-Amin M, Shibo Y, Mihara K, Kurokawa K, Kanaya S. Development and Implementation of an Algorithm for Detection of Protein Complexes in Large Interaction Networks. BMC Bioinformatics. 2006;7:207.
Content was automatically generated from an orignal PDF document using AI and may contain inaccuracies.
Similar PDF
Comprehensive machine learning prediction of GC/MS pesticide recovery based on the molecular fingerprinting for food QA/QC
2019|Agilent Technologies|Posters
Poster Reprint ASMS 2019 TP298 Comprehensive machine learning prediction of GC/MS pesticide recovery based on the molecular fingerprinting for food QA/QC Takeshi Serino* 1,2; Sadao Nakamura1; Yoshizumi Takigawa1; Norton Kitagawa3; Shigehiko Kanaya 2 1 Agilent Technologies, Hachioji City, Japan 2…
Key words
learning, learningmachine, machine𝑖𝑗, 𝑖𝑗descriptor, descriptorsmiles, smilesrecovery, recoverypek, pekatoms, atomsgeneralization, generalizationpesticide, pesticideprediction, predictionpesticides, pesticidesmethods, methodsindex, index𝑦ത
Classifying the pesticides in foods between GC-amenable and LC-amenable using the prediction model with molecular descriptors
2020|Agilent Technologies|Posters
Poster Reprint ASMS 2020 WP 165 Classifying the pesticides in foods between GC-amenable and LC-amenable using the prediction model with molecular descriptors Sadao Nakamura 1, Takeshi Serino 1, 2, Takeshi Otsuka 1, Yoshizumi Takigawa 1, Tarun Anumol 3, Shigehiko Kanaya…
Key words
both, bothlearning, learningpesticides, pesticidesdescriptor, descriptormachine, machineamenable, amenableclassification, classificationatoms, atomspesticide, pesticidetech, techensemble, ensemblemethyl, methylexecution, executionlist, listqspr
Agilent ASMS 2020 Posters Book
2020|Agilent Technologies|Posters
Poster Reprint ASMS 2020 MP 176 Using ICP-MS/MS with M-Lens for the analysis of high silicon matrix samples Yu Ying1; Xiangcheng Zeng1 1Agilent China Technologies, China, Shanghai, Introduction The expansion of the connected devices and the Internet of Things (IoT)…
Key words
peptide, peptidereprint, reprintwere, wereposter, postermethod, methoddiscussion, discussionpositive, positiveresults, resultsclassification, classificationusing, usingboth, bothexperimental, experimentalanalysis, analysisrecovery, recoverysample
Analysis and Testing of Lithium-Ion Battery Materials
2021|Shimadzu|Brochures and specifications
C10G-E088 Analysis and Testing of Lithium-Ion Battery Materials Multifaceted Solutions for Improving Performance and Quality of Lithium-Ion Secondary Batteries In the field of transport equipment, which long life, and safety must be resolved. Research accounts for approximately 20% of CO…
Key words
cantilever, cantileverelectrolytic, electrolyticlithium, lithiumdeflection, deflectionbatteries, batteriespiezo, piezoelectrode, electrodeseparators, separatorsbattery, batteryxspecia, xspeciaion, ionlipon, liponcarbonate, carbonateforce, forceelectrolytes