Comprehensive machine learning prediction of GC/MS pesticide recovery based on the molecular fingerprinting for food QA/QC
Posters | 2019 | Agilent TechnologiesInstrumentation
Pesticide residues in food commodities are strictly regulated to ensure consumer safety. Gas chromatography coupled with mass spectrometry (GC/MS) is a standard technique for detecting trace amounts of pesticides, but recovery rates can vary widely depending on chemical structure, sample matrix, and preparation protocols. Developing predictive models for recovery efficiency can streamline method development and improve QA/QC in food analysis laboratories.
This study aimed to construct and evaluate machine learning regression models capable of predicting GC/MS recovery rates of 248 pesticides across seven types of crops. By leveraging molecular descriptors derived from SMILES representations, the work sought to identify optimal algorithms that balance prediction accuracy and computational cost.
Correlation analysis showed weak relationships between single descriptors and recovery rates (Pearson r from -0.254 to 0.523), indicating the need for multivariate models. Execution times ranged from 0.83 to 7,394 seconds. Among ordinary methods, the Centroid kNN category (SBC) achieved the lowest PE, while among ensemble methods, xgbLinear (eXtreme Gradient Boosting Linear) offered superior accuracy with moderate computational cost. Four top performers across all metrics were SBC, xgbLinear, monmlp, and ppr.
A comprehensive machine learning approach was established to predict GC/MS pesticide recovery in food matrices using molecular fingerprints. The study identified SBC and xgbLinear as optimal regression methods, providing a data-driven strategy for accelerating analytical method development in food safety applications.
GC/MSD
IndustriesFood & Agriculture
ManufacturerAgilent Technologies
Summary
Significance of the Topic
Pesticide residues in food commodities are strictly regulated to ensure consumer safety. Gas chromatography coupled with mass spectrometry (GC/MS) is a standard technique for detecting trace amounts of pesticides, but recovery rates can vary widely depending on chemical structure, sample matrix, and preparation protocols. Developing predictive models for recovery efficiency can streamline method development and improve QA/QC in food analysis laboratories.
Objectives and Study Overview
This study aimed to construct and evaluate machine learning regression models capable of predicting GC/MS recovery rates of 248 pesticides across seven types of crops. By leveraging molecular descriptors derived from SMILES representations, the work sought to identify optimal algorithms that balance prediction accuracy and computational cost.
Methodology and Instrumentation
- Sample Preparation and GC/MS Analysis: Crops were treated under Japan’s Positive List guidelines, spiked at multiple levels (20, 50, 100, 200 ppb), and analyzed by GC/MS to obtain recovery rates.
- Molecular Descriptor Generation: Unique SMILES strings for 248 pesticides were retrieved from PubChem. Using the rcdk package in R, 224 descriptors were computed and refined to 178 after removing missing values.
- Machine Learning Framework: The caret package in R was used to test 89 regression models, including 69 ordinary learning methods (e.g., linear models, kernel approaches, neural networks, PLS variants) and 20 ensemble methods (e.g., random forest, gradient boosting, bagging).
- Performance Metrics: Prediction Error (PE) via 10-fold cross-validation, execution time (ET), and a Generalization Performance Index (PEk) were calculated to rank methods across seven crop matrices.
- Instrumentation: Agilent Technologies GC/MS system; R programming environment; caret and rcdk packages for model building and descriptor assignment.
Main Results and Discussion
Correlation analysis showed weak relationships between single descriptors and recovery rates (Pearson r from -0.254 to 0.523), indicating the need for multivariate models. Execution times ranged from 0.83 to 7,394 seconds. Among ordinary methods, the Centroid kNN category (SBC) achieved the lowest PE, while among ensemble methods, xgbLinear (eXtreme Gradient Boosting Linear) offered superior accuracy with moderate computational cost. Four top performers across all metrics were SBC, xgbLinear, monmlp, and ppr.
Benefits and Practical Applications
- Reduces empirical trial-and-error in method development by preselecting high-performance algorithms.
- Enhances reliability and reproducibility of pesticide analysis in QA/QC laboratories.
- Streamlines resource allocation by highlighting models that balance speed and prediction quality.
Future Trends and Opportunities
- Expansion to deep learning architectures and automated feature selection to capture complex structure–recovery relationships.
- Incorporation of larger, more diverse food matrices and pesticide classes for broader model generalization.
- Development of real-time prediction tools integrated with chromatographic software for on-the-fly method optimization.
Conclusion
A comprehensive machine learning approach was established to predict GC/MS pesticide recovery in food matrices using molecular fingerprints. The study identified SBC and xgbLinear as optimal regression methods, providing a data-driven strategy for accelerating analytical method development in food safety applications.
References
- [1] Sadao Nakamura, Takashi Yamagami, Yukiko Ono, Kenichi Toubou, Shigeki Daishima, "Multi-residue Analysis of Pesticides in Agricultural Products by GC/MS Using Synchronous SIM/Scan Acquisition," Bunseki Kagaku, vol. 62, pp. 229-241, 2013.
- [2] Ministry of Health, Labour and Welfare of Japan, "Positive List System for Agricultural Chemical Residues in Foods," 2018.
Content was automatically generated from an orignal PDF document using AI and may contain inaccuracies.
Similar PDF
Optimum molecular descriptors based on 89 machine learning methods for predicting the recovery rate of pesticides in crops by GC-MS
2020|Agilent Technologies|Posters
Poster Reprint ASMS 2020 ThP 177 Optimum molecular descriptors based on 89 machine learning methods for predicting the recovery rate of pesticides in crops by GC-MS Takeshi Serino 1, 2, Sadao Nakamura 1, Yoshizumi Takigawa 1, Tarun Anumol 3, Md.…
Key words
mds, mdsdescriptors, descriptorscluster, clusterdescriptor, descriptorclustering, clustering𝑖𝑗, 𝑖𝑗molecular, molecularcorrelation, correlationatoms, atomsordinary, ordinarylearning, learninggraph, graphprediction, predictioncorrelated, correlatedmachine
Classifying the pesticides in foods between GC-amenable and LC-amenable using the prediction model with molecular descriptors
2020|Agilent Technologies|Posters
Poster Reprint ASMS 2020 WP 165 Classifying the pesticides in foods between GC-amenable and LC-amenable using the prediction model with molecular descriptors Sadao Nakamura 1, Takeshi Serino 1, 2, Takeshi Otsuka 1, Yoshizumi Takigawa 1, Tarun Anumol 3, Shigehiko Kanaya…
Key words
both, bothlearning, learningpesticides, pesticidesdescriptor, descriptormachine, machineamenable, amenableclassification, classificationatoms, atomspesticide, pesticidetech, techensemble, ensemblemethyl, methylexecution, executionlist, listqspr
Agilent ASMS 2020 Posters Book
2020|Agilent Technologies|Posters
Poster Reprint ASMS 2020 MP 176 Using ICP-MS/MS with M-Lens for the analysis of high silicon matrix samples Yu Ying1; Xiangcheng Zeng1 1Agilent China Technologies, China, Shanghai, Introduction The expansion of the connected devices and the Internet of Things (IoT)…
Key words
peptide, peptidereprint, reprintwere, wereposter, postermethod, methoddiscussion, discussionpositive, positiveresults, resultsclassification, classificationusing, usingboth, bothexperimental, experimentalanalysis, analysisrecovery, recoverysample
Structural elucidation using GCxGC-TOFMS and machine learning for unknown metabolites in HeLa cell
2026|JEOL|Presentations
The Multidimensional Chromatography Workshop 2026 Structural elucidation using GCxGC-TOFMS and machine learning for unknown metabolites in HeLa cell DAY 1 – TUESDAY January 13, 2026 1:50 - 2:10 PM, O-7 Masaaki Ubukata1, Azusa Kubota1, Ayumi Kubo1, Misaki Kurata2, Hiroshi Tsugawa2…
Key words
formula, formulamolecular, molecularpredicted, predictedstructure, structurenist, nistmsfineanalysis, msfineanalysismass, masseimass, eimassstructural, structuralspectral, spectralspectrum, spectrumsearch, searchlibrary, libraryelucidation, elucidationpubchem