Discrimination of Coffee Bean Species Based on Aroma Compounds
Applications | 2026 | ShimadzuInstrumentation
Accurate and rapid discrimination of coffee species (primarily Coffea arabica vs. Coffea canephora — Robusta) matters for quality control, detection of adulteration, flavor profiling, and supply-chain integrity in coffee production and trading. Aroma compound profiling by GC-MS/MS combined with targeted database matching and chemometric modeling offers a practical alternative to more laborious approaches (e.g., solvent extraction and NMR markers), delivering faster workflows, higher sensitivity, and direct linkage between chemical markers and sensory descriptors.
This application study evaluated whether headspace solid-phase microextraction (HS-SPME) coupled to triple-quadrupole GC-MS/MS can reliably discriminate Arabica and Robusta roasted coffee beans. Specific aims were to: (1) obtain comprehensive volatile profiles using a Smart Aroma Database for identification and MRM quantification; (2) assess separation and characteristic compounds using multivariate statistics (PCA, PLS-DA, clustering); and (3) build and validate a classification model (SVM) to assign unknown samples to species.
Sample preparation and acquisition
- Commercial roasted coffee beans from four Arabica brands and two Robusta brands were milled; 1 g aliquots were sealed in screw vials and analyzed in triplicate.
- Volatiles were concentrated by HS-SPME and introduced directly to the GC-MS/MS with no solvent extraction or concentration steps.
Chromatography and mass spectrometry conditions (concise)
- Instrumentation: GCMS-TQ8040 NX triple-quadrupole MS with AOC-6000 autosampler.
- Column: SH-I-5Sil MS (30 m × 0.25 mm I.D., 0.25 µm).
- Injection: splitless, vaporizer 250 °C.
- Oven program: 50 °C hold 0–5 min, ramp 10 °C/min to 250 °C (final hold to 35 min total).
- Carrier: helium. Ionization: EI; interface 250 °C; ion source 200 °C.
- Acquisition modes: full-scan (m/z 35–400) for exploratory alignment and targeted MRM using Smart Aroma Database for higher sensitivity.
Data processing and chemometrics
- Peak alignment and exploratory analysis: Signpost MS (alignment assigned ~210 compounds from scan data).
- Targeted MRM identifications and quantification used Smart Aroma Database (contains ~500 aroma compounds with MRM transitions); 178 database compounds were identified in MRM mode and 175 compounds without missing values were used for multivariate modeling.
- Multivariate workflows in eMSTAT Solution: unsupervised PCA, supervised PLS-DA, hierarchical clustering and dendrograms, visualization (loading plots, box plots), and supervised classification using Support Vector Machine (SVM) with leave-one-brand-out validation.
- Exploratory scan data: alignment produced ~210 features and hierarchical clustering separated Arabica and Robusta into distinct clusters, indicating species-specific volatile patterns are present and detectable with HS-SPME GC-MS.
- Targeted MRM data: 178 aroma compounds from the Smart Aroma Database were identified; using 175 complete variables, PCA and PLS-DA produced clear separation by species along PC1 and in supervised models, confirming reproducible chemical differences with improved sensitivity and selectivity in MRM mode.
- Characteristic markers: Arabica samples were relatively enriched in compounds such as 5-methylfurfural, acetoin, and furaneol acetate — compounds associated with roasted, caramel, and creamy notes. Robusta was comparatively enriched in p-vinylguaiacol, a compound linked to medicinal/clove-like off-notes and bitterness, supporting sensory distinctions between species.
- Visualization and interpretation: eMSTAT Solution box plots and loading plots enabled straightforward identification of compounds contributing most to PC1 separation; these link analytical signals to sensory descriptors from the Smart Aroma Database.
- Classification performance: an SVM discrimination model built with three Arabica brands plus two Robusta brands and tested in a leave-one-brand-out manner correctly classified each withheld Arabica brand as Arabica with scores ≥85 (several at 100), demonstrating high accuracy and practical discriminative power for sample-level assignment.
- Rapid, low-preparation workflow: HS-SPME autosampling eliminates labor-intensive extraction and concentration steps, enabling higher throughput and simpler routine testing.
- High sensitivity and specificity: MRM acquisition combined with a curated aroma database improves compound identification confidence and quantitative reproducibility compared with scan-only approaches.
- End-to-end analytics: integration of instrument control, targeted databases, and user-friendly chemometrics (eMSTAT) allows labs without deep chemometrics expertise to build and validate discrimination models quickly.
- Use cases include authenticity testing (detection of Robusta adulteration in Arabica-labeled products), sensory quality control, raw material verification, and research into aroma–sensory relationships.
- Expanded databases and transferability: enlarging MRM libraries with retention indices and matrix-specific calibrations will improve identification across roasting levels and origins.
- Hybrid approaches: combining volatile profiling with complementary data streams (e.g., lipid markers by NMR or LC-MS, DNA-based species markers) can increase robustness for complex blends or processed products.
- Machine learning advances: larger training sets and more diverse samples will enable more generalizable classifiers (multi-class origin, roast degree, blend fraction estimation) and probabilistic adulteration detection.
- On-site and rapid screening: further automation and miniaturized GC-MS solutions could enable in-field screening at import/export points or roasting facilities.
The study demonstrates that HS-SPME GC-MS/MS with targeted MRM and a curated Smart Aroma Database, combined with user-oriented chemometrics (eMSTAT Solution), provides a rapid, sensitive, and practical method to discriminate Arabica and Robusta roasted coffee beans and to identify characteristic aroma markers. The workflow reduces sample pretreatment, yields reproducible compound lists, and enables reliable classification models suitable for quality control and authenticity screening.
GC/MSD, GC/MS/MS, GC/QQQ, Software
IndustriesFood & Agriculture
ManufacturerShimadzu
Summary
Significance of the topic
Accurate and rapid discrimination of coffee species (primarily Coffea arabica vs. Coffea canephora — Robusta) matters for quality control, detection of adulteration, flavor profiling, and supply-chain integrity in coffee production and trading. Aroma compound profiling by GC-MS/MS combined with targeted database matching and chemometric modeling offers a practical alternative to more laborious approaches (e.g., solvent extraction and NMR markers), delivering faster workflows, higher sensitivity, and direct linkage between chemical markers and sensory descriptors.
Objectives and overview of the study
This application study evaluated whether headspace solid-phase microextraction (HS-SPME) coupled to triple-quadrupole GC-MS/MS can reliably discriminate Arabica and Robusta roasted coffee beans. Specific aims were to: (1) obtain comprehensive volatile profiles using a Smart Aroma Database for identification and MRM quantification; (2) assess separation and characteristic compounds using multivariate statistics (PCA, PLS-DA, clustering); and (3) build and validate a classification model (SVM) to assign unknown samples to species.
Methodology
Sample preparation and acquisition
- Commercial roasted coffee beans from four Arabica brands and two Robusta brands were milled; 1 g aliquots were sealed in screw vials and analyzed in triplicate.
- Volatiles were concentrated by HS-SPME and introduced directly to the GC-MS/MS with no solvent extraction or concentration steps.
Chromatography and mass spectrometry conditions (concise)
- Instrumentation: GCMS-TQ8040 NX triple-quadrupole MS with AOC-6000 autosampler.
- Column: SH-I-5Sil MS (30 m × 0.25 mm I.D., 0.25 µm).
- Injection: splitless, vaporizer 250 °C.
- Oven program: 50 °C hold 0–5 min, ramp 10 °C/min to 250 °C (final hold to 35 min total).
- Carrier: helium. Ionization: EI; interface 250 °C; ion source 200 °C.
- Acquisition modes: full-scan (m/z 35–400) for exploratory alignment and targeted MRM using Smart Aroma Database for higher sensitivity.
Data processing and chemometrics
- Peak alignment and exploratory analysis: Signpost MS (alignment assigned ~210 compounds from scan data).
- Targeted MRM identifications and quantification used Smart Aroma Database (contains ~500 aroma compounds with MRM transitions); 178 database compounds were identified in MRM mode and 175 compounds without missing values were used for multivariate modeling.
- Multivariate workflows in eMSTAT Solution: unsupervised PCA, supervised PLS-DA, hierarchical clustering and dendrograms, visualization (loading plots, box plots), and supervised classification using Support Vector Machine (SVM) with leave-one-brand-out validation.
Used instrumentation
- GCMS-TQ8040 NX triple quadrupole gas chromatograph–mass spectrometer
- AOC-6000 multifunctional autosampler (HS-SPME automation)
- SH-I-5Sil MS GC column (30 m × 0.25 mm, 0.25 µm)
- Smart Aroma Database (GC-MS(/MS) with MRM conditions and sensory descriptors)
- LabSolutions Insight GCMS (for chromatogram review)
- Signpost MS (alignment and peak assignment)
- eMSTAT Solution (statistical analysis, PCA, PLS-DA, SVM model building)
Main results and discussion
- Exploratory scan data: alignment produced ~210 features and hierarchical clustering separated Arabica and Robusta into distinct clusters, indicating species-specific volatile patterns are present and detectable with HS-SPME GC-MS.
- Targeted MRM data: 178 aroma compounds from the Smart Aroma Database were identified; using 175 complete variables, PCA and PLS-DA produced clear separation by species along PC1 and in supervised models, confirming reproducible chemical differences with improved sensitivity and selectivity in MRM mode.
- Characteristic markers: Arabica samples were relatively enriched in compounds such as 5-methylfurfural, acetoin, and furaneol acetate — compounds associated with roasted, caramel, and creamy notes. Robusta was comparatively enriched in p-vinylguaiacol, a compound linked to medicinal/clove-like off-notes and bitterness, supporting sensory distinctions between species.
- Visualization and interpretation: eMSTAT Solution box plots and loading plots enabled straightforward identification of compounds contributing most to PC1 separation; these link analytical signals to sensory descriptors from the Smart Aroma Database.
- Classification performance: an SVM discrimination model built with three Arabica brands plus two Robusta brands and tested in a leave-one-brand-out manner correctly classified each withheld Arabica brand as Arabica with scores ≥85 (several at 100), demonstrating high accuracy and practical discriminative power for sample-level assignment.
Benefits and practical applications
- Rapid, low-preparation workflow: HS-SPME autosampling eliminates labor-intensive extraction and concentration steps, enabling higher throughput and simpler routine testing.
- High sensitivity and specificity: MRM acquisition combined with a curated aroma database improves compound identification confidence and quantitative reproducibility compared with scan-only approaches.
- End-to-end analytics: integration of instrument control, targeted databases, and user-friendly chemometrics (eMSTAT) allows labs without deep chemometrics expertise to build and validate discrimination models quickly.
- Use cases include authenticity testing (detection of Robusta adulteration in Arabica-labeled products), sensory quality control, raw material verification, and research into aroma–sensory relationships.
Future trends and possibilities for use
- Expanded databases and transferability: enlarging MRM libraries with retention indices and matrix-specific calibrations will improve identification across roasting levels and origins.
- Hybrid approaches: combining volatile profiling with complementary data streams (e.g., lipid markers by NMR or LC-MS, DNA-based species markers) can increase robustness for complex blends or processed products.
- Machine learning advances: larger training sets and more diverse samples will enable more generalizable classifiers (multi-class origin, roast degree, blend fraction estimation) and probabilistic adulteration detection.
- On-site and rapid screening: further automation and miniaturized GC-MS solutions could enable in-field screening at import/export points or roasting facilities.
Conclusion
The study demonstrates that HS-SPME GC-MS/MS with targeted MRM and a curated Smart Aroma Database, combined with user-oriented chemometrics (eMSTAT Solution), provides a rapid, sensitive, and practical method to discriminate Arabica and Robusta roasted coffee beans and to identify characteristic aroma markers. The workflow reduces sample pretreatment, yields reproducible compound lists, and enables reliable classification models suitable for quality control and authenticity screening.
Reference
- Gunning Y., et al. Food Chemistry, 248, 52–60 (2018).
Content was automatically generated from an orignal PDF document using AI and may contain inaccuracies.
Similar PDF
Classification of Coffees from Different Origins by Chemical Sensor Technology
2002|Agilent Technologies|Applications
AppNote 13/2002 Classification of Coffees from Different Origins by Chemical Sensor Technology Inge M. Dirinck, Isabelle E. Van Leuven, Patrick J. Dirinck Laboratory for Flavor Research, Catholic Technical University St. Lieven, Gebr. Desmetstraat 1, B-9000 Gent, Belgium Arnd C. Heiden…
Key words
robusta, robustacoffee, coffeearabica, arabicachemsensor, chemsensorkenya, kenyaclassification, classificationjava, javanoir, noirvietnam, vietnamafrican, africansoft, softvarieties, varietiescoffees, coffeesgrain, grainbrazil
Comparison of Metabolites in Rice from Different Production Areas Using GC-MS/MS
2025|Shimadzu|Applications
GC-MS GCMS-TQ 8040 NX Statistical Analysis Software eMSTAT Solution Application News Comparison of Metabolites in Rice from Different Production Areas Using GC-MS/MS Hitomi Tsujihata, Yutaka Umakoshi, and Nanami Sakashita User Benefits eMSTAT Solution enables multivariate analysis of chromatogram data…
Key words
tms, tmskagawa, kagawaacid, aciddiscriminant, discriminantchiba, chibaemstat, emstatshiga, shigaibaraki, ibarakianalysis, analysismetabolites, metabolitesinquiry, inquirystatistical, statisticalcaproic, caproicmultivariate, multivariateoctanoic
Statistical Analysis Software for Analytical Instruments eMSTAT Solution
2024|Shimadzu|Applications
Item Analysis functions Univariate analysis t-test Mann-Whitney U-test ANOVA (analysis of variance) Multivariate analysis PCA (Principal Component Analysis) PLS-DA Discriminant analysis Support Vector Machine SVM Random Forest Other Dynamic grouping Multivariate analysis Peak Matrix Box Plot ROC AUC Score/Loading Plot…
Key words
discriminant, discriminantyogurt, yogurtplot, plotemstat, emstatanalysis, analysisgrouping, groupingstatistical, statisticalscore, scoremultivariate, multivariateunknown, unknownfermented, fermentedloading, loadingunfermented, unfermentedunheated, unheatedfile
Quantifying the Similarity of Two Coffee Bean Products by GC/MS and EDXRF
2024|Shimadzu|Applications
GC-MS GCMS-TQ™ 8040 NX Energy Dispersive X-ray Fluorescence Spectrometer EDX-7200 Application News Quantifying the Similarity of Two Coffee Bean Products by GC/MS and EDXRF Yuki Nakagawa, Hirokazu Moriya User Benefits By using the Smart Aroma Database™, trace aroma components…
Key words
coffee, coffeebean, beanbeans, beanscountries, countriessimilarity, similaritytaste, tastescore, scorefound, foundelements, elementselement, elementnews, newslatte, lattearoma, aromatube, tubeflavor