GcDUO: A Batch Processing Framework Integrating PARAFAC and PARAFAC2 for GC×GC-MS Data Analysis
Presentations | 2026 | Institut d’Investigació Sanitària Pere Virgili | MDCWInstrumentation
Comprehensive two-dimensional gas chromatography coupled to mass spectrometry (GC×GC-MS) is essential for complex mixture analysis in environmental, petrochemical, food and biological research due to its high separation power and structural information capabilities. However, the resulting data complexity requires advanced computational methods for deconvolution, alignment and compound annotation at scale.
This study introduces GcDUO, an open-source framework that integrates PARAFAC and PARAFAC2 tensor decompositions to automate batch processing of GC×GC-MS datasets, aiming to improve non-targeted compound detection, annotation efficiency and reproducibility across samples.
The GcDUO pipeline converts raw GC×GC-MS chromatograms in standard CDF format into four-dimensional data arrays (samples × first-dimension retention time (RT1) × second-dimension retention time (RT2) × m/z). Key processing steps include region-of-interest selection via a watershed segmentation algorithm, noise reduction and baseline correction, followed by:
Used Instrumentation:
On public training datasets (FruityBeer and BreathMix), GcDUO achieved performance comparable to commercial software (ChromaTOF) with over 90% compound recall. It accurately annotated 23 out of 27 compounds in FruityBeer (with three library omissions) and all 12 compounds in BreathMix, using retention index consensus to recover low signal-to-noise features. Quantitative peak area measurements showed close agreement, demonstrating robustness against retention time variability and noise.
GcDUO delivers a scalable, open-source framework that combines PARAFAC and PARAFAC2 for robust GC×GC-MS data analysis, matching commercial performance while lowering barriers to entry and supporting reproducible workflows in complex mixture profiling.
GCxGC, GC/MSD, GC/TOF, Software
IndustriesOther
ManufacturerSummary
Significance of the Topic
Comprehensive two-dimensional gas chromatography coupled to mass spectrometry (GC×GC-MS) is essential for complex mixture analysis in environmental, petrochemical, food and biological research due to its high separation power and structural information capabilities. However, the resulting data complexity requires advanced computational methods for deconvolution, alignment and compound annotation at scale.
Objectives and Study Overview
This study introduces GcDUO, an open-source framework that integrates PARAFAC and PARAFAC2 tensor decompositions to automate batch processing of GC×GC-MS datasets, aiming to improve non-targeted compound detection, annotation efficiency and reproducibility across samples.
Methodology and Instrumentation
The GcDUO pipeline converts raw GC×GC-MS chromatograms in standard CDF format into four-dimensional data arrays (samples × first-dimension retention time (RT1) × second-dimension retention time (RT2) × m/z). Key processing steps include region-of-interest selection via a watershed segmentation algorithm, noise reduction and baseline correction, followed by:
- PARAFAC decomposition to extract trilinear components representing chromatographic elution profiles and mass spectra.
- Consensus scoring combining spectral dot-product matching and retention index alignment against reference libraries.
- PARAFAC2 modeling to accommodate retention time shifts across batches and refine quantitative profiles.
Used Instrumentation:
- Comprehensive GC×GC-MS system
- Data acquisition in CDF file format
- Open-source Python libraries for tensor analysis (e.g. TensorLy, SciPy)
Key Results and Discussion
On public training datasets (FruityBeer and BreathMix), GcDUO achieved performance comparable to commercial software (ChromaTOF) with over 90% compound recall. It accurately annotated 23 out of 27 compounds in FruityBeer (with three library omissions) and all 12 compounds in BreathMix, using retention index consensus to recover low signal-to-noise features. Quantitative peak area measurements showed close agreement, demonstrating robustness against retention time variability and noise.
Benefits and Practical Applications
- An open-source solution requiring no proprietary software or expert tuning.
- Automated batch processing enhances throughput in non-targeted metabolomics, environmental monitoring and flavor profiling.
- Reproducible annotation and quantitation despite dataset-specific retention shifts.
Future Trends and Opportunities
- Integration with machine learning classifiers for automated compound class prediction.
- Extension to other hyphenated techniques (e.g. LC×LC-MS).
- Development of cloud-based or GUI-enabled versions for broader accessibility.
- Community-driven library sharing and expanded retention index databases.
Conclusions
GcDUO delivers a scalable, open-source framework that combines PARAFAC and PARAFAC2 for robust GC×GC-MS data analysis, matching commercial performance while lowering barriers to entry and supporting reproducible workflows in complex mixture profiling.
References
- Llambrich M., van der Kloet F.M., Sementé L., et al. GcDUO: A Batch Processing Framework Integrating PARAFAC and PARAFAC2 for GC×GC-MS Data Analysis. Briefings in Bioinformatics, 2025. DOI:10.1093/bib/bbaf080
- FruityBeer dataset, DOI:10.7910/DVN/KA5BTU
- BreathMix dataset, DOI:10.5281/zenodo.13947810
Content was automatically generated from an orignal PDF document using AI and may contain inaccuracies.
Similar PDF
17th Multidimensional Chromatography Workshop Abstract book
2026|LECO|Others
January 12 - 15, 2026 Abstract Book Thank you to our sponsors for making this event possible. It is your generous support that enriches the conference program and allows us to operate the conference with free registration for all attendees.…
Key words
dimensional, dimensionalabstract, abstractchromatography, chromatographycomprehensive, comprehensivetwo, twogas, gasgcxgc, gcxgcanalysis, analysisspectrometry, spectrometryflight, flightmass, masstofms, tofmstwodimensional, twodimensionalpyrolysis, pyrolysisnontargeted
WHAT’S IN THE DUST? GC◊GC-MS BASED NON-TARGET SCREENING OF HOUSE DUST
2025|LECO|Presentations
WHAT’S IN THE DUST? GC×GC-MS BASED NON-TARGET SCREENING OF HOUSE DUST Andriy Rebryk Peter Haglund MDCW 2025 3-5/2/2025, Liege INTRODUCTION AIM WORKFLOW RESULTS CONCLUSIONS Background 8.9 mio Improving indoor air quality for a healthier home and Europe Gori et al.…
Key words
aim, aimeucs, eucsworkflow, workflowumeå, umeåalignment, alignmentconclusions, conclusionssyncoreplus, syncoreplusclassification, classificationdust, dustbüchi, büchiamsterdam, amsterdamnist, nistlibrary, librarybleed, bleedintroduction
One-shot tensor decomposition of full-scale GC×GC-VUV datasets for resolving petrochemical groups
2026||Presentations
One-shot tensor decomposition of full-scale GC×GC-VUV datasets for resolving petrochemical groups Paul-Albert Schneide 1, Jesper L. Hinrich 2, Aleksandra Lelevic 3 1 Department of Food Science, University of Copenhagen 2 Department of Computer Science, Technical University of Denmark 3 IFP…
Key words
invariant, invariantinternal, internalvuv, vuvtri, tridecomposition, decompositionshift, shiftaromatics, aromaticsonce, oncemodel, modellandscapes, landscapesdata, dataoutline, outlinepresentation, presentationlinear, linearabsorption
Comparative analysis of peak-detection techniques for comprehensive two-dimensional chromatography
2011|ZOEX/JSB|Scientific articles
JSB is an authorised partner of Comparative analysis of peak-detection techniques for comprehensive two-dimensional chromatography #S01 Indu Latha a, Stephen E. Reichenbach a,∗, Qingping Tao b a b Computer…
Key words
watershed, watershedlcxlc, lcxlcpeak, peakmean, meanalgorithm, algorithmtwo, twosupplier, suppliergcxgc, gcxgcshifts, shiftsstep, stepsignal, signalalgorithms, algorithmsretention, retentiondetection, detectiondimensional