Automating and Extending Comprehensive Two-Dimensional Gas Chromatography Data Processing by Interfacing Open-Source and Commercial Software
- Photo: Analytical Chemistry 2020 92 (20), 13953-13960: graphical abstract
In the research article published in ACS Analytical Chemistry journal the researchers from the University of Leicester, UK, and Leicester NIHR Biomedical Research Center developed a new approach to comprehensive two-dimensional gas chromatography data processing by streamlining the integration of free and proprietary tools within a single platform, enhancing the workflow for researchers.
Comprehensive two-dimensional gas chromatography (GC×GC) is an advanced analytical method used for detailed chemical analysis, but it produces complex datasets that are difficult to manage. A new approach has been developed using an underutilized interface in commercial software to streamline the integration of free and proprietary tools within a single platform, enhancing the workflow for researchers. This integrated system was tested on a large-scale GC×GC metabolomics dataset, employing unique algorithms to significantly reduce variations in retention times, demonstrating improved data accuracy and processing efficiency. The interface not only facilitates workflow automation but also enhances the connectivity among various chemometric tools, paving the way for better integration with broader data management systems.
The original article
Automating and Extending Comprehensive Two-Dimensional Gas Chromatography Data Processing by Interfacing Open-Source and Commercial Software
Michael J. Wilde, Bo Zhao, Rebecca L. Cordell, Wadah Ibrahim, Amisha Singapuri, Neil J. Greening, Chris E. Brightling, Salman Siddiqui, Paul S. Monks, and Robert C. Free
Analytical Chemistry 2020 92 (20), 13953-13960
DOI: 10.1021/acs.analchem.0c02844
licensed under CC-BY 4.0
Abstract
Comprehensive two-dimensional gas chromatography (GC×GC) is a powerful analytical tool for both nontargeted and targeted analyses. However, there is a need for more integrated workflows for processing and managing the resultant high-complexity datasets. End-to-end workflows for processing GC×GC data are challenging and often require multiple tools or software to process a single dataset. We describe a new approach, which uses an existing underutilized interface within commercial software to integrate free and open-source/external scripts and tools, tailoring the workflow to the needs of the individual researcher within a single software environment. To demonstrate the concept, the interface was successfully used to complete a first-pass alignment on a large-scale GC×GC metabolomics dataset. The analysis was performed by interfacing bespoke and published external algorithms within a commercial software environment to automatically correct the variation in retention times captured by a routine reference standard. Variation in 1tR and 2tR was reduced on average from 8 and 16% CV prealignment to less than 1 and 2% post alignment, respectively. The interface enables automation and creation of new functions and increases the interconnectivity between chemometric tools, providing a window for integrating data-processing software with larger informatics-based data management platforms.
Introduction
Advanced analytical technologies are revealing new chemical complexities in our environment (e.g., wastewaters and air quality), our bodies (e.g., metabolome and microbiome), and our food and commodities (e.g., packaging, materials, and petrochemicals). Comprehensive two-dimensional gas chromatography (GC×GC) is a technique that (theoretically) affords unparalleled separation of volatile and semivolatile matrices, providing a powerful analytical tool for both nontargeted and targeted analyses. (1−3) The increased peak capacity and separation of individual compounds within complex mixtures can benefit studies focused on biomarker discovery and signature profiling, such as global metabolomics. (4−7) Targeted methods for screening, such as biomonitoring persistent organic pollutants, benefit from less-extensive sample preparations and increased confidence in chemical assignment. (8,9)
Much work has been done to advance the hardware to make the technique more accessible and affordable and reach new chromatographic optima. (1,10,11) However, the complexity of the resultant datasets makes it difficult to automate, reproduce, and share data and data workflows. Increased adoption of GC×GC for routine analyses and for larger scale discovery analysis is at risk from the lagging development of more integrated platforms for data processing, management, and storage.
Commercially available pieces of software for processing GC×GC data (coupled to both univariate, e.g., flame ionization detectors, and multivariate detectors, e.g., mass spectrometric detectors, herein described collectively as “GC×GC” data) are powerful. They have high functionality and advanced graphical user interfaces (GUIs) (Table S1). However, despite efforts to make the software user friendly (e.g., by introducing guided workflows), users can require extensive periods of learning (with one or two dedicated experienced users per group). They are also expensive and often instrument-specific, restricting analysts to one type of software, and the data output may be incompatible with other software or require time-consuming export steps.
Furthermore, commercial or original equipment manufacturer (OEM) software for processing GC×GC data is not flexible enough for the different workflows that exist within the various applications of GC×GC. Although it is not economically viable for vendors to create custom functions on an individual basis, researchers need tools that maintain a level of flexibility to meet the demands of different projects and experimental designs. The locked-down nature of proprietary systems, although some are scriptable, makes it difficult to embed them in custom data-processing pipelines.
For instance, the processing of chromatographic data for the generation of a peak table (i.e., data matrix), including baseline correction, peak detection, and the critical step of alignment, is often only one part of a wider workflow (Figure 1). These steps can be preceded by sample collection and quality control steps, followed by multivariate analysis and compound identification, finally uploading the data to a data management platform or repository (Figure 1).
Analytical Chemistry - Anal. Chem. 2020, 92, 20, 13953-13960 - Figure 1. Flow diagram depicting the complexity and layers of GC×GC data processing within a wider data workflow (preprocessing includes steps such as baseline correction and signal smoothing. Postprocessing includes steps such as feature extraction/peak detection. Data reduction can include application of multivariate analysis and machine learning techniques).
Several chemometric methods using free and open-source software (FOSS) or bespoke tools and scripts have been developed for processing GC×GC data (Table S1). (12) Such tools overcome issues of accessibility and cost; they are often compatible with multiple file types for wider adoption; and the open-source programming language allows modification by the user.
Nonetheless, poor-quality supporting documentation can make it very time-consuming to determine how to use functions owing to a lack of user-friendly GUIs. They can also be a niche for certain application areas or address one task or specific part of a workflow, for example, an algorithm for performing alignment or peak detection. These tools are a great resource, which enhance the analyst’s tool kit; however, this only increases the complexity of the task faced by the analysts in choosing the optimum workflow for their data. This means that the user might have to swap between multiple software to process a single dataset. In addition, they are often specific to one programming language (unless produced as libraries with wrappers in different languages); they all are not readily accessible (i.e., published as “in-house”), and although the package or script may be made freely available, the environment/platform may not be freely available.
Consequently, analysts report unique combinations of both commercial software and FOSS. The powerful core functions of commercial software are used as a foundation (acquiring and converting raw data files at minimum) and then expanded on using external FOSS alongside manual editing to produce a data workflow. (4,5,13−19) This is done in discrete laborious steps, which are difficult to report and reproduce.
Herein, we provide the primary description for interfacing open-source and commercial software for processing GC×GC data. The objectives were to demonstrate the use of an underused interface that exists within commercial software (i) for automating steps, which were not previously possible within the software GUI, (ii) for creating new functions, (iii) for integrating open-source and bespoke code with commercial software functions, and (iv) to highlight its future potential for increasing interconnectivity between chemometric tools.
To keep the proof of concept simple, the command-line interface was used in a popular piece of software for GC×GC data processing. We illustrate its use for automating the alignment of a metabolomics dataset as an example to provide analysts with the knowledge needed to accelerate and automate their own workflows. The new method makes it possible to combine the powerful functionality of commercial software and the flexibility of FOSS to produce a custom data workflow within a single software environment, tailoring the commercial software and workflow to the needs of the individual researcher.
Experimental Details
Analysis by GC×GC was conducted as described previously using an Agilent 7890A gas chromatogram, fitted with a G3486A CFT flow modulator. (20) The instrument was coupled to a TD-100xr thermal desorption autosampler (Markes International Ltd, Llantrisant, UK). Sample tubes were placed in trays, typically six sample tubes per tray along with a tube loaded with the n-alkane and aromatic mixture and in every four trays another tube loaded with a reference indoor air mixture. (20)
Exhaled volatile organic compounds (VOCs) were collected from breath using the ReCIVA device (Owlstone Medical, Cambridge, UK) within a prospective real-world observational study involving adults presenting with self-reported acute breathlessness. (20,21) Written informed consent was obtained from all participants. The study protocol was approved by the National Research Ethics Service Committee East Midlands (REC number: 16/LO/1747) IRAS 198921.
Data were acquired in MassHunter GC–MS Acquisition B.07.04.2260 (Agilent Technologies Ltd, Stockport, UK) and the data were processed using the command-line interface within the GC Image v2.8 suite (GC Image, LLC. Lincoln, NE, US). Step-by-step examples of the data analysis are described in further detail. The files used in the exemplar methods have been made available with supporting documentation via a repository at https://github.com/rcfgroup/gc-automation.
...
Conclusions
GC×GC workflows are challenging and often cannot be done in a single software environment. Through the use of an interface, such as the command line, users have the ability to create more powerful and flexible workflows by being able to mold commercial software with their own and others’ scripts and tools developed in FOSS.
The successful interface of open-source and commercial software for processing GC×GC data has been demonstrated. Advantages of the method include the automation of multisoftware steps within a single software environment and the addition of new functionalities within commercial software using open-source scripts. Applications include the development of more user-friendly tools; automated batch processing tailored for and by the individual researcher; the sharing of reproducible workflows; and the integration of data-processing tools and workflows with data standardization platforms (e.g., ISA), repositories, and sample collection platforms (e.g., LabPipe) (45) for the seamless collection, analysis, and processing of high-fidelity datasets.
- Automating and Extending Comprehensive Two-Dimensional Gas Chromatography Data Processing by Interfacing Open-Source and Commercial Software Michael J. Wilde, Bo Zhao, Rebecca L. Cordell, Wadah Ibrahim, Amisha Singapuri, Neil J. Greening, Chris E. Brightling, Salman Siddiqui, Paul S. Monks, and Robert C. Free. Analytical Chemistry 2020 92 (20), 13953-13960. DOI: 10.1021/acs.analchem.0c02844