Title: Combine multiple mass spectral similarity measures for compound identification

Authors: Jun Zhang; Yi Xia; Chun-Hou Zheng; Bing Wang; Xiang Zhang; Peng Chen

Addresses: School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China ' School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China ' School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China ' School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China ' Department of Chemistry, University of Louisville, Louisville, KY 40292, USA ' Institute of Health Sciences, Anhui University, Hefei, Anhui 230601, China

Abstract: Compound identification in gas chromatography-mass spectrometry (GC-MS) is usually achieved by comparing a query mass spectrum with reference spectral library. The rapid growing spectral library requires a more powerful spectral similarity measure to achieve the best identification performance. In this study, seven spectrum similarity measures were combined to improve the identification accuracy. To reduce the computation time, absolute value distance (ABS_VD) similarity measure was chosen to construct a sub-library to be searched by all similarity measures. Particle Swarm Optimisation (PSO) algorithm was used to first find the optimised weights for the similarity score of each similarity measure based on the training data, and then the optimised weights were applied to the test data. Simulation study using the NIST/EPA/NIH Mass Spectral Library 2005 indicates that the combination of multiple similarity measures achieves a better performance than any single similarity measure, with the identification accuracy improved by 2.2% and 1.7% for the training data and the test data, respectively.

Keywords: compound identification; diversity; spectrum matching; multiple similarity measures; gas chromatography mass spectrometry; GC-MS; particle swarm optimisation; PSO; simulation.

DOI: 10.1504/IJDMB.2016.076018

International Journal of Data Mining and Bioinformatics, 2016 Vol.15 No.1, pp.84 - 100

Accepted: 18 Nov 2015
Published online: 21 Apr 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article