Title: Combine multiple mass spectral similarity measures for compound identification
Authors: Jun Zhang; Yi Xia; Chun-Hou Zheng; Bing Wang; Xiang Zhang; Peng Chen
Addresses: School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China ' School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China ' School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China ' School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China ' Department of Chemistry, University of Louisville, Louisville, KY 40292, USA ' Institute of Health Sciences, Anhui University, Hefei, Anhui 230601, China
Abstract: Compound identification in gas chromatography-mass spectrometry (GC-MS) is usually achieved by comparing a query mass spectrum with reference spectral library. The rapid growing spectral library requires a more powerful spectral similarity measure to achieve the best identification performance. In this study, seven spectrum similarity measures were combined to improve the identification accuracy. To reduce the computation time, absolute value distance (ABS_VD) similarity measure was chosen to construct a sub-library to be searched by all similarity measures. Particle Swarm Optimisation (PSO) algorithm was used to first find the optimised weights for the similarity score of each similarity measure based on the training data, and then the optimised weights were applied to the test data. Simulation study using the NIST/EPA/NIH Mass Spectral Library 2005 indicates that the combination of multiple similarity measures achieves a better performance than any single similarity measure, with the identification accuracy improved by 2.2% and 1.7% for the training data and the test data, respectively.
Keywords: compound identification; diversity; spectrum matching; multiple similarity measures; gas chromatography mass spectrometry; GC-MS; particle swarm optimisation; PSO; simulation.
DOI: 10.1504/IJDMB.2016.076018
International Journal of Data Mining and Bioinformatics, 2016 Vol.15 No.1, pp.84 - 100
Accepted: 18 Nov 2015
Published online: 21 Apr 2016 *