Title: Two-stage clustering analysis to detect pattern change of biomarker expression between experimental conditions

Authors: Iksoo Huh; Sunghoon Choi; Youjin Kim; Soo-Yeon Park; Oran Kwon; Taesung Park

Addresses: College of Nursing and Research Institute of Nursing Science, Seoul National University, Jongno-gu, Seoul 03080, South Korea ' Department of Statistics, Seoul National University, Gwanak-gu, Seoul 08826, South Korea ' Department of Nutritional Science and Food Management, Ewha Womans University, Seodaemun-gu, Seoul 03760, South Korea ' Department of Nutritional Science and Food Management, Ewha Womans University, Seodaemun-gu, Seoul 03760, South Korea ' Department of Nutritional Science and Food Management, Ewha Womans University, Seodaemun-gu, Seoul 03760, South Korea ' Department of Statistics, Seoul National University, Gwanak-gu, Seoul 08826, South Korea

Abstract: In a crossover design, individuals usually undergo all experimental conditions, and the measurements of biomarkers are repeatedly observed at serial time points for each experimental condition. To analyse time-dependent changing patterns of biomarkers, clustering algorithms are commonly used across time points to group together subjects having similar changing patterns. Among the clustering methods, hierarchical- and K-means clustering have been popularly used. However, since they are originally unsupervised approaches, they do not identify different changing patterns between experimental conditions. Therefore, we propose a new two-stage clustering method focusing on changing patterns. The first stage is to eliminate non-informative biomarkers using Euclidean distances, and the second stage is to allocate the remaining biomarkers to predefined patterns using a correlation-based distance. We demonstrate the advantages of our proposed method by simulation and real data analysis.

Keywords: two-stage; pattern clustering; biomarker expression; intervention study; cross-over design.

DOI: 10.1504/IJDMB.2020.108701

International Journal of Data Mining and Bioinformatics, 2020 Vol.23 No.4, pp.299 - 317

Received: 16 Apr 2020
Accepted: 19 Apr 2020

Published online: 27 Jul 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article