Title: Analysing SEER cancer data using signed maximal frequent itemset networks

Authors: Yunuscan Koçak; Tansel Özyer

Addresses: Department of Computer Engineering, TOBB Economy & Technology University, Ankara, Turkey ' Department of Computer Engineering, Ankara Medipol University, Ankara, Turkey

Abstract: Evaluating patient prognosis is prominent for predicting the effects and consequences of diseases. Systems can find interesting properties within a data set and predict unseen cases. Feature extraction and feature selection are the critical steps. In this work, a novel network-based feature extraction method is presented and tested on two cancer cases, namely (1) lung and bronchus cancer and (2) pancreatic cancer. Named as Signed Maximal Frequent Itemset Network, the proposed method uses maximal frequent itemsets as actors in a network and extracts features by considering their co-occurrence and structure of the sub-graph. To investigate patterns on prediction, the top ten maximal itemsets are selected with the recursive feature elimination method and their distributions are analysed. In conclusion, survival months are low when the information on the disease was unknown or blank, and higher in case chemotherapy was given and the primary site was labelled, such as head of the pancreas.

Keywords: cancer data analysis; frequent pattern mining; machine learning; network analysis; signed networks; maximal frequent itemsets; feature selection; lung cancer; pancreatic cancer.

DOI: 10.1504/IJDMB.2021.124106

International Journal of Data Mining and Bioinformatics, 2021 Vol.26 No.1/2, pp.20 - 58

Received: 13 Apr 2021
Received in revised form: 05 Apr 2022
Accepted: 08 Apr 2022

Published online: 13 Jul 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article