Title: A critique of imbalanced data learning approaches for big data analytics

Authors: Amril Nazir

Addresses: Department of Computer Science, Taif University, Al-Hawiya, Saudi Arabia

Abstract: Biomedical research becomes reliant on multi-disciplinary, multi-institutional collaboration, and data sharing is becoming increasingly important for researchers to reuse experiments, pool expertise and validate approaches. However, there are many hurdles for data sharing, including the unwillingness to share, lack of flexible data model for providing context information for shared data, difficulty to share syntactically and semantically consistent data across distributed institutions, and expensive cost to provide tools to share the data. In our work, we develop a web-based collaborative biomedical data sharing platform SciPort to support biomedical data sharing across distributed organisations. SciPort provides a generic metadata model for researchers to flexibly customise and organise the data. To enable convenient data sharing, SciPort provides a central server-based data sharing architecture, where data can be shared by one click through publishing metadata to the central server. To enable consistent data sharing, SciPort provides collaborative distributed schema management across distributed sites. To enable semantic consistency for data sharing, SciPort provides semantic tagging through controlled vocabularies. SciPort is lightweight and can be easily deployed for building data sharing communities for biomedical research.

Keywords: imbalanced big data learning; large-scale imbalanced data analysis; high-dimensional imbalanced data learning.

DOI: 10.1504/IJBIDM.2019.099961

International Journal of Business Intelligence and Data Mining, 2019 Vol.14 No.4, pp.419 - 457

Received: 12 Feb 2017
Accepted: 11 Mar 2017

Published online: 11 Apr 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article