Title: Text document learning using distributed incremental clustering algorithm: educational certificates

Authors: Archana Chaudhari; Preeti Mulay; Ayushi Agarwal; Krithika Iyer; Saloni Sarbhai

Addresses: Dr. D.Y. Patil Institute of Technology, Pimpri, Pune, India; Savitaribai Phule Pune University, Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India

Abstract: Technological advancements have now allowed each one of us to learn new skills at home or through various workshops conducted, and as proof, certificates are provided for each learned skill. Nowadays, certificates, whether digital or handwritten, are provided in image format. We can use this information to provide analysis on which subject has recently gained popularity and how to improve the field of study at different universities. Therefore, this paper proposes distributed incremental clustering with closeness factor-based algorithm (DIC2FBA) for text clustering. The primary focus is on the faculty development program certificates dataset, which covers both textual and numeric data. The proposed system used AWS EC2 instance and AWS S3 bucket, which helps to cluster data from multiple sites in iterative and incremental mode. Further, we have compared the findings achieved using the DIC2FBA with K-means modified inter and intra clustering (KM-I2C) algorithm based on silhouette score, and Davis Bouldin index. The proposed system will help educational institutions understand the popular skill set of faculties which can further be used to understand the effectiveness of such programs.

Keywords: distributed incremental clustering; text document learning; educational certificates; faculty development program; FDP; AWS.

DOI: 10.1504/IJBIDM.2023.134315

International Journal of Business Intelligence and Data Mining, 2023 Vol.23 No.4, pp.396 - 410

Received: 09 Feb 2022
Accepted: 18 Jun 2022

Published online: 18 Oct 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article