Title: Text document learning using distributed incremental clustering algorithm: educational certificates
Authors: Archana Chaudhari; Preeti Mulay; Ayushi Agarwal; Krithika Iyer; Saloni Sarbhai
Addresses: Dr. D.Y. Patil Institute of Technology, Pimpri, Pune, India; Savitaribai Phule Pune University, Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India ' Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Abstract: Technological advancements have now allowed each one of us to learn new skills at home or through various workshops conducted, and as proof, certificates are provided for each learned skill. Nowadays, certificates, whether digital or handwritten, are provided in image format. We can use this information to provide analysis on which subject has recently gained popularity and how to improve the field of study at different universities. Therefore, this paper proposes distributed incremental clustering with closeness factor-based algorithm (DIC2FBA) for text clustering. The primary focus is on the faculty development program certificates dataset, which covers both textual and numeric data. The proposed system used AWS EC2 instance and AWS S3 bucket, which helps to cluster data from multiple sites in iterative and incremental mode. Further, we have compared the findings achieved using the DIC2FBA with K-means modified inter and intra clustering (KM-I2C) algorithm based on silhouette score, and Davis Bouldin index. The proposed system will help educational institutions understand the popular skill set of faculties which can further be used to understand the effectiveness of such programs.
Keywords: distributed incremental clustering; text document learning; educational certificates; faculty development program; FDP; AWS.
DOI: 10.1504/IJBIDM.2023.134315
International Journal of Business Intelligence and Data Mining, 2023 Vol.23 No.4, pp.396 - 410
Received: 09 Feb 2022
Accepted: 18 Jun 2022
Published online: 18 Oct 2023 *