Text document categorisation using random forest and C4.5 decision tree classifier Online publication date: Wed, 16-Aug-2023
by Sumathi Pawar; Manjula Gururaj Rao; Karuna Pandith
International Journal of Computational Systems Engineering (IJCSYSE), Vol. 7, No. 2/3/4, 2023
Abstract: In reality, documentation is the most significant and rapidly developing field due to the restricted amount of time in the preparation of the documentation. Applications for text classification include language and item identification, document indexing, populating hierarchical catalogues of web resources, and word sense disambiguation. There are numerous texts that serve as documentation and strategies for categorisation have been created to improve efficiency. The proposed system focused on categorising and documenting text using the ensemble learning technique of random forest method and the C4.5 decision tree classifier. This system's processes include construction of decision tree text classifiers, training the constructed models as a part of implementation, dimension reduction, tf/idf indexing of the documents, clustering the terms using brown clustering and running the testing dataset through the classifiers as a part of document categorisation. Orange tool and Python libraries are used to implement the system. It is found that in random forest approach efficiency is increased due to proper construction of text classifiers.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Systems Engineering (IJCSYSE):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com