Title: Predictive modelling for fake news detection using TF-IDF and count vectorizers

Authors: Divya Singhal; Richa Vijay

Addresses: KIET Group of Institutions, Ghaziabad, Uttar Pradesh, India ' MRIIRS, Faridabad, Haryana, India

Abstract: Most people choose to acquire their news quickly and affordably via the internet, yet this encourages the fast spread of false information. Today's society depends heavily on data, and by 2023, 120 zeta bytes will be released every second. This enormous amount of data is transforming the world thanks to several technologies. People rely on online news sources to stay current on events as the Internet has grown in popularity. With the growth of social media sites like Instagram, YouTube and Facebook, information spreads quickly to people all over the world in a short amount of time. Fake news might also proliferate because of this, which would have an impact on both society and people. Fake news must be discovered and eradicated before it further harms the country. Because of how false news functions, it may be hard to spot. In this paper, we provide a paradigm for recognising fake news. The research is conducted on Python using Scikit-Learn and NLP-util library. The research examines detecting fake news and investigating traditional machine learning models to determine the best approach. The data was utilised to train seven classifiers using the TF-IDF and count vectorizer, and the results to select the best suited features to get the greatest accuracy and F1-score are shown using confusion matrix.

Keywords: fake news detection; predictive analysis; supervised learning; natural language processing; TF-IDF vector; count-vector; machine learning.

DOI: 10.1504/IJESDF.2024.139672

International Journal of Electronic Security and Digital Forensics, 2024 Vol.16 No.4, pp.503 - 519

Received: 12 Oct 2022
Accepted: 27 Feb 2023

Published online: 05 Jul 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article