Back to Projects
Project

A Taxonomy of Fake News Classification Techniques Survey and Implementation Aspects

Posted by Admin: System Admin

Beginner
Abstract

In the present era, social media platforms such as Facebook, WhatsApp, Twitter, and Telegram are significant sources of information distribution, and people believe it without knowing their origin and genuineness. Social media has fascinated people worldwide in spreading fake news due to its easy availability, cost-effectiveness, and ease of information sharing. Fake news can be generated to mislead the community for personal or commercial gains. It an also be used for other personal benefits such as defaming eminent personalities, amendment of government policies, etc. Thus, to mitigate the awful consequences of fake news, several research types have been conducted for its detection with high accuracy to prevent its fatal outcome. Motivated by the aforementioned concerns, we present a comprehensive survey of the existing fake news identification techniques in this paper. Then, we select Machine Learning (ML) models such as Long-Short Term Memory (LSTM), Passive Aggressive Algorithm, Random Forest (RF), and Naïve Bayes (NB) and train them to detect fake news articles on the self-aggregated dataset. Later, we implemented these models by hyper tuning various parameters such a smoothing, drop out factor, and batch size, which has shown promising results in accuracy and other evaluation metrics such as F1-score, recall, precision, and Area under the ROC Curve (AUC) score. The model is trained on 6335 news articles, with LSTM showing the highest accuracy of 92.34% in predicting fake news and NB were showing the highest recall. Based on these results, we propose a hybrid fake news detection technique using NB and LSTM. At last, challenges and open issues along with future research directions are discussed to facilitate the research in this domain further. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, different type of algorithms is trained to make classifications or predictions, and to uncover key insights in this project. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. Machine learning algorithms build a model based on this project data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of datasets, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Existing System & Flaws

Sharma et al. [31] defined that term in a broader perspective, including the scope of its meaning in current usage as ``A news article or message published and propagated through any media carrying false information regardless the means and motives behind it''. Although, it is worthy to note the research works like [32] that have changed the meaning of fake news as news articles, which are purposefully written to misguide or mislead the people reading or listening to the news. However, it can be justified by incorporating fake alternative resources. Fake news has now been visualized as one of the substantial threats to the nation, democracy, and journalism [33]. Many incidents were recorded in 2016 that spread fake news via repute media and web platforms during the United States of America's presidential elections. Out of 8,711,000 reactions, comments, and shares generated on fake news web articles, around 7,367,000 were election articles posted by major news portals [34]. Moreover, the economy is also susceptible to fake news spread. For instance, the spread of fake news related to Barack Obama's injury in an explosion experienced the downfall of _ 130 billion USD in stock value [35]. The dissemination of fake news may also result in stressful conditions and mental health deterioration. Over a while, the spread of fake news has raised questions on the integrity of news articles published on online news portals and social media platforms. Also, it is to be noted that social media platforms play a significant role in disseminating fake news among people worldwide. Agarwal et al. [26] presented a method for fake news classification. Removal of stop words, eliminating the white spaces and punctuations, and lemmatization of words was considered part of data preprocessing, which reduces the dimensions of data [54]. Deokate [59] proposed an SVM-based classification algorithm for the identification of fake news that spread on social media platforms, especially Twitter. It performs an efficient text preprocessing for the tweets by converting the slang used in the tweet into their standard forms. Yang et al. [66] proposed a novel TI-CNN (Text and Image information based CNN) approach, which was the combination of text and image information having respective explicit and latent features for the detection of fake news. The authors have utilized the dataset, which was focused on the news regarding the U.S. Presidential election offered by Kaggle. It contains a bunch of 20,015 news having 8074 real news and 11,941 fake news. They have trained and tested their model on this dataset and managed to achieve tremendous performance, i.e., 0.9220 precision, 0.9277 recall, and 0.9210 F1-score [66]. At last, the authors concluded that their model could easily be trained on other features of news by showing the property of expandability. Disadvantages ? An existing methodology doesn’t implement CNN,NB, DT, and LR which utilized to classify fake news by researchers worldwide. ? The system not implemented fake news classification and SUPPORT VECTOR MACHINE (SVM).

Proposed System & Advantages

We present a comprehensive survey and discuss the taxonomy on AI techniques employed for fake news classification and highlight their advancements in the same domain. We also discuss various sources of fake news dissemination. We implemented passive aggressive, LSTM, NB, and random forest algorithms for the fake news classification. Passive aggressive is an ideal algorithm to read data dynamically when huge data is generated every second. NB works well for a high-dimensional dataset and is extremely fast, having very few tunable parameters. LSTM is used because it is a state-of-the-art technique. Random Forest's efficiency is excellent in large datasets. The performance evaluation section discusses the results and empirical findings of these methods in detail. Finally, we present the research challenges and open issues about the state-of-the-art AI techniques designed for the identification/detection of fake news. Advantages ? It is a supervised learning technique that has been extensively used for binary classification problems. However, its applications have been extended to multi-class classification problems and the researchers developed some approaches to accomplish the same. SVM technique is highly appropriate for tangled tasks like the fake news classification. ? CNN is considered to be the most used architecture among the other supervised learning architectures. To train the CNN model, a large amount of input data is needed to utilize its capability fully.

Software Requirements
  • ? Operating system : Windows 7 Ultimate.
  • ? Coding Language : Python.
  • ? Front-End : Python.
  • ? Back-End : Django-ORM
  • ? Designing : Html, css, javascript.
  • ? Data Base : MySQL (WAMP Server).
Hardware Requirements
  • H/W System Configuration:-
  • ? Processor - Pentium –IV
  • ? RAM - 4 GB (min)
  • ? Hard Disk - 20 GB
  • ? Key Board - Standard Windows Keyboard
  • ? Mouse - Two or Three Button Mouse
  • ? Monitor - SVGA

Interested in this Project?

You need an active student profile to apply for this project.

Log In to Apply