Back to Projects
Project

FADOHS: Framework for Detection and Integration of Unstructured Data of Hate Speech on Facebook Using Sentiment and Emotion Analysis

Posted by Admin: System Admin

Beginner
Abstract

Hate speech is a form of expression that assaults a person or a community based on race, origin, religion, sexual orientation, or other attributes. Although it can be expressed in multiple ways, both online and offline, the increasing popularity of social media has exponentially increased both its use and severity. Therefore, the aim of this research is to locate and analyze the unstructured data of selected social media posts that intend to spread hate in the comment sections. To address this issue, we propose a novel framework called FADOHS, which combines data analysis and natural language processing strategies, to sensitize all social media providers to the pervasiveness of hate on social media. Specifically, we use sentiment and emotion analysis algorithms to analyze recent posts and comments on these pages. Posts suspected of containing dehumanizing words will be processed before fed to the clustering algorithm for further evaluation. According to the experimental results, the proposed FADOHS framework is able to surpass the state-of-the-art approach in terms of precision, recall, and F1 scores by approximately 10%. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, different type of algorithms is trained to make classifications or predictions, and to uncover key insights in this project. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. Machine learning algorithms build a model based on this project data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of datasets, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Existing System & Flaws

Ben-David and Matamoros-Fernandez's related study on overt hatred and covert disrespectful practices on the Internet [7] is based on network and multimodal analyses. It studies information and images found on social media, and it retrieves data from various Facebook pages with content associated with hate speech using tools such as Netvizz [12]. In our proposed framework, we examined the dataset [13] using a Facebook graph application programming interface (API) and emotional analysis [14]. In another related study, the authors used the valence-aware Dictionary and sEntiment Reasoner (VADER) tool as a simple rule-based model for general sentiment analysis [15]. The VADER tool uses both qualitative and quantitative methods to analyze and validate the data [16]_[28]. Subsequently, data validation is attuned to sentiments using microblog_like information. We also used the VADER tool for SA (sentiment analysis). However, unlike the research in [15], we incorporated the JAMMIN tool to perform emotional analysis experiments and, specifically, track posts with negative comments. The research [8] created a typology of abhorrence based on different ``loathing levels.'' The author's utilized morphogrammatical highlights, notion extremity, and word-installed dictionaries to plan and actualize two classifiers for the Italian language. Furthermore, they used the SVM and LSTM [8]. However, our approach is designed to uncover hate discourse on Facebook, particularly the ``unmistakable'' manifestations of hatred posted as remarks on divisive topics (e.g., immigration, religion, and race). The study in [29] highlights future issues that Facebook and Twitter would face in identifying hate speech on their respective platforms. Their tool was crowd-sourcing. Although their framework has not been fully evaluated, their study includes a quality-of-service (QoS) assessment for platform providers. Thus, they developed an intuitive tool for cracking down on hate speech. However, although their tool can identify information that does not adhere to QoS policies, we believe such a format is inefficient because of the use of Python programming tools [29]. In this study, we implemented a procedure to filter hate-_filled posts and comments on social media. The related research describes the concept of ``platform racism'' - an emerging form of racial prejudice emanating from social media pages [30]. Hatred itself is a form of discrimination, depending on the culture associated with a particular group. Annotation of the study that suggested a possible algorithm for compiling the contents was provided. Although the experiment revealed important trends relevant to our specific research question, we focused on Facebook as a platform and utilized both data extraction and experimental setup, according to the seed pages. Online trends leading to offline consequences [31] examined the link between social media and hate crimes using Facebook data. Interestingly, it was concluded that social media is often used as the propagation mechanism of hate [31]. Although such a study is important, we believe that the data used by researchers could be strengthened by analyses such as ours, in which social media analytics are used to target and identify negative comments posted on certain hate-promoting Facebook pages [13]. In contrast, in this linked study [32], the authors examinedthe most effective methods for detecting hate speech in written text. Based on this survey and our framework, we conducted various tests to compare the accuracy of the three best methods. The authors of [33] proposed an optimization approach based on meta heuristic searching. The ant lion optimization (ALO) and moth flame optimization (MF) algorithms were designed for the HSD problem. This is the first attempt to use optimization algorithms as solution-search strategies for automatic HSDs. An efficient representation scheme and a flexible fitness function were designed for this purpose. However, the FADHOS approach not only identifies unstructured data from Facebook allegedly promoting hate speech, such as commonly discussed topics, but also identifies and integrates them by topic in clusters using sentiment and emotion analysis. In 2019, OpenAI released generative pre-trained transformed 2 (GPT-2) models [34]_[45]. These were built using transformer decoder blocks. We tested the quality of our dataset by investigating the best dataset of our framework (moderate level of hate speech dataset), in which we performed several experiments using the Nobel model of OpenAI - the GPT-2 model [34]. The major goal of this experiment was to determine the level at which our hate speech dataset can enhance the performance of the GPT-2 model. The literature review in this section provides the background for this study as well as a potential motivation for further investigation. The primary commitment of our research is to accurately and efficiently locate social media pages that discuss sensitive topics and establish a reliable system that categorizes posts and incorporates unstructured information with frequently discussed themes that intentionally or unintentionally spread hate discourse. Disadvantages ? The system is not implemented EMOTION PATTERN EXTRACTION. ? The system is not implemented term frequency-inverse document frequency (TF-IDF) approach.

Proposed System & Advantages

To achieve this goal, we first recognize a set of pages from American-based websites, known to discuss controversial topics such as immigration, race, and religion. We use these ``seeds'', or Facebook IDs, to crawl the Facebook graphs and construct a small network using the ``follow'' relationship. Leveraging graph analysis techniques, we identify the most influential pages spreading hate speech and crawl their latest posts and comments. We then apply sentiment and emotion analysis algorithms to recognize posts with highly negative tones, specifically those suspected of instigating hatred. Finally, we convert each post into a document by concatenating all comments, and using the K-means algorithm to create clusters of posts based on the topics they discuss [9]_[11]. The resulting framework is able to identify a set of sensitive topics that can promote hate. The resulting framework identified a set of sensitive topics that can promote hate. The major contributions of this study are as follows. First, we develop a semiautomatic method to discover pages that discuss sensitive topics. Second, we propose an automatic method to cluster posts from pages that discuss specific topics. Finally, we design and implement a new framework for hate speech detection, and according to the experimental results, the proposed framework is able to surpass the state-of-the-art approach in terms of precision, recall, and F1 scores by approximately 10%. Advantages ? In the proposed system, for each target emotion, an emotion degree score for the matched pattern is calculated using a measure called the pattern frequency-inverse emotion frequency (PF-IEF), a modification of the classic term frequency-inverse document frequency (TF-IDF) approach and a new score referred to as the diversity score. ? In the proposed system, the system has identified only hate speech-promoting posts, the topics found by clustering are often sensitive using support vector machines (SVMs). Text can be structured in several ways, most of which utilize vector forms.

Software Requirements
  • ? Operating system : Windows 7 Ultimate.
  • ? Coding Language : Python.
  • ? Front-End : Python.
  • ? Back-End : Django-ORM
  • ? Designing : Html, css, javascript.
  • ? Data Base : MySQL (WAMP Server).
Hardware Requirements
  • H/W System Configuration:-
  • ? Processor - Pentium –IV
  • ? RAM - 4 GB (min)
  • ? Hard Disk - 20 GB
  • ? Key Board - Standard Windows Keyboard
  • ? Mouse - Two or Three Button Mouse
  • ? Monitor - SVGA

Interested in this Project?

You need an active student profile to apply for this project.

Log In to Apply