Posted by Admin: System Admin
Microblogging platforms such as Twitter have become indispensable for disseminating valuable information, especially at times of natural and man-made disasters. Often people post multimedia contents with images and/or videos to report important information such as casualties, damages of infrastructure, and urgent needs of affected people. Such information can be very helpful for humanitarian organizations for planning adequate response in a time-critical manner. However, identifying disaster information from a vast amount of posts is an arduous task, which calls for an automatic system that can filter out the actionable and non-actionable disaster-related information from social media. While many studies have shown the effectiveness of combining text and image contents for disaster identification, most previous work focused on analyzing only the textual modality and/or applied traditional recurrent neural network (RNN) or convolutional neural network (CNN) which might lead to performance degradation in case of long input sequences. This paper presents a multimodal disaster identification system that utilizes both visual and textual data in a synergistic way by conjoining the influential word features with the visual features to classify tweets. Specifically, we utilize a pretrained convolutional neural network (e.g., ResNet50) to extract visual features and a bidirectional long-term memory (BiLSTM) network with attention mechanism to extract textual features. We then aggregate both visual and textual features by leveraging a feature fusion approach followed by applying the softmax classifier. The evaluations demonstrate that the proposed multimodal system enhances the performance over the existing baselines including both unimodal and multimodal models by attaining approximately 1% and 7% of performance improvement, respectively. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, different type of algorithms is trained to make classifications or predictions, and to uncover key insights in this project. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. Machine learning algorithms build a model based on this project data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of datasets, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.
Aipe et al. [22] also proposed a CNN-based approach but they focus on multilabel classification rather that simple binary classification to label disaster-related tweets. Similarly, Yu et al. [23] used CNN, logistic regression, and SVM to classify the tweets related to different Hurricanes into multiple categories. Their CNN-based model outperformed SVM and LR. In contrast to CNN-based approaches we consider BiLSTMs with attention mechanisms with an aim to better captures dependencies between word tokens. Li et al. [24] studied the feasibility of domain adaption for analyzing the disaster tweets by applying the naive Bayes classifier on the Boston Marathon bombing and Hurricane Sandy dataset. Graf et al. [25] focused on cross-domain classification so that the classifier can be used across different types disaster events. They employed a cross-domain classifier and utilized emotional, sentimental, and linguistic features extracted from the damage-related tweets. Others have focused on text mining and summarization approaches [26], [27]. For example, Rudra et al. [26] assign tweets into different situational classes and then summarizes those tweets. Cameron et al. [27] proposed an Emergency Situation Awareness-Automated Web Text Mining (ESAAWTM) system that detects informative damage-related Twitter messages to inform charitable organizations about the incidents of a disaster. Unlike these systems that broadly focused on text mining and summarization, we only focus specifically on a multi-class classification problem on disaster-related tweets. Nguyen et al. [29] developed a deep CNN architecture to label the social media images into multiple disaster categories (i.e., severe, mild, and no-damage). Similarly, Alam et al. [30] proposed a pretrained CNN (VGG16) based framework that can identify the disaster images uploaded on the online platforms. Daly and Thom [31] culled flicker images to detect the fire event using pretrained classifiers. Finally, Lagerstrom et al. [32] developed a system to classify whether the image indicates a fire event or not. In contrast to these works that broadly developed binary classifier for classifying disaster vs. Chen et al. [34] studied the relation between the images and texts and utilize visual features along with socially relevant contextual features (e.g., time of posting, the number of comments, retweets) to identify disaster information. Mouzannar et al. [7] explored damage detection by focusing on human and environmental damage related posts. They used the Inception pre-trained model for visual feature extraction and designed a CNN architecture for textual features. Similarly, Rizk et al. [35] proposed a multimodal architecture to classify the Twitter data into infrastructure and natural damage categories. Ferda et al. [8] also presented a multimodal approach for classifying the tweets into two categories: informative task (e.g., informative vs. non-informative) and humanitarian task (e.g., affected individuals, rescue volunteering or donation effort, infrastructure and utility\ damage). They used CNN based approach for extracting the visual and textual features. Gautam et al. [36] showed a comparison between unimodal and multimodal methods on CrisisMMD [37] dataset. They utilized the late fusion [38] approach for combining the image-tweet pairs. All the works reported significant performance improvement using multimodal information in contrast to their counterparts that utilize uni-modal information. Disadvantages ? Proposed a CNN-based approach but they focus on multilabel classification rather that simple binary classification to label disaster-related tweets. ? Used CNN, logistic regression, and SVM to classify the tweets related to different Hurricanes into multiple categories.
• The primary contributions of our work are: We propose a multimodal architecture that utilizes ResNet50 and BiLSTM recurrent neural network with attention mechanism to classify the damage-related posts by exploiting both visual and textual information. We compare the performance of the proposed model with a set of existing unimodal (i.e., image, text) and multimodal classification techniques. We empirically evaluate the proposed model on a benchmark dataset and demonstrated how introducing attention could enhance the system performance through an intrinsic evaluation. We perform both quantitative and qualitative analysis to get deeper insights about the error types which provide future directions for improving the model. Advantages ? In the proposed system, the system develops an effective computational model for identifying disaster-related information by synergistically integrating features from visual and textual modalities. ? In the proposed system, the system transforms the tweet into a vector representation and then use an embedding layer to obtain semantic representations (embedding features) of the words.