Particularly, the system’s workflow will perform steps of 1 data cleaning, 2 binary filtering of informative tweets, and 3 “ ”classifying tweets into different information categories.. F
Trang 11
VIETNAM NATIONAL UNIVERSITY HANOI
INTERNATIONAL SCHOOL
-RESEARCH PROROSAL
Disaster Tweets Classification in Disaster Response with BERT-Based Model
Course Code :
Course Name :
Student name/ID :
Hanoi, May 2023
Trang 22
I Introduction
a Background of the study
A various disasters take place in various places around the globe, disaster management has s become a matter of concern for governments, NGOs, and emergency agents With good, -in time disaster response lives can be saved, damage minimized, and recovery efforts accelerated (Brown et al 2017; Waeckerle 1991)
In recent years, social media platforms like Facebook, Twitter, and Snapchat have become
integral parts of an individual’s everyday life Twitter, for example, sees an average of about
10 thousand tweets per second, corresponding to 36 million Tweets sent per minute, 867 million per day, and 361 billion every year (M 2023)
Twitter is well-known for its real-time engagement In times of crisis, many people choose this platform as their first channel to post updates on the situation, report damage, call for help, give instructions, etc (Abedin and Babar 2018) Thus, the platform has been used widely in the field of natural and human-made disasters, providing useful insights so that rescue teams can act effectively (Kapoor et al 2018; Kim and Hastak 2018) However, due to high volume and high velocity, it is merely impossible to manually monitor and analyze this data in search of informative posts, which calls for research of a solution that can automate such task (Kaur a 2019)
b Statement of the problem
Creating an automatic classifier algorithm that can detect informative tweets is challenging because of tweets’ unique properties They are limited in size (maximum 280 characters) and may contain grammar errors, special characters, or unconventional vocabularies that infer different meanings (Nguyen et al 2016) In the real-world setting, tweets data can be very imbalanced in classes, having more samples in some labels compared to others These
Trang 33
characteristics can severely affect the ability of the classifier to learn patterns, make predictions, and accurately mine useful information lating to disasters (Ghosh, Bellinger, and Corizzo re 2022)
Furthermore, apart from knowing the informativeness of social media posts, humanitarian organizations are interested in dividing these tweets into different categories to better coordinate their response (Nguyen et al 2016) Thus, there are 2 problems to address:
i Informative vs non-Informative tweets: Most posts on Twitter may contain irrelevant information and are not useful for disaster management In this case, a binary classification solution (labeling tweets into “informative” and “non-informative”) could
be performed to pertain only useful information
ii Information types of disaster tweets: Informative tweets can fall into different categories like damage reports, request for help, etc that require more targeted actions This is represented as the problem of multi-label classification
In consideration of the above issues, this research proposes a system to classify and retrieve disaster-related information from Twitter posts Particularly, the system’s workflow will perform steps of (1) data cleaning, (2) binary filtering of informative tweets, and (3) “ ” classifying tweets into different information categories For such classification tasks, this research utilizes the ensemble learning approach based on Bidirectional Encoder Representations from Transformer (BERT) models
c Purpose and significance of the study
The main objective of this study is to develop a precise and efficient machine learning system for disaster-related by implementing BERT-based models and robust processing workflow a
To the best of my knowledge, no work has been reported where the binary classification of disaster tweets is chained with multi-label classification tasks However, in light of the large
Trang 44
volume of Twitter data, this method contributes achieving optimization when deploying the to model to a production environment Ultimately, the research provides a theoretical foundation for implementing a scalable, applicable tool that aids humanitarian organizations enhancing in situational awareness and expediting effective response strategies during times of crisis
d Research questions and objectives
Research questions:
i How to know if a tweet is disaster or non-disaster?
ii How to classify a tweet into different information categories relating to a crisis? iii How can both aforementioned tasks be combined in a single workflow?
iv How accurate and effective is the proposed system in disaster tweet labeling?
Objectives:
i Methods of processing Twitter data are proposed and applied to clean the data prior to model training
ii BERT-based models are used to solve the binary and multiple-label disaster tweet classification problems
iii A multi-model system to classify tweets is proposed based on the best-performing models in the research Its effectiveness is evaluated through comparison with baseline algorithms on the metric of processing time and accuracy
II Literature Review
a Overview of the field
Recently, many methods have been developed to detect and extract crisis-related information from tweet data, ranging from location estimation, and picture analysis to text mining techniques (Kaur 2019; Prasad et al 2023) A lot of these approaches are based on traditional
Trang 55
machine learning models like SVM, Logistic Regression, or Gradian Boosting (Le 2022) However, with the rapid development of Deep Learning and NLP, research efforts in the field that implement these techniques have shown state-of-art performance in this classification task, especially when involving Bidirectional Encoder Representations from Transformers (BERT) models (Imran et al 2018; Le 2022; Ma 2019; Ningsih and Hadiana 2021; Prasad et al 2023; Zahera et al 2019)
b Previous studies and findings related to the topic
Considering the main objective of the research, this section will further discuss relevant literatures on 3 topics:
1) Disaster & non-Disaster tweets classification: A fine-tuned BERTLARGE uncased architecture, with 24 transformer blocks, 16 attention heads, and 340M parameters, is proposed
by Le (2022) to perform the customized classification task He also built several machine-learning models and paired them with 4 different vectorizers (TF-IDF, Count Vector, Skip-gram Vector, and CBoW) Evaluated on the Kaggle dataset from the Natural Language Processing with Disaster Tweets competition, the BERT model shows significantly improved performance (F1 score = 0.88) compared to traditional models (max F1 score = 0.81)
On another Kaggle dataset, Ningsih and Hadiana (2021) saw a similar increase when implementing a BERT model In particular, they added a 0.5 dropout layer to complete BERT’s pre-trained model Then a dense layer with relu activation is followed to generate opportunities for data to have the “real disaster” and “non real disaster” label- s The model achieved an accuracy score of 0.85 on average
Prasad et al (2023) further highlighted the importance of incorporating a stochastic gradient descent optimizer for pre-training self-attention language models by using BERT with AdamW optimizer is used to perform binary tweets classification Through experimentation, their
Trang 66
approach has improved the accuracy of an existing BERT model (Hayashi, Koushik, and Neubig 2016) by up to 18%, at 0.82
2) Multi-label disaster tweets classification: As a submission for the TREC-IS 2019 challenge, Zahera et al (2019) employed a fine-tuned BERT model with 10 BERT stacked layers on top to filter tweets into 25 categories Trained on an aggregated TREC-IS dataset (which contains tweets during different disasters worldwide) and with 2 loss functions (binary cross-entrophy and focal loss), their proposed methods achieved a median accuracy of 0.85
In the same year, Ma (2019) collected 74346 sample data of 7 labels to perform tweets classification for disaster management He created 4 BERT-based models: BERTbase, BERT+NL, BERT+CNN, and BERT+LSTM After testing, all BERT-based models surpass baseline performance, and BERT+LSTM along with BERTbase work the best Additionally, although customized BERT models attain higher precision, the default BERT produces a higher recall score Later on, Naaz, Abedin, and Rizvi (2021) discovered that using balanced dataset a and conducting a suitable data-splitting strategy can solve this problem in Ma’s research and create a better classification result
3) Ensemble learning approaches to tweet classification: Several multi-modal systems have been proposed to solve the tweets classification problem Kumar et al (2022) presented a deep multi-modal neural network that combines long-short-term-memory and VGG-16 networks to identify disaster-related contents using text and images together Even only with tweet texts, the system achieved F1-score varied from 61% to 92% Additionally, using Twitter images with semantic descriptors (annotation), Rizk et al (2019) adopted a two-level multi-modal classification scheme which achieved 92.43% accuracy with computational efficiency Ensembling learning that uses BERT models is also gaining popular in recent years (Cui et ity
al 2023; Mnassri et al 2022; Xu, Barth, and Solis 2020), however, is not yet been implemented for this problem
Trang 77
c Gaps in the literature and the need for the proposed study
A pattern that we see from the studied literature is that although BERT-based models performed better than traditional machine learning and some deep learning models on the task (with ~80% accuracy / F1-score), they still fall behind multi-modal mechanisms in predicting disaster tweets Hence, as an effort to further enhance the performance of the implementation, this research ir creates a novel multi-modal system that utilizes BERT-based classifiers for mining crisis-related information from Twitter data Considering the large volume of data in the real-world implementation of such a system, we also add a BERT binary classification model that can filter out irrelevant tweets before categorizing them into different labels Potentially, these design suggestions would bring higher accuracy and efficiency the task compared to single BERT-to based models
d Theoretical framework or conceptual framework
In this section, we will discuss several theoretical concepts that guide the design of our proposed system
1) BERT & BERT-based models:
BERT is a pre-trained language model proposed by Google AI Language researchers (Devlin
et al 2019) that has become state-of-the-art in natural language processing In the original research paper, authors Devlin et al (2019) explained that BERT uses Transformer, an attention mechanism that learns the contextual relationship between words or tokens in a text BERT’s distinct characteristic is that it reads text from both directions (bidirectional), which enable it to understand context based on a text’s entire surrounding
Regarding that BERT is trained on a massive amount of data, transfer learning with BERT-based models (fine-tuned models built on top of the BERT architecture) achieves high performance in specific NLP tasks like text classification even with limited labeled data (Naaz
Trang 88
et al 2021) For their benefits in reducing computational resources and time, we choose to use BERT and BERT-based model as the core of our system s
2) Model chaining:
Model chaining is the practice of dividing a machine learning workflow into dependent parts (models) that perform specific tasks in the whole system Model chaining helps to improve the scalability, manageability, and flexibility of the system as new models can be added or modified without affecting the entire workflow With model chaining, algorithms can be designed and optimized for their respective task, which leads to better overall system performance and efficient resource allocation (Friedman 2021) In our research, adding a binary classifier before the multi-label one can make it more efficient and less time-consuming to assign tweet to a a specific, useful category
3) Ensemble deep learning:
Ensemble learning is a machine learning approach that combines multiple models to perform a given task An ensemble comprises a group of base learners that are trained for the same problem, then integrated to attain a final result (Mnassri et al 2022) The application of ensemble learning help to capture diverse patterns; through which, the risk of overfitting/underfitting is mitigated, and significantly more robust prediction/classification results are achieved
A comprehensive study of ensemble deep learning has been conducted by Ganaie et al (2022) which gives details of different ensemble mechanism, namely classical methods like boosting, stacking, or fusion strategies (which is paired more with deep learning) like unweighted model averaging or majority voting To improve disaster tweet classification, we will implement and evaluate our system with the majority voting strategy
III Methodology
Trang 99
a Dataset
Our analyses are based on the publicly available dataset from the Kaggle Natural Language Processing with Disaster Tweets competition The dataset consists of 10,083 tweets which have been labeled as 0 and 1 for “not disaster” and “disaster (Ningsih and Hadiana 2021) Real ” disaster tweets account for 42,97% of the dataset We eliminate other metadata and keep only the tweet text for processing
To perform the multi-label classification task, we distribute tweets into 7 information types, following the taxonomy in the paper of Ma (2019) These labels are:
Caution and advice: Warnings, tips, and
advice by concerned authorities and
individuals
Infrastructure and utilities damage:
Posts related to damaged objects, buildings,
and services
Donations and volunteering: Regarding
the donation of food, clothes, medicines,
human power, etc
Affected individuals: Information of injured, dead people, or disaster victims
Sympathy and emotional support: Posts spreading prayers and wishes
Other useful information: Information relating to disaster a
Not related or not informative: tweets not related or not useful for disaster management
From these tweets, we create 3 datasets:
i D1 – For binary classification This dataset will contain 10,083 records and have 2 : features of tweet texts and binary labels (0 or 1) Since the classes are quite balanced,
we will not perform further data augmentation techniques
Trang 1010
ii D2 – For multi-label classification: This dataset contain 4,333 samples (omitting data s
labeled “Not related or not informative”) and also has 2 labels of tweets and information type To deal with the imbalance problem, we apply sampling methods to convert this into a balanced dataset with equal distribution of tweets in each label
iii D3 – For testing the overall system: Containing all records with mul -labeling This ti dataset will be used to evaluate the performance of our proposed system against multi-label tweets classification models
On all the datasets, data is divided with a ratio of 80% for training and 20% for testing
b Data Pre-processing
i Data Cleansing: First, all tweets are transformed to lowercase Regular Expression is used to remove common substances like emails, URLs, HTML tags, emojis, special characters, etc Then, abbreviations will be replaced with expressions that we mapped
to For fluency purposes, we do not remove stop words (Naaz et al 2021)
ii Word Embedding: Word embedding is a technique to represent words in a form that can
be understood by machine learning algorithms (Prasad et al 2023) We use the TF-IDF method, which uses word weighting (Le 2022), as the embedding mechanism
c Model Selection
To find the best candidate models for our multi-modal system, this study will also propose and measure the performance of several BERT-based models in our 2 main tasks:
Binary tweets classification models:
i BERTBASE: Default BERT model with hyperparameters tuning used as the baseline model
ii Fine-tuned BERT Add a sequence non-linear layer on top of the ERT: LARGE model