disaster tweets classification in disaster response with bert based mode

Particularly, the system’s workflow will perform steps of 1 data cleaning, 2 binary filtering of informative tweets, and 3 “ ”classifying tweets into different information categories.. F

Trang 1

1

VIETNAM NATIONAL UNIVERSITY HANOI

INTERNATIONAL SCHOOL

-RESEARCH PROROSAL

Disaster Tweets Classification in Disaster Response with BERT-Based Model

Course Code :

Course Name :

Student name/ID :

Hanoi, May 2023

Trang 2

2

I Introduction

a Background of the study

A various disasters take place in various places around the globe, disaster management has s become a matter of concern for governments, NGOs, and emergency agents With good, -in time disaster response lives can be saved, damage minimized, and recovery efforts accelerated (Brown et al 2017; Waeckerle 1991)

In recent years, social media platforms like Facebook, Twitter, and Snapchat have become

integral parts of an individual’s everyday life Twitter, for example, sees an average of about

10 thousand tweets per second, corresponding to 36 million Tweets sent per minute, 867 million per day, and 361 billion every year (M 2023)

Twitter is well-known for its real-time engagement In times of crisis, many people choose this platform as their first channel to post updates on the situation, report damage, call for help, give instructions, etc (Abedin and Babar 2018) Thus, the platform has been used widely in the field of natural and human-made disasters, providing useful insights so that rescue teams can act effectively (Kapoor et al 2018; Kim and Hastak 2018) However, due to high volume and high velocity, it is merely impossible to manually monitor and analyze this data in search of informative posts, which calls for research of a solution that can automate such task (Kaur a 2019)

b Statement of the problem

Creating an automatic classifier algorithm that can detect informative tweets is challenging because of tweets’ unique properties They are limited in size (maximum 280 characters) and may contain grammar errors, special characters, or unconventional vocabularies that infer different meanings (Nguyen et al 2016) In the real-world setting, tweets data can be very imbalanced in classes, having more samples in some labels compared to others These

Trang 3

3

characteristics can severely affect the ability of the classifier to learn patterns, make predictions, and accurately mine useful information lating to disasters (Ghosh, Bellinger, and Corizzo re 2022)

Furthermore, apart from knowing the informativeness of social media posts, humanitarian organizations are interested in dividing these tweets into different categories to better coordinate their response (Nguyen et al 2016) Thus, there are 2 problems to address:

i Informative vs non-Informative tweets: Most posts on Twitter may contain irrelevant information and are not useful for disaster management In this case, a binary classification solution (labeling tweets into “informative” and “non-informative”) could

be performed to pertain only useful information

ii Information types of disaster tweets: Informative tweets can fall into different categories like damage reports, request for help, etc that require more targeted actions This is represented as the problem of multi-label classification

In consideration of the above issues, this research proposes a system to classify and retrieve disaster-related information from Twitter posts Particularly, the system’s workflow will perform steps of (1) data cleaning, (2) binary filtering of informative tweets, and (3) “ ” classifying tweets into different information categories For such classification tasks, this research utilizes the ensemble learning approach based on Bidirectional Encoder Representations from Transformer (BERT) models

c Purpose and significance of the study

The main objective of this study is to develop a precise and efficient machine learning system for disaster-related by implementing BERT-based models and robust processing workflow a

To the best of my knowledge, no work has been reported where the binary classification of disaster tweets is chained with multi-label classification tasks However, in light of the large

Trang 4

4

volume of Twitter data, this method contributes achieving optimization when deploying the to model to a production environment Ultimately, the research provides a theoretical foundation for implementing a scalable, applicable tool that aids humanitarian organizations enhancing in situational awareness and expediting effective response strategies during times of crisis

d Research questions and objectives

Research questions:

i How to know if a tweet is disaster or non-disaster?

ii How to classify a tweet into different information categories relating to a crisis? iii How can both aforementioned tasks be combined in a single workflow?

iv How accurate and effective is the proposed system in disaster tweet labeling?

Objectives:

i Methods of processing Twitter data are proposed and applied to clean the data prior to model training

ii BERT-based models are used to solve the binary and multiple-label disaster tweet classification problems

iii A multi-model system to classify tweets is proposed based on the best-performing models in the research Its effectiveness is evaluated through comparison with baseline algorithms on the metric of processing time and accuracy

II Literature Review

a Overview of the field

Recently, many methods have been developed to detect and extract crisis-related information from tweet data, ranging from location estimation, and picture analysis to text mining techniques (Kaur 2019; Prasad et al 2023) A lot of these approaches are based on traditional

Trang 5

5

machine learning models like SVM, Logistic Regression, or Gradian Boosting (Le 2022) However, with the rapid development of Deep Learning and NLP, research efforts in the field that implement these techniques have shown state-of-art performance in this classification task, especially when involving Bidirectional Encoder Representations from Transformers (BERT) models (Imran et al 2018; Le 2022; Ma 2019; Ningsih and Hadiana 2021; Prasad et al 2023; Zahera et al 2019)

b Previous studies and findings related to the topic

Considering the main objective of the research, this section will further discuss relevant literatures on 3 topics:

1) Disaster & non-Disaster tweets classification: A fine-tuned BERTLARGE uncased architecture, with 24 transformer blocks, 16 attention heads, and 340M parameters, is proposed

by Le (2022) to perform the customized classification task He also built several machine-learning models and paired them with 4 different vectorizers (TF-IDF, Count Vector, Skip-gram Vector, and CBoW) Evaluated on the Kaggle dataset from the Natural Language Processing with Disaster Tweets competition, the BERT model shows significantly improved performance (F1 score = 0.88) compared to traditional models (max F1 score = 0.81)

On another Kaggle dataset, Ningsih and Hadiana (2021) saw a similar increase when implementing a BERT model In particular, they added a 0.5 dropout layer to complete BERT’s pre-trained model Then a dense layer with relu activation is followed to generate opportunities for data to have the “real disaster” and “non real disaster” label- s The model achieved an accuracy score of 0.85 on average

Prasad et al (2023) further highlighted the importance of incorporating a stochastic gradient descent optimizer for pre-training self-attention language models by using BERT with AdamW optimizer is used to perform binary tweets classification Through experimentation, their

Trang 6

6

approach has improved the accuracy of an existing BERT model (Hayashi, Koushik, and Neubig 2016) by up to 18%, at 0.82

2) Multi-label disaster tweets classification: As a submission for the TREC-IS 2019 challenge, Zahera et al (2019) employed a fine-tuned BERT model with 10 BERT stacked layers on top to filter tweets into 25 categories Trained on an aggregated TREC-IS dataset (which contains tweets during different disasters worldwide) and with 2 loss functions (binary cross-entrophy and focal loss), their proposed methods achieved a median accuracy of 0.85

In the same year, Ma (2019) collected 74346 sample data of 7 labels to perform tweets classification for disaster management He created 4 BERT-based models: BERTbase, BERT+NL, BERT+CNN, and BERT+LSTM After testing, all BERT-based models surpass baseline performance, and BERT+LSTM along with BERTbase work the best Additionally, although customized BERT models attain higher precision, the default BERT produces a higher recall score Later on, Naaz, Abedin, and Rizvi (2021) discovered that using balanced dataset a and conducting a suitable data-splitting strategy can solve this problem in Ma’s research and create a better classification result

3) Ensemble learning approaches to tweet classification: Several multi-modal systems have been proposed to solve the tweets classification problem Kumar et al (2022) presented a deep multi-modal neural network that combines long-short-term-memory and VGG-16 networks to identify disaster-related contents using text and images together Even only with tweet texts, the system achieved F1-score varied from 61% to 92% Additionally, using Twitter images with semantic descriptors (annotation), Rizk et al (2019) adopted a two-level multi-modal classification scheme which achieved 92.43% accuracy with computational efficiency Ensembling learning that uses BERT models is also gaining popular in recent years (Cui et ity

al 2023; Mnassri et al 2022; Xu, Barth, and Solis 2020), however, is not yet been implemented for this problem

Trang 7

7

c Gaps in the literature and the need for the proposed study

A pattern that we see from the studied literature is that although BERT-based models performed better than traditional machine learning and some deep learning models on the task (with ~80% accuracy / F1-score), they still fall behind multi-modal mechanisms in predicting disaster tweets Hence, as an effort to further enhance the performance of the implementation, this research ir creates a novel multi-modal system that utilizes BERT-based classifiers for mining crisis-related information from Twitter data Considering the large volume of data in the real-world implementation of such a system, we also add a BERT binary classification model that can filter out irrelevant tweets before categorizing them into different labels Potentially, these design suggestions would bring higher accuracy and efficiency the task compared to single BERT-to based models

d Theoretical framework or conceptual framework

In this section, we will discuss several theoretical concepts that guide the design of our proposed system

1) BERT & BERT-based models:

BERT is a pre-trained language model proposed by Google AI Language researchers (Devlin

et al 2019) that has become state-of-the-art in natural language processing In the original research paper, authors Devlin et al (2019) explained that BERT uses Transformer, an attention mechanism that learns the contextual relationship between words or tokens in a text BERT’s distinct characteristic is that it reads text from both directions (bidirectional), which enable it to understand context based on a text’s entire surrounding

Regarding that BERT is trained on a massive amount of data, transfer learning with BERT-based models (fine-tuned models built on top of the BERT architecture) achieves high performance in specific NLP tasks like text classification even with limited labeled data (Naaz

Trang 8

8

et al 2021) For their benefits in reducing computational resources and time, we choose to use BERT and BERT-based model as the core of our system s

2) Model chaining:

Model chaining is the practice of dividing a machine learning workflow into dependent parts (models) that perform specific tasks in the whole system Model chaining helps to improve the scalability, manageability, and flexibility of the system as new models can be added or modified without affecting the entire workflow With model chaining, algorithms can be designed and optimized for their respective task, which leads to better overall system performance and efficient resource allocation (Friedman 2021) In our research, adding a binary classifier before the multi-label one can make it more efficient and less time-consuming to assign tweet to a a specific, useful category

3) Ensemble deep learning:

Ensemble learning is a machine learning approach that combines multiple models to perform a given task An ensemble comprises a group of base learners that are trained for the same problem, then integrated to attain a final result (Mnassri et al 2022) The application of ensemble learning help to capture diverse patterns; through which, the risk of overfitting/underfitting is mitigated, and significantly more robust prediction/classification results are achieved

A comprehensive study of ensemble deep learning has been conducted by Ganaie et al (2022) which gives details of different ensemble mechanism, namely classical methods like boosting, stacking, or fusion strategies (which is paired more with deep learning) like unweighted model averaging or majority voting To improve disaster tweet classification, we will implement and evaluate our system with the majority voting strategy

III Methodology

Trang 9

9

a Dataset

Our analyses are based on the publicly available dataset from the Kaggle Natural Language Processing with Disaster Tweets competition The dataset consists of 10,083 tweets which have been labeled as 0 and 1 for “not disaster” and “disaster (Ningsih and Hadiana 2021) Real ” disaster tweets account for 42,97% of the dataset We eliminate other metadata and keep only the tweet text for processing

To perform the multi-label classification task, we distribute tweets into 7 information types, following the taxonomy in the paper of Ma (2019) These labels are:

Caution and advice: Warnings, tips, and

advice by concerned authorities and

individuals

Infrastructure and utilities damage:

Posts related to damaged objects, buildings,

and services

Donations and volunteering: Regarding

the donation of food, clothes, medicines,

human power, etc

Affected individuals: Information of injured, dead people, or disaster victims

Sympathy and emotional support: Posts spreading prayers and wishes

Other useful information: Information relating to disaster a

Not related or not informative: tweets not related or not useful for disaster management

From these tweets, we create 3 datasets:

i D1 – For binary classification This dataset will contain 10,083 records and have 2 : features of tweet texts and binary labels (0 or 1) Since the classes are quite balanced,

we will not perform further data augmentation techniques

Trang 10

10

ii D2 – For multi-label classification: This dataset contain 4,333 samples (omitting data s

labeled “Not related or not informative”) and also has 2 labels of tweets and information type To deal with the imbalance problem, we apply sampling methods to convert this into a balanced dataset with equal distribution of tweets in each label

iii D3 – For testing the overall system: Containing all records with mul -labeling This ti dataset will be used to evaluate the performance of our proposed system against multi-label tweets classification models

On all the datasets, data is divided with a ratio of 80% for training and 20% for testing

b Data Pre-processing

i Data Cleansing: First, all tweets are transformed to lowercase Regular Expression is used to remove common substances like emails, URLs, HTML tags, emojis, special characters, etc Then, abbreviations will be replaced with expressions that we mapped

to For fluency purposes, we do not remove stop words (Naaz et al 2021)

ii Word Embedding: Word embedding is a technique to represent words in a form that can

be understood by machine learning algorithms (Prasad et al 2023) We use the TF-IDF method, which uses word weighting (Le 2022), as the embedding mechanism

c Model Selection

To find the best candidate models for our multi-modal system, this study will also propose and measure the performance of several BERT-based models in our 2 main tasks:

Binary tweets classification models:

i BERTBASE: Default BERT model with hyperparameters tuning used as the baseline model

ii Fine-tuned BERT Add a sequence non-linear layer on top of the ERT: LARGE model

Tiêu đề	Disaster Tweets Classification in Disaster Response with BERT-Based Model
Trường học	Vietnam National University Hanoi International School
Thể loại	Research Proposal
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	16
Dung lượng	0,92 MB