International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) Emotion Recognition-Based Mental Healthcare Chat-bots: A Survey *Carol Antony, Bestina Pariyath, Seema Safar, Aswin Sahil, Akash R Nair Department of Computer Science and Engineering, Rajagiri School of Engineering and Technology, Kochi 682039, Kerala, India ABSTRACT The average human attention span has cascaded down fro m 12 to a mere seconds This is a direct consequence of man’s tech saturated lifestyle Hu mans have shown a changing trend of preferring quick and informat ive dig ital conversations over a time consuming hu man-to-human interaction Concurrently, a humongous increase in research being done on chat-bot technology is seen A well-trained chat-bot capable of having a fu lfilling and productive conversation has seen keen interest in users Ever since the outbreak of the Covid-19 pandemic our lives have been forced to change drastically Difficu lty managing to adapt to the postCovid lifestyle has raised concerns about psychological resilience to adversity The situation calls for immediate attention to mental healthcare In this survey, a study of the latest papers on how emotion recognition, in addit ion to sentiment analysis can be integrated into a chat-bot to help identify and resolve a user's mental anguish is done Th is survey is aimed at finding and analysing the existing methods used to develop a self-sufficient emotion recognition-based chat-bot system that can take up the role of a therapist Keywords: Chat-bot - Mental health - Natural language Understanding - Natural Language Generation - Deep Learning Introduction 1.1 Background According to the World Health Organization (WHO), “Mental health is a state of well -being in wh ich an indiv idual realizes his or her own ab ilit ies, can cope with the normal stresses of life, can work productively, and is ab le to make a contribution to h is or her commun ity.” Despite it having a direct correlation with one’s physical well-being resulting in ill-health tendencies and high mental mo rbidity rates [1], mental health is one of the most neglected areas of health care The society still stigmatizes mental health-related issues to a great extent even in this modern era This discrimination certain ly discourages people facing such issues fro m bringing them out in the open and they keep suffering on the inside We need to recognize trau ma as a normal hu man being’s response to an abnormal scenario and embrace people dealing with it To pave the way for a healthier society, we need to stop questioning the existence of people who suffer fro m some kind of mental dysfunction and start supporting them by addressing their mental health issues [2] The novel coronavirus outbreak has had a significant impact on the functioning of society as a whole and has evenly wreaked havoc in both the rural and urban lives A natural disaster of this intensity holds the capacity to affect the psychological vulnerability of the mass population Firstly, those who have contacted the virus and liv ing in care facilities or those plac ed at high risk of carry ing the virus in them or the frontline health care wo rkers who have direct c ontact with patients are exposed to developing mental disorders like an xiety disorder, post-traumat ic stress disorder or clinical depression Also, this unsolicited situation has led to touch starvation in many indiv iduals [3] Humans crave physical touch as much as they need verbal attention The social distancing norm introduced during the pandemic has heightened concerns of people feeling deprived of affection, because of limited or no physical contact Besides the direct psychological d istress caused by the ailment, people dread the 69 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) thought of being left alone in isolation at homes or hospitals, which ultimately leads to unhealthy sleep and diet patterns [4] The fear of not being able to see their loved ones or confirm their well-being can really shake people up The economic downfall is yet another factor that has contributed greatly to emot ional instability among people This has had adverse effects on the rural folk who are economically less -privileged, due to uncertainty about their livelihood and scarcity of new job opportunities This season has witnessed a massive increase in the nu mber of intimate partner violence (IPV) [5] and suicide victims [6] as well Therefo re, mental health has become one of the most widely discussed subjects these days In most issues related to mental health, the affected merely need a space to vent out The listener is only required to validate the person’s emotions by understanding th eir situation and helping them come up with constructive solutions to tackle it Chatbots are being widely used and experimented on in recent years The direction of research has slightly inclined towards their potential in being emotionally intelligent [7] The current scenario has increased the demand for such chatbots Being isolated fro m their loved ones and the sudden disruptive change in lifestyle has taken its toll on people globally and left a good number o f them feeling mentally vulnerab le and unstable But o wing to social stig ma and limited knowledge in the field of mental healthcare, people are often scared to reach out to others This is where a chatbot can come into play The knowledge of anonymity wh ile conversing with a chatbot enables the user to drop their facade and feel more co mfortable about opening up Most existing chatbots generally enable faster and easier access to knowledge for its users and are incapable of ascertaining or reciprocating the emotions of the user But, once we equip the chatbot with this quality, it can provide emotional support an d help reduce the load on therapists at the same time Therefore, the issues that rose or went out of hand owing to the pandemic or lack of access to mental resources can be controlled to an extent with chatbots that can understand the user’s emotions and respon d the way a human can 1.2 Objective The objective of this paper is to prov ide a co mprehensive overview of the existing research literature on the use of empathet ic conversational agents This paper discusses the most efficient methodologies available to enable a chatbot syste m to understand a user's emotions from the input user utterance with emot ion recognition, by employing deep learn ing methods and Natural Language Understanding (NLU) The paper aims to give further insight on how to generate a response that is emotionally compatible to the user’s input, without losing contextual information, thus enabling the development of therapeutic chat -bots Finally, in line with the observed gaps in the literature, this paper seeks to provide reco mmendations for future research on design and application of this technology Scope Chatbots can be explicit ly used for diagnosis, prevention, in -time interventions and follow-up in the mental healthcare sector Research being conducted lately in the field suggests how suicidal tendencies and psychological distress can be perceived from social media activity using Artificial Intelligence In such cases where immediate act ion is required, the conversational age nts can prompt the user to reach out to emergency helpline facilities or alert nearby therapists in order to prevent any mishaps These agents can also keep the user’s mental health stabilized by providing regular emotional support to avoid any chances of relap se and even facilitate early screening and quicker diagnosis of psychological dis orders [9] 70 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) Fig 1- Difference between conventional and empathetic chatbots Adolescence is a crucial period for developing and maintaining social and emotional habits important for mental well-being and youth these days rely extensively on s martphones At this age, where the child is prone to all kinds of emotional vulnerability and is in the process of transforming into an adult , the intervention of an emot ionally intelligent chatbot would be v ery resourceful The use of these agents is not just limited to youth but it can also be used by adults to relieve them of the stress and anxiety that comes with the responsibility of tending to family and society, and help them cult ivate healthy physical habits so their day becomes more productive It can be utilized by NGOs and other social or rehabilitation workers to reach out to people in need In post-Covid times, where many have been fo rced to stay home, we have realized how such a technology holds the potential to co mbat the ill-effects as well as make up for the lack of access to in-person healthcare facilities during natural calamities like floods, virus outbreaks, earthquakes etc Low and mediu m-inco me countries till date face shortage of qualified mental health professionals The doctor to patient ratio is too low Also, there is no provision for insurance coverage for mental health services Reaching out and seeking help is q uite an expensive affair These conversational agents can revolutionize mental health care in such areas and relax the burden on clinical psychologists, by making needed facilities accessible to anyone, anywhere at their fingertips Phases of an empathetic chatbot The interaction of a user with an empathetic chatbot [8] consists of precisely the following four phases: Emotion expression by the user Pre-processing of the input utterance Sentiment analysis to detect the underlying emotion Emotionally appropriate response generation 71 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) Fig - Phases of an empathetic chatbot 3.1 Emotion expression In this init ial phase, the input utterance fro m the user is fed into the system Emot ions are context -sensitive in nature More often, a sentence may express mu ltiple emotions simu ltaneously Somet imes the emot ion might not be apparent and sometimes the overall tone of the statement might be neutral Words holding a strong emotional charge will allow the system to better interpret the emot ional state of the user Such contextual word associations and emotional information can be identified by sentiment analysis, after pre-processing the statement A human’s feelings and response toward stimu li are guided by emotional mental models Setting the number of emotion categories to be used for classification of tokens at this stage itself will lessen the confusio n for emotional model selection on the future phases Most popular model availab le is the Ekman’s six basic emotions model [10] — happiness, sadness, fear, anger, disgust, and surprise Other less frequently used models for emot ion detection are Plutchik’ s wheel of emotions [11] — primary bipolar emot ions: joy versus sadness; anger versus fear; trust versus disgust; and surprise versus anticipation and advanced emotions based on the differences in intensity of the primary emotions and Parrott’s Emot iona l Layers [12], consisting of thirty-one different emotions Ekman’s model is preferred owing to higher accuracy in mental health predictions and lesser categories, leading to a faster and more efficient system 3.2 Pre-processing Fig - Steps involved in pre-processing 72 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) At this stage, the statement must be broken down for further analysis and this is the fundamental step in any NLP method Provided a character sequence, tokenizat ion divides it into smaller, indivisible units, called to kens, also scraping off all unwanted characters, repetitions and numbers In case acronyms or messaging language are present in the input, we perform normalizat io n to bring it down to its standard form All negations are kept intact to preserve contextual information After pre-processing, machine learning models are utilized to optimally tokenize each word in the testing set Tokenization improves the machine’s learn ing rate and in ferences to a great extent A good tokenization mechanis m is taking a domain specific corpus and giving them higher scores and further normalizing the remaining dataset with a generic corpus Finally, the model assigns parts of each to each of the tokens and correlations are established 3.3 Sentiment Recognition The tokens identified during the pre-processing stage are embedded into binary vector representations, which contains the mean ing corresponding to each word These word vectors are then combined and into matrix representation of a sentence Attention mechanis ms are applied to filter out the crucial informatio n, thereby co mpacting the matrix while retaining the contextual data Finally, we predict the target sentiment of the statement This can be done in following ways: Fig - Steps involved in sentiment recognition in automatic and hybrid models • • • Rule Based Models: These models make use of a fin ite set of rules crafted by the system developer to pred ict the sentiment/emotion within a statement Primarily, each word in a statement is identified as one that shows or aids a sentiment or as neutral words Then using the rules that deal with all the sentiment deciders like sentiment lexicons, adjectives, negators, intensifiers and more, the sentiment behind the statement is decided These models wor k well if we know the exact context of the user’s statement Automatic Models: Such models make use of machine learn ing techniques that learn fro m an input dataset Sentiment analysis is usually modified as a classifier algorith m Here, we need to build a machine learning algorith m and feed it with large amounts of input data to train it The most widely proposed classifier models for this are Naive Bayes, Linear regression, Support Vector Machines (SVMs) and deep learning models Hybrid Models: These models comb ine both automatic and rule -based models They are built fo r improved accuracy In this type, the developer includes the beneficial co mponents from the other two models to build a better one These types are generally used when the developer cannot be very sure of the context of the user’s input but needs accurate outputs 73 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) 3.4 Response Generation Once the sentiment behind the user’s input is identified, the chatbot needs to provide a matching human -like response to the user For this, the identified sentiment is fed to the response generator model along with the user’s input The model then generates a response that corresponds to the user’s emotion • Retrieval-based Models: This is an easier model wherein a predefined set of queries and responses are fed into the system and it chooses a response that best matches the context as well as takes into account the emotion conveyed in the input utterance These models are best suited for main-specific applicat ions like customer care for a certain business The possible response set is known in advance, hence there’s no ambiguity regarding the generated response • Generative Models: There are no predefined set of responses in these models as they generate comp letely new responses from scratch This approach is quite advanced and is seldom used in practice It is difficult to develop bots using this model as it requires a massive emotionally variant hu man-human interaction dataset and huge amounts of training time to achieve contextually accurate responses But recent years have seen a lot of advancement in this field Result Analysis Fig – Basic architecture of the chatbot 4.1 Development Approaches Table shows a set of sentiment recognition methods used by researchers in the past few years for classifying various social med ia co mments and tweets/posts while Tab le shows recently developed response-generation approaches for generat ing context-relevant and empathetic responses corresponding to user input utterances The tables were created as a result of the literature review conducted regarding the same 74 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) 4.2 Datasets available Table and list out some of the public datasets available on Kaggle platform to facilitate sentiment analysis and more hu manlike response generation Table – Sentiment Detection and Analysis Approaches Authors Year Focus Suggested Approach Chao Song et Al 2020 Short text sentiment analysis [13] A framework that is based on probabilistic linguistic terms called SAPC is created It makes use of probabilistic linguistic terms sets and support vector machines to create the framework Kashfia Sailunaz et 2019 To conduct emotion and sentiment Naïve Bayes classifier is used to recognize emotion Al analysis on twitter data [14] as well as sentiment Shailendra Kumar Singh et Al 2019 To classify social med ia texts using SentiVerb system – It uses the dictionary approach sentiment analysis [15] (opinion verb dictionary) and a binary classifier Mohsin Manshad Abbasi et Al 2019 To summarize emotions from text [16] Uses Plutchik’s wheel of emotions to classify the emotion conveyed in the text Md Rakibul Hasan 2019 To conduct sentiment analysis with NLP Uses Bag of Words and TF-IDF model concept to et Al on twitter data [17] analyse the sentiment in the tweet Xian Zhong et Al 2019 To develop an emotion classification Sentiment categorization done by modificat ions to algorithm based on SPT-CapsNet [18] SPT-CapsNet (capsule network) Ankush Chatterjee et Al 2018 To understand emotions in text using SSBED – Sentiment and Semantic Based Emotion deep learning and big data [19] Detector model is proposed for sentiment recognition Guixian Xu et Al 2017 To conduct sentiment analysis on social TF-IDF is used to process the comment and the media comment texts [20] resulting vectors are given as input to a Bi-LSTM model to analyse the sentiment Anna Jurek et Al 2015 To develop an imp roved lexicon-based Rule based approach which considers almost all sentiment analysis algorithm for social emotion determining factors media analytics [21] Monisha Kanakaraj 2015 To conduct sentiment analysis using NLP based approach to strengthen the sentiment et Al ensemble classifiers on twitter data [22] categorization by incorporating semantics in feature vectors and consequently using ensemble methods for the same 75 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) Table – Response Generation Approaches Authors Year Focus Suggested Approach Zhang et al 2020 Retrieval-Polished Response Generation A retrieved response is polished by considering a for Chatbot [23] contextually similar p rototype and the better of the retrieved and polished responses is chosen as the final response The method uses the background provided by the context and the sentence style provided by the retrieved response Kim et al 2020 Knowledge-Grounded Chatbot Based on Dual Wasserstein Generative Adversarial Net works with Effective Attention Mechanisms [24] Zhang et al 2019 Dialo GPT: Large-Scale Generative Pre- It is an open-domain pre-trained model trained on training for Conversational Response Reddit dataset and uses the mu lti-layer transformer Generation [25] architecture which handles issues like content inconsistency, loss of contextual data and lack of emotion encountered in other models Wu et al 2019 A Sequential Matching Framework for A sequential matching framewo rk (SMF) for contextMulti-Turn Response Selection in response matching was introduced that can handle Retrieval-Based Chatbots [26] important info rmation in a context as well as model the utterance relationships The models were based on convolution-pooling technique and an attention mechanism Gu et al 2019 Dually Interactive Matching Network An IMN models the matching degree between a (IMN) fo r Personalized Response context constituting mu ltiple utterances and a Selection in Retrieval-Based Chatbots candidate response The DIM model adopts a dual [27] matching arch itecture It interactively matches responses to contexts and personas respectively for ranking response candidates, giving more personalized responses Yuan et al 2019 Multi-hop Selector Network (MSN) for An MSN utilizes a mu lti-hop selector to select the Multi-turn Response Selection in relevant utterances as context after wh ich the model Retrieval-based Chatbots [28] matches the filtered context with the candidate response and returns a matching score Th is helps level down the problem of too much context in a sentence Knowledge Grounded chatbots converse using internal and external knowledge on a subject Welldesigned attention mechanis ms were proposed to reflect context without topic deviation 76 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) Su et al 2018 Response selection and automatic message-response expansion in retrieval-based QA systems using semantic dependency pair model (SDPM) [29] For response selection, an SDPM is constructed from a matrix representing correlations between the semantic dependencies of the messages and responses For database expansion, the unstructured data from the message board of various psychological consultation websites are automatically collected, fed to the QA system to find the best matched response segment and scored to verify if the new pair is good enough to be included in the database Wu et al 2018 Response selection with topic clues for A Topic aware attentive recurrent neural network retrieval-based chatbots [30] (TAARNN) was proposed in which topic information is used to facilitate representations of the message and response RNNs are good at capturing the local structure of a word sequence (syntactic and semantic) and topic awareness reduces the issue of not being able to handle long term dependencies Wu et al 2018 Learn ing Matching Models with Weak An unlabelled data set was created by retrieving Supervision for Response Selection in response candidates and a weak annotator( preRetrieval-based Chatbots [31] trained fro m large scale unlabelled hu man-hu man conversations) was employed to provide matching signals for the unlabelled input response pairs, which helped supervise the learning of matching models Table - Datasets for Sentiment Recognition Name of Dataset Size IMDB dataset (Sentiment analysis) in CSV format Amazon Reviews Sentiment Analysis Usability Contains long movie reviews (>200 words) labelled with the sentiment associated with the movie review 10 for 493.13MB Contains 3.6M A mazon product reviews, separated into two classes for positive and negative reviews (1,2 stars-negative; 4,5 stars- positive) for learning how to train fastText for sentiment analysis 6.9 Stanford Sentiment Treebank v2 (SST2) [32] Sentiment140 dataset 62.81MB Description 46.5MB Contains 11k+ sentences fro m movie rev iews parsed using Stanford parser All phrases were labelled by Mechanical Turk for sentiment 9.4 227.74MB Contains 1.6M t weets extracted using twitter API, labelled on a scale of to 4, indicating polarity of the tweet 8.8 Emotions dataset for NLP 1.97MB Contains list of documents with corresponding emotion flag 10 Sentiment Lexicons for 81 Languages [33] 1.96MB Contains positive and negative sentiment lexicons for 81 languages 7.6 77 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) Table - Datasets for Response Generation Training Name of Dataset DialoGPT-large Size Description 4.52GB Contains trained model for a large-scale pretrained dialogue response generation model The model is trained on 147M mu lti-turn d ialogue fro m Reddit discussion thread Ubuntu Dialogue Corpus 2.71GB Contains 26 million turns from natural two-person dialogues Usability 8.8 7.6 Discussion 5.1 Principal findings Chatbots have been developed since 1966 starting with Eliza, the very first chatbot created by Joseph Weizenbaum The program was designed to mimic a hu man conversation In the decades that followed, chatbot developers have built upon this model to strive for mo re hu man-like interactions These were all based on pattern matching and substitution methodology to simu late conversation The 21st century has brought about new and exciting chatbots which showcase machine learn ing and other advanced algorith ms, due to wh ich they are able to learn fro m their interactions with hu mans Since then, these chatbots have always attracted people They have been used as personal assistants that can perform tasks on behalf of their users They hav e advanced so much that people even talk to chatbots for their o wn amusement Chatbots have been used for assistance in depression for a long time now Chatbots possess both reactive i.e instant response to an input and proactive behavior i.e constant healthy notifications, which is why they are apt as an alternative for face to face counselling They can be developed to be companions capable of identifying emotions and playing the role of a therapist Wo mbat is one such chatbot developed for Cognitive Behavioral Therapy (CBT) However, it has been found to lack empathy Research is still being conducted to create a human like chatbot which can provide users with an interaction wh ich is as close to a conversation with another human as possible Using the methodologies reviewed in this paper, it is possible to achieve more human ization of chatbots to give natural and emotionally accurate responses 5.2 Assumptions and limitations Despite massive train ing, a bot is still a mach ine and it is difficu lt for a machine to capture and comprehend human emot ions in its entirety Th is can beco me harmful to the user if they beco me too attached to the chatbot and consider it as another human and not as a bot Emotional dependence on a non-living entity can become quite dangerous especially when the user’s mental stability relies on it The usage must be monitored and regulated by a dedicated institution The uncomplicated limitat ions of chatbots mainly includes the follo wing Even after incorporating contextual awareness, it is hard to enable the chatbot to recognize sarcasm and other such feelings which require good world knowledge and very good understanding of the user The chatbot may also not be able to p rocess long poetic statements which a good number of depressed people tend to use Lack of high quality hu manhuman interaction dataset for train ing and the fact that most of the existing ones are biased in terms of gender, race, sexua l orientations etc which may generate biased conversations that more harm than help is another chief reason which sets researchers back in their path to develop a universally deliverable counselor bot The chatbot won’t be able to give a proper med ical diagnosis for users with serious mental health issues and rather detects it as stress or a temporary ill feeling A chat bot therefore, should not be developed with the intention to replace therapy but as a means to encourage a person to be more 78 Electronic copy available at: https://ssrn.com/abstract=3774017 International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020) conscious and aware of their moods, behavioral patterns, and potential trigger points, suggesting healthy options for dealing with stress and depressive episodes, and reinforcing techniques learned in therapy like CBT, SFBT approaches 5.3 Areas of improvement The current research on emotionally aware chatbots can be extended into an interdisciplinary research, combin ing knowledge in the fields of mental health, artificial intelligence and computational linguistics More wo rk may be performed focused on bringing about visible, tangible behavioral changes rather than merely provid ing the user with suggestions to tackle th eir problems Provision to handle the issue of mu ltiple languages and dialects of the locals majorly in developing countries M ore datasets may be created from actual counselling/therapy sessions while maintaining anonymity of patients involved There’s also a scope to utilize user engagement with smartphones to increase accuracy and comprehension of human behavior A dedicated regulating institution is desired for mon itoring the users, handling emergencies decisively, measuring effect iveness of the b ots and being accountable for the well-being of the society So me mechanisms to detect user satisfaction after using these chatbots may be resourceful in better training of the deep learning models and hence maximize the functionality of the chatbots Conclusion This work proposes ways to enable a chatbot system to understand the user’s emotion fro m their input utterance with emotion recognition, by employing deep learning and Natural Language Understanding (NLU) Using the most efficient methodologies, the chatbot will be equipped to generate natural, empathic responses which are well suited to the user’s input on an emotional level The chatbot can have constructive conversations with the user and in performing continuous comprehensive observation of the user’s emotional and behavioral patterns Such a chatbot can be deployed in situations 