2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Assessing a Voice-Based Conversational AI prototype for Banking Application Chinmoy Deka Abhishek Shrivastava Shiva Sah Department of Design Department of Design Department of Design Indian Institute of Technology, Guwahati Indian Institute of Technology, Guwahati Indian Institute of Technology, Guwahati Guwahati, Assam Guwahati, India Guwahati, India chinmoy deka@iitg.ac.in shri@iitg.ac.in shiva.sah@alumni.iitg.ac.in Mridumoni Phukon Lipsa Routray Department of Design Indian Institute of Technology, Guwahati Guwahati, India mridumonip@iitg.ac.in Centre for Linguistic Science and Technology Indian Institute of Technology, Guwahati Guwahati, India lroutray@iitg.ac.in Abstract—Conversational AI has tremendous potential in different application domains due to its rapid development and improved accuracy in recognizing natural languages Researchers have developed numerous applications and have shown state-ofthe-art results However, acceptability by users for such Conversational AI applications is imperative for successful deployment This paper aims to assess a Conversational AI for a banking application in terms of usability, attractiveness, and intuitiveness For this purpose, two different prototype versions were developed with varying dialog design and visual backgrounds The experiment was conducted by letting 40 participants interact with the prototype versions, exploiting the Wizard-of-Oz (WoZ) paradigm, and administering three questionnaires to measure their perception of the Conversational AI prototype Qualitative and Quantitative assessment of the questionnaires suggests that the Conversational AI prototype is highly usable, attractive, and intuitive, providing evidence that users will appreciate such Conversational AI in banking applications Index Terms—Human-Centered AI, Conversational AI, Voice User Interface I I NTRODUCTION Conversational AI deals with the techniques and methods in software agents that can engage in natural conversations with humans [1] With the advent of Deep Learning and Big Data, Conversational AI assistants such as Amazon Alexa, Apple’s Siri, Google Assistant, and others have become a focal point of interest among users and researchers The success of such products has initiated researchers’ interest in applying such Conversational AI agents in different application domains Researchers have speculated that conversational agents and voice user interfaces may become the universal user interface [2] However, to implement voice-based conversational AI in real scenarios, their evaluation and usability study is significant 978-1-6654-1001-4/21/$31.00 ©2021 IEEE for their success Due to the expensive nature of AI, it is recommended to go through a testing process or iterative design approach where the experimenter can simulate the behavior of the theoretical intelligence of a computer system Wizard-of-Oz can be one potential solution for this purpose Voice-based Conversational AI have shown tremendous potential in different application domains Austerjost et al have shown the effectiveness of an intelligent virtual assistant for controlling laboratory instruments using voice commands [3] Metatla et al., in their study, have demonstrated scenarios, design space, and an educational Voice User Interface application using Amazon Alexa for inclusive education with visually impaired and sighted pupils [4] Researchers are recently getting interested in implementing Conversational AI interfaces for the banking application domain Samuel et al have developed a Voice Chatbot using raspberry pi and Amazon Lex Service for payment application in the Eyowo payment platform [5] In a similar study, Kaur et al have developed ”Voice Pay”, a digital payment solution using voice that reduced the steps of current digital payment system [6] In another study, Tymoszek et al have developed an in-vehicle payment system called ”DashCam” which uses Face, and Voice commands for user authentication, and payments [7] While these applications are developed for different purposes, their evaluation and usability study is significant for their success Porter et al have found using interviews and Wizardof-Oz prototyping that voice technology has great potential for assisting the elderly population [8] However, there is a dearth of literature on similar assessment studies of Conversational AI applications in the banking domain Moreover, in his paper, Browne has suggested Wizard-of-Oz prototyping for incorporating a human-centered design in the application that requires AI capabilities [9] This type of assessment provides 211 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) insights into the requirement of AI capability in a particular application by users and reduces the risk of unsuccessful and costly AI endeavors In this study, we assess a Conversational AI prototype’s usability, attractiveness, and intuitiveness for a banking application using the Wizard-of-Oz prototyping technique We investigated two different styles of dialog prompts and found that the users highly appreciated the Conversational AI prototypes This paper attempts to assess Conversational AI for banking applications and provides a foundation for further studies on different user groups The rest of the paper is organized as follows: In the next section, we describe the prototype design and methodology in detail Next, we explain the result, followed by a brief discussion Finally, the last section gives the conclusion II P ROTOTYPE D ESIGN A Application Design The designed Conversation AI prototype provides a voice user interface for performing a specific banking transaction, i.e., transferring an amount from one account to another We developed the application using HTML5, CSS3, bootstrap, and javascript For incorporating the Wizard-of-Oz feature in the application, we assigned specific keystrokes to trigger the respective dialog prompts by the interface The application flowchart of the Conversational AI prototype is shown in Fig.1 2) Timeout Prompts: It is played when the Conversational AI does not receive a response from the user Unlike default prompts, timeout-prompts are descriptive and explain the anticipated inputs from the user 3) Invalid-Input Prompts: It is played when the user gives incorrect information at a certain transaction step This prompt is more descriptive and explain the format and type of input anticipated by the system We further created two different versions with two different sets of default-prompts and background color However, timeout and invalid-input prompts remain the same for both versions, and both versions follow the same task flow to perform the banking transaction 1) Prototype Version A: The default prompt used in the ”prototype version A” is more descriptive and lengthier than the default prompt used in the ”prototype version B.” Moreover, the GUI in this version is minimal with a dark background with contrasting white text In addition, the interface contains a white orb that activates when the system is on listening mode and is ready to take the user’s input, as shown in Fig A 2) Prototype Version B: The default prompts in this version are kept as minimal as possible The GUI in this version is minimal, with a light background and contrasting navy blue text Like ”prototype version A,” the interface contains a white orb that activates when the system is on listening mode and is ready to take the user’s input, as shown in Fig B Fig The Conversational AI prototype interfaces:(A)Prototype Version A; (B)Prototype Version B Some of the differences in default dialogs between ”prototype version A” and ”prototype version B” are illustrated in the table I All the dialog prompts are pre-recorded for each step of the flowchart and are played by the Wizard using the keyboard strokes incorporated in the prototype for performing the experiment Fig Application Flowchart of the Banking Transaction TABLE I I LLUSTRATION OF DIFFERENCES IN DEFAULT PROMPTS IN THE PROTOTYPE VERSIONS B Dialog Design Each step in the banking transaction/application flowchart comprises three types of dialog prompts They are as follows: 1) Default Prompts: It is played by default on reaching a certain step in the application The default prompt describes the expected input from the user 212 Version Prototype Version Prototype Version Prototype Version Prototype Version A B A B Default Prompt Please narrate your name Your name? Please narrate the amount you want to transfer Amount to transfer? 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) III M ETHOD A System Configuration A 2.60 GHz intel i7 laptop with 16 GB of RAM and Microsoft Windows 10 was used for the experiment A web server was installed on the laptop to host the Wizard-of-Oz web application (Conversational AI) In addition, Microsoft Teams was installed on the laptop to perform the remote testing Another monitor was extended from the laptop through an HDMI cable for conducting the experiment smoothly The laptop’s display was used to host the remote video conferencing call with participants, while the extended monitor’s display was used to host the Conversational AI application Finally, test trials were conducted to check that the configured system was working correctly B Task The experimental task was to transfer an amount of Rs 5000 to a beneficiary named ”Ravi Kumar” through ”Account Transfer” having an account in the same branch as shown in Fig.1 A trigger was assigned and displayed on the interface to initiate the transaction Each participant completed the task only once using the randomly assigned interface AI A schedule of appointments was set following the participant’s availability and time required to perform each trial (approximately 30 minutes) E Settings The experimenter arranged the following setup before starting the video conferencing with the participants: 1) The laptop’s screen displayed the video conferencing software situated in the experimenter room 2) The extended monitor connected to the same laptop displays the Conversational AI application The experimenter shared the extended monitor’s display with the participant using the screen-sharing feature of Microsoft Teams The computer audio is enabled while sharing the screen, which allows the remote participant to hear the dialog prompts of the Conversational AI prototype The experimenter then acts as the Wizard controlling the voicebased dialog between the Conversational AI and the participants through the Wizard-of-Oz feature embedded into the Conversational AI prototype Fig.(3) C Pre and Post Questionnaires The methodology used to evaluate the Conversational AI prototype was based on pre and post-questionnaires administered to the participants The pre-questionnaire collected the participants’ demographic data and information regarding their experience with Conversational AI The post-questionnaires was conducted to measure the participants’ subjective judgment based on their interaction with the Conversational AI prototype After the completion of the task performed by each participant, the ”Single Ease Question” (SEQ) questionnaire [10] was administered to measure how easy/difficult the user-system interaction was according to the users’ perception Moreover, the following post-questionnaires were administered to investigate users’ subjective opinion about the quality of their interaction with the prototype: 1) AttrakDiff’s attractiveness scale [11], [12], was administered for assessing the system usability, effectiveness, efficiency, enjoyment, engagement and appeal of using 2) System Usability Scale (SUS) [13] was administered for evaluating the system learnability 3) INTUI questionnaire [14] was administered for evaluating the intuitiveness of the system D Participants We recruited the participants through convenient sampling A total of 45 individuals volunteered to involve in the user study However, a few individuals faced issues related to internet connectivity and schedule coordination Thus, five prospective individuals had to drop out of the experiment In the end, the user study included the engagement of 40 participants (M = 26.03 years; SD = 3.37; Range = 20-34 years; Female = 19) with the prototypes of the Conversational Fig System Setting F Protocol The procedure was designed to fit into a single 30 minutes session to avoid effects of fatigue The structure of the evaluation procedure was the same for the two groups (A and B), except for the interfaces of the Conversational AI prototype displayed to them While participants belonging to Group A were displayed the ”prototype version A”, participants belonging to Group B were displayed the ”prototype version B” A video conferencing call was made to the participants using Microsoft Teams as per their availability A Google form was administered for the evaluation procedure, which consisted of the following steps: 1) Welcome (all participant): A welcome section was incorporated into the google form It informed the participants about the Conversational AI project and its goal 2) Informed Consent Section (all participant): All participants were asked to fill a consent from allowing the experimenters to record their interaction with the Conversational AI and collect some of their data All participant were informed that their participation was voluntary and there is no remuneration associated with the experiment 213 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) 3) Pre-interview Section (all participants): A preinterview was administered in the third section of the Google form to collect information about participant’s age, gender, job position and experience with Conversational AI products such as Apple Siri, Amazon Alexa and Google Assistant 4) Task Description (all participants:) The fourth section of the Google form describes the participant’s task for the experiment using voice command with the Conversational AI prototype The section also describes the detail of the task, such as the participant’s account details and the beneficiary’s account details, required for completing the task The task was also messaged to the participants’ respective messaging service, which was meant to help the participant refer the details during the interaction with the Conversational AI 5) Presentation of the Conversational AI interface (different for the two participant’s group): The participants were asked to hold the form filling process (after filling up to section of the Google form) and to return to the Microsoft Team’s interface The screen sharing feature with computer audio was executed by the experimenter, sharing the extended monitor, which displayed the Conversational AI prototype A video was played on the experimenter’s laptop to test that the remote computer’s audio can be heard by the remote participant, which was essential for the remote experiment of the Conversational AI Once confirmed with the participant, the participant was asked to start the interaction and complete the task The interaction was recorded using the record feature of Microsoft Teams for further analysis It is worth underlining that two slightly different interfaces were employed for each user group Participants belonging to group A interacted with the interface presented in Fig (A) On the other hand, participants belonging to group B interacted with the interface presented in Fig.2 (B) Both application prototypes’ internal working were the same except for the length of the ”Dialog Prompts” and the background style incorporated into the two different versions 6) Interaction with the Conversational AI(different for the two participant’s groups): • • GroupA: Participants from group A completed the task using ”prototype version A” presented in Fig.2(A) GroupB: Participants from group B completed the task using ”prototype version B” presented in Fig.2(B) 7) Post-interviews and questionnaires (all participants): After the participant’s interaction with the prototype was completed, they were asked to return to the Google form administered at the beginning of the procedure The ”Single Ease Question” (SEQ) questionnaire was the first questionnaire administered to the participants, indicating how easy or difficult the task was The remaining questionnaires, namely ”AttrakDiff’s attractiveness scale”, the ”System Usability Scale” (SUS), and the ”INTUI” questionnaire were administered in random sequence to each participant The random sequence was generated by the experimenter using the android application [15] before each trial 8) End of the trial: After each participant completed the evaluation procedure, informal communication was carried out, and gratitude for their participation was expressed by the experimenter Recordings of participant’s interaction were stored in appropriate folders for further analysis IV R ESULT A Participants’ Information The pre-interview questionnaire was administered (Step of the Procedure) in order to collect demographic information of the participants The collected data is shown in table II It provides an insight into participant’s demographic information and their level of experience with Conversational AI It is worth underlining that participants from group B have slightly higher experience with Conversational AI than the participants of group A TABLE II D ESCRIPTION OF G ROUP A AND G ROUP B PRE - INTERVIEW Data Participants Age(avg/med/range) Gender Experience with CAI None I tried it once I use it sometimes I use it frequently SAMPLES BASED ON THE Group A Participants 20 25.85/26.5/21-30 10 Males; 10 Females Group B Participants 20 26.2/26/20-34 11 Males; Females 4/20 3/20 11/20 2/20 3/20 2/20 13/20 2/20 B Task Easiness The SEQ questionnaire was the first section of the postinterview questionnaire administered to the participants after completing the task It was administered in order to perceive how easy/difficult it was for the participants to interact with the Conversational AI prototype The SEQ score of individual participants were retrieved, and the overall SEQ score for each groups was calculated according to [16] Figure shows the result of the SEQ questionnaire Participants from both groups found the interaction easy However, participants from Group A found the interaction easier than the participants from Group B The overall SEQ score for Group A and Group B were 6.7 and 6.4 respectively C Usability of the System The System Usability Scale (SUS) was administered to evaluate the easiness of user interaction with the system The overall SUS score was 92.25 (SD=7.16) and 88.63 (SD=7.97) 214 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Fig Attrakdiff’s System Character Fig Results of SEQ (Single Ease Questionnaire) for group A and group B, respectively Both the versions of the prototype can be graded as A+ (=excellent), according to Sauro and Lewis’ scale [17] Thus, we can infer that the system was perceived to be very easy to use by the participants The results for the two different groups are similar, but group A participants found the system slightly more usable than the group B participants D Attractiveness of the System The AttrakDiff’s attractiveness scale evaluated participants’ perception of system efficiency, enjoyment, engagement, and appeal of using The short version of AttrakDiff’s scale was administered to the participants due to time constraints The scale investigated three dimensions of the system: • the Pragmatic Quality (PQ) • the Hedonic Quality (HQ) • the Attractiveness (ATT) The average scores are shown on the y-axis of Fig for both the groups All three dimensions lay in the highest region, and hence we conclude that the Conversational AI is attractive to the participants.It is further possible to assess the ”character” the participants attributed to the system by analyzing the participants’ PQ and HQ dimension scores Both groups of participants perceived the system as ”desired”, as shown in Fig 5.Fig displays the mean values of the word pairs contained in the questionnaire While most of the wordpairs are well-resolved, there is a need to work on the cheappremium word pair for both groups.All the results of the AttrakDiff’s ”attractiveness” scale were elaborated according to [12] Fig Attrakdiff’s dimensions score While for the ”Magical Experience” component, both groups fall on the higher scale and group A is slightly better than group B, the ”Verbalizability” and ”Intuition” component for both groups shows higher scale with group B slightly performing better than the group A V D ISCUSSION According to the respective scale, the Conversational AI prototype was perceived as useful, attractive, and intuitive to the participants The higher grading of the usability study may E Intuitiveness of the System The INTUI questionnaire investigated the participants’ ability to deal with the system without effort, i.e., intuitiveness of the system All the components such as ”effortlessness”, ”gut feeling”, ”magical experience”, ”verbalizability”, and ”intuition” were considered The result from the questionnaire is shown in Fig and elaborated according to [14] In terms of ”Effortlessness” i.e., easiness of the user interaction, both the groups fall on the higher scale However, for the component of ”Gut Feeling” i.e., unconsciously reaching the goal, both groups performed poorly on the scale and needs improvement 215 Fig Attrakdiff’s word-pairs 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) users’ acceptability of such an application The study provides the foundation for incorporating NLP and NLU techniques and building a Conversational AI with more functionality to test in physical settings Furthermore, the Conversational AI application can be extended for the visually impaired population and can be of significant value for hands-free communication in ATMs during the covid-19 pandemic VII ACKNOWLEDGEMENT We would like to thank Mr Ananya Kumar Bhuyan, Senior Manager, Canara Bank, IIT Guwahati Branch, for his kind support and advice on issues related to Banking We extend our gratitude to the participants involved in the experiment Further, we would like to thank Dr Bruce Balentine for instilling curiosity in us on Voice-Based technology and for his valuable comments and advice R EFERENCES Fig INTUI Scores be attributed to the speech-based user interface and humanlike accuracy of natural language understanding However, participants from ”group A” rated the prototype slightly higher than ”group B” for most dimensions On the other hand, participants from group B rated the prototype higher than participants from group A for other dimensions such as verbalizability and intuition These differences can be due to the nature of dialog prompts and the background color of the different prototypes There are some limitations in our study which is worth mentioning The experiment was conducted remotely using video conferencing and not in a natural environment A study in a natural setting e.g an ATM would have brought different challenges and issues such as noise and security However, the study is relevant for web-based and mobilebased applications and can be extended to test in a physical environment Moreover, the study analyzes the usability of a Conversational AI prototype that completes a well-defined task This can be problematic for some users accustomed to personal assistants such as Amazon Echo Dot, which supports free-form conversation VI C ONCLUSION The result of this study shows that Conversational AI is well accepted by users for the banking application domain The study participants performed the task of transferring money to a person using voice commands to two different versions of a Conversational AI They found the Conversational AI prototype as usable, attractive and intuitive Therefore it is evident that users will accept such Conversational AI applications in the coming years This study informs the research community that investment in Conversational AI applications for banking applications can be fruitful We can infer from the study that human-like natural language understanding is the key to the acceptability of such Conversational AI applications These findings are significant as they provided us with evidence about [1] N M Radziwill, and M C Benton, ”Evaluating Quality of Chatbots and Intelligent Conversational Agents,” arXiv preprint arXiv:1704.04579, April 2017 [2] M Solomon,”If Chatbots Win, Customers Lose, Says Zappos Customer Service Expert,” Forbes, March, 2017 [3] J Austerjost, M Porr, N Riedel N, et al., ”Introducing a Virtual Assistant to the Lab: A Voice User Interface for the Intuitive Control of Laboratory Instruments,” SLAS TECHNOLOGY: Translating Life Sciences Innovation 2018;23(5):476-482 doi:10.1177/2472630318788040 [4] O Metatla, A Oldfield, T Ahmed, A Vafeas, and S Miglani,”Voice User Interfaces in Schools: Co-designing for Inclusion with VisuallyImpaired and Sighted Pupils,” Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA, Paper 378, 1–15, 2019 [5] I Samuel, F A Ogunkeye, A Olajube and A Awelewa, ”Development of a Voice Chatbot for Payment Using Amazon Lex Service with Eyowo as the Payment Platform,” 2020 International Conference on Decision Aid Sciences and Application (DASA), 2020, pp 104-108, doi: 10.1109/DASA51403.2020.9317214 [6] R Kaur, R S Sandhu, A Gera, T Kaur, P Gera,”Intelligent Voice Bots for Digital Banking”,Smart Systems and IoT: Innovations in Computing Smart Innovation, Systems and Technologies, vol 141 Springer, Singapore,2020 [7] C Tymoszek, S S Arora, K Wagner, A K Jain, ”DashCam Pay:A System for In-vehicle Payments Using Face and Voice,” arXiv preprint arXiv:2004.03756, Sep 2020 [8] F Portet, M Vacher, C Golanski, R Camille, M Brigitte,”Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects,”Pers Ubiquit Comput 17, 127–144,2013 [9] J T Browne,”Wizard of Oz Prototyping for Machine Learning Experiences,”Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA, Paper LBW2621, 1–6,2019 [10] D.P Tedesco and T.S Tullis,”A comparison of methods for eliciting post-task subjective ratings in usability testing,”In Usability Professionals Association Annual Conference, 2006 [11] M Hassenzahl,”The interplay of beauty, goodness, and usability in interactive products,”Human-Computer Interaction, 19:319–349, 2004 [12] M Hassenzahl,”AttrakDiff(tm),” Internet Resource http://www.attrakdiff.de., 2014 [13] J Brooke,”SUS-A quick and dirty usability scale,”Usability evaluation in industry, 189:194–200, 1996 [14] D Ullrich and S Diefenbach,”INTUI Exploring the Facets of Intuitive Interaction,” Mensch & computer, 10:251, 2010 [15] PavelDev,”Randomizer,” [16] J Sauro,”Comparison of three one-question, post-task usability questionnaires,”In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1599–1608, 2009 [17] J R Lewis and J Sauro,”The factor structure of the system usability scale,”In Human Centered Design LNCS Vol 5619, 2009 216 ... evaluate the Conversational AI prototype was based on pre and post-questionnaires administered to the participants The pre-questionnaire collected the participants’ demographic data and information... investigated two different styles of dialog prompts and found that the users highly appreciated the Conversational AI prototypes This paper attempts to assess Conversational AI for banking applications... that investment in Conversational AI applications for banking applications can be fruitful We can infer from the study that human-like natural language understanding is the key to the acceptability