Khóa luận tốt nghiệp: Artificial Intelligent voice assistant system in real life

Leveraging PyQt5 for the graphical interface and integrating modules for speech recognition, the system, named JARVIS, provides a tangible user interface forcontrolling and interacting w

IMPLEMENTATION AND EXPERIMENTS CANOPY

CONCLUSION AND FUTURE WORK .ô-<-<-<sxssesesssese 54

FutureWonH emrx Sa EE C e.ee.e 54

In the future, we can cosider letting the machine to learn about human emotions to give more sincere advice Through the HuggingFace application, using Transformer, APIs and tools to easily download and train state-of-the-art pretrained models, which reduces compute cost and saves time for searching Applications of NLP, Computer Vision, Audio and Multimodal are already integrated in this This app is great for students who can research machine learning applications Can recognize user emotions through text, or summarize a text, recognize objects and even support semantic segmentation, panorama and expression With the transformers dataset, installing the pytorch and tensorflow libraries, we can use the above features to

54 produce the expected results, but the installation will take quite a long time and it will take a lot of time to learn how to use them , and also takes up a lot of computer memory However, it is still worth spending to learn a new technology and apply it to this system to improve product quality and customer experience.

Results ACHiOVE hố ốc 55 CONtTIDUTIONS .ecesessssssesseessssteesssssseseeesssssesesssssssseessssseseessssseeseessseeetssssueenssssneeseesssnss 56

The results achieved in the development of the voice assistant system showcase a comprehensive range of functionalities The system excels in voice interaction, effectively understanding and responding to user voice commands With a remarkable NLP prediction accuracy of 90% based on experimental results, the system exhibits a high level of proficiency in language understanding Notably, the system adopts a dual-part flow, dividing operations into an application for computer tasks and an NLP-integrated application for interactive communication with the computer This division ensures a clear and efficient workflow, prioritizing tasks based on specific established words and seamlessly switching to intent recognition when non-task processing words are encountered.

The system demonstrates versatility in information retrieval, utilizing web scraping capabilities to provide users with information on specified topics It also excels in media control, capable of playing videos on YouTube as per user requests, and offers news reading for the latest updates In the realm of entertainment, the assistant's capabilities extend to telling jokes and sharing interesting facts Users can also access real-time weather information for a specified location, inquire about the current date and time, connect with IoT devices, and control various aspects of the system, including screen management, ad skipping, and media playback The system boasts an easy-to-use interface with only start and exit buttons, contributing to user-friendly interactions Overall, the achieved results showcase a sophisticated voice assistant system with diverse capabilities, ranging from information retrieval to entertainment and system control.

In the development of the JARVIS voice assistant application, several key contributions were made across various components The process begins with data preprocessing, involving the loading and parsing of intent data from a JSON file, as well as tokenizing and stemming words in patterns The vocabulary and tag extraction phase follow, where a vocabulary is created from tokenized and stemmed words, and unique tags are extracted from the intents Subsequently, the dataset is prepared by generating training data and corresponding labels based on the bag-of- words model, converting them into NumPy arrays.

The model architecture is defined using PyTorch, specifying input size, hidden layer size, and output size A custom dataset class is implemented for loading training data, and training configuration parameters, such as epochs, batch size, and learning rate, are set The model is then trained using PyTorch's DataLoader for efficient loading, and the final state is saved along with relevant information.

The development extends to the integration of a graphical user interface (GUD), incorporating PyQt5 modules and setting up the GUI using the JarvisUI class Voice and speech recognition functionalities are implemented using pyttsx3 for text-to- speech synthesis and speech_recognition for voice recognition The code dynamically selects the computing device (GPU or CPU) for model training and utilizes multithreading to run the voice assistant in the background while keeping the GUI responsive.

The application responds to user input by loading a pre-trained neural network model and interpreting input through the get jarvis response function A continuous main loop handles voice input, recognizing various commands related to information retrieval, media playback, weather checking, and more GUI event handling connects

GUI buttons to corresponding functions, while the application initialization sets up the PyQt5 application and main window.

Additionally, miscellaneous tasks include adjusting system recursion limits, displaying welcome messages, and responding to user queries based on recognized commands This comprehensive development approach ensures the effective functioning of the JARVIS voice assistant across its diverse features.

L.D Xu, Y Lu, L Li, Embedding Blockchain Technology into IoT for Security: a Survey, IEEE Internet Things J Early Access (2021), https://doi.org/10.1109/

C Zhang, X Xu, H Chen, Theoretical Foundations and Applications of CyberPhysical Systems, J Library Hi Tech 38 (1) (2019) 95-104

Matthew B Hoy (2018) Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants, Medical Reference Services Quarterly, 37:1, 81-88, DOI:

G Gupta, N (2013) Artificial Neural Network: Network and Complex Systems, ISSN 2224-610X

Jim Flanagan et al., in “Trends in Speech Recognition,” Wayne E Lea editor, 1980 Eddy, S.R., 1996 Hidden Markov models Curr Opin Struct Biol 6, 361-365. https://doi.org/10.1016/S0959-440X(96)80056-X

Wolters, Maria Klara, Fiona Kelly, and Jonathan Kilgour “Designing a Spoken Dialogue Interface to an Intelligent Cognitive Assistant for People with

Dementia.” Health Informatics Journal 22 no 4 (December 1, 2016): 854-866. doi:10.1177/14604582 15593329

F C N Pereira and S M Shieber, Prolog and Natural Language Analysis (Center for the Study of Language and Information, Stanford, CA, 1987) Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2019 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

McFee, Brian, Raffel, Colin, Liang, Dawen, Ellis, Daniel P.W., McVicar, Matt, Battenberg, Eric, & Nieto, Oriol (2015) Librosa: Audio and Music Signal

Analysis in Python In Kathryn Huff & James Bergstra (Eds.), Proceedings of the 14th Python in Science Conference (pp 18-24). https://doi.org/10.25080/Majora-7b98e3ed-003

Tiêu đề	Artificial Intelligent voice assistant system in real life
Tác giả	Dang Quoc An
Người hướng dẫn	Nguyen Dinh Thuan, Assoc. Prof.
Trường học	University of Information Technology
Chuyên ngành	Information Systems
Thể loại	Thesis
Năm xuất bản	2023
Thành phố	Ho Chi Minh City

Định dạng
Số trang	68
Dung lượng	44,82 MB