Hạn chế và hướng phát triển

7 Tổng kết

7.2 Hạn chế và hướng phát triển

Trong q trình nghiên cứu, các cuộc tấn cơng đối kháng trên các mơ hình nhận diện giọng nói tiếng Việt cịn q mới mẻ, theo khảo sát của chúng tôi hiện nay chưa có một tài liệu tham khảo nào. Vì vậy, các nghiên cứu được được tham khảo trong luận văn này đều được thực hiện trên các mơ hình đối với tiếng Anh. Bên cạnh đó, các mơ hình nhận diện giọng nói tiếng Việt dùng cho việc nghiên cứu cịn nhiều hạn chế làm cho việc nghiên cứu bị giới hạn. Các hạn chế này sẽ được chúng tôi cải thiện trong các nghiên cứu sắp tới.

Ngồi các mơ hình hộp trắng nhận diện phân loại giọng nói, thì trên thực tế hiện nay các mơ hình nhận diện chuyển đổi giọng nói thành văn bản và các mơ hình nhận diện giọng nói hộp đen được sử dụng khá phổ biến. Vì vậy bài tốn có thể mở rộng để thực hiện tấn cơng đối kháng trên các mơ hình đó. Từ đó, ta có thể nghiên cứu và đề xuất một số phương pháp phịng chống các cuộc tấn cơng có thể xảy ra trong tương lai. Đây là hai hướng nghiên cứu quan trọng sẽ được chúng tôi thực hiện nghiên cứu sắp tới. Ngồi ra, chúng tơi cịn hướng đến ứng dụng quá trình tạo mẫu đối kháng như một q trình mã hóa dữ liệu.

Tài liệu tham khảo

[1] Xuejing Yuan et al. “Commandersong: a systematic approach for practical adversarial voice recognition”. In:Proceedings of the 27th USENIX Conference on Security Symposium. USENIX Association. 2018, pp. 49–64.

[2] Yuxuan Chen et al. “Devil’s whisper: A general approach for physical adversarial attacks against commercial black-box speech recognition devices”. In:

29th USENIX Security Symposium (USENIX Security 20). 2020, pp. 2667–

2684.

[3] Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. “Did you hear that? adversarial examples against automatic speech recognition”. In: arXiv preprint arXiv:1801.00554 (2018).

[4] Kevin Eykholt et al. “Robust physical-world attacks on deep learning visual classification”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 1625–1634.

[5] Yiming Li et al. “Backdoor learning: A survey”. In: arXiv preprint arXiv: 2007.08745 (2020).

[6] Ali Shafahi et al. “Poison frogs! targeted clean-label poisoning attacks on neural networks”. In: arXiv preprint arXiv:1804.00792 (2018).

[7] Martin Abadi et al. “Deep learning with differential privacy”. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 2016, pp. 308–318.

[8] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and Harnessing Adversarial Examples”. In: (2015). url: http://arxiv.org/abs/ 1412.6572.

[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep convolutional neural networks”. In: Advances in neural information processing systems 25 (2012), pp. 1097–1105.

[10] Lea Schăonherr et al. Adversarial Attacks Against Automatic Speech Recog- nition Systems via Psychoacoustic Hiding”. In: Network and Distributed System Security Symposium (NDSS). 2019.

[12] Vivek Tyagi and Christian Wellekens. “On desensitizing the Mel-Cepstrum to spurious spectral components for Robust Speech Recognition”. In: Pro- ceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Vol. 1. IEEE. 2005, pp. I–529.

[13] Thomas H Cormen et al. Introduction to algorithms. MIT press, 2009.

[14] Kehtarnavaz Nasser. Digital Signal Processing System Design: LabVIEW Based Hybrid Programming. 2008.

[15] Paul S Addison. “Wavelet transforms and the ECG: a review”. In: Physiolog- ical measurement 26.5 (2005), R155.

[16] Walid A Zgallai. Biomedical Signal Processing and Artificial Intelligence in Healthcare. Academic Press, 2020.

[17] Tsai Wei-Yu et al. “Always-on speech recognition using truenorth, a reconfig- urable, neurosynaptic processor”. In: IEEE Transactions on Computers 66.6 (2016), pp. 996–1007.

[18] Introduction to Speech Processing. https://wiki.aalto.fi/display/ITSP/

Introduction+to+Speech+Processing. Accessed: 2020-11-24.

[19] James MacQueen et al. “Some methods for classification and analysis of multivariate observations”. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. 14. Oakland, CA, USA.

1967, pp. 281–297.

[20] Dong Yu and Li Deng. Automatic Speech Recognition. Springer.

[21] How to handle the seo by Markov chains. http://www.vincenzomusumeci.

com/findability- seo/how- to- handle- seo- by- markov- chains/. Accessed: 2020-12-28.

[22] File:Recurrent neural network unfold.svg. https://commons.wikimedia.org/

wiki/File:Recurrent_neural_network_unfold.svg. Accessed: 2021-03-30. [23] Simple RNN vs GRU vs LSTM :- Difference lies in More Flexible control.

https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs-lstm- difference-lies-in-more-flexible-control-5f33e07b1e57. Accessed: 2021-03-30. [24] Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural

information processing systems 27 (2014).

[25] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014).

[26] Hadi Abdullah et al. “Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems”. In: NDSS’19. 2019, pp. 1369–1378.

[27] Bhagwandas P Lathi. Modern digital and analog communication systems.

[28] Guoming Zhang et al. “Dolphinattack: Inaudible voice commands”. In: Pro- ceedings of the 2017 ACM SIGSAC Conference on Computer and Communi- cations Security. 2017, pp. 103–117.

[29] Signal-to-noise ratio. https://en.wikipedia.org/wiki/Signal-to-noise_ratio.

Accessed: 2021-03-30.

[30] Nicolas Papernot et al. “Practical black-box attacks against machine learning”. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security. 2017, pp. 506–519.

[31] Yinpeng Dong et al. “Boosting adversarial attacks with momentum”. In:

Proceedings of the IEEE conference on computer vision and pattern recognition.

2018, pp. 9185–9193.

[32] Phan Duy Hung et al. “Vietnamese speech command recognition using recurrent neural networks”. In: Int. J. Adv. Comput. Sci. Appl.(IJACSA)

10.7 (2019).

[33] Douglas Coimbra de Andrade et al. “A neural attention model for speech command recognition”. In: arXiv preprint arXiv:1808.08929 (2018).

[34] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic opti- mization”. In: arXiv preprint arXiv:1412.6980 (2014).

Biến đổi Fourier rời rạc

Quá trình thực hiện các bộ lọc Mel-scale