Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 103 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
103
Dung lượng
2,37 MB
Nội dung
Improving communication efficiency of Federated Learning systems Dinh Thi Quynh Khanh Department of Computer Vision, MICA Hanoi University of Science and Technology Supervisor Assoc Prof Le Thi Lan In partial fulfillment of the requirements for the degree of Master of Computer Science August 4, 2023 Acknowledgements First of all, I would like to express my sincere gratitude towards my advisors Assoc Professor Le Thi Lan and Assoc Professor Tran Thi Thanh Hai I still remember the first day I came to the office to see them to start my master study and now that I look back to the day Professor Le Thi Lan assigned me to study the exciting topic of Federated Learning, I realize what a wonderful journey has it been, going from zero knowledge in the area to building a tool that can help research community and practitioners I’ve learned from my advisors many valuable lessons to become a better problem solver I realized that I need to value all of my (failed) attempts, imperfect initial works and see them as a starting point My advisors have taught me that failures are just temporary and instead of being discouraged by them, use them as a feedback to polish my work and hence, move myself forward Thanks to them, I see the beauty of research and finally decided to pursuit a PhD in the field; but more importantly, I truly know who I want to become and how I could contribute to our community with my works I would also like to thank every member at the Comvis laboratory, I did have so much joy and memories there, they have always made me felt welcomed I could not ask for better fellows Finally, I would love to thank my family for their support and encouragement throughout my years of study and through the process of researching and writing this thesis This accomplishment would not have been possible without them From the bottom of my heart, thank you! This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-20-1-4053 and Hanoi University of Science and Technology (HUST) under project number T2021-SAHEP003 Abstract In the recent years, Computer Vision has observed a revolution in different tasks such as object detection, object classification, action recognition Much of this success is based on collecting vast amounts of data, often in a privacy-invasive manner Federated Learning (FL) is a new discipline of machine learning that allows training models with the data being decentralized, i.e instead of sharing data, participants collaboratively train a model by only sending weight updates to a server Therefore, FL can help to protect data privacy and utilize the computing power from the huge number of endpoint devices This motivates us to investigate the use of FL in computer vision tasks However, the SOTA methods for computer vision tasks mainly base on deep learning models that require to train the models with a huge number of parameters Therefore, training models for computer visions tasks in a federated manner faces several obstacles Firstly, FL often yields a significant amount of communication overhead than centralized learning, since the model parameters are required to be exchanged between clients and central server during the training process Secondly, the study of FL for computer vision tasks requires running experiments with complicated programs, as FL is more sophisticated than conventional training in terms of implementation, making the field sometimes exclusive to researchers, engineers with decent engineering skills only especially for human action recognition tasks (HAR) Therefore, in this thesis, we attempt to push forward the development of FLy for computer vision by solving these two challenges To mitigate the first issue, we propose a model weight compression and encoding during model uploading for Federated Averaging (FedAvg) - a widely used algorithm in FL Our weight compression is inspired by Sparse Ternary Compression algorithm with a modification to be applicable to FedAvg We also utilize compressed weights’ characteristics to encode them hence the communication cost can be reduced The proposed method has been evaluated on one task of computer vision that is image classification Experimental results on MNIST dataset demonstrate that our method is able to reduce the communication cost without considerably worsening the model accuracy Regarding the second challenge, although recently some FL frameworks have been developed to facilitate the study and application of FL in some specific tasks, unfortunately, those frameworks are limited to testing deep models on object classification task Therefore, we introduce a novel framework, called FlowerAction, to study FL-based human action recognition from video data To the best of our knowledge, this is the first FL framework for video-based action recognition FlowerAction is built upon the Flower framework, it incorporates various techniques, including data loading, data partitioning, widely used network architecture for HAR (e.g SlowFast, I3D, R3D), and FL aggregations (e.g FevAvg, FedBN, FedPNS, and STC) First, we present the main components of FlowerAction, which are developed on top of the existing Flower framework to interface with the external components (data loaders, deep models) and internal ones of Flower (model communication and aggregation algorithms) We then demonstrate the effectiveness of deep models and FL algorithms in recognizing human actions using benchmark datasets (e.g HMDB51 and EgoGesture) for both simulation and real distributed training Our experimental results show that the FlowerAction framework operates properly, which helps researchers to approach FL with less effort and conduct federated training benchmarks quickly Furthermore, our analysis results that show performances in terms of top-k accuracy and communication cost of different FL algorithms could give instructive suggestions for the selection and deployment of a deep model for FL-based action recognition in the future Contents List of Acronymtypes xii Introduction 1.1 Motivation 1.2 Problem definition 1.3 Contributions 1.4 Outline of the thesis Related works 11 2.1 Communication-efficient Federated Learning 2.2 Video-based human action recognition (HAR) and federated learning for 2.3 2.4 11 HAR 13 2.2.1 Deep learning models for HAR 14 2.2.2 Federated learning for HAR 14 Federated learning frameworks 15 2.3.1 LEAF 16 2.3.2 FedVision 16 2.3.3 Flower 17 2.3.4 FedCV 18 Conclusions 20 Communication cost reduction using sparse ternary compression and encoding for FedAvg 21 vi CONTENTS 3.1 Proposed method 21 3.1.1 Layer-wise sparse ternary compression 23 3.1.2 Model weight encoding 24 3.2 Experiments 26 3.3 Conclusion 31 FlowerAction - A Federated Learning framework for Human Action Recognition 33 4.1 Proposed framework 34 4.1.1 Framework overview 34 4.1.2 Workflow of FlowerAction framework 35 4.1.3 FL algorithms 42 4.1.3.1 FedAvg (Federated Averaging) 43 4.1.3.2 FedBN (Federated BatchNorm) 44 4.1.3.3 FedPNS (Federated Probabilistic Node Selection) 44 4.1.3.4 STC (Sparse Ternary Compression) 47 Deep learning models for human action recognition 48 4.1.4.1 C3D 49 4.1.4.2 R3D 50 4.1.4.3 Inflated 3D ConvNet - I3D 50 4.1.4.4 SlowFast 50 Data pipeline 51 4.1.5.1 Data pre-processing 51 4.1.5.2 Data partitioning 51 Characteristics of FlowerAction 52 Experiments 57 4.2.1 Human action recognition datasets 57 4.2.2 Implementation and setup 61 4.2.3 Experimental results 64 Conclusion 73 4.1.4 4.1.5 4.1.6 4.2 4.3 vii CONTENTS Conclusions 74 5.1 Summary of achievement and limitations 74 5.2 Future works 75 References 90 viii List of Figures 1.1 Next word prediction application in mobile phones [1] 2.1 The overall architecture of FedVision [2] 17 2.2 The overall architecture of Flower [3] 18 2.3 Experiments on edge devices are supported by Flower [3] 19 3.1 Overview of the proposed method (1) Server sends the global model to clients; (2) Clients update the model with their local data; (3) Clients apply the proposed weight compression and encoding techniques; (4) Clients send local models to the server (5) Server updates the global model 22 3.2 Samples from MNIST handwritten digit dataset [4] 25 3.3 Network architecture used in our experiments for MNIST classification 26 3.4 Accuracy at different values of compression factor 27 3.5 Relationship between communication cost and compression factor 27 3.6 Confusion matrix at p = 73% with non-IID data distribution 28 3.7 Confusion matrix at p = 78% with IID data distribution 29 4.1 FlowerAction’s main flow 36 4.2 FlowerAction’s architecture Notes that dashed blocks indicates workin-progress 37 4.3 Flower’s template 40 4.4 FlowerAction’s data pipeline 51 4.5 IID data partition in FlowerAction 52 ix Chapter Conclusions 5.1 Summary of achievement and limitations In this thesis, we have attempted to solve obstacles when applying FL to HAR We first investigated in reducing the communication overhead of FedAvg, a widely used FL algorithm We followed sparsification approach and applied the compression to model weights, instead of gradients, as we believed weights compression can lead to round trip communication efficient algorithm Our proposed method was a modification of STC algorithm where we adapted it so it can work on model weights The method was evaluated on MNIST dataset and promising results were obtained, i.e the communication cost was reduced without a significant loss of accuracy However, we have yet to achieve round trip compression and our algorithm should be tested on several datasets and different kind of deep network architectures As we continued to study communication efficient algorithm and apply FL to HAR, we realized that we first needed a way to measure the communication cost to evaluate the effectiveness of our (future) ideas and innovation, but we could not find an explicit measurement in the literature, and we also found that to accelerate our experiments, we need to develop a high quality codebase and an unified benchmark Therefore, we built a framework that integrates FL into HAR, our framework was implemented on top of Flower, an user-friendly FL framework that is currently popular among FL research community Our framework supports two types of FL computing paradigms: (1) distributed computing, where real communication happens between server and clients and 74 5.2 Future works (2) standalone simulation, which is suitable for benchmarking model convergence under different system scenarios Our frameworks also provides utilities tailored for HAR problems such as video processing pipeline, various model architectures, etc as well as FL algorithms, ranging from standard ones like FedAvg, to communication efficient algorithms like STC Furthermore, it can be an useful toolbox for researchers working in improving communication overhead of FL as it offers a empirical way to measure this cost However, we still need to refactor our framework so it is more readable and easier to extend One conference paper has been accepted and one journal paper is in revision • Thi Quynh Khanh Dinh, Thanh-Hai Tran, Thi-Lan Le, Communication cost reduction using sparse ternary compression and encoding for FedAvg, 2021 International Conference on Information and Communication Technology Convergence (ICTC), 2021 • Thi Quynh Khanh Dinh, Thanh-Hai Tran, Trung-Kien Tran, Thi-Lan Le, FlowerAction: A federated deep learning framework for video-based human action recognition, IEEE Access (revised) Moreover, code is available at: https://github.com/quynhkhanh96/flower-action 5.2 Future works As we know have a working toolbox to aid our research, we will continue to improve both the framework itself and use this to experiment with more communication efficient FL algorithms as well as own our ideas We will explore quantization approach by applying them on HAR problems and try to improve upon them Our goal is to build algorithms that reduce the communication cost in both way and still preserve model convergence As we are aware that this is just a starting point for a toolbox aimed to aid research community in the road of bringing federated action recognition to its fullest extend, we will be continuously upgrading our framework and all contributions from 75 5.2 Future works the research and open-source community are welcomed There are a lot of room for improvement in our framework: first, we plan to add more HAR models as several new, exciting architectures that achieve state-of-the-art results are developed recently, for examples the ViViT [85] and TimeSformer [86] Many other FL algorithms can also be implemented in our framework, as we have yet to try computational efficient algorithms, we set it as a priority to implement and benchmark computational efficient FL such as PruneFL [24], FedQNN [87], HeteroFL [88], and ProgFed [89] in our development roadmap Next, we will conduct further benchmarks on other challenging video datasets and experiments on edge devices, once the computational bottleneck is solved by the aforementioned computational efficient algorithms 76 References [1] A Hard, K Rao, R Mathews, S Ramaswamy, F Beaufays, S Augenstein, H Eichner, C Kiddon, and D Ramage, “Federated learning for mobile keyboard prediction,” arXiv preprint arXiv:1811.03604, 2018 ix, 2, [2] Y Liu, A Huang, Y Luo, H Huang, Y Liu, Y Chen, L Feng, T Chen, H Yu, and Q Yang, “Fedvision: An online visual object detection platform powered by federated learning,” ArXiv, vol abs/2001.06202, 2020 ix, 17 [3] D J Beutel, T Topal, A Mathur, X Qiu, T Parcollet, P P de Gusm˜ao, and N D Lane, “Flower: A friendly federated learning research framework,” arXiv preprint arXiv:2007.14390, 2020 ix, 7, 17, 18, 19, 32, 34, 35 [4] L Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol 29, no 6, pp 141–142, 2012 ix, 25 [5] H Kuehne, H Jhuang, E Garrote, T Poggio, and T Serre, “Hmdb: a large video database for human motion recognition,” in 2011 International conference on computer vision, pp 2556–2563, IEEE, 2011 x, 58 [6] Y Zhang, C Cao, J Cheng, and H Lu, “Egogesture: A new dataset and benchmark for egocentric hand gesture recognition,” IEEE Transactions on Multimedia, vol 20, no 5, pp 1038–1050, 2018 x, 57, 59, 60, 61, 64 [7] C Zhang, Y Xie, H Bai, B Yu, W Li, and Y Gao, “A survey on federated learning,” Knowledge-Based Systems, vol 216, p 106775, 2021 77 REFERENCES [8] B McMahan, E Moore, D Ramage, S Hampson, and B A y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (A Singh and J Zhu, eds.), vol 54 of Proceedings of Machine Learning Research, pp 1273–1282, PMLR, 20–22 Apr 2017 [9] D Anguita, A Ghio, L Oneto, X Parra, J L Reyes-Ortiz, et al., “A public domain dataset for human activity recognition using smartphones.,” in Esann, vol 3, p 3, 2013 [10] L Huang, Y Yin, Z Fu, S Zhang, H Deng, and D Liu, “Loadaboost: Loss-based adaboost federated machine learning with reduced computational complexity on iid and non-iid intensive care data,” Plos one, vol 15, no 4, p e0230706, 2020 [11] A Pantelopoulos and N G Bourbakis, “A survey on wearable sensor-based systems for health monitoring and prognosis,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol 40, no 1, pp 1–12, 2009 2, [12] S Ramaswamy, R Mathews, K Rao, and F Beaufays, “Federated learning for emoji prediction in a mobile keyboard,” arXiv preprint arXiv:1906.04329, 2019 [13] S Samarakoon, M Bennis, W Saad, and M Debbah, “Federated learning for ultra-reliable low-latency v2v communications,” in 2018 IEEE Global Communications Conference (GLOBECOM), pp 1–7, IEEE, 2018 [14] M Zhang, K Sapra, S Fidler, S Yeung, and J M Alvarez, “Personalized federated learning with first order model optimization,” arXiv preprint arXiv:2012.08565, 2020 [15] Z Ai, G Wu, X Wan, Z Qi, and Y Wang, “Towards better personalization: A meta-learning approach for federated recommender systems,” in Knowledge 78 REFERENCES Science, Engineering and Management: 15th International Conference, KSEM 2022, Singapore, August 6–8, 2022, Proceedings, Part II, pp 520–533, Springer, 2022 [16] K Gupta, M Fournarakis, M Reisser, C Louizos, and M Nagel, “Quantization robust federated learning for efficient inference on heterogeneous devices,” arXiv preprint arXiv:2206.10844, 2022 [17] I Ergă un, H U Sami, and B Gă uler, “Communication-efficient secure aggregation for federated learning,” in GLOBECOM 2022-2022 IEEE Global Communications Conference, pp 3881–3886, IEEE, 2022 [18] L Wang, R Jia, and D Song, “D2p-fed: Differentially private federated learning with efficient communication,” arXiv preprint arXiv:2006.13039, 2020 [19] A Malekijoo, M J Fadaeieslam, H Malekijou, M Homayounfar, F AlizadehShabdiz, and R Rawassizadeh, “Fedzip: A compression framework for communication-efficient federated learning,” arXiv preprint arXiv:2102.01593, 2021 [20] K Wei, J Li, M Ding, C Ma, H H Yang, F Farokhi, S Jin, T Q Quek, and H V Poor, “Federated learning with differential privacy: Algorithms and performance analysis,” IEEE Transactions on Information Forensics and Security, vol 15, pp 3454–3469, 2020 [21] K Bonawitz, V Ivanov, B Kreuter, A Marcedone, H B McMahan, S Patel, D Ramage, A Segal, and K Seth, “Practical secure aggregation for federated learning on user-held data,” arXiv preprint arXiv:1611.04482, 2016 [22] A Madi, O Stan, A Mayoue, A Grivet-S´ebert, C Gouy-Pailler, and R Sirdey, “A secure federated learning framework using homomorphic encryption and verifiable computing,” in 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), pp 1–8, IEEE, 2021 79 REFERENCES [23] Z Wang and Q Hu, “Blockchain-based federated learning: A comprehensive survey,” arXiv preprint arXiv:2110.02182, 2021 [24] Y Jiang, S Wang, V Valls, B J Ko, W.-H Lee, K K Leung, and L Tassiulas, “Model pruning enables efficient federated learning on edge devices,” IEEE Transactions on Neural Networks and Learning Systems, 2022 6, 76 [25] H Seo, J Park, S Oh, M Bennis, and S.-L Kim, “16 federated knowledge distillation,” Machine Learning and Wireless Communications, p 457, 2022 [26] P Pareek and A Thakkar, “A survey on video-based human action recognition: recent updates, datasets, challenges, and applications,” Artificial Intelligence Review, vol 54, pp 2259–2322, 2021 [27] L Smaira, J Carreira, E Noland, E Clancy, A Wu, and A Zisserman, “A short note on the kinetics-700-2020 human action dataset,” arXiv preprint arXiv:2010.10864, 2020 [28] J Liu, A Shahroudy, M Perez, G Wang, L.-Y Duan, and A C Kot, “Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding,” IEEE transactions on pattern analysis and machine intelligence, vol 42, no 10, pp 2684–2701, 2019 [29] K Gedamu, Y Ji, Y Yang, L Gao, and H T Shen, “Arbitrary-view human action recognition via novel-view action generation,” Pattern Recognition, vol 118, p 108043, 2021 [30] H.-N Tran, H.-Q Nguyen, H.-G Doan, T.-H Tran, T.-L Le, and H Vu, “Pairwise-covariance multi-view discriminant analysis for robust cross-view human action recognition,” IEEE Access, vol 9, pp 76097–76111, 2021 [31] G Bertasius, H Wang, and L Torresani, “Is space-time attention all you need for video understanding?,” in ICML, vol 2, p 4, 2021 6, 14 80 REFERENCES [32] X Song, S Zhao, J Yang, H Yue, P Xu, R Hu, and H Chai, “Spatio-temporal contrastive domain adaptation for action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9787– 9795, 2021 [33] S Majumder and N Kehtarnavaz, “Vision and inertial sensing fusion for human action recognition: A review,” IEEE Sensors Journal, vol 21, no 3, pp 2454– 2467, 2020 [34] S Caldas, S M K Duddu, P Wu, T Li, J Koneˇcn` y, H B McMahan, V Smith, and A Talwalkar, “Leaf: A benchmark for federated settings,” arXiv preprint arXiv:1812.01097, 2018 7, 16 [35] C He, S Li, J So, X Zeng, M Zhang, H Wang, X Wang, P Vepakomma, A Singh, H Qiu, et al., “Fedml: A research library and benchmark for federated machine learning,” arXiv preprint arXiv:2007.13518, 2020 7, 18 [36] Y Deng, T Han, and N Ansari, “Fedvision: Federated video analytics with edge computing,” IEEE Open Journal of the Computer Society, vol 1, pp 62–72, 2020 [37] T Q Khanh Dinh, T.-H Tran, and T.-L Le, “Communication cost reduction using sparse ternary compression and encoding for fedavg,” in 2021 International Conference on Information and Communication Technology Convergence (ICTC), pp 351–356, 2021 [38] C Li, D Niu, B Jiang, X Zuo, and J Yang, “Meta-har: Federated representation learning for human activity recognition,” in Proceedings of the Web Conference 2021, pp 912–922, 2021 [39] P Jain, S Goenka, S Bagchi, B Banerjee, and S Chaterji, “Federated action recognition on heterogeneous embedded devices,” CoRR, vol abs/2107.12147, 2021 81 REFERENCES [40] Z Xiao, X Xu, H Xing, F Song, X Wang, and B Zhao, “A federated learning system with enhanced feature extraction for human activity recognition,” Knowledge-Based Systems, vol 229, p 107338, 2021 [41] S Ek, F Portet, P Lalanda, and G E Vega Baez, “Evaluating Federated Learning for human activity recognition,” in Workshop AI for Internet of Things, in conjunction with IJCAI-PRICAI 2020, (Yokohama, Japan), Jan 2021 [42] B McMahan, E Moore, D Ramage, S Hampson, and B A y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, pp 1273–1282, PMLR, 2017 8, 9, 12, 18, 21, 23, 26 [43] D Alistarh, D Grubic, J Li, R Tomioka, and M Vojnovic, “Qsgd: Communication-efficient sgd via gradient quantization and encoding,” Advances in Neural Information Processing Systems, vol 30, pp 1709–1720, 2017 12 [44] S Han, H Mao, and W J Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015 12 [45] W Wen, C Xu, F Yan, C Wu, Y Wang, Y Chen, and H Li, “Terngrad: Ternary gradients to reduce communication in distributed deep learning,” arXiv preprint arXiv:1705.07878, 2017 12 [46] F Seide, H Fu, J Droppo, G Li, and D Yu, “1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014 12 [47] F Sattler, S Wiedemann, K.-R Mă uller, and W Samek, “Robust and communication-efficient federated learning from non-iid data,” IEEE transactions on neural networks and learning systems, vol 31, no 9, pp 3400–3413, 2019 12, 23, 24, 25, 42, 47, 56, 57, 68 82 REFERENCES [48] A Mora, L Foschini, and P Bellavista, “Structured sparse ternary compression for convolutional layers in federated learning,” in 2022 IEEE 95th Vehicular Technology Conference:(VTC2022-Spring), pp 1–5, IEEE, 2022 13 [49] S Hu, J Goetz, K Malik, H Zhan, Z Liu, and Y Liu, “Fedsynth: Gradient compression via synthetic data in federated learning,” arXiv preprint arXiv:2204.01273, 2022 13 [50] R Dorfman, S Vargaftik, Y Ben-Itzhak, and K Y Levy, “Docofl: Downlink compression for cross-device federated learning,” 2023 13 [51] H H Pham, L Khoudour, A Crouzil, P Zegers, and S A Velastin, “Videobased human action recognition using deep learning: a review,” arXiv preprint arXiv:2208.03775, 2022 14 [52] D Tran, H Wang, L Torresani, J Ray, Y LeCun, and M Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6450–6459, 2018 14, 64 [53] M Sandler, A Howard, M Zhu, A Zhmoginov, and L.-C Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520, 2018 14 [54] A Ullah, J Ahmad, K Muhammad, M Sajjad, and S W Baik, “Action recognition in video sequences using deep bi-directional lstm with cnn features,” IEEE access, vol 6, pp 1155–1166, 2017 14 [55] K Simonyan and A Zisserman, “Two-stream convolutional networks for action recognition in videos,” Advances in neural information processing systems, vol 27, 2014 14 [56] V.-M Khong and T.-H Tran, “Improving human action recognition with twostream 3d convolutional neural network,” in 2018 1st international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6, IEEE, 2018 14 83 REFERENCES [57] C Feichtenhofer, H Fan, J Malik, and K He, “Slowfast networks for video recognition,” in Proceedings of the IEEE/CVF international conference on computer vision, pp 6202–6211, 2019 14, 48, 50 [58] P M Grulich and F Nawab, “Collaborative edge and cloud neural networks for real-time video processing,” Proceedings of the VLDB Endowment, vol 11, no 12, pp 2046–2049, 2018 14 [59] H Zhang, J Bosch, and H H Olsson, “End-to-end federated learning for autonomous driving vehicles,” in 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–8, IEEE, 2021 14 [60] A Raza, K P Tran, L Koehl, S Li, X Zeng, and K Benzaidi, “Lightweight transformer in federated setting for human activity recognition,” arXiv preprint arXiv:2110.00244, 2021 14 [61] Y A U Rehman, Y Gao, J Shen, P P B de Gusm˜ao, and N Lane, “Federated self-supervised learning for video understanding,” in European Conference on Computer Vision, pp 506–522, Springer, 2022 14 [62] K Doshi and Y Yilmaz, “Federated learning-based driver activity recognition for edge devices,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3338–3346, 2022 14 [63] Y Deng, T Han, and N Ansari, “Fedvision: Federated video analytics with edge computing,” IEEE Open Journal of the Computer Society, vol 1, pp 62–72, 2020 15 [64] Y Liu, A Huang, Y Luo, H Huang, Y Liu, Y Chen, L Feng, T Chen, H Yu, and Q Yang, “Fedvision: An online visual object detection platform powered by federated learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 13172–13179, 2020 16 84 REFERENCES [65] T Li, A K Sahu, M Zaheer, M Sanjabi, A Talwalkar, and V Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine Learning and Systems, vol 2, pp 429–450, 2020 18 [66] T Li, M Sanjabi, A Beirami, and V Smith, “Fair resource allocation in federated learning,” arXiv preprint arXiv:1905.10497, 2019 18 [67] S Reddi, Z Charles, M Zaheer, Z Garrett, K Rush, J Koneˇcn` y, S Kumar, and H B McMahan, “Adaptive federated optimization,” arXiv preprint arXiv:2003.00295, 2020 18 [68] Gerald Combs, “Wireshark.” 32 [69] Fred Baumgarten, Bernd Eckenfels, Brian Micek, “Netstat.” 32 [70] T Ryffel, A Trask, M Dahl, B Wagner, J Mancuso, D Rueckert, and J Passerat-Palmbach, “A generic framework for privacy preserving deep learning,” arXiv preprint arXiv:1811.04017, 2018 32 [71] H B McMahan, E Moore, D Ramage, and B A y Arcas, “Federated learning of deep networks using model averaging,” arXiv preprint arXiv:1602.05629, vol 2, 2016 42, 43 [72] X Li, M Jiang, X Zhang, M Kamp, and Q Dou, “Fed{bn}: Federated learning on non-{iid} features via local batch normalization,” in International Conference on Learning Representations, 2021 42, 44 [73] H Wu and P Wang, “Node selection toward faster convergence for federated learning on non-iid data,” IEEE Transactions on Network Science and Engineering, 2022 42, 44 [74] S Golomb, “Run-length encodings (corresp.),” IEEE transactions on information theory, vol 12, no 3, pp 399–401, 1966 47 85 REFERENCES [75] K Hara, H Kataoka, and Y Satoh, “Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6546–6555, 2018 48, 50 [76] J Carreira and A Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6299–6308, 2017 48, 50 [77] D Tran, L Bourdev, R Fergus, L Torresani, and M Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp 4489–4497, 2015 49, 64 [78] M Yurochkin, M Agarwal, S Ghosh, K Greenewald, N Hoang, and Y Khazaeni, “Bayesian nonparametric federated learning of neural networks,” in International conference on machine learning, pp 7252–7261, PMLR, 2019 52 [79] T Nishio and R Yonetani, “Client selection for federated learning with heterogeneous resources in mobile edge,” in ICC 2019 - 2019 IEEE International Conference on Communications (ICC), pp 1–7, 2019 56 [80] Y J Cho, J Wang, and G Joshi, “Client selection in federated learning: Convergence analysis and power-of-choice selection strategies,” arXiv preprint arXiv:2010.01243, 2020 56 [81] M Asad, A Moustafa, T Ito, and M Aslam, “Evaluating the communication efficiency in federated learning algorithms,” 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp 552– 557, 2020 57 [82] H Kuehne, H Jhuang, E Garrote, T Poggio, and T Serre, “HMDB: a large video database for human motion recognition,” in Proceedings of the International Conference on Computer Vision (ICCV), 2011 57, 58, 59 86 REFERENCES [83] C Feichtenhofer, A Pinz, and A Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941, 2016 64 [84] A Piergiovanni and M S Ryoo, “Representation flow for action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9945–9953, 2019 64 [85] A Arnab, M Dehghani, G Heigold, C Sun, M Luˇci´c, and C Schmid, “Vivit: A video vision transformer,” 2021 76 [86] G Bertasius, H Wang, and L Torresani, “Is space-time attention all you need for video understanding?,” in Proceedings of the International Conference on Machine Learning (ICML), July 2021 76 [87] Y Ji and L Chen, “Fedqnn: a computation-communication efficient federated learning framework for iot with low-bitwidth neural network quantization,” IEEE Internet of Things Journal, 2022 76 [88] E Diao, J Ding, and V Tarokh, “Heterofl: Computation and communication efficient federated learning for heterogeneous clients,” arXiv preprint arXiv:2010.01264, 2020 76 [89] H.-P Wang, S Stich, Y He, and M Fritz, “Progfed: effective, communication, and computation efficient federated learning by progressive training,” in International Conference on Machine Learning, pp 23034–23054, PMLR, 2022 76 [90] M Huang, H Qian, Y Han, and W Xiang, “R(2+1)d-based two-stream cnn for human activities recognition in videos,” in 2021 40th Chinese Control Conference (CCC), pp 7932–7937, 2021 [91] K Simonyan and A Zisserman, “Very deep convolutional networks for large-scale image recognition,” pp 1–14, Computational and Biological Learning Society, 2015 87 REFERENCES [92] D R Beddiar, B Nini, M Sabokrou, and A Hadid, “Vision-based human activity recognition: a survey,” Multimedia Tools and Applications, vol 79, no 41, pp 30509–30555, 2020 [93] K Doshi and Y Yilmaz, “Federated learning-based driver activity recognition for edge devices,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 3337–3345, 2022 [94] B Zhang, J Wang, J Fu, and J Xia, “Driver action recognition using federated learning,” in 2021 the 7th International Conference on Communication and Information Processing (ICCIP), ICCIP 2021, (New York, NY, USA), p 74–77, Association for Computing Machinery, 2022 [95] X Ouyang, Z Xie, J Zhou, J Huang, and G Xing, “Clusterfl: A similarityaware federated learning system for human activity recognition,” in Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’21, (New York, NY, USA), p 54–66, Association for Computing Machinery, 2021 [96] L Tu, X Ouyang, J Zhou, Y He, and G Xing, “Feddl: Federated learning via dynamic layer sharing for human activity recognition,” in Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, SenSys ’21, (New York, NY, USA), p 15–28, Association for Computing Machinery, 2021 [97] K Sozinov, V Vlassov, and S Girdzijauskas, “Human activity recognition using federated learning,” in 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pp 1103– 1111, IEEE, 2018 [98] P Jain, S Goenka, S Bagchi, B Banerjee, and S Chaterji, “Feder- 88