ỦY BAN NHÂN DÂN THÀNH PHỐ HỒ CHÍ MINH SỞ KHOA HỌC VÀ CÔNG NGHỆ ĐẠI HỌC QUỐC GIA TP HCM THÀNH PHỐ HỒ CHÍ MINH TRƢỜNG ĐẠI HỌC KHOA HỌC TỰ NHIÊN CHƢƠNG TRÌNH KHOA HỌC VÀ CƠNG NGHỆ CẤP THÀNH PHỐ BÁO CÁO TỔNG HỢP KẾT QUẢ NHIỆM VỤ NGHIÊN CỨU KHOA HỌC VÀ CÔNG NGHỆ THIẾT KẾ VI MẠCH DÙNG SoC FPGA CHO CÁC ỨNG DỤNG IoT CĨ TÍNH BẢO MẬT CAO Cơ quan chủ trì nhiệm vụ: Trƣờng Đại học Khoa học Tự nhiên, ĐHQG - HCM Chủ nhiệm nhiệm vụ: TS Huỳnh Hữu Thuận Thành phố Hồ Chí Minh - 2021 ỦY BAN NHÂN DÂN THÀNH PHỐ HỒ CHÍ MINH SỞ KHOA HỌC VÀ CƠNG NGHỆ ĐẠI HỌC QUỐC GIA TP HCM THÀNH PHỐ HỒ CHÍ MINH TRƢỜNG ĐẠI HỌC KHOA HỌC TỰ NHIÊN CHƢƠNG TRÌNH KHOA HỌC VÀ CƠNG NGHỆ CẤP THÀNH PHỐ BÁO CÁO TỔNG HỢP KẾT QUẢ NHIỆM VỤ NGHIÊN CỨU KHOA HỌC VÀ CÔNG NGHỆ THIẾT KẾ VI MẠCH DÙNG SoC FPGA CHO CÁC ỨNG DỤNG IoT CĨ TÍNH BẢO MẬT CAO (Đã chỉnh sửa theo kết luận Hội đồng nghiệm thu ngày 25 tháng năm 2021) Chủ nhiệm nhiệm vụ: Huỳnh Hữu Thuận Cơ quan chủ trì nhiệm vụ Thành phố Hồ Chí Minh- 2021 TRƢỜNG ĐẠI HỌC KHOA CỘNG HOÀ XÃ HỘI CHỦ NGHĨA VIỆT NAM HỌC TỰ NHIÊN ĐHQG-HCM Độc lập - Tự - Hạnh phúc Tp HCM, ngày 02 tháng 10 năm 2021 BÁO CÁO THỐNG KÊ KẾT QUẢ THỰC HIỆN NHIỆM VỤ NGHIÊN CỨU KH&CN I THÔNG TIN CHUNG Tên nhiệm vụ: Thiết kế vi mạch dùng SoC FPGA cho ứng dụng IoT có tính bảo mật cao Thuộc: - Chƣơng trình/ lĩnh vực: Điện - Điện tử Công nghệ thông tin/ Lĩnh vực Kỹ thuật Công nghệ Chủ nhiệm nhiệm vụ: Họ tên: Huỳnh Hữu Thuận Ngày, tháng, năm sinh: 09/12/1975 Nam/ Nữ: Nam Học hàm, học vị: Tiến sĩ Chức vụ: Trƣởng Khoa Điện tử Chức danh khoa học: Viễn thông Điện thoại: Tổ chức: 38356464 Fax: 38350096 Nhà riêng: - Mobile: 0908128458 E-mail: hhthuan@hcmus.edu.vn Tên tổ chức công tác: Trƣờng Đại học Khoa học Tự nhiên, ĐHQG-HCM Địa tổ chức: 227 Nguyễn Văn Cừ, Quận 5, Tp HCM Địa nhà riêng: 809/13 Trần Hƣng Đạo, P1, Q5 Tổ chức chủ trì nhiệm vụ: Tên tổ chức chủ trì nhiệm vụ: Trƣờng Đại học Khoa học Tự nhiên, ĐHQGHCM Điện thoại: 38353193 Fax: 38350096 E-mail: Website: www.hcmus.edu.vn Địa chỉ: 227 Nguyễn Văn Cừ, Quận 5, Tp HCM I Họ tên thủ trƣởng tổ chức: Trần Lê Quan Số tài khoản: 3713.0.1056908.00000 Kho bạc: Kho Bạc Nhà nƣớc Quận 5, TP HCM Tên quan chủ quản đề tài: Sở Khoa học Cơng nghệ Tp HCM II TÌNH HÌNH THỰC HIỆN Thời gian thực nhiệm vụ: - Theo Hợp đồng ký kết: từ tháng 12 năm 2018 đến tháng 12 năm 2020 - Thực tế thực hiện: từ tháng 12 năm 2018 đến tháng năm 2021 - Đƣợc gia hạn (nếu có): - Lần từ tháng 12 năm 2018 đến tháng năm 2021 Kinh phí sử dụng kinh phí: a) Tổng số kinh phí thực hiện: 1.790.000.000 đồng, đó: + Kính phí hỗ trợ từ ngân sách khoa học: 1.790.000.000 đồng + Kinh phí từ nguồn khác: đồng b) Tình hình cấp sử dụng kinh phí từ nguồn ngân sách khoa học: Đơn vị tính: triệu đồng Thực tế đạt Theo kế hoạch Số Ghi Thời gian Kinh phí Thời gian Kinh phí (Số đề nghị (Tháng, năm) (Tr.đ) (Tháng, năm) (Tr.đ) toán) 2018 895 12/2018 895 2019 716 12/2019 716 TT Tổng cộng 1.611 1.611 c) Kết sử dụng kinh phí theo khoản chi: Đối với đề tài: Đơn vị tính: đồng Số TT khoản chi Trả công lao Thực tế đạt Theo kế hoạch Nội dung Tổng NSKH 1.562.568.500 1.562.568.500 Nguồn khác Tổng 1.562.568.500 NSKH Nguồn khác 1.562.568.500 II động (khoa học, phổ thông) Nguyên, vật liệu, lƣợng 0 0 0 Thiết bị, máy móc 0 0 0 Xây dựng, sửa chữa nhỏ 0 0 0 Chi khác 48.431.500 48.431.500 48.431.500 48.431.500 Tổng cộng 1.611.000.000 1.611.000.000 - Lý thay đổi (nếu có): khơng có Đối với dự án: Đơn vị tính: Triệu đồng Số Nội dung TT khoản chi Theo kế hoạch Tổng NSKH Nguồn khác Thực tế đạt Tổng Thiết bị, máy móc mua 0 Nhà xƣởng xây dựng mới, cải tạo 0 Kinh phí hỗ trợ cơng nghệ 0 Chi phí lao động 0 Nguyên vật liệu, lƣợng 0 Thuê thiết bị, nhà xƣởng 0 Khác 0 0 Tổng cộng NSKH Nguồn khác III - Lý thay đổi (nếu có): Các văn hành trình thực đề tài/dự án: (Liệt kê định, văn quan quản lý từ cơng đoạn xét duyệt, phê duyệt kinh phí, hợp đồng, điều chỉnh (thời gian, nội dung, kinh phí thực có); văn tổ chức chủ trì nhiệm vụ (đơn, kiến nghị điều chỉnh có) Số TT Số, thời gian ban hành văn Tên văn 1436/QĐ-SKHCN, Quyết định việc phê duyệt ngày 27/12/2018 nhiệm vụ nghiên cứu khoa học công nghệ 45/2018/HĐQKHCN ngày 27/12/2018 Hợp đồng thực nhiệm vụ nghiên cứu khoa học công nghệ 15/2020/PLHĐQPTKHCN ngày 29/5/2020 Phụ lục hợp đồng thực nhiệm vụ nghiên cứu khoa học công nghệ Ghi Tổ chức phối hợp thực nhiệm vụ: Không Cá nhân tham gia thực nhiệm vụ: Họ tên Chuyên môn Đơn vị công tác TS Huỳnh Hữu Thuận Điện Tử Khoa ĐTVT TS Bùi Trọng Tú Điện Tử Khoa ĐTVT TS Nguyễn Đình Thúc CNTT Khoa CNTT ThS Bùi An Đông Điện Tử Khoa ĐTVT ThS Đỗ Quốc Minh Đăng Điện Tử Khoa ĐTVT ThS Huỳnh Quốc Thịnh Điện Tử Khoa ĐTVT ThS Trần Tuấn Kiệt Điện Tử Khoa ĐTVT ThS Nguyễn Quốc Khoa Điện Tử Khoa ĐTVT ThS Nguyễn Quang Anh Điện Tử Khoa ĐTVT STT IV 10 ThS Hoàng Anh Tuấn Điện Tử Khoa ĐTVT 11 ThS Nguyễn Phúc Vinh Điện Tử Cty AMCC 12 ThS Cao Trần Bảo Thƣơng Điện Tử Khoa ĐTVT - Lý thay đổi ( có): Tình hình hợp tác quốc tế: Khơng Tình hình tổ chức hội thảo, hội nghị: Khơng Tóm tắt nội dung, công việc chủ yếu: (Nêu mục 15 thuyết minh, không bao gồm: Hội thảo khoa học, điều tra khảo sát nước nước ngoài) Thời gian Số TT Các nội dung, công việc chủ yếu (Các mốc đánh giá chủ yếu) (Bắt đầu, kết thúc - tháng … năm) Theo kế hoạch Thực tế đạt đƣợc Người, quan thực Nội dung 1: Nghiên cứu quy 4/2019 trình phát triển vi mạch dùng SoC FPGA 4/2019 Huỳnh Hữu Thuận, Bùi Trọng Tú, Đỗ Quốc Minh Đăng, Huỳnh Quốc Thịnh, ĐH KHTN Nội dung 2: Nghiên cứu thực 7/2019 lõi mã hóa/giải mã liệu AES SOC FPGA 7/2019 Đỗ Quốc Minh Đăng, Cao Trần Bảo Thƣơng, Trần Tuấn Kiệt, Nguyễn Đình Thúc, Huỳnh Hữu Thuận, Bùi Trọng Tú, ĐH KHTN ,3 Nội dung 3: Nghiên cứu thực 7/2019 lõi mã hóa/giải mã liệu RSA SoC FPGA 7/2019 Đỗ Quốc Minh Đăng, Cao Trần Bảo Thƣơng, Trần Tuấn Kiệt, V Nguyễn Đình Thúc, Huỳnh Hữu Thuận, Bùi Trọng Tú, ĐH KHTN Nội dung 4: Nghiên cứu thực 8/2019 lõi mã hóa/giải mã liệu DSA (SHA2) SoC FPGA 8/2019 Đỗ Quốc Minh Đăng, Cao Trần Bảo Thƣơng, Trần Tuấn Kiệt, Nguyễn Đình Thúc, Huỳnh Hữu Thuận, Bùi Trọng Tú, ĐH KHTN Nội dung 5: Truy xuất lõi 11/2019 mã hóa/giải mã liệu từ ARM core 11/2019 Đỗ Quốc Minh Đăng, Cao Trần Bảo Thƣơng, Trần Tuấn Kiệt, Nguyễn Đình Thúc, Huỳnh Hữu Thuận, ĐH KHTN Nội dung 6: Nghiên cứu quy 12/2019 trình thiết kế MCU dùng lõi ARM 12/2019 Huỳnh Quốc Thịnh, Đỗ Quốc Minh Đăng, Cao Trần Bảo Thƣơng, Trần Tuấn Kiệt, Huỳnh Hữu Thuận, Nguyễn Phúc Vinh (AMCC), ĐH KHTN Nội dung 7: Các lõi giao tiếp 12/2019 SPI, I2C, UART, Parallel input/output 12/2019 Nguyễn Quốc Khoa, Nguyễn Quang Anh, Hoàng Anh Tuấn, Trần Tuấn Kiệt, Huỳnh Hữu Thuận, Huỳnh Quốc Thịnh, Đỗ Quốc Minh Đăng, ĐH KHTN VI Nội dung 8: Giao tiếp IoT 3/2020 platform với Wifi 3/2020 Nguyễn Quốc Khoa, Nguyễn Quang Anh, Hoàng Anh Tuấn, Trần Tuấn Kiệt, Huỳnh Hữu Thuận, Huỳnh Quốc Thịnh, ĐH KHTN Nội dung 9: Giao tiếp IoT 4/2020 platform với Lora 4/2020 Bùi Trọng Tú, Nguyễn Quốc Khoa, Nguyễn Quang Anh, Hoàng Anh Tuấn, Trần Tuấn Kiệt, Huỳnh Hữu Thuận, Huỳnh Quốc Thịnh, ĐH KHTN 10 Nội dung 10: Giao tiếp 5/2020 IoT platform với BLE 5/2020 Bùi Trọng Tú, Nguyễn Quốc Khoa, Nguyễn Quang Anh, Hoàng Anh Tuấn, Trần Tuấn Kiệt, Huỳnh Hữu Thuận, Huỳnh Quốc Thịnh, ĐH KHTN 11 Nội dung 11: Giao tiếp với 6/2020 Sensor hệ thống báo cháy dùng BLE Mesh 6/2020 Lê Đức Trị, Bùi Trọng Tú, Nguyễn Quốc Khoa, Nguyễn Quang Anh, Hoàng Anh Tuấn, Trần Tuấn Kiệt, Huỳnh Hữu Thuận, Huỳnh Quốc Thịnh, ĐH KHTN VII 12 Nội dung 12: Tích hợp hệ thống 7/2020 7/2020 Đỗ Quốc Minh Đăng, Lê Đức Trị, Trần Tuấn Kiệt, Huỳnh Hữu Thuận, Huỳnh Quốc Thịnh, ĐH KHTN 13 Nội dung 13: Tích hợp hệ thống 11/2020 đầy đủ, thử nghiệm, đánh giá 5/2021 Bùi Trọng Tú, Đỗ Quốc Minh Đăng, Huỳnh Hữu Thuận, Huỳnh Quốc Thịnh, Lê Đức Trị, ĐH KHTN - Lý thay đổi (nếu có): III SẢN PHẨM KH&CN CỦA NHIỆM VỤ Sản phẩm KH&CN tạo ra: a) Sản phẩm Dạng I: Số TT Tên sản phẩm tiêu chất lượng chủ yếu Sản phẩm mẫu: Một IoT platform sử dụng SoC FPGA hỗ trợ mã hóa/ giải mã mật mã với chuẩn giao tiếp IoT gồm Lora, Wifi, BLE giao tiếp ngoại vi SPI, I2C, UART,GPIO Đơn Số lượng vị đo Thực tế Theo kế hoạch đạt Lora: Lora: End node 868 Mhz, 434 Mhz (tƣơng tự dùng tần số khác) End node 868 Mhz, 434 Mhz (tƣơng tự dùng tần số khác) Wifi: 100Mbps BLE: Tƣơng thích 4.0 trở lên BLE: Tƣơng thích 4.0 trở lên Wifi: 100Mbps Các mã hóa mật Các mã hóa mã: Hoạt động từ mật mã: Hoạt 150 - 300 Mbps, động từ 150 RSA 1024 bit, 300 Mbps, DSA tƣơng ứng RSA 1024 bit, theo lõi RSA, DSA tƣơng ứng AES SHA-2, VIII 2020 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom) Security Enhancement for IoT Systems Based on SoC FPGA Platforms Huu-Thuan Huynh1,2,3 , Tuan-Kiet Tran1,2,3 , Tan-Phat Dang1,2,3 , and Trong-Tu Bui1,2,3 Faculty of Electronics and Telecommunications University of Science, Ho Chi Minh City, Vietnam Vietnam National University, Ho Chi Minh City, Vietnam Emails: (hhthuan, ttkiet, dtphat, bttu)@fetel.hcmus.edu.vn Abstract—Security is an important issue in the era of IoT, where a large number of devices are connected In this paper, we propose a design to enhance the security of the IoT connection using SoC FPGA platforms The design is a successful combination of the powerful processing and system management capabilities of ARM processors and the flexibility and customizability of FPGA technology Particularly, we have developed a Digital Signature cryptosystem on a DE10-Standard SoC FPGA board using the built-in ARM processor core and two self-developed IP cores acting as 1024-bit RSA and 256-bit SHA co-processors Furthermore, we also applied the DMA technique to achieve high-speed data transfer As a result, the proposed cryptosystem is compact but achieves high performance even with low frequencies In more detail, the DMA operating at the frequency of 150 MHz can achieve a speed of 1200 Mbps Also, the 1024-bit RSA core and the 256-bit SHA operating at the frequencies of 50 MHz and 100 MHz have throughputs of 25 Kbps and above 700 Mbps, respectively Index Terms—Digital Signature, RSA 1024, SHA 256, DMA, SoC FPGA, IoT I I NTRODUCTION In 2012, the International Telecommunications Union Organization defined IoT as a global infrastructure for social information that allows everything to connect and provide advanced services based on existing interactive information and communication technologies Specifically, the IoT devices connect and share information to provide advanced services, which can improve quality of life and provide greater insight into the business, such as smart cities, smart security and emergencies, smart agriculture and animal farming [1] However, the messages of IoT devices are exchanged over the worldwide publicly accessible network, so they become vulnerable to possible security breaches in the form of hacking, phishing, etc Furthermore, most IoT devices are physical devices such as sensors, terminator, and gateway devices, which not smart enough to protect themselves Therefore, it is necessary to have a specific mechanism to protect them from security risks Security is one of the most important issues of IoT, but the security of IoT systems and solutions for these issues have not been paid enough attention Mendez et al [2] showed that the number of scientific publications on IoT security was much lower than the research on new applications and technologies In 2018, there have been numerous announcements related to IoT security challenges such as using secure vault, which is 978-1-7281-6866-1/20/$31.00 ©2020 IEEE multi-key based mutual authentication mechanism, according to the work of Trusit Shah et al [3], or requirements for the confidentiality and completeness of data [4], and protect real-time data from sensors of IoT devices by blockchain technology [5] From the survey on security research on IoT systems, we realized that there are not many scientific publications in the field of cryptography algorithms based and accelerating them by dedicated hardware, so this is a very important area to be researched Especially, when researching based on cryptography algorithms in other to improve security in IoT systems not only brings a familiar approach, which based on the transmission protocol standards is still using these cryptography algorithms, but also opens the direction of the field of supporting hardware design, where the SoC FPGA platforms is an inevitable trend as today Moreover, the results of these research projects based on the field of hardware design and integration can bring real products to serve in the revolution of Industry 4.0 general and IoT field as well as system design, fabrication, and synthesis in particular Furthermore, when using SoC FPGA to enhance the security of the IoT systems, the proposed system can adapt and able to expand to implement the Edge Computing [6], which also is a trend of research SoC FPGA has been applied for many purposes such as the work of Liu, Yuan, et al [7], they made an IoT prototyped device which protected by an FPGA encrypt bitstream and encrypt system boot image based on FPGA Besides, Al-Asli, et al [8] offered a secure way to establish a symmetric session between IoT devices and cloud by FPGA-based symmetric re-encryption scheme Furthermore, Hong et al [9] suggested a neural accelerator based on FPGA, which adapted with small IoT devices with limited resources Lastly, this paper presents a prototype system that performs the Digital Signature scheme to enhance the security of IoT systems, which based on the SoC FPGA platforms Separately, in this paper, we expose the methods to enhance the computation speed of the RSA and SHA-2 algorithms, and then implemented them to the FPGA Additionally, we further present architecture about the connections between the FPGA cores, HPS, and SDRAM to obtain high-speed data accessing through the DMA technique Especially, section II presents briefly the Digital Signature scheme, which uses a public key algorithm and a hash function to protect data and authenticate 35 2020 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom) cores and SDRAM memory, which is shared memory between HPS and FPGA Besides, the FPGA side has two IPs core, which used for the specific purpose are asymmetric encryption and hash function, which are RSA 1024 bits and the SHA-2 256 bits hash function Moreover, we have applied the DMA technique with these IPs to get high throughput Finally, we adopted this system as a co-processor and played as a Digital Signature cryptosystem for verifying data in the IoT systems De10-Standard board Driver - Application Linux SoC PFGA ARM DMA FIFO RSA 1024 SDRAM DMA FIFO SHA 256 HPS B RSA 1024 bits IP core FPGA Figure 1: System architecture users or devices Section III describes the architecture of our proposed system, the implementation of RSA 1024 bits, and the SHA 256 bits algorithms on FPGA by using Verilog HDL Furthermore, experimental results about the system’s throughput, maximum operating frequencies, and the utilize of hardware resources are discussed in section IV Finally, Section V shows our conclusion II D IGITAL SIGNATURE CRYPTOSYSTEM A digital signature is a set of parameters used for authenticating the identity of the signatory and integrity of the signed data In the digital signature scheme, a private key is used in the generation of the signature, and the corresponding public key is used to verify that signature, which corresponds to the private key but is not the same Each user or device possesses a pair of the private key and public key Public keys are shared with other users However, the private keys are the secret keys, and only the owner of the private key can perform signature generation Also, users, who have the sender’s public key, can verify the signature called signature verification The digital signature scheme utilizes RSA 1024 and SHA 256 algorithms to generate and verify the signatures In this scheme, SHA 256, which is the hash function, is used to create a message digest from the input messages At the sender side, the RSA 1024 algorithm signs this message digest by encrypting with the sender’s private key to generate the signature, which is attached to the message and send together to the receiver At the receiver side, the digest message (D) is first produced from the received message using the likewise hash function And then, the RSA 1024 decrypts the received signature to obtain the received digest message (D’) If this digest (D’) matches the digest (D), then the signature is authentic, and the message is accepted; otherwise, it is rejected III P ROPOSED SYSTEM A System architecture Figure shows our proposed system architecture In particular, it consists of two main parts, the hard processor system (HPS) and the FPGA On the HPS, it includes the ARM 978-1-7281-6866-1/20/$31.00 ©2020 IEEE 1) Montgomery’s algorithm: The RSA algorithm, which is a best-known public-key cryptosystem based on the difficulty of the factorization of large integers, is used in information systems for providing confidentiality and authenticity The encryption (digital signature generation) and decryption (digital signature verification) operations of the RSA algorithm are performed by computing following Equation and Equation 2, respectively C = M e mod n (1) M = C d mod n (2) where C is the ciphertext and M is the plaintext The modulus n, private exponent d, and public exponent e together form a pair of a public key (e, n) and private key (d, n), which are calculated based on algorithm in [10] Due to calculations with these extremely large numbers, the modular exponential operations of RSA 1024 bits cannot be calculated using common techniques Montgomery’s algorithm [11] is a fast and effective method to calculate modular multiplications Instead of comptuting z = a.b mod n, this algorithm calculates Z = M onP ro(A, B, n) = A.B.r−1 mod n, which A = a.r mod n and B = b.r mod n To speed up the calculation of modular exponentials, we used the binary method for computing the power operation based on the work [12] Also, we replaced the exponentiation operation with a series of square and multiplication operations modulo n Let j be the number of bits in exponent b The following exponentiation algorithm is one way to compute z = ab mod n The Algorithm presents the applying of MonPro to compute the modular exponential, and we called it as PowerMonMod function Algorithm : z = PowerMonMod(a, b, n) Input: a, b, n Output: z = ab mod n 1: r = 1024,A = a.r mod n, Z = 1.r mod n 2: n0 = r − n−1 mod r 3: length = GetLength(b) 4: for j = legnth − downto 5: Z = M onP ro(Z, Z) 6: if (GetBit(b, j) == 1) then 7: Z = M onP ro(Z, A) 8: end if 9: end for 10: return z = M onP ro(1, Z) 36 2020 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom) A MonPro_Enable RegA MonProMux Mod2048_Enable_1 Mod2048_Load_1 Z1 1024'b1 Mod2048_DataValid_1 Mod2048_1 A Mod2048_Z1_1 1X 01 00 MonPro_Z Mod2048_DataValid_1 iLoad MonPro_Load GetBit_Enable {1024'b1, 1024'b0} RegBuffB_Enable iClk Mod2048_A1_1 Sub_Enable PowModMon_Control RegBuffN0_Enable ModMonInverse_Load N0 MonPro iClk N MonPro_DataValid iClk RegN Mod2048_2 Mod2048_Enable_2 Mod2048_Enable_1 RegZ1 Mod2048_Z1_2 Mod2048_Enable_2 Mod2048_Load_2 iLoad iClk Mod2048_Load_1 Z1 Mod2048_DataValid_2 ModMonInverse_Enable Mod2048_Load_2 oZ MonPro_Z MonModInverser_Datavalid MonModInverse NextStateFinish MonPro_DataValid iA oDataValid N0 RegN0 MonModInverser_N0 1024 MonPro_DataValid GetBit_Dis iClk Sub_Finish Mod2048_DataValid RegB iEnable iLoad iLoad iN iB FromN0 MonModInverser_Load1 MonMdInverse_Enable1 GetBit Sub_Finish Sub2048 GetBit_Dis GetBit_Enable RegBuffB_Enable Sub_Z Figure 3: RSA 1024 IP with DMA interfaces {1024'b1, 1024'b0} Figure 2: RSA 1024’s architecture 2) Hardware implementation: We implemented the RSA 1024 bits as an IP core based on Montgomery’s algorithm This core performs the signature generation operations when executing Equation 1, and performs signature verification operations when executing Equation 2, which depends on the value of input data The main function of the RSA 1024 IP core is to calculate the modular exponentials with 1024-bit numbers is oZ = iAiB mod iN , where iA is input data, iB is public key or private key, iN is modulus number, and oZ is output result Additionally, our purpose is to achieve highspeed calculation when executing the RSA 1024 operations, so we used 1024-bit numbers as the inputs of this module, and the padding scheme, which used to create 1024-bit numbers, is no mentioned in this paper Figure shows the architecture of RSA 1024 IP core with primary inputs, outputs, and detailed structure inside Specifically, fives modules consist of RegA, RegB, RegN, RegN0, and RegZ1 are used to store temporary values and handle logical shift operation these values, which to execute multiply or divide functions Besides, the Mod2048 module computes modular with 2048 bits number, and the MonModInverse module is responsible for calculating the modular inverse number Also, other modules such as the Getbit module is adopted to get the bitlength of value B, and Sub2048 compute the modular subtraction correspondingly Essentially, the MonPro submodule is the main factor, which is employed based on Montgomery’s algorithm to speed up the process of the modular exponentials Algorithm presents the detail implementation of modular exponential based on the Montgomery’s algorithm in FPGA Besides that, we also integrated the RSA 1024 IP core with the DMA technique to read and write data between it and SDRAM directly to achieve high-speed data transfer Figure shows the connections between RSA core and three interfaces for DMA, which are Master Read, Master Write, and Control/Status interface The Master Read interface is used for reading data from SDRAM and write to FIFO to give the input 978-1-7281-6866-1/20/$31.00 ©2020 IEEE values for the RSA core Similarly, the Master Write read data from output FIFO and finally write them to SDRAM through the bus The last interface is the Control/Status interface, which is a slave module utilized for configuration and checking status of the RSA core Finally, the entire subsystem, which consists of RSA core and three interfaces, is named RSA Controller Also, the architecture of the connecting model between HPS, SDRAM, and RSA Controller is shown in Figure C SHA-2 256 bits IP core 1) Algorithm overview: SHA stands for Secure Hash Algorithm SHA 256 bits is a member of SHA-2, which is a set of cryptographic hash functions (SHA 224, SHA 256, SHA 348, and SHA 512) designed by the National Security Agency (NSA) and published in 2001 by the NIST as a U.S Federal Information Processing Standard Based on the SHA 256 algorithm [13], there are two main stages, which are the Preprocessing and Hashing In the Preprocessing stage, the padding task is handled first and the purpose of this task is to ensure that the length of input messages is multiples of 512 bits After padding, the messages are parsed into N 512-bits message blocks: M , M , , M N Secondly, the scheduling and the compression task are run concomitantly in the Hashing stage In this stage, the message blocks are processed one at a time Begin with a fixed initial hash value H (0) [13], sequentially compute: H (i) = H (i−1) + CM (i) (H (i−1) ) (3) where C is the SHA 256 compression function and + means word-wise mod 232 addition, and H (N ) is the hash values of Nth message block The workflow of the Hashing stage, which including the scheduling and compression task is presented in [13] In particular, the scheduling takes the original 512-bit message block as the input and expands these 16 32-bit word W0 to W15 into 64 words W0 to W63 , and provide one word for every round of the compression function This is done according to Equation After the scheduling task has the first output data, the compression task is active In this task, the state register 37 2020 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom) (a) Scheduler Compressor Load H0 H1 H2 H3 a b c d H4 H5 e f H6 H7 g h Figure 4: SHA 256’s architecture a, b, c, d, e, f, g, and h are initialized to pre-determined 32-bit constants H0 to H7 for the first message block, and to the current intermediate hash value for the following blocks The compression function execute in 64 rounds, and then the output value of the register is added to the previous intermediate hash value using addition (denote ’+’) modulo 232 to give the new intermediate hash value (Equation 3) ( Mij if j 15 Wi = σ1 (Wj−2 ) + Wj−7 + σ0 (Wj−15 ) + Wj−16 if j > 16 (4) 2) Hardware implementation: SHA 256 bits inputs messages with the maximum length up 2128 − bits for 512 bits block size Internally, it processes these blocks based on 32bit words At the output, it processes a 256 bits hash value of message digest for SHA 256 bits The architecture of our hash logic core is shown in Figure 4, which has three submodules are Scheduler, Compressor, and Control module In the specific, we implemented the Preprocessing stage in HPS (ARM processor), which includes the padding and dividing to N message blocks Besides, we implemented the scheduling and compression on the hardware based on the FPGA and named they are the Scheduler and Compressor, respectively The structure of them are shown in the Figure 5a and 5b, respectively In the Scheduler module, we load the first 16 words 32-bit parallelly by using the iLoad signal The input messages come in the iMessages, which come from special FIFO and has the data width is 512 bits, and if the iLoad = 10 b1, data of 16 words 32-bit from W0 to W15 are assigned from the iMessage After that, the iLoad = 10 b0 and the process of the Scheduler is handled according to the Equation Also, we made the output of the Scheduler is Wi + Ki to enhance the speed of computation for the Compressor module, which following the work of Kurt K Ting et al [14] In the same approach, we implemented the iLoad for the Compressor module so that it could load the value from the intermediate hash value or constant values from H0 to H7 and write to statge registers a, b, c, d, e, f, g, and h parallelly To obtain the highest throughput, we implemented the Scheduler and Compressor in the pipeline procedure Finally, we appiled the DMA technique to enhanced the speed of read and write data between SHA 256 bits IP core and SDRAM Similar with the RSA 1024 IP core (Figure 3), the SHA 256 IP core has three interfaces for DMA, which are 978-1-7281-6866-1/20/$31.00 ©2020 IEEE + 2 Maj 1 + + + Ch + Wj + Kj (b) Compressor Figure 5: Scheduler and Compressor module Master Read, Master Write and Control/Status The Master Read is responsible to read the message blocks iMessage from the SDRAM and write to FIFO, and Master Write read the output hash digest from FIFO and write to the SDRAM Besides, HPS configures and checks the status of SHA 256 IP core through the Control/Status interface IV E XPERIMENTAL RESULTS The Digital Signature cryptosystem, which depicted in Figure 1, has been prototyped in a single Cyclone V SoC FPGA (5CSXFC6D6F31C6) chip on Intel DE10 Standard board Two IP cores, which are RSA 1024-bit and SHA 256bit, were simulated, placed and routed, and synthesized using Intel Quartus Prime Version 18.0 software Design verification was performed via simulations on ModelSim software, and the complete system was validated through the evaluation tests performed on the SoC hardware prototype The whole system based on Figure utilized nearly 54% hardware resource of Cyclone V SoC FPGA chip Particularly, the logic synthesis results of RSA 1024 and SHA 256 IP cores are 44% and 4% hardware resources of Cyclone V chip, which only include IP core and internal FIFOs, and not include the HPS processor and bus throughput(M bps) = Length(bits) ì F (M hz) ữ Counter (5) Besides, we have built the custom Linux Kernel and drivers for our proposed system, which were built by ourselves to configure the hardware system such as download binary stream to configure FPGA, initial SDRAM, configure DMA parameter Additionally, they also give the interface for Userspace to communicate with hardware through the driver of the bridges between HPS and FPGA in SoC FPGA platforms Furthermore, we also estimated the throughput of these IP cores based on the Equation Where Length is the length of the input data in units of bits, F is the operating frequency, and the Counter is the number of clock cycles of this IP needed 38 2020 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom) to complete its computation Particularly, the RSA 1024 IP core can achieve throughput is 25 Kbps when operated with frequency at 50 Mhz, and the throughput of the SHA 256 IP core is above 700 Mbps with 100 Mhz frequency Furthermore, we applied the DMA technique to reach the high-speed read and write data operation, and we obtained throughput is 1200 Mbps with 150 MHz frequency for DMA operation Table I and Table II show the comparison of our implementation results of RSA 1024 and SHA 256 IP core with known published results [15] - [22] Our RSA 1024 IP core can be operated at 50 Mhz and had a throughput of about 25 Kbps with resources utilized is nearly 18.41 K Logic Elements While previous works have higher processing throughput but operated with higher frequency, therefore our result is congruent and better than the above results in case of operating at equivalent frequency Additionally, the SHA 256 IP core can reach the throughput of over 700 Mbps with frequency at 100 MHz, and its resources utilized is 1632 Logic Elements Our SHA 256 IP core can operate at a higher frequency, and its throughput also is higher, so it is equivalent to existing studies, and it is better in some cases Technology Resource Frequency (MHz) Throughput (Kbps) Hong [15] VLSI 14 K (Gate) 300 13,1 Liu [16] FPGA 5.3 K (CLB) 317.6 150 Costa [17] CMOS 107 K (NAND) 125 121.27 Sun [18] VLSI 37.5 K (Gate) 100 10.84 Our results FPGA 18.4 K (LEs) 50 25 Table I: Comparison results for RSA 1024 IP core Technology Resource Togan [19] FPGA - Frequency (MHz) Throughput (Mbps) 80 630 Li [20] FPGA 841 (Slices) 81.6 615 Rote [21] FPGA 905 (Slices) 271 2040 He [22] FPGA 10918 (LEs) 87.08 655.66 Our results FPGA 1632 (LEs) 100 > 700 Table II: Comparison results for SHA 256 IP core V C ONCLUSIONS In this paper, we have built a proposed system to perform the Digital Signature cryptosystem based on the SoC FPGA platforms to provide the confidentiality and authenticity of data between IoT devices We completely implemented two FPGA IP cores to perform asymmetric cryptography is RSA 1024 bit, and the hash function is SHA 256 Additionally, we also built custom Linux kernel and drivers for two IP core RSA 1024 and SHA 256, and additional drivers for other elements on HPS side Through our work, we have evaluated some algorithms and techniques of the implementation cryptography algorithm on FPGA to reach high-speed processing such as Montgomery’s algorithm for RSA 1024, the customized data path of SHA 256, and the Pipeline and DMA techniques Furthermore, our FPGA SoC system has many advantages such as low cost and high performance, therefore it is suitable for gateway devices in IoT systems 978-1-7281-6866-1/20/$31.00 ©2020 IEEE ACKNOWLEDGMENT This work was supported by the Science and Technology Fund Number 45/2018/HD-QKHCN of the Ho Chi Minh City Department of Science and Technology R EFERENCES [1] Hassija, Vikas, et al ”A survey on IoT security: application areas, security threats, and solution architectures.” IEEE Access (2019): 82721-82743 [2] Mendez, et al ”Internet of things: Survey on security and privacy.” arXiv preprint arXiv:1707.01879 (2017) [3] Shah, Trusit, and S Venkatesan ”Authentication of IoT device and IoT server using secure vaults.” 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE) IEEE, 2018 [4] Sicari, et al ”A policy enforcement framework for Internet of Things applications in the smart health.” Smart Health (2017): 39-74 [5] Miller, Dennis ”Blockchain and the internet of things in the industrial sector.” IT Professional 20.3 (2018): 15-18 [6] Alrowaily, Mohammed, and Zhuo Lu ”Secure edge computing in iot systems: Review and case studies.” 2018 IEEE/ACM Symposium on Edge Computing (SEC) IEEE, 2018 [7] Liu, Yuan, et al ”Study of secure boot with a FPGA-based IoT device.” 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS) IEEE, 2017 [8] Al-Asli, Mohammed, Muhammad ES Elrabaa, and M Abu-Amara ”FPGA-Based Symmetric Re-Encryption Scheme to Secure Data Processing for Cloud-Integrated Internet of Things.” IEEE Internet of Things Journal 6.1 (2018): 446-457 [9] Hong, et al ”A FPGA-based neural accelerator for small IoT devices.” 2017 International SoC Design Conference (ISOCC) IEEE, 2017 [10] Rivest, Ronald L., Adi Shamir, and Leonard Adleman ”A method for obtaining digital signatures and public-key cryptosystems.” Communications of the ACM 21.2 (1978): 120-126 [11] Montgomery, Peter L ”Modular multiplication without trial division.” Mathematics of computation 44.170 (1985): 519-521 [12] Mukaida, Kenji, et al ”Design of high-speed and area-efficient Montgomery modular multiplier for RSA algorithm.” 2004 Symposium on VLSI Circuits Digest of Technical Papers IEEE, 2004 [13] Standard, Secure Hash ”National Institute of Standards and Technology (NIST), FIPSPublication 180-2, Aug 2002.” [14] Ting, Kurt K., et al ”An FPGA based SHA-256 processor.” International Conference on Field Programmable Logic and Applications Springer, Berlin, Heidelberg, 2002 [15] Hong, Jin-Hua, and Wen-Jie Li ”A Novel and Scalable RSA Cryptosystem Based on 32-Bit Modular Multiplier.” 2008 IEEE Computer Society Annual Symposium on VLSI IEEE, 2008 [16] Liu, Jizhong, and Jinming Dong ”Design and implementation of an efficient rsa crypto-processor.” 2010 IEEE International Conference on Progress in Informatics and Computing Vol IEEE, 2010 [17] da Costa, Caio A., et al ”A 1024 bit RSA coprocessor in CMOS.” 2013 25th International Conference on Microelectronics (ICM) IEEE, 2013 [18] Sun, Chi-Chia, et al ”VLSI design of an RSA encryption/decryption chip using systolic array based architecture.” International Journal of Electronics 103.9 (2016): 1538-1549 [19] Togan, Mihai, Adrian Floarea, and Gigi Budariu ”Design and implementation of cryptographic modules on FPGA.” Proceedings of the Applied Mathematics and Informatics (2010): 149-154 [20] Li, Chanjuan, et al ”Cost-efficient data cryptographic engine based on FPGA.” 2011 Fourth International Conference on Ubi-Media Computing IEEE, 2011 [21] Rote, Manoj D., N Vijendran, and David Selvakumar ”High performance SHA-2 core using the Round Pipelined Technique.” 2015 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) IEEE, 2015 [22] He, Zhenhao, Liji Wu, and Xiangmin Zhang ”High-speed Pipeline Design for HMAC of SHA-256 with Masking Scheme.” 2018 12th IEEE International Conference on Anti-counterfeiting, Security, and Identification (ASID) IEEE, 2018 39 LoRa Gateway Based on SoC FPGA Platforms Tan-Phat Dang1,2,3 , Tuan-Kiet Tran1,2,3 , Trong-Tu Bui1,2,3 , and Huu-Thuan Huynh1,2,3 Faculty of Electronics and Telecommunications University of Science, Ho Chi Minh City, Vietnam Vietnam National University, Ho Chi Minh City, Vietnam Emails: (dtphat, ttkiet, bttu, hhthuan)@fetel.hcmus.edu.vn Abstract—The term Internet of Things (IoT) is increasingly popular and is seen as the essential infrastructure used in many fields In particular, LoRa (Long Range) is gradually becoming the first choice for long-range, low-energy communication The LoRaWAN system implementation on the available platforms (The Things Network, ChirpStack Network Server) has certain limitations depending on the Internet In this article, we build a single channel LoRa gateway on the SoC FPGA platforms but still keep, namely, applying the power of ARM cortex A9 core on DE10-Nano and the customizability of FPGA to build general AES-128 core for security: authentication and encryption Adopt the DMA technique for AES-128 to achieve high bandwidth Specifically, DMA for the AES-128 core operating at 150 MHz frequency achieves 1200 Mbps bandwidth and take up 17% resources Besides, the bandwidth of the gateway’s handling is 115.6 Kbps with ARM cortex A9 at the clock of 800Mbps Index Terms—LoRa, LoRaWAN, AES-128, DMA, SoC FPGA, IoT I I NTRODUCTION In recent years, things have tended to connect together into a system in order to exchange data and make messages, behave in accordance with a certain environment, the term IoT is used to definitions for these In the IoT, wireless communications make up a large portion of data transmission An inevitable trend that the IoT is aiming for in wireless communication is that end devices can communicate over a wide range and have long battery life Therefore, the Low Power Wide Area Network (LPWAN) defines wireless technologies with features of long-range communication, small packets, low power, and low bandwidth And LPWANs are becoming popular in the IoT environment, as they are gradually overcoming the limitations of short-range, high-energy wireless technologies [1] In particular, LoRaWAN gradually becomes a potential candidate for long-range, low-energy and low-cost applications [2] With today’s LoRaWAN systems, the gateway plays the role of a bridge which is relaying messages from the end device to the processing center (The Things Network, ChirpStack Network Server, ) and opposite IP communication through the Internet is the connection between the gateway and processing center So the processing time depends on the Internet connection, this is not suitable for rural areas Besides, limiting the duty cycle in LoRaWAN system is also a matter of concern [3] Many gateways are deployed on a Raspberry Pi embedded computer to manage and process the data locally [4] [5] Specifically [4] uses a simple LoRa protocol (SLP) with Raspberry Pi 3B + board as a smart gateway that processes 978-0-7381-3196-2/21/$31.00 ©2021 IEEE LoRa communication and monitors information Moreover, the gateway receives and responds to the end devices by singlechannel LoRa interface through the Dragino LoRa shield Meanwhile, the end devices are implemented on Arduino Uno board and Dragino LoRa shield In the article [5], the system is built on the model of end device, gateway and cloud IoT serivces In particular, the gateway plays the role of sending and receiving LoRa messages, decoding, managing the connection of devices and uploading data to cloud services Gateway uses Raspberry Pi 3B + to communicate UART with the LoRa module This paper aims to design the simple gateway on SoC FPGA flatform because SoC FPGA can customize hardware Specifically, during receiving data at the gateway, the authentication and decryption process will take a large time, with implementing algorithm AES-128 on hardware is what is discussed in this paper However building a simple gateway, we still use LoRaWAN features about the packet format, security mechanisms (authentication and encryption), basic operational procedures like activation of an end device by Activation By Personalization (ABP), using the definition of uplink, downlink, The purpose of using these conventions of LoRaWAN is because we want to aim for a flexible gateway that can be both a simple gateway that works locally and interoperable with existing network server platforms We focus on the security part, the security part of LoRaWAN includes the authentication using AES-CMAC (AES Cipher-based Message Authentication Code) with the Network Session Key (NwkSKey), and the data encryption using AES-CTR (AES Counter) with the Application Session Key (AppSKey) These two algorithms are based on the AES-128 algorithm Many years ago, the AES algorithm was researched and implemented a lot on FPGA hardware The fly architecture is proposed for the common use of complex components, this architecture processes the encryption or decryption block in parallel with the key generation block i.e the key chain will be generated in each loop and that key chain is immediately used for encryption or decryption, and the next loop that this keychain will be input to generate the next key chain [6] [7] - [9] analyzed the compatibility of the S-Box and InvS-Box to combine them using the aggregate arithmetic in the GF field (28 ) instead of the LUT method, reducing the number of ports and resources Not only optimizing the overall architecture or entering SubBytes and InvSubBytes, but the analysis and resource reduction, a speed increase of MixColumn and NwkSKey End Device AppSKey LoRaConntection Internet Gateway End Device Internet Network Server Application Server MHDR FHDR FPORT FRAMPayload MIC Figure 2: LoRaWAN’s packet format End Device NwkSKey AppSKey Figure 1: LoRaWAN architecture InvMixColumn are also interested in the paper [10], this paper aims to simplify the architecture architecture and integration MixColumns and InvMixColumns use the same XOR ports In summary, this paper presents the LoRa system, which builds a simple gateway on a SoC FPGA platform with a single-channel LoRa module The Gateway can send and receive LoRa packets, and process and respond to the LoRaWAN packet structure Additionally, gateways receive user requests through the user interface and store them locally or online on cloud storage services SPI core is implemented on FPGA hardware to communicate with LoRa module Design AES hardware according to our proposed architecture Furthermore, we will present the architecture of the connection between SDRAM and the AES core through the DMA technique Specifically, Part II outlines LoRaWAN and the security techniques in the network Part III describes an overview of the gateway system built on the SoC FPGA, which we detail the hardware components: SPI core, AES core and application of DMA mechanism, and describe how the software works In Part IV we discuss the results achieved and Part V concludes II BACKGROUND KNOWLEDGE A LoRaWAN LoRaWAN defines the communication protocol and system architecture for the network while LoRa operates in the physical layer allowing the modulation and demodulation of the signal for long-range transmission LoRaWAN uses star topology architecture in order to save energy End devices will send the packet to the gateway and then the gateway will forward the packet to Network Server via mobile network, ethernet, (Figure 1) Network Server is in charge of managing network status The SX13xx chip, manufactured by Semtech, is used for multi-channel gateways and is suitable for LoRaWAN systems, while the SX12xx chips operate on only one channel and are often used for the end devices B Security in LoRaWAN For secure communication, LoRaWAN applies AES block encryption, this encryption technique is implemented in the entire network from the end device to the Network Server and to the Application Server LoRaWAN provides a network authentication mechanism for the packet transmitted from the end device to the Network Server using the AES-CMAC algorithm with the NwkSKey and the mechanism to encrypt data from the terminal to the Application Server using AESCTR with AppSKey The security process is carried out in two directions, uplink (data direction sent from the end deivce) and downlink (data direction is sent to the end deivce) AES-CMAC algorithm calculates the protection code for transmitted data, specifically in LoRaWAN, Message Integrity Code (MIC) is calculated from MAC Header (MHDR), Frame Header (FHDR ), Frame Port (FPORT) and Frame Payload (FRAMPayload), Figure Frame Payload data is encrypted by AES-CTR An interesting feature with AES-CTR is that the encryption and decryption process is the same, that is, it only takes the encryption process to both, and the cipher’s length is generated is not multiple number of 16, exactly equal to the actual length of the data or the length of the Frame Payload To verify data’s integrity, MIC validation is performed by recalculating the MIC from the received packet and comparing with the associated MIC 1) AES-CMAC: The algorithm AES-CMAC [11] is based on the block cipher, AES The AES-CMAC algorithm calculates the message authentication through two phases, the first phase generates two keys and the second phase uses these two keys and the primary secret key to encrypt the message Subkey generation algorithm as in algorithm input is a 128-bit secret key K, this key is simply the key for AES-128 Next, the AES-128 encryption algorithm is performed with the secret key K and the all-zero data block, and the ciphertext L is used to generate the keys K1 and K2 respectively The key K1 is generated by checking the MSB of L, if it is zero then K1 = L 1, otherwise K1 = (L 1) ⊕ 87HEX The key K2 is generated by checking the MSB of K1 , if it is K2 = K1 1, otherwise K2 = (K1 1) ⊕ 87HEX After K1 and K2 are obtained, these two keys are returned for encryption on AES-CMAC After obtaining two keys K1 and K2 , as in algorithm the algorithm calculates the number of blocks, n with each block 16-Byte in length, and the number of blocks is rounded up to get enough ciphertext for the message M Then, checks the number of blocks n, if n = then the number of blocks to be taken is and f lag is marked On the other hand, the length of M is modulo 16, if this value is then f lag is assigned equal to and vice versa by After having the value of f lag, if f lag is equal to 1, it means the last data block Mn is enough 16-Byte and Ml ast is calculated by taking the last End Device LoRaConntection Gateway +Network Server End Device End Device Figure 3: Simple LoRa gateway system data block Mn ⊕ K1 , otherwise, Ml ast = padd(Mn ) ⊕ K2 The next step initializes the variable X with a value of and i equals Perform n-1 iterations with each iteration of the job sequence Y = X ⊕ Mi and X as the result of AES-128 encryption with the private keys K and message Y are just calculated as input Finally, Y is calculated by Ml ast ⊕ X and the result T is by AES-128 encryption with key K and message Y 2) AES-CTR: AES-CTR is a mode of operation of the AES AES-CTR takes input as a block in which there is a counter field, this counter field increases gradually depending on the number of blocks, the number of blocks is a multiple of 16 Besides, AES-CTR combined with the key to encrypt, this output implements XOR operation with the plaintext and generates ciphertext The length of the ciphertext is equal to the one of the plaintext, and this length is not necessarily a multiple of 16 For decryption, AES-CTR still uses the input block and key as the encryption process and creates the encrypted data, and this data implements XOR operation with the ciphertext generated from the encryption Finally, AESCTR retrieves the original plaintext Algorithm : (K1 , K2 ) = GenerateSubKey(K) Input: K Output: K1 , K2 1: L := AES128Encrypt(K, constZero) 2: if M SB(L) = then 3: K1 = L 4: else 5: K1 = (L 1) ⊕ 87Hex 6: end if 7: if M SB(K1 ) = then 8: K2 = L 9: else 10: K2 = (L 1) ⊕ 87Hex 11: end if 12: return (K1 , K2 ) Algorithm : T = AESCMAC(K, M, len) Input: K, M, len Output: T 1: (K1 , K2 ) := GenerateSubKey(K) 2: n := ceil(len/16) 3: if n = then 4: n := 5: f lag := 6: else 7: if (len mod 16) = then 8: f lag := 9: else 10: f lag := 11: end if 12: end if 13: if f lag = then 14: Mlast := Mn ⊕ K1 15: else 16: Mlast := padd(Mn ) ⊕ K2 17: end if 18: X := 19: i := 20: for i ← to (n − 1) 21: Y := X ⊕ Mi 22: X := AES128Encrypt(K, Y ) 23: end for 24: Y := X ⊕ Mlast 25: T := AES128Encrypt(K, Y ) 26: return T III S YTEM I MPLEMENTATION A LoRa system In this study, a simple LoRa system is deployed with the gateway to receive and process packets as Figure Gateway is deployed on the DE10 Nano board Using the LoRa SX1278 module to receive data from end devices via LoRa communication The purpose of deployment on SoC FPGA board is to utilize the power of CPU ARM cortex A9 and hardware components (AES) to increase processing speed and response to end devices Gateway plays the role of data receiving, processing, storing and storing on other devices or services via the Internet With local processing and storage capabilities, it helps to reduce downlink response time to end devices B Integration system The integrated architecture is shown in the figure 4, the architecture consists of two main parts, the hard processor system (HPS) and the FPGA HPS consists of ARM core and SDRAM controller, this controller is used together for HPS and FPGA On the FPGA part, there are two IP cores, the SPI core is used for communication with the LoRa module and the AES core is used for the purpose of enhancing security, namely the security in the LoRa system in LoRaWAN’s packet ARM DMA SDRAM Controller SPI FIFO AES GPIO HPS FPGA SoC FPGA Figure 4: LoRa gateway system on SoC FPGA DataIn AES Encrypt/Decrypt DataOut RAM Secret Key Key Expansion AES Core Figure 5: AES core architecture format Besides, apply the DMA technique for the AES core to increase bandwidth C SPI core Build SPI core from logic elements on FPGA to communicate with LoRa module SPI core plays the role of master, SPI uses pins Master Out Slave In (MOSI), Master In Slave Out (MISO), SS (Slave Select), and SCLK (Serial Clock) to transmit data between De10 nano Board (SPI Master) and LoRa Module (SPI Slave) According to the datasheet of SX1278 SPI Slave of the module has modes of operation, but in this communication, single mode is applied, each read and write execution will consist of 16-bit on MOSI line, in there the first bit is read or write, is write and is read, then 7-bit address, MSB (Most significant bit) is sent first, followed by 8-bit data, if write then the 8-bit after the address is valid, if read, then the 8-bit is skipped and receives the 8-bit on the MISO In AES-128 [12] encryption, the encryption process begins by copying the input data string into a state matrix, and subsequent modifications to the data are performed on matrix of states The initial data will be XOR with the corresponding key to each byte, this procedure is called AddRoundKey, then the state matrix goes through consecutive loops, and the last loop will be slightly different from previous loops Each loop includes functional functions: AddRoundKey, SubBytes, ShiftRow, MixColumns After completing 10 loops, the AES128 encryption block generates the encrypted data (cipher text) and finishes the encryption process here To start encrypting data, key generation is performed first, the generated keys are stored in RAM in two orders, one order for encoding from the secret key to the 10th key, and the reverse one for decryption key chain (Figure 5) With generation and storage, it will be convenient to encrypt or decrypt a long data chain, especially suitable for today’s big data processing trend In AES, there are main steps: AddRoundKey, SubBytes, ShiftRow and MixColumns SubBytes and MixColumns take up the most execution time in steps, so to save time, we use ROM to store S-Box and InvS-Box transforms, we also incorporate MixColumns and inv MixColumns into this storage, instead of just saving the values of S-Box and InvS-Box, the pre-computation of the MixColumns and inv MixColumns transforms is also done For example, S-Box (0x00) = 0x63, MixColumns transform will be matrix form [02, 03, 01, 01; 01, 02, 03, 01; 01, 01, 02, 03; 03, 01, 01, 02], the corresponding stored value for S-Box (0x00) is 0xC66363A5, 0xA5C66363, 0x63A5C663, 0x6363A5C6 Then XOR the values together to complete the MixColumns E DMA DMA technique is applied to the AES core (figure 4) to limit CPU usage during memory access The DMA block includes main components: Master Read and Master Write Master Read is in charge of transferring data from the memory’s address specified in SDRAM moving to FIFO and after the FIFO has data or is not empty, the AES core takes the data and executes it The output of AES is written to FIFO of the writing branch, AES will check if FIFO is full or not, if not full, AES writes output to FIFO when it detects that FIFO has data, Master Write executes data movement in FIFO in the position specified in the SDRAM D AES core F Design software The AES-CMAC algorithm consists of two steps: key generation and encryption The key generation will perform a series of calculations: AES-128 encryption, comparison, shift, XOR bitwise operation The encryption process includes computations: AES-128, division, XOR, comparison AESCTR is a mode that works on the AES-128 platform We found that these two algorithms mainly based on AES-128 and AES-128 implementation takes more latency than other calculations, so we implemented AES-128 on hardware The gateway operation consists of two processes: Process 1: Communicate with SPI core to read and write data to LoRa module, implement AES-CMAC and AES-CTR encryption use AES core with integrated DMA technology Process 2: Post-processing and communicating with a local database or an online database This process consists of two threads, one thread takes on the job of receiving data from process 1, the other thread periodically checks configuration data changes from the user Process Pipe Thread Thread Process Figure 6: Processing scheme The bandwidth of the AES-128 process using the DMA technique is 1200 Mbps at 150 MHz In addition, the bandwidth of the gateway is calculated by the packet size divided by the processing time, in which time is measured in C code from the beginning of the data receiving process, encryption to processing data classification data (before sending to the database) The result was 115.6 Kbps with the ARM cortex A9 clocking at 800Mbps V C ONCLUSIONS Figure 7: Processing scheme The two processes communicate with each other through a pipe, figure Process will send data to process which will capture the data using pre-defined message formats IV E XPERIMENTAL RESULT The figure describes the operation of AES and DMA blocks The green frame is the DMA Read process from SDRAM to FIFO In, the read signal (oRM read) and the access address (iRM rdaddr) are sent out if the wait request signal (iRM waitrequest) is nonzero then the address and read signal continue to be sent until no longer require reading more At the same time, the returned data with data valid signal (rddatavalid) can be accepted in DMA Read The blue frame is the recording step of data received from the DMA Read recorded in the FIFO In The AES encryption/decryption process is then done The result of AES block is recorded FIFO Out (black frame) When data appears in FIFO Out, DMA Write begins the DMA process to SDRAM (gray frame) The system is built on Cyclone V SoC FPGA (5CSEBA6U23I7) chip on DE10 Nano board AES-128 cores combining DMA and SPI are implemented using Verilog code and synthesized on Intel Quartus Prime Version 18.0 These IPs were tested for functionality on the ModelSim simulation software Finally, the IP was tested on the board using SignalTap The synthesis result of AES-128 was 15%, DMA plus AES accounted for 17%, and 1% for SPI core In the C code setup, the SDRAM is initialized to store data for packet reception via the LoRa, the data is stored here and moved to the AES core and then back to the SDRAM by the DMA but at other memory’s address The LoRa Ra02 module is used for a system with Semtech’s SX1278 chip, which is configured to operate at 433 MHz, 125 kHz bandwidth, spreading factor 12, and coding rate 4/5 The bandwidth formula is shown as Equation L(bits) × F (M Hz) (1) N where T is the bandwidth, L is the data length, F is the operating frequency, N is the number of clocks that need to complete the job T = In this paper, we build the basic LoRa gateway but still use the basic LoRaWAN’s packet format and procedures We also built the AES-128 core incorporating DMA techniques and SPI on the FPGA The purpose of building AES-128 because we realize AES-CMAC and AES-CTR are both based on AES-128 In addition, security in IoT is always an essential requirement, especially AES-128 is a widely used symmetric cipher algorithm, so a powerful AES-128 core on hardware easily deploy other communications that require AES-128 encryption ACKNOWLEDGMENT We are grateful for the support provided by Science and Technology Fund Number 45/2018/HD-QKHCN of the Ho Chi Minh City Department of Science and Technology R EFERENCES [1] Raza, Usman, Parag Kulkarni, and Mahesh Sooriyabandara ”Low power wide area networks: An overview.” Communications Surveys Tutorials19.2 (2017): 855-873 [2] Lavric, Alexandru, and Adrian Ioan Petrariu ”LoRaWAN communication protocol: The new era of IoT.” 2018 International Conference on Development and Application Systems (DAS) IEEE, 2018 [3] Adelantado, Ferran, et al ”Understanding the limits of LoRaWAN.” IEEE Communications magazine 55.9 (2017): 34-40 [4] Li, D., et al ”Design of LoRaWAN Integrated Gateway System Based on Embedded Linux.” Microcontroller Embedded Systems 19.007 (2019): 10-14 [5] Eridani, Dania, Eko Didik Widianto, and Richard Dwi Olympus Augustinus ”Monitoring System in Lora Network Architecture using Smart Gateway in Simple LoRa Protocol.” 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) IEEE, 2019 [6] Opritoiu, Flavius, et al ”A high-speed AES architecture implementation.” Proceedings of the 7th ACM international conference on Computing frontiers 2010 [7] Rao, M Rajeswara, and R K Sharma ”FPGA implementation of combined S-Box and InvS-Box of AES.” 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN) IEEE, 2017 [8] Srinivas, NS Sai, and M D Akramuddin ”FPGA based hardware implementation of AES Rijndael algorithm for Encryption and Decryption.” 2016 international conference on electrical, electronics, and optimization techniques (ICEEOT) IEEE, 2016 [9] Sahoo, Oyshee Brotee, Dipak K Kole, and Hafizur Rahaman ”An optimized S-box for advanced encryption standard (AES) design.” 2012 International Conference on Advances in Computing and Communications IEEE, 2012 [10] Li, Chung-Yi, et al ”An efficient area-delay product design for mixcolumns/InvMixColumns in AES.” 2008 IEEE Computer Society Annual Symposium on VLSI IEEE, 2008 [11] Song, Junhyuk, et al The aes-cmac algorithm RFC 4493, June, 2006 [12] Standard, NIST-FIPS ”Announcing the advanced encryption standard (aes).” Federal Information Processing Standards Publication 197.1-51 (2001): 3-3