2019 International Conference on Advanced Technologies for Communications (ATC) A Variable Precision Approach for Deep Neural Networks Xuan-Tuyen Tran, Duy-Anh Nguyen, Duy-Hieu Bui, Xuan-Tu Tran* SISLAB, VNU University of Engineering and Technology – 144 Xuan Thuy road, Cau Giay, Hanoi, Vietnam Corresponding author’s email: tutx@vnu.edu.vn Abstract— Deep Neural Network (DNN) architectures have been recently considered as the big breakthrough for a variety of applications Because of the high computing capabilities required, DNN has been unsuitable for various embedded applications Many works have been trying to optimize the key operations, which are multiply-and-add, in hardware for a smaller area, higher throughput, and lower power consumption One way to optimize these factors is to use the reduced bit accuracy; for examples, Google's TPU used only 8-bit integer operations for DNN inference Based on the characteristics of different layers in DNN, further bit accuracy can be changed to preserve the hardware area, power consumption, and throughput In this work, the thesis investigates a hardware implementation of multiply-and-add with variable bit precision which can be adjusted at the computation time The proposed design can calculate the sum of several products with the bit precision ranging from to 16 bits The hardware implementation results on Xilinx FPGA Virtex 707 development kit show that our design occupies smaller hardware and can run at a higher frequency of 310 MHz, while the same functionality implemented with and without DSP48 blocks can only run at a frequency of 102 MHz In addition, to demonstrate that the proposed design is applicable effectively for deep neural network architecture, the paper also integrated the new design in the MNIST network The simulation and verification results show that the proposed system can achieve the accuracy up of to 88% Keywords: Deep learning, neural network, variable-weightbit, throughput, power consumption I INTRODUCTION Deep learning using neural network architectures has exploded with fascinating and promising results With many achievements in image classification [1], speech recognition [2], or genomics [3], deep learning has been considerably surpassed other traditional algorithms based on handcrafted features The outstanding ability in feature extraction benefits deep neural network to become a promising candidate for many artificial intelligence applications Nevertheless, one of the most serious concern is the high computation complexity, which poses a real difficulty on the application of this approach While the essential requirements of high speed and energy efficiency for real-time applications have not been met in general purpose platforms like GPU or CPU, the specifically designed hardware make FPGA is the next possible solution for implementing neural network algorithms To realize the algorithm effectively on this 978-1-7281-2392-9/19/$31.00 ©2019 IEEE computational platform, various accelerator techniques have been introduced Previous works tried to reduce computational complexity and data storage by using quantization [4] or approximation [5] These works show that the DNN can be accelerated by using various bit precisions for operations in different layers of DNNs For example, Google’s TPU shows that for inference mode of DNNs, 8-bit precisions in fixed-point format to use integer arithmetic is good enough for a large number of applications However, reducing bit precisions unintendedly will affect the network accuracy with accumulated errors Therefore, this reduction should also be considered as an integrated part not only in inference mode but also in the training phase and the backpropagation steps Bit precisions in fixed-point format include a number of bits to represent the integer part and the fractional one of data in DNN and their corresponding arithmetic operations In addition, reducing the bit width (i.e the total number of bits to represent an integral part and the fractional part) will decrease the memory bandwidth and the data storage space when compared with the standard computation of 32-bit data width or 16-bit data width In the conventional approaches, where the number of bit precision is fixed, the calculation is done with a specific accuracy even when the required accuracy is lower than the actual one More specifically, individual DNNs might have different optimal bit precisions in different layers, therefore, using the worst-case numerical precision for all values in the network would lead to the waste of energy, lower performance, and higher power consumption This work implements the MAC unit (Multiplication and Accumulation) with variable-bit-precision The multiplier is eliminated by using serial processing of each bit of the weights, and the accuracy is adjustable by ending the calculation when the designed accuracy is reached This module will enable the run-time adjustment of the accuracy of DNN It can be used in both inference data-path and the backpropagation data-path to test the DNNs with different bit accuracy Subsequently, to prove that the proposed design could be applied effectively in the DNNs architecture, this work has successfully integrated it into a specific feed-forward neural network used for hand digital writing recognition The system has been evaluated using the MNIST database, and achieved the accuracy of 88% 313 2019 International Conference on Advanced Technologies for Communications (ATC) Three main contributions of the proposed computational method include: Firstly, providing the capability to enable a changeable precision depending on different requirements Secondly, allowing the parallel computation, which leads to speeding up computational time Thirdly, the proposed MAC unit is integrated the “zero-skipping” mechanism, which enables the design to become adaptive, and more effective when processing the data which have many zero values The remaining parts of this paper are organized as follows: Section II investigates the variation in precision that motivate this work The algorithm and hardware architecture implementation of the proposed computation unit is illustrated in Section III In addition, to demonstrate the application ability, the proposed module has been integrated successfully in specific neural network architecture for recognizing handwriting digitals The overall system architecture design is in the scope of Section IV After that, the simulation and implementation results are depicted in Section V This work has conducted verification experiments with numerous test cases In addition, the hardware implementation results on Xilinx FPGA Virtex 707 development kit are also reported Finally, the paper comes with the conclusion that provides some potential researches could be conducted in the future (Section VI) It is clear from Figure that AlexNet and SqueezeNet achieve nearly full accuracy with 7-bit fixed-point number, while GoogLeNet and VGG-16 need 10 bits and 11 bits respectively to achieve a reliable result In other words, the minimal accuracy loss of a given DNNs architecture corresponding with particular minimum weight bit precision varies among various networks This inconsistency in bitwidth requirements across different DNNs poses a challenge for hardware implementation Exploiting this result by calculating data with variable bit-width, will generally improve the performance, and make the system adaptable with some situations when accuracy can be tolerated On the other hand, The works from [11] and [12] show that different layers can use different bit widths to represent the data with minimum accuracy loss For example, AlexNet needs 10 bits for the first layer but it only needs bits for the eighth layer with 1% of accuracy loss The bit widths in different layers of AlexNet and GoogLeNet with its corresponding accuracy are illustrated in Figure This means that the hardware accelerators should provide different bit precisions for different layers in these networks II RESRACHES ON NUMERICAL PRECISION IN DNNS A The variability of precision requirement across and within DNNs Different DNNs can have varied fixed-point bit-width requirements for representing their data In fact, each DNN network has been trained to get its own weights to operate effectively The same fixed bit precision utilized for various DNN systems may reduce the flexibility and waste the energy Using fixed bit precision cannot exploit the precision variability among different architectures Many experiments conducted in [6] show that different networks have a different number of required weight bits The comparison between accuracy results using fixed-point representation with different bit-width shown in Figure on four neural networks: AlexNet [7] SqueezeNet [8], GoogLeNet [9], and VGG-16 [10] Figure 2: Different layers in a DNN has different optimal data width [12] B The desired accuracy depends on the specific requirement In some specific application, the required accuracy could be tolerated in order to achieve a trade-off with another factor like speed or energy consumption In such a case, it is valuable that the accuracy of DNN can be adjustable and controlled to an optimal point The desired accuracy of each layer in the DNN can be simulated and decided using software or accelerated hardware This accuracy might be adjusted not only after the training process but also during the training process which uses backpropagation With this information, the hardware accelerated DNN can be more efficient with lower hardware area, higher throughput and lower power consumption III VARIABLE BIT PRECISION APPROACH A Basic implementation The procedure for calculating the sum of the product with the variable bit precision method is illustrated in Figure Figure 1: The accuracy of different DNNs with different bit precision [11] 978-1-7281-2392-9/19/$31.00 ©2019 IEEE 314 2019 International Conference on Advanced Technologies for Communications (ATC) Partial sum calculated after each cycle clock 8bit weights, so calculation completed after clock cycle A + t0