This advertorial is sponsored by Intel®

Introduction

Most commercial applications today use 32-bits of floating point (ƒp32) for and workloads. Various researchers have demonstrated that both deep learning training and can be performed with lower precision, using 16-bit multipliers for training and 8-bit multipliers or fewer for inference with minimal to no loss in accuracy (higher precision – 16-bits vs. 8-bits – is usually needed during training to accurately represent the gradients during the backpropagation phase). Using these lower precisions (training with 16-bit multipliers accumulated to 32-bits or more and inference with 8-bit multipliers accumulated to 32-bits) will likely become the standard over the next year, in particular for convolutional neural networks (CNNs).

There are two main benefits of lower precision. First, many operations are memory bandwidth bound, and reducing precision would allow for better usage of cache and reduction of bandwidth bottlenecks. Thus, can be moved faster through the memory hierarchy to maximize compute resources. Second, the hardware may higher operations per second (OPS) at lower precision as these multipliers require less silicon area and power.

In this article, we review the history of lower numerical precision training and inference and describe how …

Read More on Datafloq



Source link
Bigdata and data center
thanks you RSS link
( https://datafloq.com/read/lower-numerical-precision-deep-learning-inference/791)

LEAVE A REPLY

Please enter your comment!
Please enter your name here