This advertorial is sponsored by Intel®

Introduction

Most commercial learning applications today use 2-bits of floating point (ƒp32) for and workloads. Various researchers have demonstrated that both deep learning and inference can be performed with lower precision, using 16-bit multipliers for training and 8-bit multipliers or fewer for inference with minimal to no loss in accuracy (higher precision – 16-bits vs. 8-bits – is usually needed during training to accurately represent the gradients during the backpropagation phase). Using these lower precisions (training with 16-bit multipliers accumulated to 32-bits or more and inference with 8-bit multipliers accumulated to 32-bits) will likely become the standard over the next year, in particular for convolutional networks (CNNs).

There are two main benefits of lower precision. First, many operations are memory bandwidth bound, and reducing precision would allow for better usage of cache and reduction of bandwidth bottlenecks. Thus, can be moved faster through the memory hierarchy to maximize compute resources. Second, the hardware may enable higher operations per second (OPS) at lower precision as these multipliers require less silicon area and power.

In this article, we review the history of lower numerical precision training and inference and describe how …

Read More on Datafloq



Source link
Bigdata and data center
thanks you RSS link
( https://datafloq.com/read/lower-numerical-precision-deep-learning-inference/4791)

LEAVE A REPLY

Please enter your comment!
Please enter your name here