https://dl.acm.org/doi/10.1145/2654822.2541967
Abstract
- ML
- pervasive in a broad range of domains,
- in a broad range of systems (embedded to data centers)
- At the same time,
- a small set of ml algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs)
- are proving to be sota across many applications.
- As architectures evolve towards
- heterogeneous multi-cores
- composed of a mix of cores and accelerators,
- a ml accelerator can achieve
- the rare combination of efficiency (due to the small number of target algorithms)
- and broad application scope.
下一段
- Until now, most ml accelerator designs
- focused on
- efficiently implementing the computational part of the algorithms.
- However,recent sota CNNs and DNNs
- are characterized by their large size.
这么大咋办?
- design an accelerator
- for large-scale CNNs and DNNs,
- emphasis on the impact of memory on accelerator design, performance and energy.
第三段
- possible to design an accelerator with
- a high throughput,
- capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions)
- in a small footprint of 3.02
m
m
2
mm^2
mm2 and 485 mW;
- compared to a 128-bit 2GHz SIMD processor,
- 117.87x faster,
- reduce the total energy by 21.08x.
- The accelerator characteristics are obtained
- Such a high throughput in a small footprint
- can
- open up the usage of sota ml algorithms
- in a broad set of systems and
- for a broad set of applications.
1. Introduction
- architectures evolve towards
- heterogeneous multi-cores
- composed of a mix of cores and accelerators,
- designing accelerators
- which realize the best possible tradeoff
- between flexibility and efficiency
- is becoming a prominent issue.
第二段
- The first question
- for which category of applications
- one should primarily design accelerators ?
- Together with the architecture towards
- a second simultaneous and significant trend
- in high-performance and embedded applications is developing:
- many of the emerging high-performance
- and embedded applications,
- from image/video/audio recognition to automatic translation, business analytics, and all forms of robotics rely on ml techniques.
- This trend even starts to percolate in our community
- half of the benchmarks of PARSEC [2],
- a suite partly introduced to highlight the emergence of new types of applications,
- can be implemented using machine-learning algorithms [4].
PARSEC里的一半都可以用ml实现!
- a third and equally remarkable trend in ml
- where a small number of techniques,
- based on nn (especially CNN [27] and DNN
[16]),
proved in the past few years to be state-of-the-art across a broad range of applications [25].
- a unique opportunity to
- design accelerators
- which can realize the best of both worlds:
- significant application scope together with
- high performance and efficiency
- due to the limited number of target algorithms.