2014:DianNao a small-footprint high-throughput accelerator for ubiquitous machine-learning

楚俊逸
2023-12-01

  • 我是在这个网址上找到这篇文章的!

https://dl.acm.org/doi/10.1145/2654822.2541967

  • 属于

Abstract

  • ML
  • pervasive in a broad range of domains,
  • in a broad range of systems (embedded to data centers)

  • At the same time,
    • a small set of ml algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs)
  • are proving to be sota across many applications.

  • As architectures evolve towards
    • heterogeneous multi-cores
    • composed of a mix of cores and accelerators,
  • a ml accelerator can achieve
    • the rare combination of efficiency (due to the small number of target algorithms)
    • and broad application scope.

下一段

  • Until now, most ml accelerator designs
    • focused on
    • efficiently implementing the computational part of the algorithms.
  • However,recent sota CNNs and DNNs
    • are characterized by their large size.

这么大咋办?

  • design an accelerator
    • for large-scale CNNs and DNNs,
  • emphasis on the impact of memory on accelerator design, performance and energy.

第三段

  • possible to design an accelerator with
    • a high throughput,
    • capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions)
    • in a small footprint of 3.02 m m 2 mm^2 mm2 and 485 mW;
  • compared to a 128-bit 2GHz SIMD processor,
    • 117.87x faster,
    • reduce the total energy by 21.08x.

  • The accelerator characteristics are obtained
    • after layout at 65nm.
  • Such a high throughput in a small footprint
    • can
    • open up the usage of sota ml algorithms
    • in a broad set of systems and
    • for a broad set of applications.

1. Introduction

  • architectures evolve towards
    • heterogeneous multi-cores
    • composed of a mix of cores and accelerators,
  • designing accelerators
    • which realize the best possible tradeoff
    • between flexibility and efficiency
    • is becoming a prominent issue.

第二段

  • The first question
    • for which category of applications
  • one should primarily design accelerators ?

  • Together with the architecture towards
    • accelerators,
  • a second simultaneous and significant trend
    • in high-performance and embedded applications is developing:

  • many of the emerging high-performance
    • and embedded applications,
    • from image/video/audio recognition to automatic translation, business analytics, and all forms of robotics rely on ml techniques.

  • This trend even starts to percolate in our community
    • where it turns out that
  • half of the benchmarks of PARSEC [2],
    • a suite partly introduced to highlight the emergence of new types of applications,
    • can be implemented using machine-learning algorithms [4].

PARSEC里的一半都可以用ml实现!

  • a third and equally remarkable trend in ml
    • where a small number of techniques,
    • based on nn (especially CNN [27] and DNN
      [16]),
      proved in the past few years to be state-of-the-art across a broad range of applications [25].

  • a unique opportunity to
    • design accelerators
    • which can realize the best of both worlds:
  • significant application scope together with
  • high performance and efficiency
  • due to the limited number of target algorithms.
 类似资料: