Inhalt anspringen

Energy and Bandwidth Efficient Sparse Programmable Dataflow Accelerator

Schnelle Fakten

Abstract

The high performance Neural Network accelerator architectures rely on either large external memory bandwidth and/or sparse computation paradigms which scale down unfavorably. Current state-of-the-art architectures also include on-chip SRAMs often more than 150 kB, establishing a lower bound on silicon area based on memory alone. This article presents an architecture which exploits programmable dataflow in combination with sparsity to make more efficient use of small on-chip memories. Its control logic supports an enlarged map space through uneven mappings, searched by a cost-model driven compiler to find points which allow prioritizing energy consumption and memory access over throughput. The problem of supporting sparse processing for flexible dataflows is circumvented by introducing an encoding scheme which provides sparsity metadata while still allowing random read and write access. Altogether, the system showcases enhanced performance in terms of external memory access while requiring only 51 kB of on-chip SRAM and small silicon area (0.5 mm2). The design achieves average energy efficiency of 4.4 TOPS/W and 9.7 inferences/s for a sparse AlexNet workload.

Schlagwörter

Bandwidth

Convolution

DNN accelerator

Encoding

Energy efficiency

Logic

Memory management

System-on-chip

edge AI

flexible dataflow

map space exploration

sparse processing

Erläuterungen und Hinweise

Diese Seite verwendet Cookies, um die Funktionalität der Webseite zu gewährleisten und statistische Daten zu erheben. Sie können der statistischen Erhebung über die Datenschutzeinstellungen widersprechen (Opt-Out).

Einstellungen (Öffnet in einem neuen Tab)