Abstract

As modern computing workloads become increasingly data-intensive, the limitations of traditional digital computing hardware, especially in terms of communication bandwidth between memory and processors, are becoming more apparent. Surprisingly, the performance and energy efficiency of the state-of-the-art are not primarily constrained by the computing power of advanced processing architectures but rather by the low communication bandwidth between the memory and the processor. This challenge necessitates a clever re-architecting of digital computing hardware to make the communication between the processor and the memory more energy-efficient, faster, and capable of supporting significantly higher bandwidth. Memory-centric computing emerges as a novel, alternate computing paradigm that addresses these requirements through the integration of processing logic within the memory device itself. By enabling highly localized computing within the memory, memory-centric architectures not only demonstrate higher energy efficiency and lower computational latency but also support massively parallel computing performance within a compact form factor. However, this new computing paradigm also introduces unique design challenges, demanding careful architecting of novel and efficient computing techniques and dataflow designs tailored to the constraints of memory architectures. Further, enhancing functional flexibility, modularity, and design scalability continues to pose significant challenges in this research domain. This work tackles the existing challenges in supporting versatile, heterogeneous computing within the memory while also maintaining high energy efficiency and computational parallelism. This is achieved through a novel, programmable near-memory computing technique that employs a cluster of re-writable look-up tables (LUT) working collectively to support various logic/arithmetic operations. Additionally, the in-situ programmability of this processing architecture is facilitated with a custom-designed instruction set architecture. Furthermore, the integration of a plurality of this processing architecture within the memory cell arrays (i.e., banks) minimizes latency and energy loss while also maximizing the bandwidth of data communication between the processor and memory. This design solution also opens the path to efficient design scaling, supporting highly versatile, heterogeneous computing workloads on the same computing platform.

Publication Date

5-29-2024

Document Type

Dissertation

Student Type

Graduate

Degree Name

Electrical and Computer Engineering (Ph.D)

Department, Program, or Center

Electrical and Computer Engineering Technology

College

Kate Gleason College of Engineering

Advisor

Amlan Ganguly

Advisor/Committee Member

Minseok Kwon

Advisor/Committee Member

Mark Indovina

Campus

RIT – Main Campus

Share

COinS