Abstract

Deep Neural Networks (DNNs) have shown remarkable success in speech denoising; however, their high computational and energy requirements make realtime deployment on edge devices challenging. In contrast, Spiking Neural Networks (SNNs) operate using sparse, event-driven spikes—offering a biologically inspired and energy-efficient alternative. In this study, we delve into a Spiking Neural Network (SNN) model that leverages the temporal dynamics of spiking neurons to capture long-range dependencies in the audio signal. By encoding the input audio into sparse and event-driven computations, the SNN can efficiently process the temporal information while requiring significantly fewer computations compared to DNNs. We present a real-time speech denoising system that maps noisy audio to sparse spike trains and processes them using diverse SNN architectures that aim to exploit time- and frequency-domain features. We investigated various baseline models and propose the Dual-Signal Transformation Spiking Network, a hybrid model that conducts frequency-domain enhancement through spectrogram masking and supplements it with raw waveform-based reconstruction in the time domain. Our experiments show that SNNs -- especially the Dual-Signal model -- are able to achieve competitive denoising performance while substantially lowering the computational cost—opening up possibilities for efficient and real-time auditory processing on neuromorphic hardware. This potentially contribute to the development of real time audio denoiser using SNN.

Publication Date

1-22-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Alexander Ororbia

Advisor/Committee Member

Aaron Deever

Advisor/Committee Member

Eduardo Coelho De Lima

Campus

RIT – Main Campus

Share

COinS