End-to-end Speech Separation with Neural Networks

  • 0 Ratings
  • 0 Want to read
  • 0 Currently reading
  • 0 Have read
End-to-end Speech Separation with Neural Netw ...
Yi Luo
Not in Library

My Reading Lists:

Create a new list

Check-In

×Close
Add an optional check-in date. Check-in dates are used to track yearly reading goals.
Today

  • 0 Ratings
  • 0 Want to read
  • 0 Currently reading
  • 0 Have read

Buy this book

Last edited by MARC Bot
December 11, 2022 | History

End-to-end Speech Separation with Neural Networks

  • 0 Ratings
  • 0 Want to read
  • 0 Currently reading
  • 0 Have read

Speech separation has long been an active research topic in the signal processing community with its importance in a wide range of applications such as hearable devices and telecommunication systems. It not only serves as a fundamental problem for all higher-level speech processing tasks such as automatic speech recognition, natural language understanding, and smart personal assistants, but also plays an important role in smart earphones and augmented and virtual reality devices. With the recent progress in deep neural networks, the separation performance has been significantly advanced by various new problem definitions and model architectures. The most widely-used approach in the past years performs separation in time-frequency domain, where a spectrogram or a time-frequency representation is first calculated from the mixture signal and multiple time-frequency masks are then estimated for the target sources. The masks are applied on the mixture's time-frequency representation to extract the target representations, and then operations such as inverse short-time Fourier transform is utilized to convert them back to waveforms.

However, such frequency-domain methods may have difficulties in modeling the phase spectrogram as the conventional time-frequency masks often only consider the magnitude spectrogram. Moreover, the training objectives for the frequency-domain methods are typically also in frequency-domain, which may not be inline with widely-used time-domain evaluation metrics such as signal-to-noise ratio and signal-to-distortion ratio. The problem formulation of time-domain, end-to-end speech separation naturally arises to tackle the disadvantages in the frequency-domain systems. The end-to-end speech separation networks take the mixture waveform as input and directly estimate the waveforms of the target sources. Following the general pipeline of conventional frequency-domain systems which contains a waveform encoder, a separator, and a waveform decoder, time-domain systems can be design in a similar way while significantly improves the separation performance. In this dissertation, I focus on multiple aspects in the general problem formulation of end-to-end separation networks including the system designs, model architectures, and training objectives.

I start with a single-channel pipeline, which we refer to as the time-domain audio separation network (TasNet), to validate the advantage of end-to-end separation comparing with the conventional time-frequency domain pipelines. I then move to the multi-channel scenario and introduce the filter-and-sum network (FaSNet) for both fixed-geometry and ad-hoc geometry microphone arrays. Next I introduce methods for lightweight network architecture design that allows the models to maintain the separation performance while using only as small as 2.5% model size and 17.6% model complexity. After that, I look into the training objective functions for end-to-end speech separation and describe two training objectives for separating varying numbers of sources and improving the robustness under reverberant environments, respectively. Finally I take a step back and revisit several problem formulations in end-to-end separation pipeline and raise more questions in this framework to be further analyzed and investigated in future works.

Publish Date
Language
English

Buy this book

Edition Availability
Cover of: End-to-end Speech Separation with Neural Networks

Add another edition?

Book Details


Edition Notes

Department: Electrical Engineering.

Thesis advisor: Nima Mesgarani.

Thesis advisor: John N. Wright.

Thesis (Ph.D.)--Columbia University, 2021.

Published in
[New York, N.Y.?]

The Physical Object

Pagination
1 online resource.

ID Numbers

Open Library
OL43793486M
OCLC/WorldCat
1261045082

Source records

marc_columbia MARC record

Community Reviews (0)

Feedback?
No community reviews have been submitted for this work.

Lists

This work does not appear on any lists.

History

Download catalog record: RDF / JSON
December 11, 2022 Created by MARC Bot import new book