An edition of End-to-end Speech Separation with Neural Networks (2021)

End-to-end Speech Separation with Neural Networks

by Yi Luo

0 Ratings
0 Want to read
0 Currently reading
0 Have read

End-to-end Speech Separation with Neural Netw ...

Yi Luo

Not in Library

My Reading Lists:

Use this Work

Create a new list

0 Ratings
0 Want to read
0 Currently reading
0 Have read

Check nearby libraries

WorldCat

Buy this book

Last edited by MARC Bot

December 11, 2022 | History

Edit

An edition of End-to-end Speech Separation with Neural Networks (2021)

End-to-end Speech Separation with Neural Networks

by Yi Luo

0 Ratings
0 Want to read
0 Currently reading
0 Have read

Speech separation has long been an active research topic in the signal processing community with its importance in a wide range of applications such as hearable devices and telecommunication systems. It not only serves as a fundamental problem for all higher-level speech processing tasks such as automatic speech recognition, natural language understanding, and smart personal assistants, but also plays an important role in smart earphones and augmented and virtual reality devices. With the recent progress in deep neural networks, the separation performance has been significantly advanced by various new problem definitions and model architectures. The most widely-used approach in the past years performs separation in time-frequency domain, where a spectrogram or a time-frequency representation is first calculated from the mixture signal and multiple time-frequency masks are then estimated for the target sources. The masks are applied on the mixture's time-frequency representation to extract the target representations, and then operations such as inverse short-time Fourier transform is utilized to convert them back to waveforms.

However, such frequency-domain methods may have difficulties in modeling the phase spectrogram as the conventional time-frequency masks often only consider the magnitude spectrogram. Moreover, the training objectives for the frequency-domain methods are typically also in frequency-domain, which may not be inline with widely-used time-domain evaluation metrics such as signal-to-noise ratio and signal-to-distortion ratio. The problem formulation of time-domain, end-to-end speech separation naturally arises to tackle the disadvantages in the frequency-domain systems. The end-to-end speech separation networks take the mixture waveform as input and directly estimate the waveforms of the target sources. Following the general pipeline of conventional frequency-domain systems which contains a waveform encoder, a separator, and a waveform decoder, time-domain systems can be design in a similar way while significantly improves the separation performance. In this dissertation, I focus on multiple aspects in the general problem formulation of end-to-end separation networks including the system designs, model architectures, and training objectives.

I start with a single-channel pipeline, which we refer to as the time-domain audio separation network (TasNet), to validate the advantage of end-to-end separation comparing with the conventional time-frequency domain pipelines. I then move to the multi-channel scenario and introduce the filter-and-sum network (FaSNet) for both fixed-geometry and ad-hoc geometry microphone arrays. Next I introduce methods for lightweight network architecture design that allows the models to maintain the separation performance while using only as small as 2.5% model size and 17.6% model complexity. After that, I look into the training objective functions for end-to-end speech separation and describe two training objectives for separating varying numbers of sources and improving the robustness under reverberant environments, respectively. Finally I take a step back and revisit several problem formulations in end-to-end separation pipeline and raise more questions in this framework to be further analyzed and investigated in future works.

Publish Date

2021

Publisher

[publisher not identified]

Language

English

Check nearby libraries

WorldCat

Buy this book

Edition	Availability
1 End-to-end Speech Separation with Neural Networks 2021, [publisher not identified] in English	aaaa Not in Library Libraries near you: WorldCat

Add another edition?

Book Details

Edition Notes

Department: Electrical Engineering.

Thesis advisor: Nima Mesgarani.

Thesis advisor: John N. Wright.

Thesis (Ph.D.)--Columbia University, 2021.

Published in: [New York, N.Y.?]

The Physical Object

Pagination: 1 online resource.

ID Numbers

Open Library: OL43793486M
OCLC/WorldCat: 1261045082

Source records

marc_columbia MARC record

Community Reviews (0)

Feedback?

No community reviews have been submitted for this work.

Lists

This work does not appear on any lists.

History

Created December 11, 2022
1 revision

Download catalog record: RDF / JSON

December 11, 2022

Created by MARC Bot

import new book

End-to-end Speech Separation with Neural Networks

by Yi Luo