Publications

Filter by type:
. The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans. IEEE DSLW, 2021.

Preprint

. Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization. accepted in IEEE ICASSP, 2021.

Preprint Audio Demo

. Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition. submitted to Computer Speech and Language, 2021.

Preprint

. ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration. IEEE SLT Workshop, 2021.

Preprint

. End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. ISCA Interspeech, 2020.

Preprint Source Document

. End-to-End ASR with Adaptive Span Self-Attention. ISCA Interspeech, 2020.

PDF Source Document

. Significance of Spectral Cues in Automatic Speech Segmentation for Indian Language Speech Synthesizers. Speech Communication, 2020.

Code Source Document

. The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge. CHiME-6 Workshop, 2020.

Preprint PDF Slides Video Source Document Blog

. CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings. CHiME-6 Workshop, 2020.

Preprint PDF Source Document

. Far-Field Location Guided Target Speech Extraction using End-to-End Speech Recognition Objectives. IEEE ICASSP, 2020.

PDF Video Source Document Audio Demo

. Attention-based ASR with Lightweight and Dynamic Convolutions. IEEE ICASSP, 2020.

Preprint Source Document

. Speech Enhancement Using End-to-End Speech Recognition Objectives. IEEE WASPAA, 2019.

PDF Poster

. Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors for Reverberant Speech Recognition. IEEE WASPAA, 2019.

PDF

. An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions. arXiv:1904.09049, 2019.

Preprint

. The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays. CHiME-5 Workshop, 2018.

PDF Slides Source Document

. Student-Teacher Learning for BLSTM Mask-based Speech Enhancement. ISCA Interspeech, 2018.

PDF Poster Source Document

. TBT (Toolkit to Build TTS): A High Performance Framework to Build Multiple Language HTS Voice. ISCA Interspeech, 2017.

PDF Source Document

. A Hybrid Approach to Segmentation of Speech Using Signal Processing Cues and Hidden Markov Models. MS thesis, Indian Institute of Technology Madras, 2016.

PDF

. Exploration of Vowel Onset and Offset Points for Hybrid Speech Segmentation. IEEE TENCON, 2015.

Source Document

. Blizzard Challenge 2015 : Submission by DONLab, IIT Madras. Blizzard Challenge, 2015.

PDF

. Building Speech Synthesis Systems for Indian Languages. IEEE NCC, 2015.

Source Document

. IIT Madras's Submission to the Blizzard Challenge 2014. Blizzard Challenge, 2014.

PDF

. Group Delay based Phone Segmentation for HTS. IEEE NCC, 2014.

PDF Source Document

. A Syllable Based Statistical Text to Speech System. EUSIPCO, 2013.

PDF Source Document

. A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages. ISCA SSW, 2013.

PDF Source Document