Detailed info

WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

Authors	An Tran, Konstantinos Drossos, Tuomas Virtanen
Title	WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information
Abstract	Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i.e. a caption) of its contents. Most AAC methods are adapted from image captioning or machine translation fields. In this work, we present a novel AAC method, explicitly focused on the exploitation of the temporal and time-frequency patterns in audio. We employ three learnable processes for audio encoding, two for extracting the temporal and time-frequency information, and one to merge the output of the previous two processes. To generate the caption, we employ the widely used Transformer decoder. We assess our method utilizing the freely available splits of the Clotho dataset. Our results increase previously reported highest SPIDEr to 17.3, from 16.2 (higher is better).
ISBN	978-1-6654-0900-1
Conference	2021 29th European Signal Processing Conference (EUSIPCO)
Date	23/08/2021
Location	Dublin, Ireland
Year of Publication, Publisher	2021
Url	https://zenodo.org/record/5723160
DOI	10.23919/EUSIPCO54536.2021.9616340

Key Facts

Project Coordinator: Dr. Sotiris Ioannidis
Institution: Foundation for Research and Technology Hellas (FORTH)
E-mail: marvel-info@marvel-project.eu
Start: 01.01.2021
Duration: 36 months
Participating Organisations: 17
Number of countries: 12

Get Connected

Funding

This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 957337. The website reflects only the view of the author(s) and the Commission is not responsible for any use that may be made of the information it contains.

Detailed info

WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

Key Facts

Get Connected

Menu

Funding