Academic literature on the topic 'Automated audio captioning'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Automated audio captioning.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Automated audio captioning"
Bokhove, Christian, and Christopher Downey. "Automated generation of ‘good enough’ transcripts as a first step to transcription of audio-recorded data." Methodological Innovations 11, no. 2 (May 2018): 205979911879074. http://dx.doi.org/10.1177/2059799118790743.
Full textKoenecke, Allison, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. "Racial disparities in automated speech recognition." Proceedings of the National Academy of Sciences 117, no. 14 (March 23, 2020): 7684–89. http://dx.doi.org/10.1073/pnas.1915768117.
Full textMirzaei, Maryam Sadat, Kourosh Meshgi, Yuya Akita, and Tatsuya Kawahara. "Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill." ReCALL 29, no. 2 (March 2, 2017): 178–99. http://dx.doi.org/10.1017/s0958344017000039.
Full textGuo, Rundong. "Advancing real-time close captioning: blind source separation and transcription for hearing impairments." Applied and Computational Engineering 30, no. 1 (January 22, 2024): 125–30. http://dx.doi.org/10.54254/2755-2721/30/20230084.
Full textPrabhala, Jagat Chaitanya, Venkatnareshbabu K, and Ragoju Ravi. "OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION." Applied Mathematics and Sciences An International Journal (MathSJ) 10, no. 1/2 (June 26, 2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.
Full textNam, Somang, and Deborah Fels. "Simulation of Subjective Closed Captioning Quality Assessment Using Prediction Models." International Journal of Semantic Computing 13, no. 01 (March 2019): 45–65. http://dx.doi.org/10.1142/s1793351x19400038.
Full textGotmare, Abhay, Gandharva Thite, and Laxmi Bewoor. "A multimodal machine learning approach to generate news articles from geo-tagged images." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 3 (June 1, 2024): 3434. http://dx.doi.org/10.11591/ijece.v14i3.pp3434-3442.
Full textVerma, Dr Neeta. "Assistive Vision Technology using Deep Learning Techniques." International Journal for Research in Applied Science and Engineering Technology 9, no. VII (July 31, 2021): 2695–704. http://dx.doi.org/10.22214/ijraset.2021.36815.
Full textEren, Aysegul Ozkaya, and Mustafa Sert. "Automated Audio Captioning with Topic Modeling." IEEE Access, 2023, 1. http://dx.doi.org/10.1109/access.2023.3235733.
Full textXiao, Feiyang, Jian Guan, Qiaoxi Zhu, and Wenwu Wang. "Graph Attention for Automated Audio Captioning." IEEE Signal Processing Letters, 2023, 1–5. http://dx.doi.org/10.1109/lsp.2023.3266114.
Full textDissertations / Theses on the topic "Automated audio captioning"
Labbé, Etienne. "Description automatique des événements sonores par des méthodes d'apprentissage profond." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSES054.
Full textIn the audio research field, the majority of machine learning systems focus on recognizing a limited number of sound events. However, when a machine interacts with real data, it must be able to handle much more varied and complex situations. To tackle this problem, annotators use natural language, which allows any sound information to be summarized. Automated Audio Captioning (AAC) was introduced recently to develop systems capable of automatically producing a description of any type of sound in text form. This task concerns all kinds of sound events such as environmental, urban, domestic sounds, sound effects, music or speech. This type of system could be used by people who are deaf or hard of hearing, and could improve the indexing of large audio databases. In the first part of this thesis, we present the state of the art of the AAC task through a global description of public datasets, learning methods, architectures and evaluation metrics. Using this knowledge, we then present the architecture of our first AAC system, which obtains encouraging scores on the main AAC metric named SPIDEr: 24.7% on the Clotho corpus and 40.1% on the AudioCaps corpus. Then, subsequently, we explore many aspects of AAC systems in the second part. We first focus on evaluation methods through the study of SPIDEr. For this, we propose a variant called SPIDEr-max, which considers several candidates for each audio file, and which shows that the SPIDEr metric is very sensitive to the predicted words. Then, we improve our reference system by exploring different architectures and numerous hyper-parameters to exceed the state of the art on AudioCaps (SPIDEr of 49.5%). Next, we explore a multi-task learning method aimed at improving the semantics of sentences generated by our system. Finally, we build a general and unbiased AAC system called CONETTE, which can generate different types of descriptions that approximate those of the target datasets. In the third and last part, we propose to study the capabilities of a AAC system to automatically search for audio content in a database. Our approach obtains competitive scores to systems dedicated to this task, while using fewer parameters. We also introduce semi-supervised methods to improve our system using new unlabeled audio data, and we show how pseudo-label generation can impact a AAC model. Finally, we studied the AAC systems in languages other than English: French, Spanish and German. In addition, we propose a system capable of producing all four languages at the same time, and we compare it with systems specialized in each language
Book chapters on the topic "Automated audio captioning"
M., Nivedita, AsnathVictyPhamila Y., Umashankar Kumaravelan, and Karthikeyan N. "Voice-Based Image Captioning System for Assisting Visually Impaired People Using Neural Networks." In Principles and Applications of Socio-Cognitive and Affective Computing, 177–99. IGI Global, 2022. http://dx.doi.org/10.4018/978-1-6684-3843-5.ch011.
Full textVenturini, Shamira, Michaela Mae Vann, Martina Pucci, and Giulia M. L. Bencini. "Towards a More Inclusive Learning Environment: The Importance of Providing Captions That Are Suited to Learners’ Language Proficiency in the UDL Classroom." In Studies in Health Technology and Informatics. IOS Press, 2022. http://dx.doi.org/10.3233/shti220884.
Full textConference papers on the topic "Automated audio captioning"
Kim, Minkyu, Kim Sung-Bin, and Tae-Hyun Oh. "Prefix Tuning for Automated Audio Captioning." In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023. http://dx.doi.org/10.1109/icassp49357.2023.10096877.
Full textDrossos, Konstantinos, Sharath Adavanne, and Tuomas Virtanen. "Automated audio captioning with recurrent neural networks." In 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2017. http://dx.doi.org/10.1109/waspaa.2017.8170058.
Full textChen, Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, and Eng Siong Chng. "Interactive Auido-text Representation for Automated Audio Captioning with Contrastive Learning." In Interspeech 2022. ISCA: ISCA, 2022. http://dx.doi.org/10.21437/interspeech.2022-10510.
Full textKim, Jaeyeon, Jaeyoon Jung, Jinjoo Lee, and Sang Hoon Woo. "EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning." In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024. http://dx.doi.org/10.1109/icassp48485.2024.10446672.
Full textYe, Zhongjie, Yuqing Wang, Helin Wang, Dongchao Yang, and Yuexian Zou. "FeatureCut: An Adaptive Data Augmentation for Automated Audio Captioning." In 2022 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022. http://dx.doi.org/10.23919/apsipaasc55919.2022.9980325.
Full textKoh, Andrew, Soham Tiwari, and Chng Eng Siong. "Automated Audio Captioning with Epochal Difficult Captions for curriculum learning." In 2022 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022. http://dx.doi.org/10.23919/apsipaasc55919.2022.9980242.
Full textWijngaard, Gijs, Elia Formisano, Bruno L. Giordano, and Michel Dumontier. "ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds." In 2023 31st European Signal Processing Conference (EUSIPCO). IEEE, 2023. http://dx.doi.org/10.23919/eusipco58844.2023.10289793.
Full textJain, Arushi, Navaneeth B. R, Shelly Mohanty, R. Sujatha, Sujatha R, Sourabh Tiwari, and Rashmi T. Shankarappa. "Web Framework for Enhancing Automated Audio Captioning Performance for Domestic Environment." In 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2022. http://dx.doi.org/10.1109/icccnt54827.2022.9984255.
Full textSun, Jianyuan, Xubo Liu, Xinhao Mei, Volkan Kılıç, Mark D. Plumbley, and Wenwu Wang. "Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning." In INTERSPEECH 2023. ISCA: ISCA, 2023. http://dx.doi.org/10.21437/interspeech.2023-943.
Full textXu, Xuenan, Heinrich Dinkel, Mengyue Wu, Zeyu Xie, and Kai Yu. "Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9413982.
Full text