Literatura académica sobre el tema "Speech diarization"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Speech diarization".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Artículos de revistas sobre el tema "Speech diarization"
Mertens, Robert, Po-Sen Huang, Luke Gottlieb, Gerald Friedland, Ajay Divakaran y Mark Hasegawa-Johnson. "On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks". International Journal of Multimedia Data Engineering and Management 3, n.º 3 (julio de 2012): 1–19. http://dx.doi.org/10.4018/jmdem.2012070101.
Texto completoAstapov, Sergei, Aleksei Gusev, Marina Volkova, Aleksei Logunov, Valeriia Zaluskaia, Vlada Kapranova, Elena Timofeeva, Elena Evseeva, Vladimir Kabarov y Yuri Matveev. "Application of Fusion of Various Spontaneous Speech Analytics Methods for Improving Far-Field Neural-Based Diarization". Mathematics 9, n.º 23 (23 de noviembre de 2021): 2998. http://dx.doi.org/10.3390/math9232998.
Texto completoLyu, Ke-Ming, Ren-yuan Lyu y Hsien-Tsung Chang. "Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation". PeerJ Computer Science 10 (29 de marzo de 2024): e1973. http://dx.doi.org/10.7717/peerj-cs.1973.
Texto completoPrabhala, Jagat Chaitanya, Venkatnareshbabu K y Ragoju Ravi. "OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION". Applied Mathematics and Sciences An International Journal (MathSJ) 10, n.º 1/2 (26 de junio de 2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.
Texto completoV, Sethuram, Ande Prasad y R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization". Intelligent Decision Technologies 15, n.º 4 (10 de enero de 2022): 561–77. http://dx.doi.org/10.3233/idt-211005.
Texto completoMurali, Abhejay, Satwik Dutta, Meena Chandra Shekar, Dwight Irvin, Jay Buzhardt y John H. Hansen. "Towards developing speaker diarization for parent-child interactions". Journal of the Acoustical Society of America 152, n.º 4 (octubre de 2022): A61. http://dx.doi.org/10.1121/10.0015551.
Texto completoTaha, Thaer Mufeed, Zaineb Ben Messaoud y Mondher Frikha. "Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization". International Journal of Interactive Mobile Technologies (iJIM) 18, n.º 03 (9 de febrero de 2024): 88–103. http://dx.doi.org/10.3991/ijim.v18i03.43013.
Texto completoKothalkar, Prasanna V., John H. L. Hansen, Dwight Irvin y Jay Buzhardt. "Child-adult speech diarization in naturalistic conditions of preschool classrooms using room-independent ResNet model and automatic speech recognition-based re-segmentation". Journal of the Acoustical Society of America 155, n.º 2 (1 de febrero de 2024): 1198–215. http://dx.doi.org/10.1121/10.0024353.
Texto completoKshirod, Kshirod Sarmah. "Speaker Diarization with Deep Learning Techniques". Turkish Journal of Computer and Mathematics Education (TURCOMAT) 11, n.º 3 (15 de diciembre de 2020): 2570–82. http://dx.doi.org/10.61841/turcomat.v11i3.14309.
Texto completoLleida, Eduardo, Alfonso Ortega, Antonio Miguel, Virginia Bazán-Gil, Carmen Pérez, Manuel Gómez y Alberto de Prada. "Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media". Applied Sciences 9, n.º 24 (11 de diciembre de 2019): 5412. http://dx.doi.org/10.3390/app9245412.
Texto completoTesis sobre el tema "Speech diarization"
Zelenák, Martin. "Detection and handling of overlapping speech for speaker diarization". Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/72431.
Texto completoOtterson, Scott. "Use of speaker location features in meeting diarization /". Thesis, Connect to this title online; UW restricted, 2008. http://hdl.handle.net/1773/15463.
Texto completoPeso, Pablo. "Spatial features of reverberant speech : estimation and application to recognition and diarization". Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/45664.
Texto completoSinclair, Mark. "Speech segmentation and speaker diarisation for transcription and translation". Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.
Texto completoIshizuka, Kentaro. "Studies on Acoustic Features for Automatic Speech Recognition and Speaker Diarization in Real Environments". 京都大学 (Kyoto University), 2009. http://hdl.handle.net/2433/123834.
Texto completoYin, Ruiqing. "Steps towards end-to-end neural speaker diarization". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS261/document.
Texto completoSpeaker diarization is the task of determining "who speaks when" in an audio stream that usually contains an unknown amount of speech from an unknown number of speakers. Speaker diarization systems are usually built as the combination of four main stages. First, non-speech regions such as silence, music, and noise are removed by Voice Activity Detection (VAD). Next, speech regions are split into speaker-homogeneous segments by Speaker Change Detection (SCD), later grouped according to the identity of the speaker thanks to unsupervised clustering approaches. Finally, speech turn boundaries and labels are (optionally) refined with a re-segmentation stage. In this thesis, we propose to address these four stages with neural network approaches. We first formulate both the initial segmentation (voice activity detection and speaker change detection) and the final re-segmentation as a set of sequence labeling problems and then address them with Bidirectional Long Short-Term Memory (Bi-LSTM) networks. In the speech turn clustering stage, we propose to use affinity propagation on top of neural speaker embeddings. Experiments on a broadcast TV dataset show that affinity propagation clustering is more suitable than hierarchical agglomerative clustering when applied to neural speaker embeddings. The LSTM-based segmentation and affinity propagation clustering are also combined and jointly optimized to form a speaker diarization pipeline. Compared to the pipeline with independently optimized modules, the new pipeline brings a significant improvement. In addition, we propose to improve the similarity matrix by bidirectional LSTM and then apply spectral clustering on top of the improved similarity matrix. The proposed system achieves state-of-the-art performance in the CALLHOME telephone conversation dataset. Finally, we formulate sequential clustering as a supervised sequence labeling task and address it with stacked RNNs. To better understand its behavior, the analysis is based on a proposed encoder-decoder architecture. Our proposed systems bring a significant improvement compared with traditional clustering methods on toy examples
Cui, Can. "Séparation, diarisation et reconnaissance de la parole conjointes pour la transcription automatique de réunions". Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0103.
Texto completoFar-field microphone-array meeting transcription is particularly challenging due to overlapping speech, ambient noise, and reverberation. To address these issues, we explored three approaches. First, we employ a multichannel speaker separation model to isolate individual speakers, followed by a single-channel, single-speaker automatic speech recognition (ASR) model to transcribe the separated and enhanced audio. This method effectively enhances speech quality for ASR. Second, we propose an end-to-end multichannel speaker-attributed ASR (MC-SA-ASR) model, which builds on an existing single-channel SA-ASR model and incorporates a multichannel Conformer-based encoder with multi-frame cross-channel attention (MFCCA). Unlike traditional approaches that require a multichannel front-end speech enhancement model, the MC-SA-ASR model handles far-field microphones in an end-to-end manner. We also experimented with different input features, including Mel filterbank and phase features, for that model. Lastly, we incorporate a multichannel beamforming and enhancement model as a front-end processing step, followed by a single-channel SA-ASR model to process the enhanced multi-speaker speech signals. We tested different fixed, hybrid, and fully neural network-based beamformers and proposed to jointly optimize the neural beamformer and SA-ASR models using the training objective for the latter. In addition to these methods, we developed a meeting transcription pipeline that integrates voice activity detection, speaker diarization, and SA-ASR to process real meeting recordings effectively. Experimental results indicate that, while using a speaker separation model can enhance speech quality, separation errors can propagate to ASR, resulting in suboptimal performance. A guided speaker separation approach proves to be more effective. Our proposed MC-SA-ASR model demonstrates efficiency in integrating multichannel information and the shared information between the ASR and speaker blocks. Experiments with different input features reveal that models trained with Mel filterbank features perform better in terms of word error rate (WER) and speaker error rate (SER) when the number of channels and speakers is low (2 channels with 1 or 2 speakers). However, for settings with 3 or 4 channels and 3 speakers, models trained with additional phase information outperform those using only Mel filterbank features. This suggests that phase information can enhance ASR by leveraging localization information from multiple channels. Although MFCCA-based MC-SA-ASR outperforms the single-channel SA-ASR and MC-ASR models without a speaker block, the joint beamforming and SA-ASR model further improves the performance. Specifically, joint training of the neural beamformer and SA-ASR yields the best performance, indicating that improving speech quality might be a more direct and efficient approach than using an end-to-end MC-SA-ASR model for multichannel meeting transcription. Furthermore, the study of the real meeting transcription pipeline underscores the potential for better end-to-end models. In our investigation on improving speaker assignment in SA-ASR, we found that the speaker block does not effectively help improve the ASR performance. This highlights the need for improved architectures that more effectively integrate ASR and speaker information
Mariotte, Théo. "Traitement automatique de la parole en réunion par dissémination de capteurs". Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA1001.
Texto completoThis thesis work focuses on automatic speech processing, and more specifically on speaker diarization. This task requires the signal to be segmented to identify events such as voice activity, overlapped speech, or speaker changes. This work tackles the scenario where the signal is recorded by a device located in the center of a group of speakers, as in meetings. These conditions lead to a degradation in signal quality due to the distance between the speakers (distant speech).To mitigate this degradation, one approach is to record the signal using a microphone array. The resulting multichannel signal provides information on the spatial distribution of the acoustic field. Two lines of research are being explored for speech segmentation using microphone arrays.The first introduces a method combining acoustic features with spatial features. We propose a new set of features based on the circular harmonics expansion. This approach improves segmentation performance under distant speech conditions while reducing the number of model parameters and improving robustness in case of change in the array geometry.The second proposes several approaches that combine channels using self-attention. Different models, inspired by an existing architecture, are developed. Combining channels also improves segmentation under distant speech conditions. Two of these approaches make feature extraction more interpretable. The proposed distant speech segmentation systems also improve speaker diarization.Channel combination shows poor robustness to changes in the array geometry during inference. To avoid this behavior, a learning procedure is proposed, which improves the robustness in case of array mismatch.Finally, we identified a gap in the public datasets available for distant multichannel automatic speech processing. An acquisition protocol is introduced to build a new dataset, integrating speaker position annotation in addition to speaker diarization.Thus, this work aims to improve the quality of multichannel distant speech segmentation. The proposed methods exploit the spatial information provided by microphone arrays while improving the robustness in case of array mismatch
Hsu, Wu-Hua y 許吳華. "A Preliminary Study on Speaker Diarization for Automatic Transcription of Broadcast Radio Speech". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/a3z9vr.
Texto completo國立臺北科技大學
電子工程系
106
We use Time-delay Neural Network for Speaker Diarization. The average DER is 27.74%, which is better than 31.08% of GMM. We use trained automatic speaker diarization system to classify information of unmarked speakers in the NER-210 corpus, retrain the ASR by marking the output of the speaker information timeline. The experimental results show, through the speaker diarization system, the ASR system that classifies the speaker information can reduce the original CER from 20.01% to 19.13%. In addition, the average CER of the basic LSTM model on the automatic speech recognition system is 17.2%. The average CER can be reduced to 13.12% using the multi-layer serial neural network CNN-TDNN-LSTM model. Then, we using Confidence Measure data selection and adding more word sequences in the language model to increase the recognition rate, the average CER can be reduced to 9.2%.
Capítulos de libros sobre el tema "Speech diarization"
Avdeeva, Anastasia y Sergey Novoselov. "Deep Speaker Embeddings Based Online Diarization". En Speech and Computer, 24–32. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20980-2_3.
Texto completoZajíc, Zbyněk, Josef V. Psutka y Luděk Müller. "Diarization Based on Identification with X-Vectors". En Speech and Computer, 667–78. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_64.
Texto completoEdwards, Erik, Michael Brenndoerfer, Amanda Robinson, Najmeh Sadoughi, Greg P. Finley, Maxim Korenevsky, Nico Axtmann, Mark Miller y David Suendermann-Oeft. "A Free Synthetic Corpus for Speaker Diarization Research". En Speech and Computer, 113–22. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_13.
Texto completoZajíc, Zbyněk, Josef V. Psutka, Lucie Zajícová, Luděk Müller y Petr Salajka. "Diarization of the Language Consulting Center Telephone Calls". En Speech and Computer, 549–58. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-26061-3_56.
Texto completoEdwards, Erik, Amanda Robinson, Najmeh Sadoughi, Greg P. Finley, Maxim Korenevsky, Michael Brenndoerfer, Nico Axtmann, Mark Miller y David Suendermann-Oeft. "Speaker Diarization: A Top-Down Approach Using Syllabic Phonology". En Speech and Computer, 123–33. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_14.
Texto completoKudashev, Oleg y Alexander Kozlov. "The Diarization System for an Unknown Number of Speakers". En Speech and Computer, 340–44. Cham: Springer International Publishing, 2013. http://dx.doi.org/10.1007/978-3-319-01931-4_45.
Texto completoNguyen, Trung Hieu, Eng Siong Chng y Haizhou Li. "Speaker Diarization: An Emerging Research". En Speech and Audio Processing for Coding, Enhancement and Recognition, 229–77. New York, NY: Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_8.
Texto completoKynych, Frantisek, Jindrich Zdansky, Petr Cerva y Lukas Mateju. "Online Speaker Diarization Using Optimized SE-ResNet Architecture". En Text, Speech, and Dialogue, 176–87. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-40498-6_16.
Texto completoZajíc, Zbyněk, Jan Zelinka y Luděk Müller. "Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech". En Speech and Computer, 555–63. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-66429-3_55.
Texto completoKunešová, Marie, Marek Hrúz, Zbyněk Zajíc y Vlasta Radová. "Detection of Overlapping Speech for the Purposes of Speaker Diarization". En Speech and Computer, 247–57. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-26061-3_26.
Texto completoActas de conferencias sobre el tema "Speech diarization"
Von Neumann, Thilo, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix y Reinhold Haeb-Umbach. "Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization". En 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 775–79. IEEE, 2024. http://dx.doi.org/10.1109/icasspw62465.2024.10625894.
Texto completoLamel, Lori, Jean-Luc Gauvain y Leonardo Canseco-Rodriguez. "Speaker diarization from speech transcripts". En Interspeech 2004. ISCA: ISCA, 2004. http://dx.doi.org/10.21437/interspeech.2004-250.
Texto completoBounazou, Hadjer, Nassim Asbai y Sihem Zitouni. "Speaker Diarization in Overlapped Speech". En 2022 19th International Multi-Conference on Systems, Signals & Devices (SSD). IEEE, 2022. http://dx.doi.org/10.1109/ssd54932.2022.9955684.
Texto completoJiang, Yidi, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian y Haizhou Li. "Prompt-Driven Target Speech Diarization". En ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024. http://dx.doi.org/10.1109/icassp48485.2024.10446072.
Texto completoXie, Jiamin, Leibny Paola García-Perera, Daniel Povey y Sanjeev Khudanpur. "Multi-PLDA Diarization on Children’s Speech". En Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-2961.
Texto completoGebre, Binyam Gebrekidan, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts y Tom Heskes. "Speaker diarization using gesture and speech". En Interspeech 2014. ISCA: ISCA, 2014. http://dx.doi.org/10.21437/interspeech.2014-141.
Texto completoLupu, Eugen, Anca Apatean y Radu Arsinte. "Speaker diarization experiments for Romanian parliamentary speech". En 2015 International Symposium on Signals, Circuits and Systems (ISSCS). IEEE, 2015. http://dx.doi.org/10.1109/isscs.2015.7204023.
Texto completoLyu, Dau-Cheng, Eng-Siong Chng y Haizhou Li. "Language diarization for code-switch conversational speech". En ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6639083.
Texto completoImseng, David y Gerald Friedland. "Robust Speaker Diarization for short speech recordings". En Understanding (ASRU). IEEE, 2009. http://dx.doi.org/10.1109/asru.2009.5373254.
Texto completoWang, Yingzhi, Mirco Ravanelli y Alya Yacoubi. "Speech Emotion Diarization: Which Emotion Appears When?" En 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389790.
Texto completoInformes sobre el tema "Speech diarization"
Hansen, John H. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. Fort Belvoir, VA: Defense Technical Information Center, octubre de 2015. http://dx.doi.org/10.21236/ada623029.
Texto completo