Gotowa bibliografia na temat „Speech diarization”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Speech diarization”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Speech diarization"
Mertens, Robert, Po-Sen Huang, Luke Gottlieb, Gerald Friedland, Ajay Divakaran i Mark Hasegawa-Johnson. "On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks". International Journal of Multimedia Data Engineering and Management 3, nr 3 (lipiec 2012): 1–19. http://dx.doi.org/10.4018/jmdem.2012070101.
Pełny tekst źródłaAstapov, Sergei, Aleksei Gusev, Marina Volkova, Aleksei Logunov, Valeriia Zaluskaia, Vlada Kapranova, Elena Timofeeva, Elena Evseeva, Vladimir Kabarov i Yuri Matveev. "Application of Fusion of Various Spontaneous Speech Analytics Methods for Improving Far-Field Neural-Based Diarization". Mathematics 9, nr 23 (23.11.2021): 2998. http://dx.doi.org/10.3390/math9232998.
Pełny tekst źródłaLyu, Ke-Ming, Ren-yuan Lyu i Hsien-Tsung Chang. "Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation". PeerJ Computer Science 10 (29.03.2024): e1973. http://dx.doi.org/10.7717/peerj-cs.1973.
Pełny tekst źródłaPrabhala, Jagat Chaitanya, Venkatnareshbabu K i Ragoju Ravi. "OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION". Applied Mathematics and Sciences An International Journal (MathSJ) 10, nr 1/2 (26.06.2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.
Pełny tekst źródłaV, Sethuram, Ande Prasad i R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization". Intelligent Decision Technologies 15, nr 4 (10.01.2022): 561–77. http://dx.doi.org/10.3233/idt-211005.
Pełny tekst źródłaMurali, Abhejay, Satwik Dutta, Meena Chandra Shekar, Dwight Irvin, Jay Buzhardt i John H. Hansen. "Towards developing speaker diarization for parent-child interactions". Journal of the Acoustical Society of America 152, nr 4 (październik 2022): A61. http://dx.doi.org/10.1121/10.0015551.
Pełny tekst źródłaTaha, Thaer Mufeed, Zaineb Ben Messaoud i Mondher Frikha. "Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization". International Journal of Interactive Mobile Technologies (iJIM) 18, nr 03 (9.02.2024): 88–103. http://dx.doi.org/10.3991/ijim.v18i03.43013.
Pełny tekst źródłaKothalkar, Prasanna V., John H. L. Hansen, Dwight Irvin i Jay Buzhardt. "Child-adult speech diarization in naturalistic conditions of preschool classrooms using room-independent ResNet model and automatic speech recognition-based re-segmentation". Journal of the Acoustical Society of America 155, nr 2 (1.02.2024): 1198–215. http://dx.doi.org/10.1121/10.0024353.
Pełny tekst źródłaKshirod, Kshirod Sarmah. "Speaker Diarization with Deep Learning Techniques". Turkish Journal of Computer and Mathematics Education (TURCOMAT) 11, nr 3 (15.12.2020): 2570–82. http://dx.doi.org/10.61841/turcomat.v11i3.14309.
Pełny tekst źródłaLleida, Eduardo, Alfonso Ortega, Antonio Miguel, Virginia Bazán-Gil, Carmen Pérez, Manuel Gómez i Alberto de Prada. "Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media". Applied Sciences 9, nr 24 (11.12.2019): 5412. http://dx.doi.org/10.3390/app9245412.
Pełny tekst źródłaRozprawy doktorskie na temat "Speech diarization"
Zelenák, Martin. "Detection and handling of overlapping speech for speaker diarization". Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/72431.
Pełny tekst źródłaOtterson, Scott. "Use of speaker location features in meeting diarization /". Thesis, Connect to this title online; UW restricted, 2008. http://hdl.handle.net/1773/15463.
Pełny tekst źródłaPeso, Pablo. "Spatial features of reverberant speech : estimation and application to recognition and diarization". Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/45664.
Pełny tekst źródłaSinclair, Mark. "Speech segmentation and speaker diarisation for transcription and translation". Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.
Pełny tekst źródłaIshizuka, Kentaro. "Studies on Acoustic Features for Automatic Speech Recognition and Speaker Diarization in Real Environments". 京都大学 (Kyoto University), 2009. http://hdl.handle.net/2433/123834.
Pełny tekst źródłaYin, Ruiqing. "Steps towards end-to-end neural speaker diarization". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS261/document.
Pełny tekst źródłaSpeaker diarization is the task of determining "who speaks when" in an audio stream that usually contains an unknown amount of speech from an unknown number of speakers. Speaker diarization systems are usually built as the combination of four main stages. First, non-speech regions such as silence, music, and noise are removed by Voice Activity Detection (VAD). Next, speech regions are split into speaker-homogeneous segments by Speaker Change Detection (SCD), later grouped according to the identity of the speaker thanks to unsupervised clustering approaches. Finally, speech turn boundaries and labels are (optionally) refined with a re-segmentation stage. In this thesis, we propose to address these four stages with neural network approaches. We first formulate both the initial segmentation (voice activity detection and speaker change detection) and the final re-segmentation as a set of sequence labeling problems and then address them with Bidirectional Long Short-Term Memory (Bi-LSTM) networks. In the speech turn clustering stage, we propose to use affinity propagation on top of neural speaker embeddings. Experiments on a broadcast TV dataset show that affinity propagation clustering is more suitable than hierarchical agglomerative clustering when applied to neural speaker embeddings. The LSTM-based segmentation and affinity propagation clustering are also combined and jointly optimized to form a speaker diarization pipeline. Compared to the pipeline with independently optimized modules, the new pipeline brings a significant improvement. In addition, we propose to improve the similarity matrix by bidirectional LSTM and then apply spectral clustering on top of the improved similarity matrix. The proposed system achieves state-of-the-art performance in the CALLHOME telephone conversation dataset. Finally, we formulate sequential clustering as a supervised sequence labeling task and address it with stacked RNNs. To better understand its behavior, the analysis is based on a proposed encoder-decoder architecture. Our proposed systems bring a significant improvement compared with traditional clustering methods on toy examples
Cui, Can. "Séparation, diarisation et reconnaissance de la parole conjointes pour la transcription automatique de réunions". Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0103.
Pełny tekst źródłaFar-field microphone-array meeting transcription is particularly challenging due to overlapping speech, ambient noise, and reverberation. To address these issues, we explored three approaches. First, we employ a multichannel speaker separation model to isolate individual speakers, followed by a single-channel, single-speaker automatic speech recognition (ASR) model to transcribe the separated and enhanced audio. This method effectively enhances speech quality for ASR. Second, we propose an end-to-end multichannel speaker-attributed ASR (MC-SA-ASR) model, which builds on an existing single-channel SA-ASR model and incorporates a multichannel Conformer-based encoder with multi-frame cross-channel attention (MFCCA). Unlike traditional approaches that require a multichannel front-end speech enhancement model, the MC-SA-ASR model handles far-field microphones in an end-to-end manner. We also experimented with different input features, including Mel filterbank and phase features, for that model. Lastly, we incorporate a multichannel beamforming and enhancement model as a front-end processing step, followed by a single-channel SA-ASR model to process the enhanced multi-speaker speech signals. We tested different fixed, hybrid, and fully neural network-based beamformers and proposed to jointly optimize the neural beamformer and SA-ASR models using the training objective for the latter. In addition to these methods, we developed a meeting transcription pipeline that integrates voice activity detection, speaker diarization, and SA-ASR to process real meeting recordings effectively. Experimental results indicate that, while using a speaker separation model can enhance speech quality, separation errors can propagate to ASR, resulting in suboptimal performance. A guided speaker separation approach proves to be more effective. Our proposed MC-SA-ASR model demonstrates efficiency in integrating multichannel information and the shared information between the ASR and speaker blocks. Experiments with different input features reveal that models trained with Mel filterbank features perform better in terms of word error rate (WER) and speaker error rate (SER) when the number of channels and speakers is low (2 channels with 1 or 2 speakers). However, for settings with 3 or 4 channels and 3 speakers, models trained with additional phase information outperform those using only Mel filterbank features. This suggests that phase information can enhance ASR by leveraging localization information from multiple channels. Although MFCCA-based MC-SA-ASR outperforms the single-channel SA-ASR and MC-ASR models without a speaker block, the joint beamforming and SA-ASR model further improves the performance. Specifically, joint training of the neural beamformer and SA-ASR yields the best performance, indicating that improving speech quality might be a more direct and efficient approach than using an end-to-end MC-SA-ASR model for multichannel meeting transcription. Furthermore, the study of the real meeting transcription pipeline underscores the potential for better end-to-end models. In our investigation on improving speaker assignment in SA-ASR, we found that the speaker block does not effectively help improve the ASR performance. This highlights the need for improved architectures that more effectively integrate ASR and speaker information
Mariotte, Théo. "Traitement automatique de la parole en réunion par dissémination de capteurs". Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA1001.
Pełny tekst źródłaThis thesis work focuses on automatic speech processing, and more specifically on speaker diarization. This task requires the signal to be segmented to identify events such as voice activity, overlapped speech, or speaker changes. This work tackles the scenario where the signal is recorded by a device located in the center of a group of speakers, as in meetings. These conditions lead to a degradation in signal quality due to the distance between the speakers (distant speech).To mitigate this degradation, one approach is to record the signal using a microphone array. The resulting multichannel signal provides information on the spatial distribution of the acoustic field. Two lines of research are being explored for speech segmentation using microphone arrays.The first introduces a method combining acoustic features with spatial features. We propose a new set of features based on the circular harmonics expansion. This approach improves segmentation performance under distant speech conditions while reducing the number of model parameters and improving robustness in case of change in the array geometry.The second proposes several approaches that combine channels using self-attention. Different models, inspired by an existing architecture, are developed. Combining channels also improves segmentation under distant speech conditions. Two of these approaches make feature extraction more interpretable. The proposed distant speech segmentation systems also improve speaker diarization.Channel combination shows poor robustness to changes in the array geometry during inference. To avoid this behavior, a learning procedure is proposed, which improves the robustness in case of array mismatch.Finally, we identified a gap in the public datasets available for distant multichannel automatic speech processing. An acquisition protocol is introduced to build a new dataset, integrating speaker position annotation in addition to speaker diarization.Thus, this work aims to improve the quality of multichannel distant speech segmentation. The proposed methods exploit the spatial information provided by microphone arrays while improving the robustness in case of array mismatch
Hsu, Wu-Hua, i 許吳華. "A Preliminary Study on Speaker Diarization for Automatic Transcription of Broadcast Radio Speech". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/a3z9vr.
Pełny tekst źródła國立臺北科技大學
電子工程系
106
We use Time-delay Neural Network for Speaker Diarization. The average DER is 27.74%, which is better than 31.08% of GMM. We use trained automatic speaker diarization system to classify information of unmarked speakers in the NER-210 corpus, retrain the ASR by marking the output of the speaker information timeline. The experimental results show, through the speaker diarization system, the ASR system that classifies the speaker information can reduce the original CER from 20.01% to 19.13%. In addition, the average CER of the basic LSTM model on the automatic speech recognition system is 17.2%. The average CER can be reduced to 13.12% using the multi-layer serial neural network CNN-TDNN-LSTM model. Then, we using Confidence Measure data selection and adding more word sequences in the language model to increase the recognition rate, the average CER can be reduced to 9.2%.
Części książek na temat "Speech diarization"
Avdeeva, Anastasia, i Sergey Novoselov. "Deep Speaker Embeddings Based Online Diarization". W Speech and Computer, 24–32. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20980-2_3.
Pełny tekst źródłaZajíc, Zbyněk, Josef V. Psutka i Luděk Müller. "Diarization Based on Identification with X-Vectors". W Speech and Computer, 667–78. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_64.
Pełny tekst źródłaEdwards, Erik, Michael Brenndoerfer, Amanda Robinson, Najmeh Sadoughi, Greg P. Finley, Maxim Korenevsky, Nico Axtmann, Mark Miller i David Suendermann-Oeft. "A Free Synthetic Corpus for Speaker Diarization Research". W Speech and Computer, 113–22. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_13.
Pełny tekst źródłaZajíc, Zbyněk, Josef V. Psutka, Lucie Zajícová, Luděk Müller i Petr Salajka. "Diarization of the Language Consulting Center Telephone Calls". W Speech and Computer, 549–58. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-26061-3_56.
Pełny tekst źródłaEdwards, Erik, Amanda Robinson, Najmeh Sadoughi, Greg P. Finley, Maxim Korenevsky, Michael Brenndoerfer, Nico Axtmann, Mark Miller i David Suendermann-Oeft. "Speaker Diarization: A Top-Down Approach Using Syllabic Phonology". W Speech and Computer, 123–33. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_14.
Pełny tekst źródłaKudashev, Oleg, i Alexander Kozlov. "The Diarization System for an Unknown Number of Speakers". W Speech and Computer, 340–44. Cham: Springer International Publishing, 2013. http://dx.doi.org/10.1007/978-3-319-01931-4_45.
Pełny tekst źródłaNguyen, Trung Hieu, Eng Siong Chng i Haizhou Li. "Speaker Diarization: An Emerging Research". W Speech and Audio Processing for Coding, Enhancement and Recognition, 229–77. New York, NY: Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_8.
Pełny tekst źródłaKynych, Frantisek, Jindrich Zdansky, Petr Cerva i Lukas Mateju. "Online Speaker Diarization Using Optimized SE-ResNet Architecture". W Text, Speech, and Dialogue, 176–87. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-40498-6_16.
Pełny tekst źródłaZajíc, Zbyněk, Jan Zelinka i Luděk Müller. "Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech". W Speech and Computer, 555–63. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-66429-3_55.
Pełny tekst źródłaKunešová, Marie, Marek Hrúz, Zbyněk Zajíc i Vlasta Radová. "Detection of Overlapping Speech for the Purposes of Speaker Diarization". W Speech and Computer, 247–57. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-26061-3_26.
Pełny tekst źródłaStreszczenia konferencji na temat "Speech diarization"
Von Neumann, Thilo, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix i Reinhold Haeb-Umbach. "Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization". W 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 775–79. IEEE, 2024. http://dx.doi.org/10.1109/icasspw62465.2024.10625894.
Pełny tekst źródłaLamel, Lori, Jean-Luc Gauvain i Leonardo Canseco-Rodriguez. "Speaker diarization from speech transcripts". W Interspeech 2004. ISCA: ISCA, 2004. http://dx.doi.org/10.21437/interspeech.2004-250.
Pełny tekst źródłaBounazou, Hadjer, Nassim Asbai i Sihem Zitouni. "Speaker Diarization in Overlapped Speech". W 2022 19th International Multi-Conference on Systems, Signals & Devices (SSD). IEEE, 2022. http://dx.doi.org/10.1109/ssd54932.2022.9955684.
Pełny tekst źródłaJiang, Yidi, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian i Haizhou Li. "Prompt-Driven Target Speech Diarization". W ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024. http://dx.doi.org/10.1109/icassp48485.2024.10446072.
Pełny tekst źródłaXie, Jiamin, Leibny Paola García-Perera, Daniel Povey i Sanjeev Khudanpur. "Multi-PLDA Diarization on Children’s Speech". W Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-2961.
Pełny tekst źródłaGebre, Binyam Gebrekidan, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts i Tom Heskes. "Speaker diarization using gesture and speech". W Interspeech 2014. ISCA: ISCA, 2014. http://dx.doi.org/10.21437/interspeech.2014-141.
Pełny tekst źródłaLupu, Eugen, Anca Apatean i Radu Arsinte. "Speaker diarization experiments for Romanian parliamentary speech". W 2015 International Symposium on Signals, Circuits and Systems (ISSCS). IEEE, 2015. http://dx.doi.org/10.1109/isscs.2015.7204023.
Pełny tekst źródłaLyu, Dau-Cheng, Eng-Siong Chng i Haizhou Li. "Language diarization for code-switch conversational speech". W ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6639083.
Pełny tekst źródłaImseng, David, i Gerald Friedland. "Robust Speaker Diarization for short speech recordings". W Understanding (ASRU). IEEE, 2009. http://dx.doi.org/10.1109/asru.2009.5373254.
Pełny tekst źródłaWang, Yingzhi, Mirco Ravanelli i Alya Yacoubi. "Speech Emotion Diarization: Which Emotion Appears When?" W 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389790.
Pełny tekst źródłaRaporty organizacyjne na temat "Speech diarization"
Hansen, John H. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. Fort Belvoir, VA: Defense Technical Information Center, październik 2015. http://dx.doi.org/10.21236/ada623029.
Pełny tekst źródła