Inhaltsverzeichnis
Auswahl der wissenschaftlichen Literatur zum Thema „Speech diarization“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Speech diarization" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Zeitschriftenartikel zum Thema "Speech diarization"
Mertens, Robert, Po-Sen Huang, Luke Gottlieb, Gerald Friedland, Ajay Divakaran und Mark Hasegawa-Johnson. „On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks“. International Journal of Multimedia Data Engineering and Management 3, Nr. 3 (Juli 2012): 1–19. http://dx.doi.org/10.4018/jmdem.2012070101.
Der volle Inhalt der QuelleAstapov, Sergei, Aleksei Gusev, Marina Volkova, Aleksei Logunov, Valeriia Zaluskaia, Vlada Kapranova, Elena Timofeeva, Elena Evseeva, Vladimir Kabarov und Yuri Matveev. „Application of Fusion of Various Spontaneous Speech Analytics Methods for Improving Far-Field Neural-Based Diarization“. Mathematics 9, Nr. 23 (23.11.2021): 2998. http://dx.doi.org/10.3390/math9232998.
Der volle Inhalt der QuelleLyu, Ke-Ming, Ren-yuan Lyu und Hsien-Tsung Chang. „Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation“. PeerJ Computer Science 10 (29.03.2024): e1973. http://dx.doi.org/10.7717/peerj-cs.1973.
Der volle Inhalt der QuellePrabhala, Jagat Chaitanya, Venkatnareshbabu K und Ragoju Ravi. „OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION“. Applied Mathematics and Sciences An International Journal (MathSJ) 10, Nr. 1/2 (26.06.2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.
Der volle Inhalt der QuelleV, Sethuram, Ande Prasad und R. Rajeswara Rao. „Metaheuristic adapted convolutional neural network for Telugu speaker diarization“. Intelligent Decision Technologies 15, Nr. 4 (10.01.2022): 561–77. http://dx.doi.org/10.3233/idt-211005.
Der volle Inhalt der QuelleMurali, Abhejay, Satwik Dutta, Meena Chandra Shekar, Dwight Irvin, Jay Buzhardt und John H. Hansen. „Towards developing speaker diarization for parent-child interactions“. Journal of the Acoustical Society of America 152, Nr. 4 (Oktober 2022): A61. http://dx.doi.org/10.1121/10.0015551.
Der volle Inhalt der QuelleTaha, Thaer Mufeed, Zaineb Ben Messaoud und Mondher Frikha. „Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization“. International Journal of Interactive Mobile Technologies (iJIM) 18, Nr. 03 (09.02.2024): 88–103. http://dx.doi.org/10.3991/ijim.v18i03.43013.
Der volle Inhalt der QuelleKothalkar, Prasanna V., John H. L. Hansen, Dwight Irvin und Jay Buzhardt. „Child-adult speech diarization in naturalistic conditions of preschool classrooms using room-independent ResNet model and automatic speech recognition-based re-segmentation“. Journal of the Acoustical Society of America 155, Nr. 2 (01.02.2024): 1198–215. http://dx.doi.org/10.1121/10.0024353.
Der volle Inhalt der QuelleKshirod, Kshirod Sarmah. „Speaker Diarization with Deep Learning Techniques“. Turkish Journal of Computer and Mathematics Education (TURCOMAT) 11, Nr. 3 (15.12.2020): 2570–82. http://dx.doi.org/10.61841/turcomat.v11i3.14309.
Der volle Inhalt der QuelleLleida, Eduardo, Alfonso Ortega, Antonio Miguel, Virginia Bazán-Gil, Carmen Pérez, Manuel Gómez und Alberto de Prada. „Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media“. Applied Sciences 9, Nr. 24 (11.12.2019): 5412. http://dx.doi.org/10.3390/app9245412.
Der volle Inhalt der QuelleDissertationen zum Thema "Speech diarization"
Zelenák, Martin. „Detection and handling of overlapping speech for speaker diarization“. Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/72431.
Der volle Inhalt der QuelleOtterson, Scott. „Use of speaker location features in meeting diarization /“. Thesis, Connect to this title online; UW restricted, 2008. http://hdl.handle.net/1773/15463.
Der volle Inhalt der QuellePeso, Pablo. „Spatial features of reverberant speech : estimation and application to recognition and diarization“. Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/45664.
Der volle Inhalt der QuelleSinclair, Mark. „Speech segmentation and speaker diarisation for transcription and translation“. Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.
Der volle Inhalt der QuelleIshizuka, Kentaro. „Studies on Acoustic Features for Automatic Speech Recognition and Speaker Diarization in Real Environments“. 京都大学 (Kyoto University), 2009. http://hdl.handle.net/2433/123834.
Der volle Inhalt der QuelleYin, Ruiqing. „Steps towards end-to-end neural speaker diarization“. Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS261/document.
Der volle Inhalt der QuelleSpeaker diarization is the task of determining "who speaks when" in an audio stream that usually contains an unknown amount of speech from an unknown number of speakers. Speaker diarization systems are usually built as the combination of four main stages. First, non-speech regions such as silence, music, and noise are removed by Voice Activity Detection (VAD). Next, speech regions are split into speaker-homogeneous segments by Speaker Change Detection (SCD), later grouped according to the identity of the speaker thanks to unsupervised clustering approaches. Finally, speech turn boundaries and labels are (optionally) refined with a re-segmentation stage. In this thesis, we propose to address these four stages with neural network approaches. We first formulate both the initial segmentation (voice activity detection and speaker change detection) and the final re-segmentation as a set of sequence labeling problems and then address them with Bidirectional Long Short-Term Memory (Bi-LSTM) networks. In the speech turn clustering stage, we propose to use affinity propagation on top of neural speaker embeddings. Experiments on a broadcast TV dataset show that affinity propagation clustering is more suitable than hierarchical agglomerative clustering when applied to neural speaker embeddings. The LSTM-based segmentation and affinity propagation clustering are also combined and jointly optimized to form a speaker diarization pipeline. Compared to the pipeline with independently optimized modules, the new pipeline brings a significant improvement. In addition, we propose to improve the similarity matrix by bidirectional LSTM and then apply spectral clustering on top of the improved similarity matrix. The proposed system achieves state-of-the-art performance in the CALLHOME telephone conversation dataset. Finally, we formulate sequential clustering as a supervised sequence labeling task and address it with stacked RNNs. To better understand its behavior, the analysis is based on a proposed encoder-decoder architecture. Our proposed systems bring a significant improvement compared with traditional clustering methods on toy examples
Cui, Can. „Séparation, diarisation et reconnaissance de la parole conjointes pour la transcription automatique de réunions“. Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0103.
Der volle Inhalt der QuelleFar-field microphone-array meeting transcription is particularly challenging due to overlapping speech, ambient noise, and reverberation. To address these issues, we explored three approaches. First, we employ a multichannel speaker separation model to isolate individual speakers, followed by a single-channel, single-speaker automatic speech recognition (ASR) model to transcribe the separated and enhanced audio. This method effectively enhances speech quality for ASR. Second, we propose an end-to-end multichannel speaker-attributed ASR (MC-SA-ASR) model, which builds on an existing single-channel SA-ASR model and incorporates a multichannel Conformer-based encoder with multi-frame cross-channel attention (MFCCA). Unlike traditional approaches that require a multichannel front-end speech enhancement model, the MC-SA-ASR model handles far-field microphones in an end-to-end manner. We also experimented with different input features, including Mel filterbank and phase features, for that model. Lastly, we incorporate a multichannel beamforming and enhancement model as a front-end processing step, followed by a single-channel SA-ASR model to process the enhanced multi-speaker speech signals. We tested different fixed, hybrid, and fully neural network-based beamformers and proposed to jointly optimize the neural beamformer and SA-ASR models using the training objective for the latter. In addition to these methods, we developed a meeting transcription pipeline that integrates voice activity detection, speaker diarization, and SA-ASR to process real meeting recordings effectively. Experimental results indicate that, while using a speaker separation model can enhance speech quality, separation errors can propagate to ASR, resulting in suboptimal performance. A guided speaker separation approach proves to be more effective. Our proposed MC-SA-ASR model demonstrates efficiency in integrating multichannel information and the shared information between the ASR and speaker blocks. Experiments with different input features reveal that models trained with Mel filterbank features perform better in terms of word error rate (WER) and speaker error rate (SER) when the number of channels and speakers is low (2 channels with 1 or 2 speakers). However, for settings with 3 or 4 channels and 3 speakers, models trained with additional phase information outperform those using only Mel filterbank features. This suggests that phase information can enhance ASR by leveraging localization information from multiple channels. Although MFCCA-based MC-SA-ASR outperforms the single-channel SA-ASR and MC-ASR models without a speaker block, the joint beamforming and SA-ASR model further improves the performance. Specifically, joint training of the neural beamformer and SA-ASR yields the best performance, indicating that improving speech quality might be a more direct and efficient approach than using an end-to-end MC-SA-ASR model for multichannel meeting transcription. Furthermore, the study of the real meeting transcription pipeline underscores the potential for better end-to-end models. In our investigation on improving speaker assignment in SA-ASR, we found that the speaker block does not effectively help improve the ASR performance. This highlights the need for improved architectures that more effectively integrate ASR and speaker information
Mariotte, Théo. „Traitement automatique de la parole en réunion par dissémination de capteurs“. Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA1001.
Der volle Inhalt der QuelleThis thesis work focuses on automatic speech processing, and more specifically on speaker diarization. This task requires the signal to be segmented to identify events such as voice activity, overlapped speech, or speaker changes. This work tackles the scenario where the signal is recorded by a device located in the center of a group of speakers, as in meetings. These conditions lead to a degradation in signal quality due to the distance between the speakers (distant speech).To mitigate this degradation, one approach is to record the signal using a microphone array. The resulting multichannel signal provides information on the spatial distribution of the acoustic field. Two lines of research are being explored for speech segmentation using microphone arrays.The first introduces a method combining acoustic features with spatial features. We propose a new set of features based on the circular harmonics expansion. This approach improves segmentation performance under distant speech conditions while reducing the number of model parameters and improving robustness in case of change in the array geometry.The second proposes several approaches that combine channels using self-attention. Different models, inspired by an existing architecture, are developed. Combining channels also improves segmentation under distant speech conditions. Two of these approaches make feature extraction more interpretable. The proposed distant speech segmentation systems also improve speaker diarization.Channel combination shows poor robustness to changes in the array geometry during inference. To avoid this behavior, a learning procedure is proposed, which improves the robustness in case of array mismatch.Finally, we identified a gap in the public datasets available for distant multichannel automatic speech processing. An acquisition protocol is introduced to build a new dataset, integrating speaker position annotation in addition to speaker diarization.Thus, this work aims to improve the quality of multichannel distant speech segmentation. The proposed methods exploit the spatial information provided by microphone arrays while improving the robustness in case of array mismatch
Hsu, Wu-Hua, und 許吳華. „A Preliminary Study on Speaker Diarization for Automatic Transcription of Broadcast Radio Speech“. Thesis, 2018. http://ndltd.ncl.edu.tw/handle/a3z9vr.
Der volle Inhalt der Quelle國立臺北科技大學
電子工程系
106
We use Time-delay Neural Network for Speaker Diarization. The average DER is 27.74%, which is better than 31.08% of GMM. We use trained automatic speaker diarization system to classify information of unmarked speakers in the NER-210 corpus, retrain the ASR by marking the output of the speaker information timeline. The experimental results show, through the speaker diarization system, the ASR system that classifies the speaker information can reduce the original CER from 20.01% to 19.13%. In addition, the average CER of the basic LSTM model on the automatic speech recognition system is 17.2%. The average CER can be reduced to 13.12% using the multi-layer serial neural network CNN-TDNN-LSTM model. Then, we using Confidence Measure data selection and adding more word sequences in the language model to increase the recognition rate, the average CER can be reduced to 9.2%.
Buchteile zum Thema "Speech diarization"
Avdeeva, Anastasia, und Sergey Novoselov. „Deep Speaker Embeddings Based Online Diarization“. In Speech and Computer, 24–32. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20980-2_3.
Der volle Inhalt der QuelleZajíc, Zbyněk, Josef V. Psutka und Luděk Müller. „Diarization Based on Identification with X-Vectors“. In Speech and Computer, 667–78. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_64.
Der volle Inhalt der QuelleEdwards, Erik, Michael Brenndoerfer, Amanda Robinson, Najmeh Sadoughi, Greg P. Finley, Maxim Korenevsky, Nico Axtmann, Mark Miller und David Suendermann-Oeft. „A Free Synthetic Corpus for Speaker Diarization Research“. In Speech and Computer, 113–22. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_13.
Der volle Inhalt der QuelleZajíc, Zbyněk, Josef V. Psutka, Lucie Zajícová, Luděk Müller und Petr Salajka. „Diarization of the Language Consulting Center Telephone Calls“. In Speech and Computer, 549–58. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-26061-3_56.
Der volle Inhalt der QuelleEdwards, Erik, Amanda Robinson, Najmeh Sadoughi, Greg P. Finley, Maxim Korenevsky, Michael Brenndoerfer, Nico Axtmann, Mark Miller und David Suendermann-Oeft. „Speaker Diarization: A Top-Down Approach Using Syllabic Phonology“. In Speech and Computer, 123–33. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_14.
Der volle Inhalt der QuelleKudashev, Oleg, und Alexander Kozlov. „The Diarization System for an Unknown Number of Speakers“. In Speech and Computer, 340–44. Cham: Springer International Publishing, 2013. http://dx.doi.org/10.1007/978-3-319-01931-4_45.
Der volle Inhalt der QuelleNguyen, Trung Hieu, Eng Siong Chng und Haizhou Li. „Speaker Diarization: An Emerging Research“. In Speech and Audio Processing for Coding, Enhancement and Recognition, 229–77. New York, NY: Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_8.
Der volle Inhalt der QuelleKynych, Frantisek, Jindrich Zdansky, Petr Cerva und Lukas Mateju. „Online Speaker Diarization Using Optimized SE-ResNet Architecture“. In Text, Speech, and Dialogue, 176–87. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-40498-6_16.
Der volle Inhalt der QuelleZajíc, Zbyněk, Jan Zelinka und Luděk Müller. „Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech“. In Speech and Computer, 555–63. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-66429-3_55.
Der volle Inhalt der QuelleKunešová, Marie, Marek Hrúz, Zbyněk Zajíc und Vlasta Radová. „Detection of Overlapping Speech for the Purposes of Speaker Diarization“. In Speech and Computer, 247–57. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-26061-3_26.
Der volle Inhalt der QuelleKonferenzberichte zum Thema "Speech diarization"
Von Neumann, Thilo, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix und Reinhold Haeb-Umbach. „Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization“. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 775–79. IEEE, 2024. http://dx.doi.org/10.1109/icasspw62465.2024.10625894.
Der volle Inhalt der QuelleLamel, Lori, Jean-Luc Gauvain und Leonardo Canseco-Rodriguez. „Speaker diarization from speech transcripts“. In Interspeech 2004. ISCA: ISCA, 2004. http://dx.doi.org/10.21437/interspeech.2004-250.
Der volle Inhalt der QuelleBounazou, Hadjer, Nassim Asbai und Sihem Zitouni. „Speaker Diarization in Overlapped Speech“. In 2022 19th International Multi-Conference on Systems, Signals & Devices (SSD). IEEE, 2022. http://dx.doi.org/10.1109/ssd54932.2022.9955684.
Der volle Inhalt der QuelleJiang, Yidi, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian und Haizhou Li. „Prompt-Driven Target Speech Diarization“. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024. http://dx.doi.org/10.1109/icassp48485.2024.10446072.
Der volle Inhalt der QuelleXie, Jiamin, Leibny Paola García-Perera, Daniel Povey und Sanjeev Khudanpur. „Multi-PLDA Diarization on Children’s Speech“. In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-2961.
Der volle Inhalt der QuelleGebre, Binyam Gebrekidan, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts und Tom Heskes. „Speaker diarization using gesture and speech“. In Interspeech 2014. ISCA: ISCA, 2014. http://dx.doi.org/10.21437/interspeech.2014-141.
Der volle Inhalt der QuelleLupu, Eugen, Anca Apatean und Radu Arsinte. „Speaker diarization experiments for Romanian parliamentary speech“. In 2015 International Symposium on Signals, Circuits and Systems (ISSCS). IEEE, 2015. http://dx.doi.org/10.1109/isscs.2015.7204023.
Der volle Inhalt der QuelleLyu, Dau-Cheng, Eng-Siong Chng und Haizhou Li. „Language diarization for code-switch conversational speech“. In ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6639083.
Der volle Inhalt der QuelleImseng, David, und Gerald Friedland. „Robust Speaker Diarization for short speech recordings“. In Understanding (ASRU). IEEE, 2009. http://dx.doi.org/10.1109/asru.2009.5373254.
Der volle Inhalt der QuelleWang, Yingzhi, Mirco Ravanelli und Alya Yacoubi. „Speech Emotion Diarization: Which Emotion Appears When?“ In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389790.
Der volle Inhalt der QuelleBerichte der Organisationen zum Thema "Speech diarization"
Hansen, John H. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. Fort Belvoir, VA: Defense Technical Information Center, Oktober 2015. http://dx.doi.org/10.21236/ada623029.
Der volle Inhalt der Quelle