Inhaltsverzeichnis
Auswahl der wissenschaftlichen Literatur zum Thema „Speaker embedding“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Speaker embedding" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Zeitschriftenartikel zum Thema "Speaker embedding"
Kang, Woo Hyun, Sung Hwan Mun, Min Hyun Han und Nam Soo Kim. „Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification“. IEEE Access 8 (2020): 141838–49. http://dx.doi.org/10.1109/access.2020.3012893.
Der volle Inhalt der QuelleLee, Kong Aik, Qiongqiong Wang und Takafumi Koshinaka. „Xi-Vector Embedding for Speaker Recognition“. IEEE Signal Processing Letters 28 (2021): 1385–89. http://dx.doi.org/10.1109/lsp.2021.3091932.
Der volle Inhalt der QuelleSečujski, Milan, Darko Pekar, Siniša Suzić, Anton Smirnov und Tijana Nosek. „Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding“. JUCS - Journal of Universal Computer Science 26, Nr. 4 (28.04.2020): 434–53. http://dx.doi.org/10.3897/jucs.2020.023.
Der volle Inhalt der QuelleBae, Ara, und Wooil Kim. „Speaker Verification Employing Combinations of Self-Attention Mechanisms“. Electronics 9, Nr. 12 (21.12.2020): 2201. http://dx.doi.org/10.3390/electronics9122201.
Der volle Inhalt der QuelleBahmaninezhad, Fahimeh, Chunlei Zhang und John H. L. Hansen. „An investigation of domain adaptation in speaker embedding space for speaker recognition“. Speech Communication 129 (Mai 2021): 7–16. http://dx.doi.org/10.1016/j.specom.2021.01.001.
Der volle Inhalt der QuelleLi, Wenjie, Pengyuan Zhang und Yonghong Yan. „TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition“. Electronics Letters 55, Nr. 14 (Juli 2019): 816–19. http://dx.doi.org/10.1049/el.2019.1228.
Der volle Inhalt der QuelleMingote, Victoria, Antonio Miguel, Alfonso Ortega und Eduardo Lleida. „Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification“. Applied Sciences 9, Nr. 16 (11.08.2019): 3295. http://dx.doi.org/10.3390/app9163295.
Der volle Inhalt der QuelleLIANG, Chunyan, Lin YANG, Qingwei ZHAO und Yonghong YAN. „Factor Analysis of Neighborhood-Preserving Embedding for Speaker Verification“. IEICE Transactions on Information and Systems E95.D, Nr. 10 (2012): 2572–76. http://dx.doi.org/10.1587/transinf.e95.d.2572.
Der volle Inhalt der QuelleLin, Weiwei, Man-Wai Mak, Na Li, Dan Su und Dong Yu. „A Framework for Adapting DNN Speaker Embedding Across Languages“. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2810–22. http://dx.doi.org/10.1109/taslp.2020.3030499.
Der volle Inhalt der QuelleByun, Jaeuk, und Jong Won Shin. „Monaural Speech Separation Using Speaker Embedding From Preliminary Separation“. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 2753–63. http://dx.doi.org/10.1109/taslp.2021.3101617.
Der volle Inhalt der QuelleDissertationen zum Thema "Speaker embedding"
Cui, Ming. „Experiments in speaker diarization using speaker vectors“. Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.
Der volle Inhalt der Quelletalardiarisering är uppgiften att bestämma ”vem talade när?” i en ljud- eller videoinspelning som innehåller en okänd mängd tal och även ett okänt antal talare. Det har framstått som en allt viktigare och dedikerad domän inom talforskning. Ursprungligen föreslogs det som ett forskningsämne relaterat till automatisk taligenkänning, där talardiarisering fungerar som ett processteg upströms. Under de senaste åren har dock talardiarisering blivit en viktig nyckelteknik för många uppgifter, till exempel navigering, hämtning, eller högre nivå slutledning på ljuddata. Vår forskning fokuserar på de befintliga algoritmerna för talare diarisering. Speciellt riktar sig avhandlingen på skillnaderna mellan övervakade och oövervakade metoder. Syftet med denna avhandling är att kontrollera de mest avancerade algoritmerna och analysera vilken algoritm som passar bäst för våra applikationsscenarier. Dess huvudsakliga bidrag är (1) en empirisk studie av algoritmer för talare diarisering; (2) lämplig förbehandling av corpusdata, (3) ljudinbäddningsnätverk för att skapa d-vektorer; (4) experiment på olika algoritmer och corpus och jämförelse av dem; (5) en bra rekommendation för våra krav. Den empiriska studien visar att för inbäddning av extraktionsmodul, på grund av de neurala nätverkna kan utbildas med stora datamängder, diariseringsprestandan kan förbättras avsevärt genom att ersätta i-vektorer med dvektorer. Dessutom är skillnaderna mellan övervakade metoder och oövervakade metoder mestadels i klustermodulen. Avhandlingen använder endast dvektorer som ingång till diariseringsnätverk och väljer två huvudalgoritmer som jämförobjekt: Spektralkluster representerar oövervakad metod och obegränsat återkommande neuralt nätverk (UIS-RNN) representerar övervakad metod.
Lukáč, Peter. „Verifikace osob podle hlasu bez extrakce příznaků“. Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445531.
Der volle Inhalt der QuelleFahlström, Myrman Arvid. „Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks“. Thesis, KTH, Tal, musik och hörsel, TMH, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210237.
Der volle Inhalt der QuelleObevakad inlärning av tal innebär att automatiskt hitta mönster i tal, t ex ord eller talljud, utan bevakning i form av ortografiska transkriptioner eller tidigare kunskap om språket. Ett grundläggande problem är dock att obevakad talinlärning tenderar att hitta väldigt talar- och kontextspecifika representationer av tal. Vi föreslår en metod för att förbättra kvaliteten av posteriorgram genererade med en obevakad modell, genom att partitionera de latenta klasserna funna av modellen. Vi gör detta genom att träna en gles siamesisk modell för att hitta en linjär transformering av de givna posteriorgrammen, extraherade från den obevakade modellen, till lågdimensionella posteriorgram. Den siamesiska modellen använder sig av talfragmentpar funna med obevakad ordupptäckning, där varje par består av fragment som antingen tillhör samma eller olika klasser. Den färdigtränade modellen görs sedan om till en exakt partitionering av posteriorgrammen. Vi följer Zero Resource Speech Challenge, och evaluerar modellen med hjälp av minimala ordpar-ABX-uppgiften. Vi demonstrerar att vår metod avsevärt minskar posteriorgrammens dimensionalitet, samtidigt som posteriorgrammen blir mer talarinvarianta. Detta antyder att modellen kan vara användbar som ett generellt extra steg för att förbättra probabilistiska akustiska särdrag från obevakade modeller.
Chung-KoYin und 尹崇珂. „Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversational Systems“. Thesis, 2018. http://ndltd.ncl.edu.tw/handle/h5smz2.
Der volle Inhalt der QuelleChe-ChingHuang und 黃喆青. „Speaker Change Detection using Speaker and Articulatory Feature Embeddings“. Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ge4d25.
Der volle Inhalt der Quelle國立成功大學
資訊工程學系
107
Nowadays, with the improvement and advancement of many related technologies for voice processing, voice interactive software and products have become more and more popular. In the part of the multi-person dialogue voice, we will need to use the speaker change point detection technology to perform voice pre-processing, and then do further analysis and processing. In the past research on speaker change point detection, most of them are based on the characteristics of acoustic features for detection. The method proposed in this thesis is to provide the speaker information from the perspective of articulatory features. The difference in pronunciation characteristics is to improve the accuracy of the detection of the change point of the speaker and achieve complementary effects. In this thesis, we use a convolutional neural network to train a speaker embedding model by extracting acoustic features from speech to obtain a vector representing the characteristics of the speaker. In addition, a model of articulatory features (AFs) is trained, and a multi-layered perception network is used to extract the AF embedding of speech features. Finally, using these two vectors, another multilayer perceptual network is used to train the speaker change detection model. The trained speaker change detection model is helpful to determine the exact position of the change point. Two speech databases were utilized in this thesis. The first one was the VoxCeleb2 speech database, which was a corpus widely used in the field of speaker identification. The second was the Librispeech corpus, which was a corpus widely used in the field of speech or speaker recognition. In this thesis, we mainly trained three models, the first was the speaker embedding model, the second was the AF embedding model, and the third was the speaker change detection model. In the speaker embedding model part, this thesis used the VoxCeleb2 database for training and evaluation of model parameter settings. In the articulatory feature embedding model, we used the Librispeech database for training and evaluation of model parameter settings. In the part of the speaker change detection model, we used a dialogue corpus composed from the VoxCeleb2 database to train the speaker change detection model. In the speaker change detection task, the experimental results showed that the proposed method could reduce false alarm rate by 1.94%, increasing accuracy by 1.1%, precision rate by 2.04% and F1 score by 0.16%. From the experimental results, the proposed method was superior to the traditional method and could be applied to products that require speaker change detection.
Bücher zum Thema "Speaker embedding"
Camp, Elisabeth. A Dual Act Analysis of Slurs. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198758655.003.0003.
Der volle Inhalt der QuelleBuchteile zum Thema "Speaker embedding"
Karam, Z. N., und W. M. Campbell. „Graph Embedding for Speaker Recognition“. In Graph Embedding for Pattern Analysis, 229–60. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4457-2_10.
Der volle Inhalt der QuelleZhou, Kai, Qun Yang, Xiusong Sun und Shaohan Liu. „A Deep Speaker Embedding Transfer Method for Speaker Verification“. In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery, 369–76. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32456-8_40.
Der volle Inhalt der QuelleZhou, Dao, Longbiao Wang, Kong Aik Lee, Meng Liu und Jianwu Dang. „Deep Discriminative Embedding with Ranked Weight for Speaker Verification“. In Communications in Computer and Information Science, 79–86. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63823-8_10.
Der volle Inhalt der QuelleAmani, Arash, Mohammad Mohammadamini und Hadi Veisi. „Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding“. In Speech and Computer, 50–57. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_5.
Der volle Inhalt der QuelleTkachenko, Maxim, Alexander Yamshinin, Mikhail Kotov und Marina Nastasenko. „Lightweight Embeddings for Speaker Verification“. In Speech and Computer, 687–96. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_70.
Der volle Inhalt der QuelleCyrta, Pawel, Tomasz Trzciński und Wojciech Stokowiec. „Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings“. In Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017, 107–17. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-67220-5_10.
Der volle Inhalt der QuelleÜlgen, İsmail Rasim, Mustafa Erden und Levent M. Arslan. „Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme“. In Speech and Computer, 826–36. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_74.
Der volle Inhalt der QuelleDesgrippes, Magalie, und Amelia Lambelet. „3. On the Sociolinguistic Embedding of Portuguese Heritage Language Speakers in Switzerland: Socio-Economic Status and Home Literacy Environment (HELASCOT Project)“. In Heritage and School Language Literacy Development in Migrant Children, herausgegeben von Raphael Berthele und Amelia Lambelet, 34–57. Bristol, Blue Ridge Summit: Multilingual Matters, 2017. http://dx.doi.org/10.21832/9781783099054-004.
Der volle Inhalt der QuelleMillikan, Ruth. „Embedding Language in the World“. In Singular Thought and Mental Files, 251–64. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198746881.003.0012.
Der volle Inhalt der QuelleKirk-Giannini, Cameron Domenico, und Ernie Lepore. „Attributions of Attitude (May 22, 1970)“. In The Structure of Truth, 66–81. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198842491.003.0005.
Der volle Inhalt der QuelleKonferenzberichte zum Thema "Speaker embedding"
Li, Lantian, Chao Xing, Dong Wang, Kaimin Yu und Thomas Fang Zheng. „Binary speaker embedding“. In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2016. http://dx.doi.org/10.1109/iscslp.2016.7918381.
Der volle Inhalt der QuelleJung, Jee-Weon, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim und Ha-Jin Yu. „Selective Deep Speaker Embedding Enhancement for Speaker Verification“. In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-25.
Der volle Inhalt der QuelleKottur, Satwik, Xiaoyu Wang und Vitor Carvalho. „Exploring Personalized Neural Conversational Models“. In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/521.
Der volle Inhalt der QuelleHan, Min Hyun, Woo Hyun Kang, Sung Hwan Mun und Nam Soo Kim. „Information Preservation Pooling for Speaker Embedding“. In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-9.
Der volle Inhalt der QuelleChen, Chia-Ping, Su-Yu Zhang, Chih-Ting Yeh, Jia-Ching Wang, Tenghui Wang und Chien-Lin Huang. „Speaker Characterization Using TDNN-LSTM Based Speaker Embedding“. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683185.
Der volle Inhalt der QuelleGeorges, Munir, Jonathan Huang und Tobias Bocklet. „Compact Speaker Embedding: lrx-Vector“. In Interspeech 2020. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/interspeech.2020-2106.
Der volle Inhalt der QuelleToruk, Mesut, Gokhan Bilgin und Ahmet Serbes. „Speaker Diarization using Embedding Vectors“. In 2020 28th Signal Processing and Communications Applications Conference (SIU). IEEE, 2020. http://dx.doi.org/10.1109/siu49456.2020.9302162.
Der volle Inhalt der QuelleKaram, Zahi N., und William M. Campbell. „Graph-embedding for speaker recognition“. In Interspeech 2010. ISCA: ISCA, 2010. http://dx.doi.org/10.21437/interspeech.2010-726.
Der volle Inhalt der QuelleWang, Po-Chin, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan und Shan-Wen Hsiao. „Improving Embedding-based Neural-Network Speaker Recognition“. In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-8.
Der volle Inhalt der QuelleYamamoto, Hitoshi, Kong Aik Lee, Koji Okabe und Takafumi Koshinaka. „Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding“. In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-1508.
Der volle Inhalt der Quelle