Academic literature on the topic 'Speaker embedding'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speaker embedding.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Speaker embedding"
Kang, Woo Hyun, Sung Hwan Mun, Min Hyun Han, and Nam Soo Kim. "Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification." IEEE Access 8 (2020): 141838–49. http://dx.doi.org/10.1109/access.2020.3012893.
Full textLee, Kong Aik, Qiongqiong Wang, and Takafumi Koshinaka. "Xi-Vector Embedding for Speaker Recognition." IEEE Signal Processing Letters 28 (2021): 1385–89. http://dx.doi.org/10.1109/lsp.2021.3091932.
Full textSečujski, Milan, Darko Pekar, Siniša Suzić, Anton Smirnov, and Tijana Nosek. "Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding." JUCS - Journal of Universal Computer Science 26, no. 4 (April 28, 2020): 434–53. http://dx.doi.org/10.3897/jucs.2020.023.
Full textBae, Ara, and Wooil Kim. "Speaker Verification Employing Combinations of Self-Attention Mechanisms." Electronics 9, no. 12 (December 21, 2020): 2201. http://dx.doi.org/10.3390/electronics9122201.
Full textBahmaninezhad, Fahimeh, Chunlei Zhang, and John H. L. Hansen. "An investigation of domain adaptation in speaker embedding space for speaker recognition." Speech Communication 129 (May 2021): 7–16. http://dx.doi.org/10.1016/j.specom.2021.01.001.
Full textLi, Wenjie, Pengyuan Zhang, and Yonghong Yan. "TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition." Electronics Letters 55, no. 14 (July 2019): 816–19. http://dx.doi.org/10.1049/el.2019.1228.
Full textMingote, Victoria, Antonio Miguel, Alfonso Ortega, and Eduardo Lleida. "Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification." Applied Sciences 9, no. 16 (August 11, 2019): 3295. http://dx.doi.org/10.3390/app9163295.
Full textLIANG, Chunyan, Lin YANG, Qingwei ZHAO, and Yonghong YAN. "Factor Analysis of Neighborhood-Preserving Embedding for Speaker Verification." IEICE Transactions on Information and Systems E95.D, no. 10 (2012): 2572–76. http://dx.doi.org/10.1587/transinf.e95.d.2572.
Full textLin, Weiwei, Man-Wai Mak, Na Li, Dan Su, and Dong Yu. "A Framework for Adapting DNN Speaker Embedding Across Languages." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2810–22. http://dx.doi.org/10.1109/taslp.2020.3030499.
Full textByun, Jaeuk, and Jong Won Shin. "Monaural Speech Separation Using Speaker Embedding From Preliminary Separation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 2753–63. http://dx.doi.org/10.1109/taslp.2021.3101617.
Full textDissertations / Theses on the topic "Speaker embedding"
Cui, Ming. "Experiments in speaker diarization using speaker vectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.
Full texttalardiarisering är uppgiften att bestämma ”vem talade när?” i en ljud- eller videoinspelning som innehåller en okänd mängd tal och även ett okänt antal talare. Det har framstått som en allt viktigare och dedikerad domän inom talforskning. Ursprungligen föreslogs det som ett forskningsämne relaterat till automatisk taligenkänning, där talardiarisering fungerar som ett processteg upströms. Under de senaste åren har dock talardiarisering blivit en viktig nyckelteknik för många uppgifter, till exempel navigering, hämtning, eller högre nivå slutledning på ljuddata. Vår forskning fokuserar på de befintliga algoritmerna för talare diarisering. Speciellt riktar sig avhandlingen på skillnaderna mellan övervakade och oövervakade metoder. Syftet med denna avhandling är att kontrollera de mest avancerade algoritmerna och analysera vilken algoritm som passar bäst för våra applikationsscenarier. Dess huvudsakliga bidrag är (1) en empirisk studie av algoritmer för talare diarisering; (2) lämplig förbehandling av corpusdata, (3) ljudinbäddningsnätverk för att skapa d-vektorer; (4) experiment på olika algoritmer och corpus och jämförelse av dem; (5) en bra rekommendation för våra krav. Den empiriska studien visar att för inbäddning av extraktionsmodul, på grund av de neurala nätverkna kan utbildas med stora datamängder, diariseringsprestandan kan förbättras avsevärt genom att ersätta i-vektorer med dvektorer. Dessutom är skillnaderna mellan övervakade metoder och oövervakade metoder mestadels i klustermodulen. Avhandlingen använder endast dvektorer som ingång till diariseringsnätverk och väljer två huvudalgoritmer som jämförobjekt: Spektralkluster representerar oövervakad metod och obegränsat återkommande neuralt nätverk (UIS-RNN) representerar övervakad metod.
Lukáč, Peter. "Verifikace osob podle hlasu bez extrakce příznaků." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445531.
Full textFahlström, Myrman Arvid. "Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks." Thesis, KTH, Tal, musik och hörsel, TMH, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210237.
Full textObevakad inlärning av tal innebär att automatiskt hitta mönster i tal, t ex ord eller talljud, utan bevakning i form av ortografiska transkriptioner eller tidigare kunskap om språket. Ett grundläggande problem är dock att obevakad talinlärning tenderar att hitta väldigt talar- och kontextspecifika representationer av tal. Vi föreslår en metod för att förbättra kvaliteten av posteriorgram genererade med en obevakad modell, genom att partitionera de latenta klasserna funna av modellen. Vi gör detta genom att träna en gles siamesisk modell för att hitta en linjär transformering av de givna posteriorgrammen, extraherade från den obevakade modellen, till lågdimensionella posteriorgram. Den siamesiska modellen använder sig av talfragmentpar funna med obevakad ordupptäckning, där varje par består av fragment som antingen tillhör samma eller olika klasser. Den färdigtränade modellen görs sedan om till en exakt partitionering av posteriorgrammen. Vi följer Zero Resource Speech Challenge, och evaluerar modellen med hjälp av minimala ordpar-ABX-uppgiften. Vi demonstrerar att vår metod avsevärt minskar posteriorgrammens dimensionalitet, samtidigt som posteriorgrammen blir mer talarinvarianta. Detta antyder att modellen kan vara användbar som ett generellt extra steg för att förbättra probabilistiska akustiska särdrag från obevakade modeller.
Chung-KoYin and 尹崇珂. "Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversational Systems." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/h5smz2.
Full textChe-ChingHuang and 黃喆青. "Speaker Change Detection using Speaker and Articulatory Feature Embeddings." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ge4d25.
Full text國立成功大學
資訊工程學系
107
Nowadays, with the improvement and advancement of many related technologies for voice processing, voice interactive software and products have become more and more popular. In the part of the multi-person dialogue voice, we will need to use the speaker change point detection technology to perform voice pre-processing, and then do further analysis and processing. In the past research on speaker change point detection, most of them are based on the characteristics of acoustic features for detection. The method proposed in this thesis is to provide the speaker information from the perspective of articulatory features. The difference in pronunciation characteristics is to improve the accuracy of the detection of the change point of the speaker and achieve complementary effects. In this thesis, we use a convolutional neural network to train a speaker embedding model by extracting acoustic features from speech to obtain a vector representing the characteristics of the speaker. In addition, a model of articulatory features (AFs) is trained, and a multi-layered perception network is used to extract the AF embedding of speech features. Finally, using these two vectors, another multilayer perceptual network is used to train the speaker change detection model. The trained speaker change detection model is helpful to determine the exact position of the change point. Two speech databases were utilized in this thesis. The first one was the VoxCeleb2 speech database, which was a corpus widely used in the field of speaker identification. The second was the Librispeech corpus, which was a corpus widely used in the field of speech or speaker recognition. In this thesis, we mainly trained three models, the first was the speaker embedding model, the second was the AF embedding model, and the third was the speaker change detection model. In the speaker embedding model part, this thesis used the VoxCeleb2 database for training and evaluation of model parameter settings. In the articulatory feature embedding model, we used the Librispeech database for training and evaluation of model parameter settings. In the part of the speaker change detection model, we used a dialogue corpus composed from the VoxCeleb2 database to train the speaker change detection model. In the speaker change detection task, the experimental results showed that the proposed method could reduce false alarm rate by 1.94%, increasing accuracy by 1.1%, precision rate by 2.04% and F1 score by 0.16%. From the experimental results, the proposed method was superior to the traditional method and could be applied to products that require speaker change detection.
Books on the topic "Speaker embedding"
Camp, Elisabeth. A Dual Act Analysis of Slurs. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198758655.003.0003.
Full textBook chapters on the topic "Speaker embedding"
Karam, Z. N., and W. M. Campbell. "Graph Embedding for Speaker Recognition." In Graph Embedding for Pattern Analysis, 229–60. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4457-2_10.
Full textZhou, Kai, Qun Yang, Xiusong Sun, and Shaohan Liu. "A Deep Speaker Embedding Transfer Method for Speaker Verification." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery, 369–76. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32456-8_40.
Full textZhou, Dao, Longbiao Wang, Kong Aik Lee, Meng Liu, and Jianwu Dang. "Deep Discriminative Embedding with Ranked Weight for Speaker Verification." In Communications in Computer and Information Science, 79–86. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63823-8_10.
Full textAmani, Arash, Mohammad Mohammadamini, and Hadi Veisi. "Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding." In Speech and Computer, 50–57. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_5.
Full textTkachenko, Maxim, Alexander Yamshinin, Mikhail Kotov, and Marina Nastasenko. "Lightweight Embeddings for Speaker Verification." In Speech and Computer, 687–96. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_70.
Full textCyrta, Pawel, Tomasz Trzciński, and Wojciech Stokowiec. "Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings." In Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017, 107–17. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-67220-5_10.
Full textÜlgen, İsmail Rasim, Mustafa Erden, and Levent M. Arslan. "Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme." In Speech and Computer, 826–36. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_74.
Full textDesgrippes, Magalie, and Amelia Lambelet. "3. On the Sociolinguistic Embedding of Portuguese Heritage Language Speakers in Switzerland: Socio-Economic Status and Home Literacy Environment (HELASCOT Project)." In Heritage and School Language Literacy Development in Migrant Children, edited by Raphael Berthele and Amelia Lambelet, 34–57. Bristol, Blue Ridge Summit: Multilingual Matters, 2017. http://dx.doi.org/10.21832/9781783099054-004.
Full textMillikan, Ruth. "Embedding Language in the World." In Singular Thought and Mental Files, 251–64. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198746881.003.0012.
Full textKirk-Giannini, Cameron Domenico, and Ernie Lepore. "Attributions of Attitude (May 22, 1970)." In The Structure of Truth, 66–81. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198842491.003.0005.
Full textConference papers on the topic "Speaker embedding"
Li, Lantian, Chao Xing, Dong Wang, Kaimin Yu, and Thomas Fang Zheng. "Binary speaker embedding." In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2016. http://dx.doi.org/10.1109/iscslp.2016.7918381.
Full textJung, Jee-Weon, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim, and Ha-Jin Yu. "Selective Deep Speaker Embedding Enhancement for Speaker Verification." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-25.
Full textKottur, Satwik, Xiaoyu Wang, and Vitor Carvalho. "Exploring Personalized Neural Conversational Models." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/521.
Full textHan, Min Hyun, Woo Hyun Kang, Sung Hwan Mun, and Nam Soo Kim. "Information Preservation Pooling for Speaker Embedding." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-9.
Full textChen, Chia-Ping, Su-Yu Zhang, Chih-Ting Yeh, Jia-Ching Wang, Tenghui Wang, and Chien-Lin Huang. "Speaker Characterization Using TDNN-LSTM Based Speaker Embedding." In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683185.
Full textGeorges, Munir, Jonathan Huang, and Tobias Bocklet. "Compact Speaker Embedding: lrx-Vector." In Interspeech 2020. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/interspeech.2020-2106.
Full textToruk, Mesut, Gokhan Bilgin, and Ahmet Serbes. "Speaker Diarization using Embedding Vectors." In 2020 28th Signal Processing and Communications Applications Conference (SIU). IEEE, 2020. http://dx.doi.org/10.1109/siu49456.2020.9302162.
Full textKaram, Zahi N., and William M. Campbell. "Graph-embedding for speaker recognition." In Interspeech 2010. ISCA: ISCA, 2010. http://dx.doi.org/10.21437/interspeech.2010-726.
Full textWang, Po-Chin, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan, and Shan-Wen Hsiao. "Improving Embedding-based Neural-Network Speaker Recognition." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-8.
Full textYamamoto, Hitoshi, Kong Aik Lee, Koji Okabe, and Takafumi Koshinaka. "Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding." In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-1508.
Full text