Academic literature on the topic 'Speaker embedding'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speaker embedding.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speaker embedding"

1

Kang, Woo Hyun, Sung Hwan Mun, Min Hyun Han, and Nam Soo Kim. "Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification." IEEE Access 8 (2020): 141838–49. http://dx.doi.org/10.1109/access.2020.3012893.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lee, Kong Aik, Qiongqiong Wang, and Takafumi Koshinaka. "Xi-Vector Embedding for Speaker Recognition." IEEE Signal Processing Letters 28 (2021): 1385–89. http://dx.doi.org/10.1109/lsp.2021.3091932.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sečujski, Milan, Darko Pekar, Siniša Suzić, Anton Smirnov, and Tijana Nosek. "Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding." JUCS - Journal of Universal Computer Science 26, no. 4 (April 28, 2020): 434–53. http://dx.doi.org/10.3897/jucs.2020.023.

Full text
Abstract:
The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the similarities and differences between speakers and speaking styles more efficiently. The initial model from which speaker/style adaptation was carried out was a multi-speaker/multi-style model based on 8.5 hours of American English speech data which corresponds to 16 different speaker/style combinations. The results of the experiments show that both versions of the obtained system, one using 10 minutes and the other as little as 30 seconds of target data, outperform the state of the art in parametric speaker/style-dependent speech synthesis. This opens a wide range of application of speaker/style dependent speech synthesis based on small quantities of training data, in domains ranging from customer interaction in call centers to robot-assisted medical therapy.
APA, Harvard, Vancouver, ISO, and other styles
4

Bae, Ara, and Wooil Kim. "Speaker Verification Employing Combinations of Self-Attention Mechanisms." Electronics 9, no. 12 (December 21, 2020): 2201. http://dx.doi.org/10.3390/electronics9122201.

Full text
Abstract:
One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting the speaker embedding using attention mechanism instead of average or statistics pooling. In the attention method, the speaker recognition performance is improved by employing multiple heads rather than a single head. In this paper, we propose advanced methods to extract a new embedding by compensating for the disadvantages of the single-head and multi-head attention methods. The combination method comprising single-head and split-based multi-head attentions shows a 5.39% Equal Error Rate (EER). When the single-head and projection-based multi-head attention methods are combined, the speaker recognition performance improves by 4.45%, which is the best performance in this work. Our experimental results demonstrate that the attention mechanism reflects the speaker’s properties more effectively than average or statistics pooling, and the speaker verification system could be further improved by employing combinations of different attention techniques.
APA, Harvard, Vancouver, ISO, and other styles
5

Bahmaninezhad, Fahimeh, Chunlei Zhang, and John H. L. Hansen. "An investigation of domain adaptation in speaker embedding space for speaker recognition." Speech Communication 129 (May 2021): 7–16. http://dx.doi.org/10.1016/j.specom.2021.01.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Wenjie, Pengyuan Zhang, and Yonghong Yan. "TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition." Electronics Letters 55, no. 14 (July 2019): 816–19. http://dx.doi.org/10.1049/el.2019.1228.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Mingote, Victoria, Antonio Miguel, Alfonso Ortega, and Eduardo Lleida. "Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification." Applied Sciences 9, no. 16 (August 11, 2019): 3295. http://dx.doi.org/10.3390/app9163295.

Full text
Abstract:
In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part.
APA, Harvard, Vancouver, ISO, and other styles
8

LIANG, Chunyan, Lin YANG, Qingwei ZHAO, and Yonghong YAN. "Factor Analysis of Neighborhood-Preserving Embedding for Speaker Verification." IEICE Transactions on Information and Systems E95.D, no. 10 (2012): 2572–76. http://dx.doi.org/10.1587/transinf.e95.d.2572.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lin, Weiwei, Man-Wai Mak, Na Li, Dan Su, and Dong Yu. "A Framework for Adapting DNN Speaker Embedding Across Languages." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2810–22. http://dx.doi.org/10.1109/taslp.2020.3030499.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Byun, Jaeuk, and Jong Won Shin. "Monaural Speech Separation Using Speaker Embedding From Preliminary Separation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 2753–63. http://dx.doi.org/10.1109/taslp.2021.3101617.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Speaker embedding"

1

Cui, Ming. "Experiments in speaker diarization using speaker vectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.

Full text
Abstract:
Speaker Diarization is the task of determining ‘who spoke when?’ in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. It has emerged as an increasingly important and dedicated domain of speech research. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Our research focuses on the existing speaker diarization algorithms. Particularly, the thesis targets the differences between supervised and unsupervised methods. The aims of this thesis is to check the state-of-the-art algorithms and analyze which algorithm is most suitable for our application scenarios. Its main contributions are (1) an empirical study of speaker diarization algorithms; (2) appropriate corpus data pre-processing; (3) audio embedding network for creating d-vectors; (4) experiments on different algorithms and corpus and comparison of them; (5) a good recommendation for our requirements. The empirical study shows that, for embedding extraction module, due to the neural networks can be trained with big datasets, the diarization performance can be significantly improved by replacing i-vectors with d-vectors. Moreover, the differences between supervised methods and unsupervised methods are mostly in clustering module. The thesis only uses d-vectors as the input of diarization network and selects two main algorithms as compare objects: Spectral Clustering represents unsupervised method and Unbounded Interleaved-state Recurrent Neural Network (UIS-RNN) represents supervised method.
talardiarisering är uppgiften att bestämma ”vem talade när?” i en ljud- eller videoinspelning som innehåller en okänd mängd tal och även ett okänt antal talare. Det har framstått som en allt viktigare och dedikerad domän inom talforskning. Ursprungligen föreslogs det som ett forskningsämne relaterat till automatisk taligenkänning, där talardiarisering fungerar som ett processteg upströms. Under de senaste åren har dock talardiarisering blivit en viktig nyckelteknik för många uppgifter, till exempel navigering, hämtning, eller högre nivå slutledning på ljuddata. Vår forskning fokuserar på de befintliga algoritmerna för talare diarisering. Speciellt riktar sig avhandlingen på skillnaderna mellan övervakade och oövervakade metoder. Syftet med denna avhandling är att kontrollera de mest avancerade algoritmerna och analysera vilken algoritm som passar bäst för våra applikationsscenarier. Dess huvudsakliga bidrag är (1) en empirisk studie av algoritmer för talare diarisering; (2) lämplig förbehandling av corpusdata, (3) ljudinbäddningsnätverk för att skapa d-vektorer; (4) experiment på olika algoritmer och corpus och jämförelse av dem; (5) en bra rekommendation för våra krav. Den empiriska studien visar att för inbäddning av extraktionsmodul, på grund av de neurala nätverkna kan utbildas med stora datamängder, diariseringsprestandan kan förbättras avsevärt genom att ersätta i-vektorer med dvektorer. Dessutom är skillnaderna mellan övervakade metoder och oövervakade metoder mestadels i klustermodulen. Avhandlingen använder endast dvektorer som ingång till diariseringsnätverk och väljer två huvudalgoritmer som jämförobjekt: Spektralkluster representerar oövervakad metod och obegränsat återkommande neuralt nätverk (UIS-RNN) representerar övervakad metod.
APA, Harvard, Vancouver, ISO, and other styles
2

Lukáč, Peter. "Verifikace osob podle hlasu bez extrakce příznaků." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445531.

Full text
Abstract:
Verifikácia osôb je oblasť, ktorá sa stále modernizuje, zlepšuje a snaží sa vyhovieť požiadavkám, ktoré sa na ňu kladú vo oblastiach využitia ako sú autorizačné systmémy, forenzné analýzy, atď. Vylepšenia sa uskutočňujú vďaka pokrom v hlbokom učení, tvorením nových trénovacích a testovacích dátovych sad a rôznych súťaží vo verifikácií osôb a workshopov. V tejto práci preskúmame modely pre verifikáciu osôb bez extrakcie príznakov. Používanie nespracovaných zvukových stôp ako vstupy modelov zjednodušuje spracovávanie vstpu a teda znižujú sa výpočetné a pamäťové požiadavky a redukuje sa počet hyperparametrov potrebných pre tvorbu príznakov z nahrávok, ktoré ovplivňujú výsledky. Momentálne modely bez extrakcie príznakov nedosahujú výsledky modelov s extrakciou príznakov. Na základných modeloch budeme experimentovať s modernými technikamy a budeme sa snažiť zlepšiť presnosť modelov. Experimenty s modernými technikamy značne zlepšili výsledky základných modelov ale stále sme nedosiahli výsledky vylepšeného modelu s extrakciou príznakov. Zlepšenie je ale dostatočné nato aby sme vytovrili fúziu so s týmto modelom. Záverom diskutujeme dosiahnuté výsledky a navrhujeme zlepšenia na základe týchto výsledkov.
APA, Harvard, Vancouver, ISO, and other styles
3

Fahlström, Myrman Arvid. "Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks." Thesis, KTH, Tal, musik och hörsel, TMH, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210237.

Full text
Abstract:
Unsupervised learning of speech is concerned with automatically finding patterns such as words or speech sounds, without supervision in the form of orthographical transcriptions or a priori knowledge of the language. However, a fundamental problem is that unsupervised speech learning methods tend to discover highly speaker-specific and context-dependent representations of speech. We propose a method for improving the quality of posteriorgrams generated from an unsupervised model through partitioning of the latent classes discovered by the model. We do this by training a sparse siamese model to find a linear transformation of input posteriorgrams, extracted from the unsupervised model, to lower-dimensional posteriorgrams. The siamese model makes use of same-category and different-category speech fragment pairs obtained through unsupervised term discovery. After training, the model is converted into an exact partitioning of the posteriorgrams. We evaluate the model on the minimal-pair ABX task in the context of the Zero Resource Speech Challenge. We are able to demonstrate that our method significantly reduces the dimensionality of standard Gaussian mixture model posteriorgrams, while also making them more speaker invariant. This suggests that the model may be viable as a general post-processing step to improve probabilistic acoustic features obtained by unsupervised learning.
Obevakad inlärning av tal innebär att automatiskt hitta mönster i tal, t ex ord eller talljud, utan bevakning i form av ortografiska transkriptioner eller tidigare kunskap om språket. Ett grundläggande problem är dock att obevakad talinlärning tenderar att hitta väldigt talar- och kontextspecifika representationer av tal. Vi föreslår en metod för att förbättra kvaliteten av posteriorgram genererade med en obevakad modell, genom att partitionera de latenta klasserna funna av modellen. Vi gör detta genom att träna en gles siamesisk modell för att hitta en linjär transformering av de givna posteriorgrammen, extraherade från den obevakade modellen, till lågdimensionella posteriorgram. Den siamesiska modellen använder sig av talfragmentpar funna med obevakad ordupptäckning, där varje par består av fragment som antingen tillhör samma eller olika klasser. Den färdigtränade modellen görs sedan om till en exakt partitionering av posteriorgrammen. Vi följer Zero Resource Speech Challenge, och evaluerar modellen med hjälp av minimala ordpar-ABX-uppgiften. Vi demonstrerar att vår metod avsevärt minskar posteriorgrammens dimensionalitet, samtidigt som posteriorgrammen blir mer talarinvarianta. Detta antyder att modellen kan vara användbar som ett generellt extra steg för att förbättra probabilistiska akustiska särdrag från obevakade modeller.
APA, Harvard, Vancouver, ISO, and other styles
4

Chung-KoYin and 尹崇珂. "Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversational Systems." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/h5smz2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Che-ChingHuang and 黃喆青. "Speaker Change Detection using Speaker and Articulatory Feature Embeddings." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ge4d25.

Full text
Abstract:
碩士
國立成功大學
資訊工程學系
107
Nowadays, with the improvement and advancement of many related technologies for voice processing, voice interactive software and products have become more and more popular. In the part of the multi-person dialogue voice, we will need to use the speaker change point detection technology to perform voice pre-processing, and then do further analysis and processing. In the past research on speaker change point detection, most of them are based on the characteristics of acoustic features for detection. The method proposed in this thesis is to provide the speaker information from the perspective of articulatory features. The difference in pronunciation characteristics is to improve the accuracy of the detection of the change point of the speaker and achieve complementary effects. In this thesis, we use a convolutional neural network to train a speaker embedding model by extracting acoustic features from speech to obtain a vector representing the characteristics of the speaker. In addition, a model of articulatory features (AFs) is trained, and a multi-layered perception network is used to extract the AF embedding of speech features. Finally, using these two vectors, another multilayer perceptual network is used to train the speaker change detection model. The trained speaker change detection model is helpful to determine the exact position of the change point. Two speech databases were utilized in this thesis. The first one was the VoxCeleb2 speech database, which was a corpus widely used in the field of speaker identification. The second was the Librispeech corpus, which was a corpus widely used in the field of speech or speaker recognition. In this thesis, we mainly trained three models, the first was the speaker embedding model, the second was the AF embedding model, and the third was the speaker change detection model. In the speaker embedding model part, this thesis used the VoxCeleb2 database for training and evaluation of model parameter settings. In the articulatory feature embedding model, we used the Librispeech database for training and evaluation of model parameter settings. In the part of the speaker change detection model, we used a dialogue corpus composed from the VoxCeleb2 database to train the speaker change detection model. In the speaker change detection task, the experimental results showed that the proposed method could reduce false alarm rate by 1.94%, increasing accuracy by 1.1%, precision rate by 2.04% and F1 score by 0.16%. From the experimental results, the proposed method was superior to the traditional method and could be applied to products that require speaker change detection.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Speaker embedding"

1

Camp, Elisabeth. A Dual Act Analysis of Slurs. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198758655.003.0003.

Full text
Abstract:
Slurs are incendiary terms—many deny that sentences containing them can ever be true. And utterances where they occur embedded within normally “quarantining” contexts, like conditionals and indirect reports, can still seem offensive. At the same time, others find that sentences containing slurs can be true; and there are clear cases where embedding does inoculate a speaker from the slur’s offensiveness. This chapter argues that four standard accounts of the “other” element that differentiates slurs from their more neutral counterparts—semantic content, perlocutionary effect, presupposition, and conventional implicature—all fail to account for this puzzling mixture of intuitions. Instead, it proposes that slurs make two distinct, coordinated contributions to a sentence’s conventional communicative role.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Speaker embedding"

1

Karam, Z. N., and W. M. Campbell. "Graph Embedding for Speaker Recognition." In Graph Embedding for Pattern Analysis, 229–60. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4457-2_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhou, Kai, Qun Yang, Xiusong Sun, and Shaohan Liu. "A Deep Speaker Embedding Transfer Method for Speaker Verification." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery, 369–76. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32456-8_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhou, Dao, Longbiao Wang, Kong Aik Lee, Meng Liu, and Jianwu Dang. "Deep Discriminative Embedding with Ranked Weight for Speaker Verification." In Communications in Computer and Information Science, 79–86. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63823-8_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Amani, Arash, Mohammad Mohammadamini, and Hadi Veisi. "Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding." In Speech and Computer, 50–57. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tkachenko, Maxim, Alexander Yamshinin, Mikhail Kotov, and Marina Nastasenko. "Lightweight Embeddings for Speaker Verification." In Speech and Computer, 687–96. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_70.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cyrta, Pawel, Tomasz Trzciński, and Wojciech Stokowiec. "Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings." In Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017, 107–17. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-67220-5_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ülgen, İsmail Rasim, Mustafa Erden, and Levent M. Arslan. "Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme." In Speech and Computer, 826–36. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_74.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Desgrippes, Magalie, and Amelia Lambelet. "3. On the Sociolinguistic Embedding of Portuguese Heritage Language Speakers in Switzerland: Socio-Economic Status and Home Literacy Environment (HELASCOT Project)." In Heritage and School Language Literacy Development in Migrant Children, edited by Raphael Berthele and Amelia Lambelet, 34–57. Bristol, Blue Ridge Summit: Multilingual Matters, 2017. http://dx.doi.org/10.21832/9781783099054-004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Millikan, Ruth. "Embedding Language in the World." In Singular Thought and Mental Files, 251–64. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198746881.003.0012.

Full text
Abstract:
Direct reference theories hold that nothing beyond reference is carried from speaker to hearer by singular terms. The chapter argues the same is true of common nouns and most other extensional terms such as terms for properties, places, events, and actions. None of these terms carry descriptions, grasp of paradigm property sets, inferential mandates, or anything else to be “loosened” or “tightened” by pragmatic inference. Both thought and language are directly structured by the structure of the world itself, not by peculiarities of the human mind and not by convention. The route from speech to hearer understanding is indirect, passing, typically, through the hearer’s prior grasp of world structure, a structure that hearers may have idiosyncratic ways of grasping. They may have quite different ways of identifying the same thing; that is, different ways of recognizing when new natural or intentional information about the same is arriving at the sensory surfaces.
APA, Harvard, Vancouver, ISO, and other styles
10

Kirk-Giannini, Cameron Domenico, and Ernie Lepore. "Attributions of Attitude (May 22, 1970)." In The Structure of Truth, 66–81. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198842491.003.0005.

Full text
Abstract:
The subject of Lecture IV is attributions of attitude. In it, Davidson extends his theory of indirect quotation, which had appeared in 1968, to propositional attitude ascriptions more generally. He begins by criticizing rival accounts due to Quine, Scheffler, Church, and Frege. His positive proposal turns on the idea that the complementizer clauses embedded in ascriptions of attitude are not semantically a part of the embedding sentence. According to the paratactic account he favors, attributions of attitude involve demonstrative reference to an utterance of the speaker’s, which is claimed to stand in some relation to some utterance or attitude of the ascribee.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speaker embedding"

1

Li, Lantian, Chao Xing, Dong Wang, Kaimin Yu, and Thomas Fang Zheng. "Binary speaker embedding." In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2016. http://dx.doi.org/10.1109/iscslp.2016.7918381.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jung, Jee-Weon, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim, and Ha-Jin Yu. "Selective Deep Speaker Embedding Enhancement for Speaker Verification." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kottur, Satwik, Xiaoyu Wang, and Vitor Carvalho. "Exploring Personalized Neural Conversational Models." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/521.

Full text
Abstract:
Modeling dialog systems is currently one of the most active problems in Natural Language Processing. Recent advancement in Deep Learning has sparked an interest in the use of neural networks in modeling language, particularly for personalized conversational agents that can retain contextual information during dialog exchanges. This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training, data cleaning, diversity reranking, evaluation setting, etc. Based on the tradeoffs of different models, we propose a new generative dialogue model conditioned on speakers as well as context history that outperforms all previous models on both retrieval and generative metrics. Our findings indicate that pretraining speaker embeddings on larger datasets, as well as bootstrapping word and speaker embeddings, can significantly improve performance (up to 3 points in perplexity), and that promoting diversity in using Mutual Information based techniques has a very strong effect in ranking metrics.
APA, Harvard, Vancouver, ISO, and other styles
4

Han, Min Hyun, Woo Hyun Kang, Sung Hwan Mun, and Nam Soo Kim. "Information Preservation Pooling for Speaker Embedding." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Chia-Ping, Su-Yu Zhang, Chih-Ting Yeh, Jia-Ching Wang, Tenghui Wang, and Chien-Lin Huang. "Speaker Characterization Using TDNN-LSTM Based Speaker Embedding." In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683185.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Georges, Munir, Jonathan Huang, and Tobias Bocklet. "Compact Speaker Embedding: lrx-Vector." In Interspeech 2020. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/interspeech.2020-2106.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Toruk, Mesut, Gokhan Bilgin, and Ahmet Serbes. "Speaker Diarization using Embedding Vectors." In 2020 28th Signal Processing and Communications Applications Conference (SIU). IEEE, 2020. http://dx.doi.org/10.1109/siu49456.2020.9302162.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Karam, Zahi N., and William M. Campbell. "Graph-embedding for speaker recognition." In Interspeech 2010. ISCA: ISCA, 2010. http://dx.doi.org/10.21437/interspeech.2010-726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Po-Chin, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan, and Shan-Wen Hsiao. "Improving Embedding-based Neural-Network Speaker Recognition." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yamamoto, Hitoshi, Kong Aik Lee, Koji Okabe, and Takafumi Koshinaka. "Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding." In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-1508.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography