Auswahl der wissenschaftlichen Literatur zum Thema „Speaker embedding“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Speaker embedding" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Speaker embedding"

1

Kang, Woo Hyun, Sung Hwan Mun, Min Hyun Han und Nam Soo Kim. „Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification“. IEEE Access 8 (2020): 141838–49. http://dx.doi.org/10.1109/access.2020.3012893.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Lee, Kong Aik, Qiongqiong Wang und Takafumi Koshinaka. „Xi-Vector Embedding for Speaker Recognition“. IEEE Signal Processing Letters 28 (2021): 1385–89. http://dx.doi.org/10.1109/lsp.2021.3091932.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Sečujski, Milan, Darko Pekar, Siniša Suzić, Anton Smirnov und Tijana Nosek. „Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding“. JUCS - Journal of Universal Computer Science 26, Nr. 4 (28.04.2020): 434–53. http://dx.doi.org/10.3897/jucs.2020.023.

Der volle Inhalt der Quelle
Annotation:
The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the similarities and differences between speakers and speaking styles more efficiently. The initial model from which speaker/style adaptation was carried out was a multi-speaker/multi-style model based on 8.5 hours of American English speech data which corresponds to 16 different speaker/style combinations. The results of the experiments show that both versions of the obtained system, one using 10 minutes and the other as little as 30 seconds of target data, outperform the state of the art in parametric speaker/style-dependent speech synthesis. This opens a wide range of application of speaker/style dependent speech synthesis based on small quantities of training data, in domains ranging from customer interaction in call centers to robot-assisted medical therapy.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Bae, Ara, und Wooil Kim. „Speaker Verification Employing Combinations of Self-Attention Mechanisms“. Electronics 9, Nr. 12 (21.12.2020): 2201. http://dx.doi.org/10.3390/electronics9122201.

Der volle Inhalt der Quelle
Annotation:
One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting the speaker embedding using attention mechanism instead of average or statistics pooling. In the attention method, the speaker recognition performance is improved by employing multiple heads rather than a single head. In this paper, we propose advanced methods to extract a new embedding by compensating for the disadvantages of the single-head and multi-head attention methods. The combination method comprising single-head and split-based multi-head attentions shows a 5.39% Equal Error Rate (EER). When the single-head and projection-based multi-head attention methods are combined, the speaker recognition performance improves by 4.45%, which is the best performance in this work. Our experimental results demonstrate that the attention mechanism reflects the speaker’s properties more effectively than average or statistics pooling, and the speaker verification system could be further improved by employing combinations of different attention techniques.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Bahmaninezhad, Fahimeh, Chunlei Zhang und John H. L. Hansen. „An investigation of domain adaptation in speaker embedding space for speaker recognition“. Speech Communication 129 (Mai 2021): 7–16. http://dx.doi.org/10.1016/j.specom.2021.01.001.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Li, Wenjie, Pengyuan Zhang und Yonghong Yan. „TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition“. Electronics Letters 55, Nr. 14 (Juli 2019): 816–19. http://dx.doi.org/10.1049/el.2019.1228.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Mingote, Victoria, Antonio Miguel, Alfonso Ortega und Eduardo Lleida. „Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification“. Applied Sciences 9, Nr. 16 (11.08.2019): 3295. http://dx.doi.org/10.3390/app9163295.

Der volle Inhalt der Quelle
Annotation:
In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

LIANG, Chunyan, Lin YANG, Qingwei ZHAO und Yonghong YAN. „Factor Analysis of Neighborhood-Preserving Embedding for Speaker Verification“. IEICE Transactions on Information and Systems E95.D, Nr. 10 (2012): 2572–76. http://dx.doi.org/10.1587/transinf.e95.d.2572.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Lin, Weiwei, Man-Wai Mak, Na Li, Dan Su und Dong Yu. „A Framework for Adapting DNN Speaker Embedding Across Languages“. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2810–22. http://dx.doi.org/10.1109/taslp.2020.3030499.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Byun, Jaeuk, und Jong Won Shin. „Monaural Speech Separation Using Speaker Embedding From Preliminary Separation“. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 2753–63. http://dx.doi.org/10.1109/taslp.2021.3101617.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema "Speaker embedding"

1

Cui, Ming. „Experiments in speaker diarization using speaker vectors“. Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.

Der volle Inhalt der Quelle
Annotation:
Speaker Diarization is the task of determining ‘who spoke when?’ in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. It has emerged as an increasingly important and dedicated domain of speech research. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Our research focuses on the existing speaker diarization algorithms. Particularly, the thesis targets the differences between supervised and unsupervised methods. The aims of this thesis is to check the state-of-the-art algorithms and analyze which algorithm is most suitable for our application scenarios. Its main contributions are (1) an empirical study of speaker diarization algorithms; (2) appropriate corpus data pre-processing; (3) audio embedding network for creating d-vectors; (4) experiments on different algorithms and corpus and comparison of them; (5) a good recommendation for our requirements. The empirical study shows that, for embedding extraction module, due to the neural networks can be trained with big datasets, the diarization performance can be significantly improved by replacing i-vectors with d-vectors. Moreover, the differences between supervised methods and unsupervised methods are mostly in clustering module. The thesis only uses d-vectors as the input of diarization network and selects two main algorithms as compare objects: Spectral Clustering represents unsupervised method and Unbounded Interleaved-state Recurrent Neural Network (UIS-RNN) represents supervised method.
talardiarisering är uppgiften att bestämma ”vem talade när?” i en ljud- eller videoinspelning som innehåller en okänd mängd tal och även ett okänt antal talare. Det har framstått som en allt viktigare och dedikerad domän inom talforskning. Ursprungligen föreslogs det som ett forskningsämne relaterat till automatisk taligenkänning, där talardiarisering fungerar som ett processteg upströms. Under de senaste åren har dock talardiarisering blivit en viktig nyckelteknik för många uppgifter, till exempel navigering, hämtning, eller högre nivå slutledning på ljuddata. Vår forskning fokuserar på de befintliga algoritmerna för talare diarisering. Speciellt riktar sig avhandlingen på skillnaderna mellan övervakade och oövervakade metoder. Syftet med denna avhandling är att kontrollera de mest avancerade algoritmerna och analysera vilken algoritm som passar bäst för våra applikationsscenarier. Dess huvudsakliga bidrag är (1) en empirisk studie av algoritmer för talare diarisering; (2) lämplig förbehandling av corpusdata, (3) ljudinbäddningsnätverk för att skapa d-vektorer; (4) experiment på olika algoritmer och corpus och jämförelse av dem; (5) en bra rekommendation för våra krav. Den empiriska studien visar att för inbäddning av extraktionsmodul, på grund av de neurala nätverkna kan utbildas med stora datamängder, diariseringsprestandan kan förbättras avsevärt genom att ersätta i-vektorer med dvektorer. Dessutom är skillnaderna mellan övervakade metoder och oövervakade metoder mestadels i klustermodulen. Avhandlingen använder endast dvektorer som ingång till diariseringsnätverk och väljer två huvudalgoritmer som jämförobjekt: Spektralkluster representerar oövervakad metod och obegränsat återkommande neuralt nätverk (UIS-RNN) representerar övervakad metod.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Lukáč, Peter. „Verifikace osob podle hlasu bez extrakce příznaků“. Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445531.

Der volle Inhalt der Quelle
Annotation:
Verifikácia osôb je oblasť, ktorá sa stále modernizuje, zlepšuje a snaží sa vyhovieť požiadavkám, ktoré sa na ňu kladú vo oblastiach využitia ako sú autorizačné systmémy, forenzné analýzy, atď. Vylepšenia sa uskutočňujú vďaka pokrom v hlbokom učení, tvorením nových trénovacích a testovacích dátovych sad a rôznych súťaží vo verifikácií osôb a workshopov. V tejto práci preskúmame modely pre verifikáciu osôb bez extrakcie príznakov. Používanie nespracovaných zvukových stôp ako vstupy modelov zjednodušuje spracovávanie vstpu a teda znižujú sa výpočetné a pamäťové požiadavky a redukuje sa počet hyperparametrov potrebných pre tvorbu príznakov z nahrávok, ktoré ovplivňujú výsledky. Momentálne modely bez extrakcie príznakov nedosahujú výsledky modelov s extrakciou príznakov. Na základných modeloch budeme experimentovať s modernými technikamy a budeme sa snažiť zlepšiť presnosť modelov. Experimenty s modernými technikamy značne zlepšili výsledky základných modelov ale stále sme nedosiahli výsledky vylepšeného modelu s extrakciou príznakov. Zlepšenie je ale dostatočné nato aby sme vytovrili fúziu so s týmto modelom. Záverom diskutujeme dosiahnuté výsledky a navrhujeme zlepšenia na základe týchto výsledkov.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Fahlström, Myrman Arvid. „Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks“. Thesis, KTH, Tal, musik och hörsel, TMH, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210237.

Der volle Inhalt der Quelle
Annotation:
Unsupervised learning of speech is concerned with automatically finding patterns such as words or speech sounds, without supervision in the form of orthographical transcriptions or a priori knowledge of the language. However, a fundamental problem is that unsupervised speech learning methods tend to discover highly speaker-specific and context-dependent representations of speech. We propose a method for improving the quality of posteriorgrams generated from an unsupervised model through partitioning of the latent classes discovered by the model. We do this by training a sparse siamese model to find a linear transformation of input posteriorgrams, extracted from the unsupervised model, to lower-dimensional posteriorgrams. The siamese model makes use of same-category and different-category speech fragment pairs obtained through unsupervised term discovery. After training, the model is converted into an exact partitioning of the posteriorgrams. We evaluate the model on the minimal-pair ABX task in the context of the Zero Resource Speech Challenge. We are able to demonstrate that our method significantly reduces the dimensionality of standard Gaussian mixture model posteriorgrams, while also making them more speaker invariant. This suggests that the model may be viable as a general post-processing step to improve probabilistic acoustic features obtained by unsupervised learning.
Obevakad inlärning av tal innebär att automatiskt hitta mönster i tal, t ex ord eller talljud, utan bevakning i form av ortografiska transkriptioner eller tidigare kunskap om språket. Ett grundläggande problem är dock att obevakad talinlärning tenderar att hitta väldigt talar- och kontextspecifika representationer av tal. Vi föreslår en metod för att förbättra kvaliteten av posteriorgram genererade med en obevakad modell, genom att partitionera de latenta klasserna funna av modellen. Vi gör detta genom att träna en gles siamesisk modell för att hitta en linjär transformering av de givna posteriorgrammen, extraherade från den obevakade modellen, till lågdimensionella posteriorgram. Den siamesiska modellen använder sig av talfragmentpar funna med obevakad ordupptäckning, där varje par består av fragment som antingen tillhör samma eller olika klasser. Den färdigtränade modellen görs sedan om till en exakt partitionering av posteriorgrammen. Vi följer Zero Resource Speech Challenge, och evaluerar modellen med hjälp av minimala ordpar-ABX-uppgiften. Vi demonstrerar att vår metod avsevärt minskar posteriorgrammens dimensionalitet, samtidigt som posteriorgrammen blir mer talarinvarianta. Detta antyder att modellen kan vara användbar som ett generellt extra steg för att förbättra probabilistiska akustiska särdrag från obevakade modeller.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Chung-KoYin und 尹崇珂. „Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversational Systems“. Thesis, 2018. http://ndltd.ncl.edu.tw/handle/h5smz2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Che-ChingHuang und 黃喆青. „Speaker Change Detection using Speaker and Articulatory Feature Embeddings“. Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ge4d25.

Der volle Inhalt der Quelle
Annotation:
碩士
國立成功大學
資訊工程學系
107
Nowadays, with the improvement and advancement of many related technologies for voice processing, voice interactive software and products have become more and more popular. In the part of the multi-person dialogue voice, we will need to use the speaker change point detection technology to perform voice pre-processing, and then do further analysis and processing. In the past research on speaker change point detection, most of them are based on the characteristics of acoustic features for detection. The method proposed in this thesis is to provide the speaker information from the perspective of articulatory features. The difference in pronunciation characteristics is to improve the accuracy of the detection of the change point of the speaker and achieve complementary effects. In this thesis, we use a convolutional neural network to train a speaker embedding model by extracting acoustic features from speech to obtain a vector representing the characteristics of the speaker. In addition, a model of articulatory features (AFs) is trained, and a multi-layered perception network is used to extract the AF embedding of speech features. Finally, using these two vectors, another multilayer perceptual network is used to train the speaker change detection model. The trained speaker change detection model is helpful to determine the exact position of the change point. Two speech databases were utilized in this thesis. The first one was the VoxCeleb2 speech database, which was a corpus widely used in the field of speaker identification. The second was the Librispeech corpus, which was a corpus widely used in the field of speech or speaker recognition. In this thesis, we mainly trained three models, the first was the speaker embedding model, the second was the AF embedding model, and the third was the speaker change detection model. In the speaker embedding model part, this thesis used the VoxCeleb2 database for training and evaluation of model parameter settings. In the articulatory feature embedding model, we used the Librispeech database for training and evaluation of model parameter settings. In the part of the speaker change detection model, we used a dialogue corpus composed from the VoxCeleb2 database to train the speaker change detection model. In the speaker change detection task, the experimental results showed that the proposed method could reduce false alarm rate by 1.94%, increasing accuracy by 1.1%, precision rate by 2.04% and F1 score by 0.16%. From the experimental results, the proposed method was superior to the traditional method and could be applied to products that require speaker change detection.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bücher zum Thema "Speaker embedding"

1

Camp, Elisabeth. A Dual Act Analysis of Slurs. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198758655.003.0003.

Der volle Inhalt der Quelle
Annotation:
Slurs are incendiary terms—many deny that sentences containing them can ever be true. And utterances where they occur embedded within normally “quarantining” contexts, like conditionals and indirect reports, can still seem offensive. At the same time, others find that sentences containing slurs can be true; and there are clear cases where embedding does inoculate a speaker from the slur’s offensiveness. This chapter argues that four standard accounts of the “other” element that differentiates slurs from their more neutral counterparts—semantic content, perlocutionary effect, presupposition, and conventional implicature—all fail to account for this puzzling mixture of intuitions. Instead, it proposes that slurs make two distinct, coordinated contributions to a sentence’s conventional communicative role.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Speaker embedding"

1

Karam, Z. N., und W. M. Campbell. „Graph Embedding for Speaker Recognition“. In Graph Embedding for Pattern Analysis, 229–60. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4457-2_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Zhou, Kai, Qun Yang, Xiusong Sun und Shaohan Liu. „A Deep Speaker Embedding Transfer Method for Speaker Verification“. In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery, 369–76. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32456-8_40.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Zhou, Dao, Longbiao Wang, Kong Aik Lee, Meng Liu und Jianwu Dang. „Deep Discriminative Embedding with Ranked Weight for Speaker Verification“. In Communications in Computer and Information Science, 79–86. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63823-8_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Amani, Arash, Mohammad Mohammadamini und Hadi Veisi. „Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding“. In Speech and Computer, 50–57. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_5.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Tkachenko, Maxim, Alexander Yamshinin, Mikhail Kotov und Marina Nastasenko. „Lightweight Embeddings for Speaker Verification“. In Speech and Computer, 687–96. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_70.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Cyrta, Pawel, Tomasz Trzciński und Wojciech Stokowiec. „Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings“. In Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017, 107–17. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-67220-5_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Ülgen, İsmail Rasim, Mustafa Erden und Levent M. Arslan. „Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme“. In Speech and Computer, 826–36. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_74.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Desgrippes, Magalie, und Amelia Lambelet. „3. On the Sociolinguistic Embedding of Portuguese Heritage Language Speakers in Switzerland: Socio-Economic Status and Home Literacy Environment (HELASCOT Project)“. In Heritage and School Language Literacy Development in Migrant Children, herausgegeben von Raphael Berthele und Amelia Lambelet, 34–57. Bristol, Blue Ridge Summit: Multilingual Matters, 2017. http://dx.doi.org/10.21832/9781783099054-004.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Millikan, Ruth. „Embedding Language in the World“. In Singular Thought and Mental Files, 251–64. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198746881.003.0012.

Der volle Inhalt der Quelle
Annotation:
Direct reference theories hold that nothing beyond reference is carried from speaker to hearer by singular terms. The chapter argues the same is true of common nouns and most other extensional terms such as terms for properties, places, events, and actions. None of these terms carry descriptions, grasp of paradigm property sets, inferential mandates, or anything else to be “loosened” or “tightened” by pragmatic inference. Both thought and language are directly structured by the structure of the world itself, not by peculiarities of the human mind and not by convention. The route from speech to hearer understanding is indirect, passing, typically, through the hearer’s prior grasp of world structure, a structure that hearers may have idiosyncratic ways of grasping. They may have quite different ways of identifying the same thing; that is, different ways of recognizing when new natural or intentional information about the same is arriving at the sensory surfaces.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Kirk-Giannini, Cameron Domenico, und Ernie Lepore. „Attributions of Attitude (May 22, 1970)“. In The Structure of Truth, 66–81. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198842491.003.0005.

Der volle Inhalt der Quelle
Annotation:
The subject of Lecture IV is attributions of attitude. In it, Davidson extends his theory of indirect quotation, which had appeared in 1968, to propositional attitude ascriptions more generally. He begins by criticizing rival accounts due to Quine, Scheffler, Church, and Frege. His positive proposal turns on the idea that the complementizer clauses embedded in ascriptions of attitude are not semantically a part of the embedding sentence. According to the paratactic account he favors, attributions of attitude involve demonstrative reference to an utterance of the speaker’s, which is claimed to stand in some relation to some utterance or attitude of the ascribee.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Speaker embedding"

1

Li, Lantian, Chao Xing, Dong Wang, Kaimin Yu und Thomas Fang Zheng. „Binary speaker embedding“. In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2016. http://dx.doi.org/10.1109/iscslp.2016.7918381.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Jung, Jee-Weon, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim und Ha-Jin Yu. „Selective Deep Speaker Embedding Enhancement for Speaker Verification“. In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-25.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Kottur, Satwik, Xiaoyu Wang und Vitor Carvalho. „Exploring Personalized Neural Conversational Models“. In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/521.

Der volle Inhalt der Quelle
Annotation:
Modeling dialog systems is currently one of the most active problems in Natural Language Processing. Recent advancement in Deep Learning has sparked an interest in the use of neural networks in modeling language, particularly for personalized conversational agents that can retain contextual information during dialog exchanges. This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training, data cleaning, diversity reranking, evaluation setting, etc. Based on the tradeoffs of different models, we propose a new generative dialogue model conditioned on speakers as well as context history that outperforms all previous models on both retrieval and generative metrics. Our findings indicate that pretraining speaker embeddings on larger datasets, as well as bootstrapping word and speaker embeddings, can significantly improve performance (up to 3 points in perplexity), and that promoting diversity in using Mutual Information based techniques has a very strong effect in ranking metrics.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Han, Min Hyun, Woo Hyun Kang, Sung Hwan Mun und Nam Soo Kim. „Information Preservation Pooling for Speaker Embedding“. In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-9.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Chen, Chia-Ping, Su-Yu Zhang, Chih-Ting Yeh, Jia-Ching Wang, Tenghui Wang und Chien-Lin Huang. „Speaker Characterization Using TDNN-LSTM Based Speaker Embedding“. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683185.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Georges, Munir, Jonathan Huang und Tobias Bocklet. „Compact Speaker Embedding: lrx-Vector“. In Interspeech 2020. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/interspeech.2020-2106.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Toruk, Mesut, Gokhan Bilgin und Ahmet Serbes. „Speaker Diarization using Embedding Vectors“. In 2020 28th Signal Processing and Communications Applications Conference (SIU). IEEE, 2020. http://dx.doi.org/10.1109/siu49456.2020.9302162.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Karam, Zahi N., und William M. Campbell. „Graph-embedding for speaker recognition“. In Interspeech 2010. ISCA: ISCA, 2010. http://dx.doi.org/10.21437/interspeech.2010-726.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Wang, Po-Chin, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan und Shan-Wen Hsiao. „Improving Embedding-based Neural-Network Speaker Recognition“. In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-8.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Yamamoto, Hitoshi, Kong Aik Lee, Koji Okabe und Takafumi Koshinaka. „Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding“. In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-1508.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie