Relevant bibliographies by topics / Diarisation de la parole

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Diarisation de la parole'

Author: Grafiati

Published: 30 November 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Diarisation de la parole.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Diarisation de la parole"

Sun, Guangzhi, Chao Zhang, and Philip C. Woodland. "Combination of deep speaker embeddings for diarisation." Neural Networks 141 (September 2021): 372–84. http://dx.doi.org/10.1016/j.neunet.2021.04.020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zelenák, M., and J. Hernando. "Speaker overlap detection with prosodic features for speaker diarisation." IET Signal Processing 6, no. 8 (October 1, 2012): 798–804. http://dx.doi.org/10.1049/iet-spr.2011.0233.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cassetta, Michele. "Parole, parole, parole." Dental Cadmos 87, no. 01 (September 2019): 588. http://dx.doi.org/10.19256/d.cadmos.09.2019.08.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chiari, Alexis. "Parole parole." Feuillets psychanalytiques N° 2, no. 1 (September 21, 2017): 65–77. http://dx.doi.org/10.3917/fpsy.002.0065.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Richmond, J. L., and B. J. Halkon. "Speaker Diarisation of Vibroacoustic Intelligence from Drone Mounted Laser Doppler Vibrometers." Journal of Physics: Conference Series 2041, no. 1 (October 1, 2021): 012011. http://dx.doi.org/10.1088/1742-6596/2041/1/012011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tyszler, Jean-Jacques. "Parole vide, parole pleine, parole imposée." Journal français de psychiatrie 45, no. 1 (2017): 70. http://dx.doi.org/10.3917/jfp.045.0070.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Schiavinato, Jacques. "Parole égarée, parole retrouvée." Revue de psychothérapie psychanalytique de groupe 28, no. 1 (1997): 115–27. http://dx.doi.org/10.3406/rppg.1997.1366.

Full text

Abstract:

Palabra perdida, palabra recobrada. Recorrido de un adolescente participante a un grupo de psicodrama. ¿ Cómo los procesos que intervienen en este trabajo de grupo estan mobilizados ? ¿ Como las relaciones intersubjetivas de los miembros del grupo intervienen en la elaboracion intra-subjetiva ? Cómo se puede distinguir el trabajo psíquico efectuado por este adolescente, en los movimientos transfero-contra-transferenciales entre él, los otros miembros del grupo y los terapeutas. Cómo, más específicamente, el psicodrama, por su función de representación, le va a permitir a este adolescente de proyectar en la escena, la manera de cómo ha podido ser maltratado por sus objetos internos. Cómo, por fin, se cumple un trabajo de descondensación y de arreglo de su vida psíquica.

APA, Harvard, Vancouver, ISO, and other styles

Bouville, Jean-Marc. "Parole d’enfant, parole à l’enfant, parole sur l’enfant." Revue de l'enfance et de l'adolescence 94, no. 2 (2016): 7. http://dx.doi.org/10.3917/read.094.0007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Aguiar, Flavio. "Macounaïma : parole perdue, parole retrouvée." Études françaises 28, no. 2-3 (1992): 59. http://dx.doi.org/10.7202/035881ar.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tournon, André. "Parole de badin, parole irrécusable." Réforme, Humanisme, Renaissance 76, no. 1 (2013): 107–18. http://dx.doi.org/10.3406/rhren.2013.3294.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Diarisation de la parole"

Cui, Can. "Séparation, diarisation et reconnaissance de la parole conjointes pour la transcription automatique de réunions." Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0103.

Full text

Abstract:

La transcription de réunions enregistrées par une antenne de microphones distante est particulièrement difficile en raison de la superposition des locuteurs, du bruit ambiant et de la réverbération. Pour résoudre ces problèmes, nous avons exploré trois approches. Premièrement, nous utilisons un modèle de séparation de sources multicanal pour séparer les locuteurs, puis un modèle de reconnaissance automatique de la parole (ASR) monocanal et mono-locuteur pour transcrire la parole séparée et rehaussée. Deuxièmement, nous proposons un modèle multicanal multi-locuteur de bout-en-bout (MC-SA-ASR), qui s'appuie sur un modèle multi-locuteur monocanal (SA-ASR) existant et inclut un encodeur multicanal par Conformer avec un mécanisme d'attention multi-trame intercanale (MFCCA). Contrairement aux approches traditionnelles qui nécessitent un modèle de rehaussement de la parole multicanal en amont, le modèle MC-SA-ASR traite les microphones distants de bout-en-bout. Nous avons également expérimenté différentes caractéristiques d'entrée, dont le banc de filtres Mel et les caractéristiques de phase, pour ce modèle. Enfin, nous utilisons un modèle de formation de voies et de rehaussement multicanal comme pré-traitement, suivi d'un modèle SA-ASR monocanal pour traiter la parole multi-locuteur rehaussée. Nous avons testé différentes techniques de formation de voies fixe, hybride ou neuronale et proposé d'apprendre conjointement les modèles de formation de voies neuronale et de SA-ASR en utilisant le coût d'apprentissage de ce dernier. En plus de ces méthodes, nous avons développé un pipeline de transcription de réunions qui intègre la détection de l'activité vocale, la diarisation et le SA-ASR pour traiter efficacement les enregistrements de réunions réelles. Les résultats expérimentaux indiquent que, même si l'utilisation d'un modèle de séparation de sources peut améliorer la qualité de la parole, les erreurs de séparation peuvent se propager à l'ASR, entraînant des performances sous-optimales. Une approche guidée de séparation de sources s'avère plus efficace. Notre modèle MC-SA-ASR proposé démontre l'efficacité de l'intégration des informations multicanales et des informations partagées entre les modules d'ASR et de locuteur. Des expériences avec différentes catactéristiques d'entrée révèlent que les modèles appris avec les caractéristiques de Mel Filterbank fonctionnent mieux en termes de taux d'erreur sur les mots (WER) et de taux d'erreur sur les locuteurs (SER) lorsque le nombre de canaux et de locuteurs est faible (2 canaux avec 1 ou 2 locuteurs). Cependant, pour les configurations à 3 ou 4 canaux et 3 locuteurs, les modèles appris sur des caractéristiques de phase supplémentaires surpassent ceux utilisant uniquement les caractéristiques Mel. Cela suggère que les informations de phase peuvent améliorer la transcription du contenu vocal en exploitant les informations de localisation provenant de plusieurs canaux. Bien que MC-SA-ASR basé sur MFCCA surpasse les modèles SA-ASR et MC-ASR monocanal sans module de locuteur, les modèle de formation de voies et de SA-ASR conjointes permet d'obtenir des résultats encore meilleurs. Plus précisément, l'apprentissage conjoint de la formation de voies neuronale et de SA-ASR donne les meilleures performances, ce qui indique que l'amélioration de la qualité de la parole pourrait être une approche plus directe et plus efficace que l'utilisation d'un modèle MC-SA-ASR de bout-en-bout pour la transcription de réunions multicanales. En outre, l'étude du pipeline de transcription de réunions réelles souligne le potentiel pour des meilleurs modèles de bout-en-bout. Dans notre étude sur l'amélioration de l'attribution des locuteurs par SA-ASR, nous avons constaté que le module d'ASR n'est pas sensible aux modifications du module de locuteur. Cela met en évidence la nécessité d'architectures améliorées qui intègrent plus efficacement l'ASR et l'information de locuteur
Far-field microphone-array meeting transcription is particularly challenging due to overlapping speech, ambient noise, and reverberation. To address these issues, we explored three approaches. First, we employ a multichannel speaker separation model to isolate individual speakers, followed by a single-channel, single-speaker automatic speech recognition (ASR) model to transcribe the separated and enhanced audio. This method effectively enhances speech quality for ASR. Second, we propose an end-to-end multichannel speaker-attributed ASR (MC-SA-ASR) model, which builds on an existing single-channel SA-ASR model and incorporates a multichannel Conformer-based encoder with multi-frame cross-channel attention (MFCCA). Unlike traditional approaches that require a multichannel front-end speech enhancement model, the MC-SA-ASR model handles far-field microphones in an end-to-end manner. We also experimented with different input features, including Mel filterbank and phase features, for that model. Lastly, we incorporate a multichannel beamforming and enhancement model as a front-end processing step, followed by a single-channel SA-ASR model to process the enhanced multi-speaker speech signals. We tested different fixed, hybrid, and fully neural network-based beamformers and proposed to jointly optimize the neural beamformer and SA-ASR models using the training objective for the latter. In addition to these methods, we developed a meeting transcription pipeline that integrates voice activity detection, speaker diarization, and SA-ASR to process real meeting recordings effectively. Experimental results indicate that, while using a speaker separation model can enhance speech quality, separation errors can propagate to ASR, resulting in suboptimal performance. A guided speaker separation approach proves to be more effective. Our proposed MC-SA-ASR model demonstrates efficiency in integrating multichannel information and the shared information between the ASR and speaker blocks. Experiments with different input features reveal that models trained with Mel filterbank features perform better in terms of word error rate (WER) and speaker error rate (SER) when the number of channels and speakers is low (2 channels with 1 or 2 speakers). However, for settings with 3 or 4 channels and 3 speakers, models trained with additional phase information outperform those using only Mel filterbank features. This suggests that phase information can enhance ASR by leveraging localization information from multiple channels. Although MFCCA-based MC-SA-ASR outperforms the single-channel SA-ASR and MC-ASR models without a speaker block, the joint beamforming and SA-ASR model further improves the performance. Specifically, joint training of the neural beamformer and SA-ASR yields the best performance, indicating that improving speech quality might be a more direct and efficient approach than using an end-to-end MC-SA-ASR model for multichannel meeting transcription. Furthermore, the study of the real meeting transcription pipeline underscores the potential for better end-to-end models. In our investigation on improving speaker assignment in SA-ASR, we found that the speaker block does not effectively help improve the ASR performance. This highlights the need for improved architectures that more effectively integrate ASR and speaker information

APA, Harvard, Vancouver, ISO, and other styles

Soldi, Giovanni. "Diarisation du locuteur en temps réel pour les objets intelligents." Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0061.

Full text

Abstract:

La diarisation du locuteur en temps réel vise à détecter "qui parle maintenant" dans un flux audio donné. La majorité des systèmes de diarisation en ligne proposés a mis l'accent sur des domaines moins difficiles, tels que l’émission des nouvelles et discours en plénière, caractérisé par une faible spontanéité. La première contribution de cette thèse est le développement d'un système de diarisation du locuteur complètement un-supervisé et adaptatif en ligne pour les données de réunions qui sont plus difficiles et spontanées. En raison des hauts taux d’erreur de diarisation, une approche semi-supervisé pour la diarisation en ligne, ou les modèles des interlocuteurs sont initialisés avec une quantité modeste de données étiquetées manuellement et adaptées par une incrémentale maximum a-posteriori adaptation (MAP) procédure, est proposée. Les erreurs obtenues peuvent être suffisamment bas pour supporter des applications pratiques. La deuxième partie de la thèse aborde le problème de la normalisation phonétique pendant la modélisation des interlocuteurs avec petites quantités des données. Tout d'abord, Phone Adaptive Training (PAT), une technique récemment proposé, est évalué et optimisé au niveau de la modélisation des interlocuteurs et dans le cadre de la vérification automatique du locuteur (ASV) et est ensuite développée vers un système entièrement un-supervise en utilisant des transcriptions de classe acoustiques générées automatiquement, dont le nombre est contrôlé par analyse de l'arbre de régression. PAT offre des améliorations significatives dans la performance d'un système ASV iVector, même lorsque des transcriptions phonétiques précises ne sont pas disponibles
On-line speaker diarization aims to detect “who is speaking now" in a given audio stream. The majority of proposed on-line speaker diarization systems has focused on less challenging domains, such as broadcast news and plenary speeches, characterised by long speaker turns and low spontaneity. The first contribution of this thesis is the development of a completely unsupervised adaptive on-line diarization system for challenging and highly spontaneous meeting data. Due to the obtained high diarization error rates, a semi-supervised approach to on-line diarization, whereby speaker models are seeded with a modest amount of manually labelled data and adapted by an efficient incremental maximum a-posteriori adaptation (MAP) procedure, is proposed. Obtained error rates may be low enough to support practical applications. The second part of the thesis addresses instead the problem of phone normalisation when dealing with short-duration speaker modelling. First, Phone Adaptive Training (PAT), a recently proposed technique, is assessed and optimised at the speaker modelling level and in the context of automatic speaker verification (ASV) and then is further developed towards a completely unsupervised system using automatically generated acoustic class transcriptions, whose number is controlled by regression tree analysis. PAT delivers significant improvements in the performance of a state-of-the-art iVector ASV system even when accurate phonetic transcriptions are not available

APA, Harvard, Vancouver, ISO, and other styles

Mariotte, Théo. "Traitement automatique de la parole en réunion par dissémination de capteurs." Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA1001.

Full text

Abstract:

Ces travaux de thèse se concentrent sur le traitement automatique de la parole, et plus particulièrement sur la diarisation en locuteurs. Cette tâche nécessite de segmenter le signal afin d'identifier des évènements tels que la présence de parole, de parole superposée ou de changements de locuteur. Cette recherche se focalise sur le cas où le signal est capté par un dispositif placé au centre d'un groupe de locuteurs, comme lors de réunions. Ces conditions entraînent une dégradation de la qualité des signaux en raison de l'éloignement des sources sonores (parole distante).Afin de pallier cette dégradation, une approche consiste à enregistrer le signal à l'aide d'un ensemble de microphones formant une antenne acoustique. Le signal multicanal obtenu permet d'obtenir des informations sur la répartition spatiale du champ acoustique. Deux axes de recherche sont explorés pour la segmentation de la parole à l'aide d'antecnnes de microphones.Le premier axe introduit une méthode combinant des caractéristiques acoustiques avec des caractéristiques spatiales. Un nouveau jeu de caractéristiques, basé sur le formalisme des harmoniques circulaires, est proposé. Cette approche améliore les performances de segmentation en conditions distantes, tout en réduisant le nombre de paramètres des modèles et en garantissant une certaine robustesse en cas de désactivation de certains microphones.Le second axe propose plusieurs approches de combinaison des canaux en utilisant des mécanismes d'auto-attention. Différents modèles, inspirés d'une architecture existante, sont développés. La combinaison de canaux améliore également la segmentation en conditions distantes. Deux de ces approches rendent l'extraction de caractéristiques plus interprétable. Les systèmes de segmentation de la parole distante proposés améliorent également la diarisation en locuteurs.La combinaison de canaux montre une faible robustesse en cas de changement de géométrie de l'antenne en phase d'évaluation. Pour y remédier, une procédure d'apprentissage est proposée, qui améliore la robustesse en présence d'une antenne non conforme.Finalement, les travaux menés ont permis d'identifier un manque dans les jeux de données publics disponibles pour le traitement automatique de la parole distante. Un protocole d'acquisition est introduit pour l'acquisition de signaux en réunions et intégrant l'annotation de la position des locuteurs en plus de la segmentation.En somme, ces travaux visent à améliorer la qualité de la segmentation de la parole distante multicanale. Les méthodes proposées exploitent l'information spatiale fournie par les antennes de microphones en garantissant une certaine robustesse au nombre de microphones disponibles
This thesis work focuses on automatic speech processing, and more specifically on speaker diarization. This task requires the signal to be segmented to identify events such as voice activity, overlapped speech, or speaker changes. This work tackles the scenario where the signal is recorded by a device located in the center of a group of speakers, as in meetings. These conditions lead to a degradation in signal quality due to the distance between the speakers (distant speech).To mitigate this degradation, one approach is to record the signal using a microphone array. The resulting multichannel signal provides information on the spatial distribution of the acoustic field. Two lines of research are being explored for speech segmentation using microphone arrays.The first introduces a method combining acoustic features with spatial features. We propose a new set of features based on the circular harmonics expansion. This approach improves segmentation performance under distant speech conditions while reducing the number of model parameters and improving robustness in case of change in the array geometry.The second proposes several approaches that combine channels using self-attention. Different models, inspired by an existing architecture, are developed. Combining channels also improves segmentation under distant speech conditions. Two of these approaches make feature extraction more interpretable. The proposed distant speech segmentation systems also improve speaker diarization.Channel combination shows poor robustness to changes in the array geometry during inference. To avoid this behavior, a learning procedure is proposed, which improves the robustness in case of array mismatch.Finally, we identified a gap in the public datasets available for distant multichannel automatic speech processing. An acquisition protocol is introduced to build a new dataset, integrating speaker position annotation in addition to speaker diarization.Thus, this work aims to improve the quality of multichannel distant speech segmentation. The proposed methods exploit the spatial information provided by microphone arrays while improving the robustness in case of array mismatch

APA, Harvard, Vancouver, ISO, and other styles

Milner, Rosanna Margaret. "Using deep neural networks for speaker diarisation." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/16567/.

Full text

Abstract:

Speaker diarisation answers the question “who spoke when?” in an audio recording. The input may vary, but a system is required to output speaker labelled segments in time. Typical stages are Speech Activity Detection (SAD), speaker segmentation and speaker clustering. Early research focussed on Conversational Telephone Speech (CTS) and Broadcast News (BN) domains before the direction shifted to meetings and, more recently, broadcast media. The British Broadcasting Corporation (BBC) supplied data through the Multi-Genre Broadcast (MGB) Challenge in 2015 which showed the difficulties speaker diarisation systems have on broadcast media data. Diarisation is typically an unsupervised task which does not use auxiliary data or information to enhance a system. However, methods which do involve supplementary data have shown promise. Five semi-supervised methods are investigated which use a combination of inputs: different channel types and transcripts. The methods involve Deep Neural Networks (DNNs) for SAD, DNNs trained for channel detection, transcript alignment, and combinations of these approaches. However, the methods are only applicable when datasets contain the required inputs. Therefore, a method involving a pretrained Speaker Separation Deep Neural Network (ssDNN) is investigated which is applicable to every dataset. This technique performs speaker clustering and speaker segmentation using DNNs successfully for meeting data and with mixed results for broadcast media. The task of diarisation focuses on two aspects: accurate segments and speaker labels. The Diarisation Error Rate (DER) does not evaluate the segmentation quality as it does not measure the number of correctly detected segments. Other metrics exist, such as boundary and purity measures, but these also mask the segmentation quality. An alternative metric is presented based on the F-measure which considers the number of hypothesis segments correctly matched to reference segments. A deeper insight into the segment quality is shown through this metric.

APA, Harvard, Vancouver, ISO, and other styles

Sinclair, Mark. "Speech segmentation and speaker diarisation for transcription and translation." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.

Full text

Abstract:

This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and punctuation restoration. While segmentation and diarization may appear to be relatively simple tasks to describe, in practise we find that they are very challenging and are, in general, ill-defined problems. Therefore, we first provide a formalisation of each of the problems as the sub-division of speech within acoustic space and time. Here, we see that the task can become very difficult when we want to partition this domain into our target classes of speakers, whilst avoiding other classes that reside in the same space, such as phonemes. We present a theoretical framework for describing and discussing the tasks as well as introducing existing state-of-the-art methods and research. Current Speaker Diarization systems are notoriously sensitive to hyper-parameters and lack robustness across datasets. Therefore, we present a method which uses a series of oracle experiments to expose the limitations of current systems and to which system components these limitations can be attributed. We also demonstrate how Diarization Error Rate (DER), the dominant error metric in the literature, is not a comprehensive or reliable indicator of overall performance or of error propagation to subsequent downstream tasks. These results inform our subsequent research. We find that, as a precursor to Speaker Diarization, the task of Speech Segmentation is a crucial first step in the system chain. Current methods typically do not account for the inherent structure of spoken discourse. As such, we explored a novel method which exploits an utterance-duration prior in order to better model the segment distribution of speech. We show how this method improves not only segmentation, but also the performance of subsequent speech recognition, machine translation and speaker diarization systems. Typical ASR transcriptions do not include punctuation and the task of enriching transcriptions with this information is known as ‘punctuation restoration’. The benefit is not only improved readability but also better compatibility with NLP systems that expect sentence-like units such as in conventional machine translation. We show how segmentation and diarization are related tasks that are able to contribute acoustic information that complements existing linguistically-based punctuation approaches. There is a growing demand for speech technology applications in the broadcast media domain. This domain presents many new challenges including diverse noise and recording conditions. We show that the capacity of existing GMM-HMM based speech segmentation systems is limited for such scenarios and present a Deep Neural Network (DNN) based method which offers a more robust speech segmentation method resulting in improved speech recognition performance for a television broadcast dataset. Ultimately, we are able to show that the speech segmentation is an inherently ill-defined problem for which the solution is highly dependent on the downstream task that it is intended for.

APA, Harvard, Vancouver, ISO, and other styles

Kounadis-Bastian, Dionyssos. "Quelques contributions pour la séparation et la diarisation de sources audio dans des mélanges multicanaux convolutifs." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM012/document.

Full text

Abstract:

Dans cette thèse, nous abordons le problème de la séparation de sources audio dans des mélanges convolutifs multicanaux et sous-déterminés,en utilisant une modélisation probabiliste.Nous nous concentrons sur trois aspects,et nous apportons trois contributions.D’abord, nous nous inspirons du modèle Gaussien local par factorisation en matrices non-négatives (LGM-with-NMF), qui est un modéle empiriquement validé pour représenter un signal audio.Nous proposons une extension Bayésienne de ce modèle, qui permet de surpasser certaines limitations du modèle NMF. Nous incorporons cette représentation dans un cadre de séparation audio multicanaux, et le comparons avec l’état de l’art sur des tâches de séparation. Nous obtenons des résultats prometteurs. Deuxièmement, nous étudions comment séparer des mélanges audio de sources et/ou des capteurs en mouvement. Ces déplacements rendent le chemin acoustique entre les sources et les microphones variant en cours du temps.L’adressage des mélanges convolutifs et variant au cours du temps semble rare dans la littérature. Ainsi, nous partons d’une méthode état de l’art utilisant LGM-with-NMF, développée pour la séparation de mélanges invariants (sources et microphones statiques). Nous proposons a ceci une extension qui utilise un filtre de Kalman pour suivre le chemin acoustique au cours du temps.La méthode proposée est comparée à une adaptation bloc par bloc d’une méthode de l’état de l’art appliquée sur des intervalles de temps,et adonné des résultats exceptionnels sur les mélanges simulés et les mélanges du monde réel. Enfin, nous investiguons les similitudes entre la séparation et la diarisation audio. La diarisation audio est le problème de l’annotation des intervalles d’un mélange audio, auxquels chaque locuteur/source est émettant. La plupart des méthodes de séparation supposent toutes les sources à émettant continuellement. Une hypothèse qui peut donner lieu à de fausses estimations,durant les intervalles au cours desquels cette source n’émettait pas. Notre objectif est que diarisation puisse aider à résoudre la séparation, en indiquant les sources qui émettent chaque intervalle de temps.Dans cette mesure, nous concevons une cadre commun pour traiter simultanément la diarisation et la séparation du mélange audio. Ce cadre incorpore,un modèle de Markov caché pour suivre les activités des sources,au sein d’une méthode de séparation LGM-with-NMF.Nous comparons l’algorithme proposé, à l’état de l’art sur des tâches de séparation et de diarisation. Nous obtenons des performances comparables avec l’état de l’art pour la séparation, et supérieures pour la diarisation
In this thesis we address the problem of audio source separation (ASS) for multichannel and underdetermined convolutive mixtures through probabilistic modeling. We focus on three aspects of the problem and make three contributions. Firstly, inspired from the empirically well validated representation of an audio signal, that is know as local Gaussian signal model (LGM) with non-negative matrix factorization (NMF), we propose a Bayesian extension to this, that overcomes some of the limitations of the NMF. We incorporate this representation in a multichannel ASS framework and compare it with the state of the art in ASS, yielding promising results.Secondly, we study how to separate mixtures of moving sources and/or of moving microphones.Movements make the acoustic path between sources and microphones become time-varying.Addresing time-varying audio mixtures appears is not so popular in the ASS literature.Thus, we begin from a state of the art LGM-with-NMF method designed for separating time-invariant audiomixtures and propose an extension that uses a Kalman smoother to track the acoustic path across time.The proposed method is benchmarked against a block-wise adaptation of that state of the art (ran on time segments),and delivers competitive results on both simulated and real-world mixtures.Lastly, we investigate the link between ASS and the task of audio diarisation.Audio diarisation is the recognition of the time intervals of activity of every speaker/source in the mix.Most state of the art ASS methods consider the sources ceaselssly emitting; A hypothesis that can result in spurious signal estimates for a source, in intervals where that source was not emitting.Our aim is that diarisation can aid ASS by indicating the emitting sources at each time frame.To that extent we design a joint framework for simultaneous diarization and ASS,that incorporates a hidden Markov model (HMM) to track the temporal activity of the sources, within a state of the art LGM-with-NMF ASS framework.We compare the proposed method with the state of the art in ASS and audio diarisation tasks.We obtain performances comparable, with the state of the art, in terms of separation and outperformant in terms of diarisation

APA, Harvard, Vancouver, ISO, and other styles

Tevissen, Yannis. "Diarisation multimodale : vers des modèles robustes et justes en contexte réel." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS014.

Full text

Abstract:

La diarisation du locuteur, c'est à dire la tache de déterminer automatiquement « qui parle, quand ? » dans un enregistrement audio ou vidéo, est un des piliers des systèmes modernes d'analyse des conversations. A la télévision, les contenus diffusés sont divers et couvrent à peu près tous les types de conversations, de la discussion calme entre deux personnes, aux débats passionnés, en passant par les interviews en terrain de guerre. L'analyse de ces contenus, réalisée par la société Newsbridge, requiert, en vue de leur archivage et de leur indexation, des méthodes de traitement robustes et justes. Dans ce travail, nous présentons deux nouvelles méthodes permettant d'améliorer la robustesse des systèmes via des approches de fusion. La première se concentre sur la détection d'activité vocale, prétraitement nécessaire à tout système de diarisation. La seconde est une approche multimodale qui tire notamment parti des dernières avancées en traitement du langage naturel. Nous voyons également que les récentes avancées des systèmes de diarisation rendent l'utilisation de la diarisation du locuteur réaliste y compris dans des secteurs critiques tels que l'analyse de larges archives audiovisuelles ou le maintien à domicile de personnes âgées. Enfin ce travail présente une nouvelle méthode d'évaluation de la justesse algorithmique de la diarisation du locuteur en vue de rendre son utilisation plus responsable
Speaker diarization, or the task of automatically determining "who spoke, when?" in an audio or video recording, is one of the pillars of modern conversation analysis systems. On television, the content broadcasted is very diverse and covers about every type of conversation, from calm discussions between two people to impassioned debates and wartime interviews. The archiving and indexing of this content, carried out by the Newsbridge company, requires robust and fair processing methods. In this work, we present two new methods for improving systems' robustness via fusion approaches. The first method focuses on voice activity detection, a necessary pre-processing step for every diarization system. The second is a multimodal approach that takes advantage of the latest advances in natural language processing. We also show that recent advances in diarization systems make the use of speaker diarization realistic, even in critical sectors such as the analysis of large audiovisual archives or the home care of the elderly. Finally, this work shows a new method for evaluating the algorithmic fairness of speaker diarization, with the objective to make its use more responsible

APA, Harvard, Vancouver, ISO, and other styles

Ouni, Slim. "Parole Multimodale : de la parole articulatoire à la parole audiovisuelle." Habilitation à diriger des recherches, Université de Lorraine, 2013. http://tel.archives-ouvertes.fr/tel-00927119.

Full text

Abstract:

La communication parlée est par essence multimodale. Le signal acoustique véhicule la modalité auditive, et l'image la modalité visuelle et gestuelle (déformations du visage). Le signal de parole est en effet la conséquence des déformations du conduit vocal sous l'effet du mouvement de la mâchoire, des lèvres, de la langue, etc.. pour moduler le signal d'excitation produit par les cordes vocales ou les turbulences aérodynamiques. Ces déformations sont visibles au niveau du visage (lèvres, joues, mâchoire) grâce à la coordination des différents muscles orofaciaux et de la déformation de la peau induite par ces derniers. La modalité visuelle permet de fournir des informations complémentaires au signal acoustique, et elle devient indispensable dans le cas où le signal acoustique est dégradé, comme c'est le cas chez les malentendants, ou en milieu bruité. D'autres modalités peuvent être liées à la parole, comme les mouvements des sourcils et les différents gestes qui expriment l'émotion. Cette dernière modalité suprasegmentale peut, comme la modalité visuelle, compléter le message acoustique ou acoustique-visuel. Cet exposé présentera les travaux que je mène sur la parole multimodale. Ce caractère multimodal de la communication parlée est traité de deux façons différentes : (1) étudier les deux composantes articulatoire et acoustique de la parole. En effet, je m'intéresse à la caractérisation articulatoire des sons et à l'étude du lien entre l'espace articulatoire et l'espace acoustique. En particulier, je m'intéresse à la récupération de l'évolution temporelle du conduit vocal à partir du signal acoustique (aussi appelée inversion acoustique-articulatoire) et à l'étude de la caractérisation articulatoire de la parole par l'analyse de corpus de données articulatoires. (2) étudier les deux composantes acoustique et visuelle. Dans ce cadre, je m'intéresse à l'effet de la déformation du conduit vocal sur l'apparence du visage qui véhicule le message visuel. La synthèse acoustique-visuelle est un cadre qui permet d'étudier cet aspect. De plus, l'étude de l'intelligibilité audiovisuelle permet de mieux comprendre les mécanismes de la communication audiovisuelle, mais également d'évaluer le système de synthèse acoustique-visuelle. Enfin, je présenterai mon programme de recherche qui porte sur la parole multimodale expressive que je propose d'étudier globalement, c.-à-d. en considérant les composantes articulaire, acoustique et visuelle ainsi que l'expressivité intrinsèque de celles-ci, simultanément. Je propose en particulier d'aborder la modélisation de la dynamique articulatoire et faciale de la parole pour produire de la parole combinée avec les expressions faciales.

APA, Harvard, Vancouver, ISO, and other styles

Zwyssig, Erich Paul. "Speech processing using digital MEMS microphones." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8287.

Full text

Abstract:

The last few years have seen the start of a unique change in microphones for consumer devices such as smartphones or tablets. Almost all analogue capacitive microphones are being replaced by digital silicon microphones or MEMS microphones. MEMS microphones perform differently to conventional analogue microphones. Their greatest disadvantage is significantly increased self-noise or decreased SNR, while their most significant benefits are ease of design and manufacturing and improved sensitivity matching. This thesis presents research on speech processing, comparing conventional analogue microphones with the newly available digital MEMS microphones. Specifically, voice activity detection, speaker diarisation (who spoke when), speech separation and speech recognition are looked at in detail. In order to carry out this research different microphone arrays were built using digital MEMS microphones and corpora were recorded to test existing algorithms and devise new ones. Some corpora that were created for the purpose of this research will be released to the public in 2013. It was found that the most commonly used VAD algorithm in current state-of-theart diarisation systems is not the best-performing one, i.e. MLP-based voice activity detection consistently outperforms the more frequently used GMM-HMM-based VAD schemes. In addition, an algorithm was derived that can determine the number of active speakers in a meeting recording given audio data from a microphone array of known geometry, leading to improved diarisation results. Finally, speech separation experiments were carried out using different post-filtering algorithms, matching or exceeding current state-of-the art results. The performance of the algorithms and methods presented in this thesis was verified by comparing their output using speech recognition tools and simple MLLR adaptation and the results are presented as word error rates, an easily comprehensible scale. To summarise, using speech recognition and speech separation experiments, this thesis demonstrates that the significantly reduced SNR of the MEMS microphone can be compensated for with well established adaptation techniques such as MLLR. MEMS microphones do not affect voice activity detection and speaker diarisation performance.

APA, Harvard, Vancouver, ISO, and other styles

Vermigli, Vania <1975&gt. "Parole parole parole… On connait la chanson omaggio ad Alain Resnais e alla musica francese del ‘900." Master's Degree Thesis, Università Ca' Foscari Venezia, 2020. http://hdl.handle.net/10579/17114.

Full text

Abstract:

Film musicale esilarante e sognatore nel quale si vedrà il ripercorrere la storia della più bella musica francese del '900, e il suo impiego sempre originale e innovativo nel contesto cinematografico. Intrecci amorosi, riflessioni intime sulle debolezze umane, situazioni satiriche faranno da sfondo, in una Parigi contemporanea, a playback musicali di brani celebri come Paroles, Paroles..., Et moi dans mon coin, Je m'en fous pas mal e molti altri. La musica diverrà lo strumento evocativo di stati d'animo, pensieri, emozioni dei nostri protagonisti. Un alternarsi tra bugie e realtà. Un uso diverso della musica in continuo collegamento con le immagini e l’evolversi della storia. Alain Resnais, si dimostra ancora una volta, regista poliedrico e sperimentatore del linguaggio cinematografico come forma di comunicazione diretta.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Diarisation de la parole"

Pozzi, Antonia. Parole. Milano: Garzanti, 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Office, National Audit. Parole. London: Stationery Office, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Alessandra, Cenni, and Dino Onorina, eds. Parole. [Milan, Italy]: Garzanti, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Committee, Connecticut General Assembly Legislative Program Review and Investigations. Board of Parole and parole services. Hartford, CT: The Committee, 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

New York (State). Dept. of Audit and Control. Division of Parole, field parole services. [Albany, N.Y.]: The Office, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Cattani, Adelino. Come dirlo?: Parole giuste, parole belle. Casoria: Loffredo, 2008.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Cavalleri, Cesare. Persone & parole. Milano: Ares, 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Lazzara, Vito. Parole monche. Torino: Genesi editrice, 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Kästner, Erich. Parole Emil. München: Carl Hanser, 1998.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Fuschini, Francesco. Parole poverette. 2nd ed. Venezia: Marsilio, 1996.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Diarisation de la parole"

Levesque, Roger J. R. "Parole." In Encyclopedia of Adolescence, 2036–37. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4419-1695-2_685.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Antolak-Saper, Natalia. "Parole." In The Role of the Media in Criminal Justice Policy, 110–45. London: Routledge, 2022. http://dx.doi.org/10.4324/9781003220299-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Levesque, Roger J. R. "Parole." In Encyclopedia of Adolescence, 2711–12. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-33228-4_685.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mitford, Jessica. "Parole." In The American Prison Business, 216–27. London: Routledge, 2023. http://dx.doi.org/10.4324/9781003327424-12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gottfredson, Michael R., and Don M. Gottfredson. "Parole Decisions." In Decision Making in Criminal Justice, 229–55. Boston, MA: Springer US, 1988. http://dx.doi.org/10.1007/978-1-4757-9954-5_9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ndiaye, Christiane. "Parole ouverte." In Introduction aux littératures francophones, 269–70. Montréal: Presses de l’Université de Montréal, 2004. http://dx.doi.org/10.4000/books.pum.10663.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Danon-Boileau, Laurent. "Parole associative, parole compulsive." In Des psychanalystes en séance, 28–34. Gallimard, 2016. http://dx.doi.org/10.3917/gall.tamet.2016.01.0028.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Soumahoro, Maboula. "Parole noire/Noire parole." In Racismes de France, 276–91. La Découverte, 2020. http://dx.doi.org/10.3917/dec.slaou.2020.01.0276.

Full text

APA, Harvard, Vancouver, ISO, and other styles

"Parole." In Briefs of Leading Cases in Corrections, 259–86. 6th edition. | New York: Routledge, 2016.: Routledge, 2016. http://dx.doi.org/10.4324/9781315531694-10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

"Parole." In Benchmark. I.B.Tauris, 2003. http://dx.doi.org/10.5040/9780755622566.ch-013.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Diarisation de la parole"

Bissig, Pascal, Klaus-Tycho Foerster, Simon Tanner, and Roger Wattenhofer. "Distributed discussion diarisation." In 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 2017. http://dx.doi.org/10.1109/ccnc.2017.7983281.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Yue, Felix Weninger, Boqing Liu, Maximilian Schmitt, Florian Eyben, and Björn Schuller. "A Paralinguistic Approach To Speaker Diarisation." In MM '17: ACM Multimedia Conference. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3123266.3123338.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Garau, Giulia, Alfred Dielmann, and Hervé Bourlard. "Audio-visual synchronisation for speaker diarisation." In Interspeech 2010. ISCA: ISCA, 2010. http://dx.doi.org/10.21437/interspeech.2010-704.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kwon, Youngki, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, and Joon Son Chung. "Adapting Speaker Embeddings for Speaker Diarisation." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Qiujia, Florian L. Kreyssig, Chao Zhang, and Philip C. Woodland. "Discriminative Neural Clustering for Speaker Diarisation." In 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021. http://dx.doi.org/10.1109/slt48900.2021.9383617.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Milner, Rosanna, and Thomas Hain. "Segment-oriented evaluation of speaker diarisation performance." In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016. http://dx.doi.org/10.1109/icassp.2016.7472721.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Milner, Rosanna, and Thomas Hain. "DNN-Based Speaker Clustering for Speaker Diarisation." In Interspeech 2016. ISCA, 2016. http://dx.doi.org/10.21437/interspeech.2016-126.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sun, G., D. Liu, C. Zhang, and P. C. Woodland. "Content-Aware Speaker Embeddings for Speaker Diarisation." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9414390.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Albanie, Samuel, Gul Varol, Liliane Momeni, Triantafyllos Afouras, Andrew Brown, Chuhan Zhang, Ernesto Coto, et al. "SeeHear: Signer Diarisation and a New Dataset." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9414856.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Heo, Hee-Soo, Youngki Kwon, Bong-Jin Lee, You Jin Kim, and Jee-Weon Jung. "High-Resolution Embedding Extractor for Speaker Diarisation." In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023. http://dx.doi.org/10.1109/icassp49357.2023.10097190.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Diarisation de la parole"

DEPARTMENT OF THE ARMY WASHINGTON DC. Boards, Commissions, and Committees: Army Clemency and Parole Board. Fort Belvoir, VA: Defense Technical Information Center, October 1998. http://dx.doi.org/10.21236/ada401997.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Polinsky, A. Mitchell, and Paul Riskind. Deterrence and the Optimal Use of Prison, Parole, and Probation. Cambridge, MA: National Bureau of Economic Research, May 2017. http://dx.doi.org/10.3386/w23436.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Anwar, Shamena, and Hanming Fang. Testing for Racial Prejudice in the Parole Board Release Process: Theory and Evidence. Cambridge, MA: National Bureau of Economic Research, July 2012. http://dx.doi.org/10.3386/w18239.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kuziemko, Ilyana. Going Off Parole: How the Elimination of Discretionary Prison Release Affects the Social Cost of Crime. Cambridge, MA: National Bureau of Economic Research, September 2007. http://dx.doi.org/10.3386/w13380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Health hazard evaluation report: HETA-92-0271-2349, District of Columbia, Board of Parole, Washington, D.C. U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, September 1993. http://dx.doi.org/10.26616/nioshheta9202712349.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Diarisation de la parole'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Diarisation de la parole"

Dissertations / Theses on the topic "Diarisation de la parole"

Books on the topic "Diarisation de la parole"

Book chapters on the topic "Diarisation de la parole"

Conference papers on the topic "Diarisation de la parole"

Reports on the topic "Diarisation de la parole"