Se connecter

Bibliographies thématiques / Visual speech model

Littérature scientifique sur le sujet « Visual speech model »

Auteur : Grafiati

Publié le 10 mars 2023

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Sommaire

Articles de revues
Thèses
Livres
Chapitres de livres
Actes de conférences

Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « Visual speech model ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Articles de revues sur le sujet "Visual speech model"

1

Jia, Xi Bin, et Mei Xia Zheng. « Video Based Visual Speech Feature Model Construction ». Applied Mechanics and Materials 182-183 (juin 2012) : 1367–71. http://dx.doi.org/10.4028/www.scientific.net/amm.182-183.1367.

Texte intégral

Résumé :

This paper aims to give a solutions for the construction of chinese visual speech feature model based on HMM. We propose and discuss three kind representation model of the visual speech which are lip geometrical features, lip motion features and lip texture features. The model combines the advantages of the local LBP and global DCT texture information together, which shows better performance than the single feature. Equally the model combines the advantages of the local LBP and geometrical information together is better than single feature. By computing the recognition rate of the visemes from the model, the paper shows the HMM which describing the dynamic of speech, coupled with the combined feature for describing the global and local texture is the best model.

Styles APA, Harvard, Vancouver, ISO, etc.

2

Mishra, Saumya, Anup Kumar Gupta et Puneet Gupta. « DARE : Deceiving Audio–Visual speech Recognition model ». Knowledge-Based Systems 232 (novembre 2021) : 107503. http://dx.doi.org/10.1016/j.knosys.2021.107503.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Brahme, Aparna, et Umesh Bhadade. « Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition ». International Journal of Image and Graphics 20, n^o 04 (octobre 2020) : 2050029. http://dx.doi.org/10.1142/s0219467820500291.

Texte intégral

Résumé :

In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a “Visual Word” as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1[Formula: see text]s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Metzger, Brian A. ,., John F. ,. Magnotti, Elizabeth Nesbitt, Daniel Yoshor et Michael S. ,. Beauchamp. « Cross-modal suppression model of speech perception : Visual information drives suppressive interactions between visual and auditory speech in pSTG ». Journal of Vision 20, n^o 11 (20 octobre 2020) : 434. http://dx.doi.org/10.1167/jov.20.11.434.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Hazen, T. J. « Visual model structures and synchrony constraints for audio-visual speech recognition ». IEEE Transactions on Audio, Speech and Language Processing 14, n^o 3 (mai 2006) : 1082–89. http://dx.doi.org/10.1109/tsa.2005.857572.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

6

Fagel, Sascha. « Merging methods of speech visualization ». ZAS Papers in Linguistics 40 (1 janvier 2005) : 19–32. http://dx.doi.org/10.21248/zaspil.40.2005.255.

Texte intégral

Résumé :

The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic) face model and a 3D synthetic head. Both face models can be driven by both the data based and the rule based articulation model. The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. For every virtual articulator (articulation parameter) the 3D synthetic face model defines a set of displacement vectors for the vertices of the 3D objects of the head. The vertices of the 3D synthetic head then are moved by linear combinations of these displacement vectors to visualize articulation movements. For the image based video synthesis a single reference image is deformed to fit the facial properties derived from the control commands. Facial feature points and facial displacements have to be defined for the reference image. The algorithm can also use an image database with appropriately annotated facial properties. An example database was built automatically from video recordings. Both the 3D synthetic face and the image based face generate visual speech that is capable to increase the intelligibility of audible speech. Other well known image based audiovisual speech synthesis systems like MIKETALK and VIDEO REWRITE concatenate pre-recorded single images or video sequences, respectively. Parametric talking heads like BALDI control a parametric face with a parametric articulation model. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.

Styles APA, Harvard, Vancouver, ISO, etc.

7

Loh, Marco, Gabriele Schmid, Gustavo Deco et Wolfram Ziegler. « Audiovisual Matching in Speech and Nonspeech Sounds : A Neurodynamical Model ». Journal of Cognitive Neuroscience 22, n^o 2 (février 2010) : 240–47. http://dx.doi.org/10.1162/jocn.2009.21202.

Texte intégral

Résumé :

Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult in the nonspeech domain as compared to the speech domain. We constructed a biophysically realistic neural network model simulating this experimental evidence. We propose that a stronger connection between modalities in speech underlies the behavioral difference between the speech and the nonspeech domain. This could be the result of more extensive experience with speech stimuli. Because the match-to-sample paradigm does not allow us to draw conclusions concerning the integration of auditory and visual information, we also simulated two further conditions based on the same paradigm, which tested the integration of auditory and visual information within a single stimulus. New experimental data for these two conditions support the simulation results and suggest that audiovisual integration of discordant stimuli is stronger in speech than in nonspeech stimuli. According to the simulations, the connection strength between auditory and visual information, on the one hand, determines how well auditory information can be assigned to visual information, and on the other hand, it influences the magnitude of multimodal integration.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Yu, Wentao, Steffen Zeiler et Dorothea Kolossa. « Reliability-Based Large-Vocabulary Audio-Visual Speech Recognition ». Sensors 22, n^o 15 (23 juillet 2022) : 5501. http://dx.doi.org/10.3390/s22155501.

Texte intégral

Résumé :

Audio-visual speech recognition (AVSR) can significantly improve performance over audio-only recognition for small or medium vocabularies. However, current AVSR, whether hybrid or end-to-end (E2E), still does not appear to make optimal use of this secondary information stream as the performance is still clearly diminished in noisy conditions for large-vocabulary systems. We, therefore, propose a new fusion architecture—the decision fusion net (DFN). A broad range of time-variant reliability measures are used as an auxiliary input to improve performance. The DFN is used in both hybrid and E2E models. Our experiments on two large-vocabulary datasets, the Lip Reading Sentences 2 and 3 (LRS2 and LRS3) corpora, show highly significant improvements in performance over previous AVSR systems for large-vocabulary datasets. The hybrid model with the proposed DFN integration component even outperforms oracle dynamic stream-weighting, which is considered to be the theoretical upper bound for conventional dynamic stream-weighting approaches. Compared to the hybrid audio-only model, the proposed DFN achieves a relative word-error-rate reduction of 51% on average, while the E2E-DFN model, with its more competitive audio-only baseline system, achieves a relative word error rate reduction of 43%, both showing the efficacy of our proposed fusion architecture.

Styles APA, Harvard, Vancouver, ISO, etc.

9

How, Chun Kit, Ismail Mohd Khairuddin, Mohd Azraai Mohd Razman, Anwar P. P. Abdul Majeed et Wan Hasbullah Mohd Isa. « Development of Audio-Visual Speech Recognition using Deep-Learning Technique ». MEKATRONIKA 4, n^o 1 (27 juin 2022) : 88–95. http://dx.doi.org/10.15282/mekatronika.v4i1.8625.

Texte intégral

Résumé :

Deep learning is a technique with artificial intelligent (AI) that simulate humans’ learning behavior. Audio-visual speech recognition is important for the listener understand the emotions behind the spoken words truly. In this thesis, two different deep learning models, Convolutional Neural Network (CNN) and Deep Neural Network (DNN), were developed to recognize the speech’s emotion from the dataset. Pytorch framework with torchaudio library was used. Both models were given the same training, validation, testing, and augmented datasets. The training will be stopped when the training loop reaches ten epochs, or the validation loss function does not improve for five epochs. At the end, the highest accuracy and lowest loss function of CNN model in the training dataset are 76.50% and 0.006029 respectively, meanwhile the DNN model achieved 75.42% and 0.086643 respectively. Both models were evaluated using confusion matrix. In conclusion, CNN model has higher performance than DNN model, but needs to improvise as the accuracy of testing dataset is low and the loss function is high.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Holubenko, Nataliia. « Cognitive and Intersemiotic Model of the Visual and Verbal Modes in a Screen Adaptation to Literary Texts ». World Journal of English Language 12, n^o 6 (18 juillet 2022) : 129. http://dx.doi.org/10.5430/wjel.v12n6p129.

Texte intégral

Résumé :

The aim of the study is to examine screen adaptations from the perspective of cognitive and intersemiotic models of the visual and verbal modes. The purpose of the study is to express the specificity of a screen text which is defined as a combination of three media: speech, image, and music. The scope is to demonstrate the general framework of an intersemiotic translation from a new point of view – like a transliteration. The method of the research refers to semiotic and stylistic analyzes – methods of transformation from one sign system into another from prose works with regard to their cognitive as well as narrative and stylistic features (Zhong, Chen, & Xuan, 2021). Thus, the study analyses such specific relations between the verbal and visual modes in film adaptations of prose literature as a more detailed description of event episodes, events’ temporal structure, presentation of author’s thoughts and characters’ thoughts; their mental activity formulated indirect speech and inner speech that is shown only by the actor’s intonation. The results of the study made possible to show the types of inner speech in their adaptations: author’s thoughts, characters’ thoughts which are presented only by the verbal mode, and visual modes’ inner speeches that combine the modes of character’s voice and image. One can conclude, that taking into account intersemiotic relations between the visual and verbal spaces, it is possible to explain, for instance, how the words of characters are replaced by their facial expressions, gestures, or intonations.

Styles APA, Harvard, Vancouver, ISO, etc.

Plus de sources

Thèses sur le sujet "Visual speech model"

1

Somasundaram, Arunachalam. « A facial animation model for expressive audio-visual speech ». Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1148973645.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

Van, Wassenhove Virginie. « Cortical dynamics of auditory-visual speech a forward model of multisensory integration / ». College Park, Md. : University of Maryland, 2004. http://hdl.handle.net/1903/1871.

Texte intégral

Résumé :

Thesis (Ph. D.) -- University of Maryland, College Park, 2004.
Thesis research directed by: Neuroscience and Cognitive Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

Styles APA, Harvard, Vancouver, ISO, etc.

3

Cosker, Darren. « Animation of a hierarchical image based facial model and perceptual analysis of visual speech ». Thesis, Cardiff University, 2005. http://orca.cf.ac.uk/56003/.

Texte intégral

Résumé :

In this Thesis a hierarchical image-based 2D talking head model is presented, together with robust automatic and semi-automatic animation techniques, and a novel perceptual method for evaluating visual-speech based on the McGurk effect. The novelty of the hierarchical facial model stems from the fact that sub-facial areas are modelled individually. To produce a facial animation, animations for a set of chosen facial areas are first produced, either by key-framing sub-facial parameter values, or using a continuous input speech signal, and then combined into a full facial output. Modelling hierarchically has several attractive qualities. It isolates variation in sub-facial regions from the rest of the face, and therefore provides a high degree of control over different facial parts along with meaningful image based animation parameters. The automatic synthesis of animations may be achieved using speech not originally included in the training set. The model is also able to automatically animate pauses, hesitations and non-verbal (or non-speech related) sounds and actions. To automatically produce visual-speech, two novel analysis and synthesis methods are proposed. The first method utilises a Speech-Appearance Model (SAM), and the second uses a Hidden Markov Coarticulation Model (HMCM) - based on a Hidden Markov Model (HMM). To evaluate synthesised animations (irrespective of whether they are rendered semi automatically, or using speech), a new perceptual analysis approach based on the McGurk effect is proposed. This measure provides both an unbiased and quantitative method for evaluating talking head visual speech quality and overall perceptual realism. A combination of this new approach, along with other objective and perceptual evaluation techniques, are employed for a thorough evaluation of hierarchical model animations.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Theobald, Barry-John. « Visual speech synthesis using shape and appearance models ». Thesis, University of East Anglia, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.396720.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Dean, David Brendan. « Synchronous HMMs for audio-visual speech processing ». Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Texte intégral

Résumé :

Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Dean, David Brendan. « Synchronous HMMs for audio-visual speech processing ». Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Texte intégral

Résumé :

Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.

Styles APA, Harvard, Vancouver, ISO, etc.

7

Mukherjee, Niloy 1978. « Spontaneous speech recognition using visual context-aware language models ». Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/62380.

Texte intégral

Résumé :

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.
Includes bibliographical references (p. 83-88).
The thesis presents a novel situationally-aware multimodal spoken language system called Fuse that performs speech understanding for visual object selection. An experimental task was created in which people were asked to refer, using speech alone, to objects arranged on a table top. During training, Fuse acquires a grammar and vocabulary from a "show-and-tell" procedure in which visual scenes are paired with verbal descriptions of individual objects. Fuse determines a set of visually salient words and phrases and associates them to a set of visual features. Given a new scene, Fuse uses the acquired knowledge to generate class-based language models conditioned on the objects present in the scene as well as a spatial language model that predicts the occurrences of spatial terms conditioned on target and landmark objects. The speech recognizer in Fuse uses a weighted mixture of these language models to search for more likely interpretations of user speech in context of the current scene. During decoding, the weights are updated using a visual attention model which redistributes attention over objects based on partially decoded utterances. The dynamic situationally-aware language models enable Fuse to jointly infer spoken language utterances underlying speech signals as well as the identities of target objects they refer to. In an evaluation of the system, visual situationally-aware language modeling shows significant , more than 30 %, decrease in speech recognition and understanding error rates. The underlying ideas of situation-aware speech understanding that have been developed in Fuse may may be applied in numerous areas including assistive and mobile human-machine interfaces.
by Niloy Mukherjee.
S.M.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Kalantari, Shahram. « Improving spoken term detection using complementary information ». Thesis, Queensland University of Technology, 2015. https://eprints.qut.edu.au/90074/1/Shahram_Kalantari_Thesis.pdf.

Texte intégral

Résumé :

This research has made contributions to the area of spoken term detection (STD), defined as the process of finding all occurrences of a specified search term in a large collection of speech segments. The use of visual information in the form of lip movements of the speaker in addition to audio and the use of topic of the speech segments, and the expected frequency of words in the target speech domain, are proposed. By using these complementary information, improvement in the performance of STD has been achieved which enables efficient search of key words in large collection of multimedia documents.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Deena, Salil Prashant. « Visual speech synthesis by learning joint probabilistic models of audio and video ». Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/visual-speech-synthesis-by-learning-joint-probabilistic-models-of-audio-and-video(bdd1a78b-4957-469e-8be4-34e83e676c79).html.

Texte intégral

Résumé :

Visual speech synthesis deals with synthesising facial animation from an audio representation of speech. In the last decade or so, data-driven approaches have gained prominence with the development of Machine Learning techniques that can learn an audio-visual mapping. Many of these Machine Learning approaches learn a generative model of speech production using the framework of probabilistic graphical models, through which efficient inference algorithms can be developed for synthesis. In this work, the audio and visual parameters are assumed to be generated from an underlying latent space that captures the shared information between the two modalities. These latent points evolve through time according to a dynamical mapping and there are mappings from the latent points to the audio and visual spaces respectively. The mappings are modelled using Gaussian processes, which are non-parametric models that can represent a distribution over non-linear functions. The result is a non-linear state-space model. It turns out that the state-space model is not a very accurate generative model of speech production because it assumes a single dynamical model, whereas it is well known that speech involves multiple dynamics (for e.g. different syllables) that are generally non-linear. In order to cater for this, the state-space model can be augmented with switching states to represent the multiple dynamics, thus giving a switching state-space model. A key problem is how to infer the switching states so as to model the multiple non-linear dynamics of speech, which we address by learning a variable-order Markov model on a discrete representation of audio speech. Various synthesis methods for predicting visual from audio speech are proposed for both the state-space and switching state-space models. Quantitative evaluation, involving the use of error and correlation metrics between ground truth and synthetic features, is used to evaluate our proposed method in comparison to other probabilistic models previously applied to the problem. Furthermore, qualitative evaluation with human participants has been conducted to evaluate the realism, perceptual characteristics and intelligibility of the synthesised animations. The results are encouraging and demonstrate that by having a joint probabilistic model of audio and visual speech that caters for the non-linearities in audio-visual mapping, realistic visual speech can be synthesised from audio speech.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Ahmad, Nasir. « A motion based approach for audio-visual automatic speech recognition ». Thesis, Loughborough University, 2011. https://dspace.lboro.ac.uk/2134/8564.

Texte intégral

Résumé :

The research work presented in this thesis introduces novel approaches for both visual region of interest extraction and visual feature extraction for use in audio-visual automatic speech recognition. In particular, the speaker‘s movement that occurs during speech is used to isolate the mouth region in video sequences and motionbased features obtained from this region are used to provide new visual features for audio-visual automatic speech recognition. The mouth region extraction approach proposed in this work is shown to give superior performance compared with existing colour-based lip segmentation methods. The new features are obtained from three separate representations of motion in the region of interest, namely the difference in luminance between successive images, block matching based motion vectors and optical flow. The new visual features are found to improve visual-only and audiovisual speech recognition performance when compared with the commonly-used appearance feature-based methods. In addition, a novel approach is proposed for visual feature extraction from either the discrete cosine transform or discrete wavelet transform representations of the mouth region of the speaker. In this work, the image transform is explored from a new viewpoint of data discrimination; in contrast to the more conventional data preservation viewpoint. The main findings of this work are that audio-visual automatic speech recognition systems using the new features extracted from the frequency bands selected according to their discriminatory abilities generally outperform those using features designed for data preservation. To establish the noise robustness of the new features proposed in this work, their performance has been studied in presence of a range of different types of noise and at various signal-to-noise ratios. In these experiments, the audio-visual automatic speech recognition systems based on the new approaches were found to give superior performance both to audio-visual systems using appearance based features and to audio-only speech recognition systems.

Styles APA, Harvard, Vancouver, ISO, etc.

Plus de sources

Livres sur le sujet "Visual speech model"

1

G, Stork David, et Hennecke Marcus E, dir. Speechreading by humans and machines : Models, systems, and applications. Berlin : Springer, 1996.

Trouver le texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

Hidden Markov Models for Visual Speech Synthesis in Limited Data Environments. Storming Media, 2001.

Trouver le texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Stork, David G., et Marcus E. Hennecke. Speechreading by Humans and Machines : Models, Systems, and Applications. Springer, 2010.

Trouver le texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

4

Stork, David G., et Marcus E. Hennecke. Speechreading by Humans and Machines : Models, Systems, and Applications. Springer London, Limited, 2013.

Trouver le texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Antrobus, John S. How Does the Waking and Sleeping Brain Produce Spontaneous Thought and Imagery, and Why ? Sous la direction de Kalina Christoff et Kieran C. R. Fox. Oxford University Press, 2018. http://dx.doi.org/10.1093/oxfordhb/9780190464745.013.36.

Texte intégral

Résumé :

Although mind-wandering and dreaming often appear as trivial or distracting cognitive processes, this chapter suggests that they may also contribute to the evaluation, sorting, and saving of representations of recent events of future value to an individual. But 50 years after spontaneous imagery—night dreaming—was first compared to concurrent cortical EEG, there is limited hard evidence on the neural processes that produce either visual dreaming imagery or the speech imagery of waking spontaneous thought. The authors propose here an outline of a neurocognitive model of such processes with suggestions for future research that may contribute to a better understanding of their utility.

Styles APA, Harvard, Vancouver, ISO, etc.

6

(Editor), David G. Stork, et Marcus E. Hennecke (Editor), dir. Speechreading by Humans and Machines : Models, Systems, and Applications (NATO ASI Series / Computer and Systems Sciences). Springer, 1996.

Trouver le texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

7

Raine, Michael. A New Form of Silent Cinema. Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780190254971.003.0007.

Texte intégral

Résumé :

Ozu Yasujiro wanted to make a “new form” of silent cinema before it disappeared, something sophisticated in a fragile medium that was forced to do obvious things. His goal was to create, for the first and only time in Japanese cinema, films in which audible dialogue was displaced in favor of the intertitle as a form of “visual repartee.” After Western cinema switched to the talkie and while Japan was in the process of converting, Ozu took advantage of the transition from benshi-dialogue to actor-dialogue cinema to invent something like Hollywood silent film: a visual mode of narration with musical accompaniment and speech carried as intertitles. Ozu used the “sound version” to shut the benshi up, allowing emotion in An Inn in Tokyo to “float” as the unspoken disappointment behind banal dialogue, heard synaesthetically in the rhythm of alternating titles and images in a lyrical mise en scène.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Tobin, Claudia. Modernism and Still Life. Edinburgh University Press, 2020. http://dx.doi.org/10.3366/edinburgh/9781474455138.001.0001.

Texte intégral

Résumé :

The late nineteenth and early twentieth centuries have been characterised as the ‘age of speed’ but they also witnessed a reanimation of still life across different art forms. This book takes an original approach to still life in modern literature and the visual arts by examining the potential for movement and transformation in the idea of stillness and the ordinary. It proposes that still life can be understood not only as a genre of visual art but also as a mode of attentiveness and a way of being in the world. It ranges widely in its material, taking Cézanne and literary responses to his still life painting as its point of departure. It investigates constellations of writers, visual artists and dancers including D. H. Lawrence, Virginia Woolf, David Jones, Winifred Nicholson, Wallace Stevens, and lesser-known figures including Charles Mauron and Margaret Morris. Modernism and Still Life reveals that at the heart of modern art were forms of stillness that were intimately bound up with movement. The still life emerges charged with animation, vibration and rhythm, an unstable medium, unexpectedly vital and well suited to the expression of modern concerns.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Titus, Barbara. Hearing Maskanda. Bloomsbury Publishing Inc, 2022. http://dx.doi.org/10.5040/9781501377792.

Texte intégral

Résumé :

Hearing Maskanda outlines how people make sense of their world through practicing and hearing maskanda music in South Africa. Having emerged in response to the experience of forced labour migration in the early 20th century, maskanda continues to straddle a wide range of cultural and musical universes. Maskanda musicians reground ideas, (hi)stories, norms, speech and beliefs that have been uprooted in centuries of colonial and apartheid rule by using specific musical textures, vocalities and idioms. With an autoethnographic approach of how she came to understand and participate in maskanda, Titus indicates some instances where her acts of knowledge formation confronted, bridged or invaded those of other maskanda participants. Thus, the book not only aims to demonstrate the epistemic importance of music and aurality but also the performative and creative dimension of academic epistemic approaches such as ethnography, historiography and music analysis, that aim towards conceptualization and (visual) representation. In doing so, the book unearths the colonialist potential of knowledge formation at large and disrupts modes of thinking and (academic) research that are globally normative.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Berressem, Hanjo. Felix Guattari's Schizoanalytic Ecology. Edinburgh University Press, 2020. http://dx.doi.org/10.3366/edinburgh/9781474450751.001.0001.

Texte intégral

Résumé :

Félix Guattari’s Schizoanalytic Ecology argues that Guattari’s ecosophy, which it regards as a ‘schizoanalytic ecology’ or ‘schizoecology’ for short, is the most consistent conceptual spine of Guattari’s oeuvre. Engaging with the whole spectrum and range of Guattari’s, as well as Guattari and Deleuze’s works, it maintains that underneath Guattari’s staccato style, his hectic speeds and his conceptual acrobatics, lie a number of insistent questions and demands. How to make life on this planet better, more liveable, more in tune with and adequate to the planet’s functioning? How to do this without false romanticism or nostalgia? At the conceptual centre of the book lies the first comprehensive and in-depth analysis and explication of the diagrammatic meta-model that Guattari develops in his book Schizoanalytic Cartographies, his magnum opus and conceptual legacy. It is here that Guattari develops, in an extremely formalized manner, the schizoecological complementarity of what he calls ‘the given’ (the world) and of ‘the giving’ (the world’s creatures). After considering the implications of schizoecology for the fields of literature, the visual arts, architecture, and research, this book, which is the companion volume to Gilles Deleuze’s Luminous Philosophy, culminates in readings of Guattari’s explicitly ecological texts The Three Ecologies and Chaosmosis.

Styles APA, Harvard, Vancouver, ISO, etc.

Chapitres de livres sur le sujet "Visual speech model"

1

Grant, Ken W., et Joshua G. W. Bernstein. « Toward a Model of Auditory-Visual Speech Intelligibility ». Dans Multisensory Processes, 33–57. Cham : Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-10461-0_3.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

MeiXia, Zheng, et Jia XiBin. « Joint LBP and DCT Model for Visual Speech ». Dans Advances in Intelligent and Soft Computing, 101–7. Berlin, Heidelberg : Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-27866-2_13.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Akinpelu, Samson, et Serestina Viriri. « A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification ». Dans Advances in Visual Computing, 419–30. Cham : Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20716-7_33.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

4

Deena, Salil, et Aphrodite Galata. « Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model ». Dans Advances in Visual Computing, 89–100. Berlin, Heidelberg : Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-10331-5_9.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Bothe, Hans H. « A visual speech model based on fuzzy-neuro methods ». Dans Image Analysis and Processing, 152–58. Berlin, Heidelberg : Springer Berlin Heidelberg, 1995. http://dx.doi.org/10.1007/3-540-60298-4_251.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

6

Baayen, R. Harald, Robert Schreuder et Richard Sproat. « Morphology in the Mental Lexicon : A Computational Model for Visual Word Recognition ». Dans Text, Speech and Language Technology, 267–93. Dordrecht : Springer Netherlands, 2000. http://dx.doi.org/10.1007/978-94-010-9458-0_9.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

7

Seong, Thum Wei, Mohd Zamri Ibrahim, Nurul Wahidah Binti Arshad et D. J. Mulvaney. « A Comparison of Model Validation Techniques for Audio-Visual Speech Recognition ». Dans IT Convergence and Security 2017, 112–19. Singapore : Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-6451-7_14.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

8

Sun, Zhongbo, Yannan Wang et Li Cao. « An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement ». Dans MultiMedia Modeling, 722–28. Cham : Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_60.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

9

Xie, Lei, et Zhi-Qiang Liu. « Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition ». Dans Advances in Machine Learning and Cybernetics, 994–1004. Berlin, Heidelberg : Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11739685_104.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

10

Matthews, Iain, J. Andrew Bangham, Richard Harvey et Stephen Cox. « A comparison of active shape model and scale decomposition based features for visual speech recognition ». Dans Lecture Notes in Computer Science, 514–28. Berlin, Heidelberg : Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0054762.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Actes de conférences sur le sujet "Visual speech model"

1

Wang, Wupeng, Chao Xing, Dong Wang, Xiao Chen et Fengyu Sun. « A Robust Audio-Visual Speech Enhancement Model ». Dans ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. http://dx.doi.org/10.1109/icassp40776.2020.9053033.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

Tiippana, Kaisa, Ilmari Kurki et Tarja Peromaa. « Applying the summation model in audiovisual speech perception ». Dans The 14th International Conference on Auditory-Visual Speech Processing. ISCA : ISCA, 2017. http://dx.doi.org/10.21437/avsp.2017-28.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Fan, Heng, Jinhai Xiang, Guoliang Li et Fuchuan Ni. « Robust visual tracking via deep discriminative model ». Dans 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. http://dx.doi.org/10.1109/icassp.2017.7952492.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

4

Liu, Li, Gang Feng et Denis Beautemps. « Inner Lips Parameter Estimation based on Adaptive Ellipse Model ». Dans The 14th International Conference on Auditory-Visual Speech Processing. ISCA : ISCA, 2017. http://dx.doi.org/10.21437/avsp.2017-15.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Israel Santos, Timothy, Andrew Abel, Nick Wilson et Yan Xu. « Speaker-Independent Visual Speech Recognition with the Inception V3 Model ». Dans 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021. http://dx.doi.org/10.1109/slt48900.2021.9383540.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

6

Edge, J. D., A. Hilton et P. Jackson. « Model-based synthesis of visual speech movements from 3D video ». Dans SIGGRAPH '09 : Posters. New York, New York, USA : ACM Press, 2009. http://dx.doi.org/10.1145/1599301.1599309.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

7

Dinet, Eric, et Emmanuel Kubicki. « A selective attention model for predicting visual attractors ». Dans ICASSP 2008 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008. http://dx.doi.org/10.1109/icassp.2008.4517705.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

8

Cosker, D. P. « Speaker-independent speech-driven facial animation using a hierarchical model ». Dans International Conference on Visual Information Engineering (VIE 2003). Ideas, Applications, Experience. IEE, 2003. http://dx.doi.org/10.1049/cp:20030514.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

9

Dai, Pingyang, Yanlong Luo, Weisheng Liu, Cuihua Li et Yi Xie. « Robust visual tracking via part-based sparsity model ». Dans ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6637963.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

10

Wu, Jinjian, Guangming Shi, Weisi Lin et C. C. Jay Kuo. « Enhanced just noticeable difference model with visual regularity consideration ». Dans 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016. http://dx.doi.org/10.1109/icassp.2016.7471943.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!