Academic literature on the topic 'Visual speech model'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual speech model.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Visual speech model"

1

Jia, Xi Bin, and Mei Xia Zheng. "Video Based Visual Speech Feature Model Construction." Applied Mechanics and Materials 182-183 (June 2012): 1367–71. http://dx.doi.org/10.4028/www.scientific.net/amm.182-183.1367.

Full text
Abstract:
This paper aims to give a solutions for the construction of chinese visual speech feature model based on HMM. We propose and discuss three kind representation model of the visual speech which are lip geometrical features, lip motion features and lip texture features. The model combines the advantages of the local LBP and global DCT texture information together, which shows better performance than the single feature. Equally the model combines the advantages of the local LBP and geometrical information together is better than single feature. By computing the recognition rate of the visemes from the model, the paper shows the HMM which describing the dynamic of speech, coupled with the combined feature for describing the global and local texture is the best model.
APA, Harvard, Vancouver, ISO, and other styles
2

Mishra, Saumya, Anup Kumar Gupta, and Puneet Gupta. "DARE: Deceiving Audio–Visual speech Recognition model." Knowledge-Based Systems 232 (November 2021): 107503. http://dx.doi.org/10.1016/j.knosys.2021.107503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Brahme, Aparna, and Umesh Bhadade. "Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition." International Journal of Image and Graphics 20, no. 04 (October 2020): 2050029. http://dx.doi.org/10.1142/s0219467820500291.

Full text
Abstract:
In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a “Visual Word” as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1[Formula: see text]s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.
APA, Harvard, Vancouver, ISO, and other styles
4

Metzger, Brian A. ,., John F. ,. Magnotti, Elizabeth Nesbitt, Daniel Yoshor, and Michael S. ,. Beauchamp. "Cross-modal suppression model of speech perception: Visual information drives suppressive interactions between visual and auditory speech in pSTG." Journal of Vision 20, no. 11 (October 20, 2020): 434. http://dx.doi.org/10.1167/jov.20.11.434.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hazen, T. J. "Visual model structures and synchrony constraints for audio-visual speech recognition." IEEE Transactions on Audio, Speech and Language Processing 14, no. 3 (May 2006): 1082–89. http://dx.doi.org/10.1109/tsa.2005.857572.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Fagel, Sascha. "Merging methods of speech visualization." ZAS Papers in Linguistics 40 (January 1, 2005): 19–32. http://dx.doi.org/10.21248/zaspil.40.2005.255.

Full text
Abstract:
The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic) face model and a 3D synthetic head. Both face models can be driven by both the data based and the rule based articulation model. The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. For every virtual articulator (articulation parameter) the 3D synthetic face model defines a set of displacement vectors for the vertices of the 3D objects of the head. The vertices of the 3D synthetic head then are moved by linear combinations of these displacement vectors to visualize articulation movements. For the image based video synthesis a single reference image is deformed to fit the facial properties derived from the control commands. Facial feature points and facial displacements have to be defined for the reference image. The algorithm can also use an image database with appropriately annotated facial properties. An example database was built automatically from video recordings. Both the 3D synthetic face and the image based face generate visual speech that is capable to increase the intelligibility of audible speech. Other well known image based audiovisual speech synthesis systems like MIKETALK and VIDEO REWRITE concatenate pre-recorded single images or video sequences, respectively. Parametric talking heads like BALDI control a parametric face with a parametric articulation model. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.
APA, Harvard, Vancouver, ISO, and other styles
7

Loh, Marco, Gabriele Schmid, Gustavo Deco, and Wolfram Ziegler. "Audiovisual Matching in Speech and Nonspeech Sounds: A Neurodynamical Model." Journal of Cognitive Neuroscience 22, no. 2 (February 2010): 240–47. http://dx.doi.org/10.1162/jocn.2009.21202.

Full text
Abstract:
Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult in the nonspeech domain as compared to the speech domain. We constructed a biophysically realistic neural network model simulating this experimental evidence. We propose that a stronger connection between modalities in speech underlies the behavioral difference between the speech and the nonspeech domain. This could be the result of more extensive experience with speech stimuli. Because the match-to-sample paradigm does not allow us to draw conclusions concerning the integration of auditory and visual information, we also simulated two further conditions based on the same paradigm, which tested the integration of auditory and visual information within a single stimulus. New experimental data for these two conditions support the simulation results and suggest that audiovisual integration of discordant stimuli is stronger in speech than in nonspeech stimuli. According to the simulations, the connection strength between auditory and visual information, on the one hand, determines how well auditory information can be assigned to visual information, and on the other hand, it influences the magnitude of multimodal integration.
APA, Harvard, Vancouver, ISO, and other styles
8

Yu, Wentao, Steffen Zeiler, and Dorothea Kolossa. "Reliability-Based Large-Vocabulary Audio-Visual Speech Recognition." Sensors 22, no. 15 (July 23, 2022): 5501. http://dx.doi.org/10.3390/s22155501.

Full text
Abstract:
Audio-visual speech recognition (AVSR) can significantly improve performance over audio-only recognition for small or medium vocabularies. However, current AVSR, whether hybrid or end-to-end (E2E), still does not appear to make optimal use of this secondary information stream as the performance is still clearly diminished in noisy conditions for large-vocabulary systems. We, therefore, propose a new fusion architecture—the decision fusion net (DFN). A broad range of time-variant reliability measures are used as an auxiliary input to improve performance. The DFN is used in both hybrid and E2E models. Our experiments on two large-vocabulary datasets, the Lip Reading Sentences 2 and 3 (LRS2 and LRS3) corpora, show highly significant improvements in performance over previous AVSR systems for large-vocabulary datasets. The hybrid model with the proposed DFN integration component even outperforms oracle dynamic stream-weighting, which is considered to be the theoretical upper bound for conventional dynamic stream-weighting approaches. Compared to the hybrid audio-only model, the proposed DFN achieves a relative word-error-rate reduction of 51% on average, while the E2E-DFN model, with its more competitive audio-only baseline system, achieves a relative word error rate reduction of 43%, both showing the efficacy of our proposed fusion architecture.
APA, Harvard, Vancouver, ISO, and other styles
9

How, Chun Kit, Ismail Mohd Khairuddin, Mohd Azraai Mohd Razman, Anwar P. P. Abdul Majeed, and Wan Hasbullah Mohd Isa. "Development of Audio-Visual Speech Recognition using Deep-Learning Technique." MEKATRONIKA 4, no. 1 (June 27, 2022): 88–95. http://dx.doi.org/10.15282/mekatronika.v4i1.8625.

Full text
Abstract:
Deep learning is a technique with artificial intelligent (AI) that simulate humans’ learning behavior. Audio-visual speech recognition is important for the listener understand the emotions behind the spoken words truly. In this thesis, two different deep learning models, Convolutional Neural Network (CNN) and Deep Neural Network (DNN), were developed to recognize the speech’s emotion from the dataset. Pytorch framework with torchaudio library was used. Both models were given the same training, validation, testing, and augmented datasets. The training will be stopped when the training loop reaches ten epochs, or the validation loss function does not improve for five epochs. At the end, the highest accuracy and lowest loss function of CNN model in the training dataset are 76.50% and 0.006029 respectively, meanwhile the DNN model achieved 75.42% and 0.086643 respectively. Both models were evaluated using confusion matrix. In conclusion, CNN model has higher performance than DNN model, but needs to improvise as the accuracy of testing dataset is low and the loss function is high.
APA, Harvard, Vancouver, ISO, and other styles
10

Holubenko, Nataliia. "Cognitive and Intersemiotic Model of the Visual and Verbal Modes in a Screen Adaptation to Literary Texts." World Journal of English Language 12, no. 6 (July 18, 2022): 129. http://dx.doi.org/10.5430/wjel.v12n6p129.

Full text
Abstract:
The aim of the study is to examine screen adaptations from the perspective of cognitive and intersemiotic models of the visual and verbal modes. The purpose of the study is to express the specificity of a screen text which is defined as a combination of three media: speech, image, and music. The scope is to demonstrate the general framework of an intersemiotic translation from a new point of view – like a transliteration. The method of the research refers to semiotic and stylistic analyzes – methods of transformation from one sign system into another from prose works with regard to their cognitive as well as narrative and stylistic features (Zhong, Chen, & Xuan, 2021). Thus, the study analyses such specific relations between the verbal and visual modes in film adaptations of prose literature as a more detailed description of event episodes, events’ temporal structure, presentation of author’s thoughts and characters’ thoughts; their mental activity formulated indirect speech and inner speech that is shown only by the actor’s intonation. The results of the study made possible to show the types of inner speech in their adaptations: author’s thoughts, characters’ thoughts which are presented only by the verbal mode, and visual modes’ inner speeches that combine the modes of character’s voice and image. One can conclude, that taking into account intersemiotic relations between the visual and verbal spaces, it is possible to explain, for instance, how the words of characters are replaced by their facial expressions, gestures, or intonations.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Visual speech model"

1

Somasundaram, Arunachalam. "A facial animation model for expressive audio-visual speech." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1148973645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Van, Wassenhove Virginie. "Cortical dynamics of auditory-visual speech a forward model of multisensory integration /." College Park, Md. : University of Maryland, 2004. http://hdl.handle.net/1903/1871.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2004.
Thesis research directed by: Neuroscience and Cognitive Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
3

Cosker, Darren. "Animation of a hierarchical image based facial model and perceptual analysis of visual speech." Thesis, Cardiff University, 2005. http://orca.cf.ac.uk/56003/.

Full text
Abstract:
In this Thesis a hierarchical image-based 2D talking head model is presented, together with robust automatic and semi-automatic animation techniques, and a novel perceptual method for evaluating visual-speech based on the McGurk effect. The novelty of the hierarchical facial model stems from the fact that sub-facial areas are modelled individually. To produce a facial animation, animations for a set of chosen facial areas are first produced, either by key-framing sub-facial parameter values, or using a continuous input speech signal, and then combined into a full facial output. Modelling hierarchically has several attractive qualities. It isolates variation in sub-facial regions from the rest of the face, and therefore provides a high degree of control over different facial parts along with meaningful image based animation parameters. The automatic synthesis of animations may be achieved using speech not originally included in the training set. The model is also able to automatically animate pauses, hesitations and non-verbal (or non-speech related) sounds and actions. To automatically produce visual-speech, two novel analysis and synthesis methods are proposed. The first method utilises a Speech-Appearance Model (SAM), and the second uses a Hidden Markov Coarticulation Model (HMCM) - based on a Hidden Markov Model (HMM). To evaluate synthesised animations (irrespective of whether they are rendered semi automatically, or using speech), a new perceptual analysis approach based on the McGurk effect is proposed. This measure provides both an unbiased and quantitative method for evaluating talking head visual speech quality and overall perceptual realism. A combination of this new approach, along with other objective and perceptual evaluation techniques, are employed for a thorough evaluation of hierarchical model animations.
APA, Harvard, Vancouver, ISO, and other styles
4

Theobald, Barry-John. "Visual speech synthesis using shape and appearance models." Thesis, University of East Anglia, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.396720.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Full text
Abstract:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
APA, Harvard, Vancouver, ISO, and other styles
6

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Full text
Abstract:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
APA, Harvard, Vancouver, ISO, and other styles
7

Mukherjee, Niloy 1978. "Spontaneous speech recognition using visual context-aware language models." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/62380.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.
Includes bibliographical references (p. 83-88).
The thesis presents a novel situationally-aware multimodal spoken language system called Fuse that performs speech understanding for visual object selection. An experimental task was created in which people were asked to refer, using speech alone, to objects arranged on a table top. During training, Fuse acquires a grammar and vocabulary from a "show-and-tell" procedure in which visual scenes are paired with verbal descriptions of individual objects. Fuse determines a set of visually salient words and phrases and associates them to a set of visual features. Given a new scene, Fuse uses the acquired knowledge to generate class-based language models conditioned on the objects present in the scene as well as a spatial language model that predicts the occurrences of spatial terms conditioned on target and landmark objects. The speech recognizer in Fuse uses a weighted mixture of these language models to search for more likely interpretations of user speech in context of the current scene. During decoding, the weights are updated using a visual attention model which redistributes attention over objects based on partially decoded utterances. The dynamic situationally-aware language models enable Fuse to jointly infer spoken language utterances underlying speech signals as well as the identities of target objects they refer to. In an evaluation of the system, visual situationally-aware language modeling shows significant , more than 30 %, decrease in speech recognition and understanding error rates. The underlying ideas of situation-aware speech understanding that have been developed in Fuse may may be applied in numerous areas including assistive and mobile human-machine interfaces.
by Niloy Mukherjee.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
8

Kalantari, Shahram. "Improving spoken term detection using complementary information." Thesis, Queensland University of Technology, 2015. https://eprints.qut.edu.au/90074/1/Shahram_Kalantari_Thesis.pdf.

Full text
Abstract:
This research has made contributions to the area of spoken term detection (STD), defined as the process of finding all occurrences of a specified search term in a large collection of speech segments. The use of visual information in the form of lip movements of the speaker in addition to audio and the use of topic of the speech segments, and the expected frequency of words in the target speech domain, are proposed. By using these complementary information, improvement in the performance of STD has been achieved which enables efficient search of key words in large collection of multimedia documents.
APA, Harvard, Vancouver, ISO, and other styles
9

Deena, Salil Prashant. "Visual speech synthesis by learning joint probabilistic models of audio and video." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/visual-speech-synthesis-by-learning-joint-probabilistic-models-of-audio-and-video(bdd1a78b-4957-469e-8be4-34e83e676c79).html.

Full text
Abstract:
Visual speech synthesis deals with synthesising facial animation from an audio representation of speech. In the last decade or so, data-driven approaches have gained prominence with the development of Machine Learning techniques that can learn an audio-visual mapping. Many of these Machine Learning approaches learn a generative model of speech production using the framework of probabilistic graphical models, through which efficient inference algorithms can be developed for synthesis. In this work, the audio and visual parameters are assumed to be generated from an underlying latent space that captures the shared information between the two modalities. These latent points evolve through time according to a dynamical mapping and there are mappings from the latent points to the audio and visual spaces respectively. The mappings are modelled using Gaussian processes, which are non-parametric models that can represent a distribution over non-linear functions. The result is a non-linear state-space model. It turns out that the state-space model is not a very accurate generative model of speech production because it assumes a single dynamical model, whereas it is well known that speech involves multiple dynamics (for e.g. different syllables) that are generally non-linear. In order to cater for this, the state-space model can be augmented with switching states to represent the multiple dynamics, thus giving a switching state-space model. A key problem is how to infer the switching states so as to model the multiple non-linear dynamics of speech, which we address by learning a variable-order Markov model on a discrete representation of audio speech. Various synthesis methods for predicting visual from audio speech are proposed for both the state-space and switching state-space models. Quantitative evaluation, involving the use of error and correlation metrics between ground truth and synthetic features, is used to evaluate our proposed method in comparison to other probabilistic models previously applied to the problem. Furthermore, qualitative evaluation with human participants has been conducted to evaluate the realism, perceptual characteristics and intelligibility of the synthesised animations. The results are encouraging and demonstrate that by having a joint probabilistic model of audio and visual speech that caters for the non-linearities in audio-visual mapping, realistic visual speech can be synthesised from audio speech.
APA, Harvard, Vancouver, ISO, and other styles
10

Ahmad, Nasir. "A motion based approach for audio-visual automatic speech recognition." Thesis, Loughborough University, 2011. https://dspace.lboro.ac.uk/2134/8564.

Full text
Abstract:
The research work presented in this thesis introduces novel approaches for both visual region of interest extraction and visual feature extraction for use in audio-visual automatic speech recognition. In particular, the speaker‘s movement that occurs during speech is used to isolate the mouth region in video sequences and motionbased features obtained from this region are used to provide new visual features for audio-visual automatic speech recognition. The mouth region extraction approach proposed in this work is shown to give superior performance compared with existing colour-based lip segmentation methods. The new features are obtained from three separate representations of motion in the region of interest, namely the difference in luminance between successive images, block matching based motion vectors and optical flow. The new visual features are found to improve visual-only and audiovisual speech recognition performance when compared with the commonly-used appearance feature-based methods. In addition, a novel approach is proposed for visual feature extraction from either the discrete cosine transform or discrete wavelet transform representations of the mouth region of the speaker. In this work, the image transform is explored from a new viewpoint of data discrimination; in contrast to the more conventional data preservation viewpoint. The main findings of this work are that audio-visual automatic speech recognition systems using the new features extracted from the frequency bands selected according to their discriminatory abilities generally outperform those using features designed for data preservation. To establish the noise robustness of the new features proposed in this work, their performance has been studied in presence of a range of different types of noise and at various signal-to-noise ratios. In these experiments, the audio-visual automatic speech recognition systems based on the new approaches were found to give superior performance both to audio-visual systems using appearance based features and to audio-only speech recognition systems.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Visual speech model"

1

G, Stork David, and Hennecke Marcus E, eds. Speechreading by humans and machines: Models, systems, and applications. Berlin: Springer, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Hidden Markov Models for Visual Speech Synthesis in Limited Data Environments. Storming Media, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Stork, David G., and Marcus E. Hennecke. Speechreading by Humans and Machines: Models, Systems, and Applications. Springer, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Stork, David G., and Marcus E. Hennecke. Speechreading by Humans and Machines: Models, Systems, and Applications. Springer London, Limited, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Antrobus, John S. How Does the Waking and Sleeping Brain Produce Spontaneous Thought and Imagery, and Why? Edited by Kalina Christoff and Kieran C. R. Fox. Oxford University Press, 2018. http://dx.doi.org/10.1093/oxfordhb/9780190464745.013.36.

Full text
Abstract:
Although mind-wandering and dreaming often appear as trivial or distracting cognitive processes, this chapter suggests that they may also contribute to the evaluation, sorting, and saving of representations of recent events of future value to an individual. But 50 years after spontaneous imagery—night dreaming—was first compared to concurrent cortical EEG, there is limited hard evidence on the neural processes that produce either visual dreaming imagery or the speech imagery of waking spontaneous thought. The authors propose here an outline of a neurocognitive model of such processes with suggestions for future research that may contribute to a better understanding of their utility.
APA, Harvard, Vancouver, ISO, and other styles
6

(Editor), David G. Stork, and Marcus E. Hennecke (Editor), eds. Speechreading by Humans and Machines: Models, Systems, and Applications (NATO ASI Series / Computer and Systems Sciences). Springer, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Raine, Michael. A New Form of Silent Cinema. Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780190254971.003.0007.

Full text
Abstract:
Ozu Yasujiro wanted to make a “new form” of silent cinema before it disappeared, something sophisticated in a fragile medium that was forced to do obvious things. His goal was to create, for the first and only time in Japanese cinema, films in which audible dialogue was displaced in favor of the intertitle as a form of “visual repartee.” After Western cinema switched to the talkie and while Japan was in the process of converting, Ozu took advantage of the transition from benshi-dialogue to actor-dialogue cinema to invent something like Hollywood silent film: a visual mode of narration with musical accompaniment and speech carried as intertitles. Ozu used the “sound version” to shut the benshi up, allowing emotion in An Inn in Tokyo to “float” as the unspoken disappointment behind banal dialogue, heard synaesthetically in the rhythm of alternating titles and images in a lyrical mise en scène.
APA, Harvard, Vancouver, ISO, and other styles
8

Tobin, Claudia. Modernism and Still Life. Edinburgh University Press, 2020. http://dx.doi.org/10.3366/edinburgh/9781474455138.001.0001.

Full text
Abstract:
The late nineteenth and early twentieth centuries have been characterised as the ‘age of speed’ but they also witnessed a reanimation of still life across different art forms. This book takes an original approach to still life in modern literature and the visual arts by examining the potential for movement and transformation in the idea of stillness and the ordinary. It proposes that still life can be understood not only as a genre of visual art but also as a mode of attentiveness and a way of being in the world. It ranges widely in its material, taking Cézanne and literary responses to his still life painting as its point of departure. It investigates constellations of writers, visual artists and dancers including D. H. Lawrence, Virginia Woolf, David Jones, Winifred Nicholson, Wallace Stevens, and lesser-known figures including Charles Mauron and Margaret Morris. Modernism and Still Life reveals that at the heart of modern art were forms of stillness that were intimately bound up with movement. The still life emerges charged with animation, vibration and rhythm, an unstable medium, unexpectedly vital and well suited to the expression of modern concerns.
APA, Harvard, Vancouver, ISO, and other styles
9

Titus, Barbara. Hearing Maskanda. Bloomsbury Publishing Inc, 2022. http://dx.doi.org/10.5040/9781501377792.

Full text
Abstract:
Hearing Maskanda outlines how people make sense of their world through practicing and hearing maskanda music in South Africa. Having emerged in response to the experience of forced labour migration in the early 20th century, maskanda continues to straddle a wide range of cultural and musical universes. Maskanda musicians reground ideas, (hi)stories, norms, speech and beliefs that have been uprooted in centuries of colonial and apartheid rule by using specific musical textures, vocalities and idioms. With an autoethnographic approach of how she came to understand and participate in maskanda, Titus indicates some instances where her acts of knowledge formation confronted, bridged or invaded those of other maskanda participants. Thus, the book not only aims to demonstrate the epistemic importance of music and aurality but also the performative and creative dimension of academic epistemic approaches such as ethnography, historiography and music analysis, that aim towards conceptualization and (visual) representation. In doing so, the book unearths the colonialist potential of knowledge formation at large and disrupts modes of thinking and (academic) research that are globally normative.
APA, Harvard, Vancouver, ISO, and other styles
10

Berressem, Hanjo. Felix Guattari's Schizoanalytic Ecology. Edinburgh University Press, 2020. http://dx.doi.org/10.3366/edinburgh/9781474450751.001.0001.

Full text
Abstract:
Félix Guattari’s Schizoanalytic Ecology argues that Guattari’s ecosophy, which it regards as a ‘schizoanalytic ecology’ or ‘schizoecology’ for short, is the most consistent conceptual spine of Guattari’s oeuvre. Engaging with the whole spectrum and range of Guattari’s, as well as Guattari and Deleuze’s works, it maintains that underneath Guattari’s staccato style, his hectic speeds and his conceptual acrobatics, lie a number of insistent questions and demands. How to make life on this planet better, more liveable, more in tune with and adequate to the planet’s functioning? How to do this without false romanticism or nostalgia? At the conceptual centre of the book lies the first comprehensive and in-depth analysis and explication of the diagrammatic meta-model that Guattari develops in his book Schizoanalytic Cartographies, his magnum opus and conceptual legacy. It is here that Guattari develops, in an extremely formalized manner, the schizoecological complementarity of what he calls ‘the given’ (the world) and of ‘the giving’ (the world’s creatures). After considering the implications of schizoecology for the fields of literature, the visual arts, architecture, and research, this book, which is the companion volume to Gilles Deleuze’s Luminous Philosophy, culminates in readings of Guattari’s explicitly ecological texts The Three Ecologies and Chaosmosis.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Visual speech model"

1

Grant, Ken W., and Joshua G. W. Bernstein. "Toward a Model of Auditory-Visual Speech Intelligibility." In Multisensory Processes, 33–57. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-10461-0_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

MeiXia, Zheng, and Jia XiBin. "Joint LBP and DCT Model for Visual Speech." In Advances in Intelligent and Soft Computing, 101–7. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-27866-2_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Akinpelu, Samson, and Serestina Viriri. "A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification." In Advances in Visual Computing, 419–30. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20716-7_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Deena, Salil, and Aphrodite Galata. "Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model." In Advances in Visual Computing, 89–100. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-10331-5_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bothe, Hans H. "A visual speech model based on fuzzy-neuro methods." In Image Analysis and Processing, 152–58. Berlin, Heidelberg: Springer Berlin Heidelberg, 1995. http://dx.doi.org/10.1007/3-540-60298-4_251.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Baayen, R. Harald, Robert Schreuder, and Richard Sproat. "Morphology in the Mental Lexicon: A Computational Model for Visual Word Recognition." In Text, Speech and Language Technology, 267–93. Dordrecht: Springer Netherlands, 2000. http://dx.doi.org/10.1007/978-94-010-9458-0_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Seong, Thum Wei, Mohd Zamri Ibrahim, Nurul Wahidah Binti Arshad, and D. J. Mulvaney. "A Comparison of Model Validation Techniques for Audio-Visual Speech Recognition." In IT Convergence and Security 2017, 112–19. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-6451-7_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Sun, Zhongbo, Yannan Wang, and Li Cao. "An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement." In MultiMedia Modeling, 722–28. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_60.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Xie, Lei, and Zhi-Qiang Liu. "Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition." In Advances in Machine Learning and Cybernetics, 994–1004. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11739685_104.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Matthews, Iain, J. Andrew Bangham, Richard Harvey, and Stephen Cox. "A comparison of active shape model and scale decomposition based features for visual speech recognition." In Lecture Notes in Computer Science, 514–28. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0054762.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Visual speech model"

1

Wang, Wupeng, Chao Xing, Dong Wang, Xiao Chen, and Fengyu Sun. "A Robust Audio-Visual Speech Enhancement Model." In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. http://dx.doi.org/10.1109/icassp40776.2020.9053033.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tiippana, Kaisa, Ilmari Kurki, and Tarja Peromaa. "Applying the summation model in audiovisual speech perception." In The 14th International Conference on Auditory-Visual Speech Processing. ISCA: ISCA, 2017. http://dx.doi.org/10.21437/avsp.2017-28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Fan, Heng, Jinhai Xiang, Guoliang Li, and Fuchuan Ni. "Robust visual tracking via deep discriminative model." In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. http://dx.doi.org/10.1109/icassp.2017.7952492.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Li, Gang Feng, and Denis Beautemps. "Inner Lips Parameter Estimation based on Adaptive Ellipse Model." In The 14th International Conference on Auditory-Visual Speech Processing. ISCA: ISCA, 2017. http://dx.doi.org/10.21437/avsp.2017-15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Israel Santos, Timothy, Andrew Abel, Nick Wilson, and Yan Xu. "Speaker-Independent Visual Speech Recognition with the Inception V3 Model." In 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021. http://dx.doi.org/10.1109/slt48900.2021.9383540.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Edge, J. D., A. Hilton, and P. Jackson. "Model-based synthesis of visual speech movements from 3D video." In SIGGRAPH '09: Posters. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1599301.1599309.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Dinet, Eric, and Emmanuel Kubicki. "A selective attention model for predicting visual attractors." In ICASSP 2008 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008. http://dx.doi.org/10.1109/icassp.2008.4517705.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cosker, D. P. "Speaker-independent speech-driven facial animation using a hierarchical model." In International Conference on Visual Information Engineering (VIE 2003). Ideas, Applications, Experience. IEE, 2003. http://dx.doi.org/10.1049/cp:20030514.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Dai, Pingyang, Yanlong Luo, Weisheng Liu, Cuihua Li, and Yi Xie. "Robust visual tracking via part-based sparsity model." In ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6637963.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wu, Jinjian, Guangming Shi, Weisi Lin, and C. C. Jay Kuo. "Enhanced just noticeable difference model with visual regularity consideration." In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016. http://dx.doi.org/10.1109/icassp.2016.7471943.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography