Dissertations / Theses on the topic 'Visual speech model'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 18 dissertations / theses for your research on the topic 'Visual speech model.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Somasundaram, Arunachalam. "A facial animation model for expressive audio-visual speech." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1148973645.
Full textVan, Wassenhove Virginie. "Cortical dynamics of auditory-visual speech a forward model of multisensory integration /." College Park, Md. : University of Maryland, 2004. http://hdl.handle.net/1903/1871.
Full textThesis research directed by: Neuroscience and Cognitive Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
Cosker, Darren. "Animation of a hierarchical image based facial model and perceptual analysis of visual speech." Thesis, Cardiff University, 2005. http://orca.cf.ac.uk/56003/.
Full textTheobald, Barry-John. "Visual speech synthesis using shape and appearance models." Thesis, University of East Anglia, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.396720.
Full textDean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.
Full textDean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.
Full textMukherjee, Niloy 1978. "Spontaneous speech recognition using visual context-aware language models." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/62380.
Full textIncludes bibliographical references (p. 83-88).
The thesis presents a novel situationally-aware multimodal spoken language system called Fuse that performs speech understanding for visual object selection. An experimental task was created in which people were asked to refer, using speech alone, to objects arranged on a table top. During training, Fuse acquires a grammar and vocabulary from a "show-and-tell" procedure in which visual scenes are paired with verbal descriptions of individual objects. Fuse determines a set of visually salient words and phrases and associates them to a set of visual features. Given a new scene, Fuse uses the acquired knowledge to generate class-based language models conditioned on the objects present in the scene as well as a spatial language model that predicts the occurrences of spatial terms conditioned on target and landmark objects. The speech recognizer in Fuse uses a weighted mixture of these language models to search for more likely interpretations of user speech in context of the current scene. During decoding, the weights are updated using a visual attention model which redistributes attention over objects based on partially decoded utterances. The dynamic situationally-aware language models enable Fuse to jointly infer spoken language utterances underlying speech signals as well as the identities of target objects they refer to. In an evaluation of the system, visual situationally-aware language modeling shows significant , more than 30 %, decrease in speech recognition and understanding error rates. The underlying ideas of situation-aware speech understanding that have been developed in Fuse may may be applied in numerous areas including assistive and mobile human-machine interfaces.
by Niloy Mukherjee.
S.M.
Kalantari, Shahram. "Improving spoken term detection using complementary information." Thesis, Queensland University of Technology, 2015. https://eprints.qut.edu.au/90074/1/Shahram_Kalantari_Thesis.pdf.
Full textDeena, Salil Prashant. "Visual speech synthesis by learning joint probabilistic models of audio and video." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/visual-speech-synthesis-by-learning-joint-probabilistic-models-of-audio-and-video(bdd1a78b-4957-469e-8be4-34e83e676c79).html.
Full textAhmad, Nasir. "A motion based approach for audio-visual automatic speech recognition." Thesis, Loughborough University, 2011. https://dspace.lboro.ac.uk/2134/8564.
Full textRoxburgh, Zoe. "Visualising articulation : real-time ultrasound visual biofeedback and visual articulatory models and their use in treating speech sound disorders associated with submucous cleft palate." Thesis, Queen Margaret University, 2018. https://eresearch.qmu.ac.uk/handle/20.500.12289/8899.
Full textNavarathna, Rajitha Dharshana Bandara. "Robust recognition of human behaviour in challenging environments." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/66235/1/Rajitha%20Dharshana%20Bandara_Navarathna_Thesis.pdf.
Full textFernández, López Adriana. "Learning of meaningful visual representations for continuous lip-reading." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/671206.
Full textEn les darreres dècades, hi ha hagut un interès creixent en la descodificació de la parla utilitzant exclusivament senyals visuals, es a dir, imitant la capacitat humana de llegir els llavis, donant lloc a sistemes de lectura automàtica de llavis (ALR). No obstant això, se sap que l’accès a la parla a través del canal visual està subjecte a moltes limitacions en comparació amb el senyal acústic, es a dir, s’ha argumentat que els humans poden llegir al voltant del 30% de la informació dels llavis, i la resta es completa fent servir el context. Així, un dels principals reptes de l’ALR resideix en les ambigüitats visuals que sorgeixen a escala de paraula, destacant que no tots els sons que escoltem es poden distingir fàcilment observant els llavis. A la literatura, els primers sistemes ALR van abordar tasques de reconeixement senzilles, com ara el reconeixement de l’alfabet o els dígits, però progressivament van passar a entorns mes complexos i realistes que han conduït a diversos sistemes recents dirigits a la lectura continua dels llavis. En gran manera, aquests avenços han estat possibles gracies a la construcció de sistemes potents basats en arquitectures d’aprenentatge profund que han començat a substituir ràpidament els sistemes tradicionals. Tot i que les taxes de reconeixement de la lectura continua dels llavis poden semblar modestes en comparació amb les assolides pels sistemes basats en audio, és evident que el camp ha fet un pas endavant. Curiosament, es pot observar un efecte anàleg quan els humans intenten descodificar la parla: donats senyals sense soroll, la majoria de la gent pot descodificar el canal d’àudio sense esforç¸, però tindria dificultats per llegir els llavis, ja que l’ambigüitat dels senyals visuals fa necessari l’ús de context addicional per descodificar el missatge. En aquesta tesi explorem el modelatge adequat de representacions visuals amb l’objectiu de millorar la lectura contínua dels llavis. Amb aquest objectiu, presentem diferents mecanismes basats en dades per fer front als principals reptes de la lectura de llavis relacionats amb les ambigüitats o la dependència dels parlants dels senyals visuals. Els nostres resultats destaquen els avantatges d’una correcta codificació del canal visual, per a la qual les característiques més útils són aquelles que codifiquen les posicions corresponents dels llavis d’una manera similar, independentment de l’orador. Aquest fet obre la porta a i) la lectura de llavis en molts idiomes diferents sense necessitat de conjunts de dades a gran escala, i ii) a l’augment de la contribució del canal visual en sistemes de parla audiovisuals.´ D’altra banda, els nostres experiments identifiquen una tendència a centrar-se en iii la modelització del context temporal com la clau per avançar en el camp, on hi ha la necessitat de models d’ALR que s’entrenin en conjunts de dades que incloguin una gran variabilitat de la parla a diversos nivells de context. En aquesta tesi, demostrem que tant el modelatge adequat de les representacions visuals com la capacitat de retenir el context a diversos nivells són condicions necessàries per construir sistemes de lectura de llavis amb èxit.
Chilakapati, Praveen. "DRIVING SIMULATOR VALIDATION AND REAR-END CRASH RISK ANALYSIS AT A SIGNALISED INTERSECTION." Master's thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2925.
Full textM.S.
Department of Civil and Environmental Engineering
Engineering and Computer Science
Civil Engineering
Yau, Wai Chee, and waichee@ieee org. "Video Analysis of Mouth Movement Using Motion Templates for Computer-based Lip-Reading." RMIT University. Electrical and Computer Engineering, 2008. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20081209.162504.
Full textJalkebo, Charlotte. "Placement of Controls in Construction Equipment Using Operators´Sitting Postures : Process and Recommendations." Thesis, Linköpings universitet, Maskinkonstruktion, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-108980.
Full textLEONE, GIUSEPPE RICCARDO. "Comunicazione bimodale nel web per mezzo di facce parlanti 3D." Doctoral thesis, 2014. http://hdl.handle.net/2158/874631.
Full textRajaram, Siddharth. "Selective attention and speech processing in the cortex." Thesis, 2014. https://hdl.handle.net/2144/13312.
Full text