Dissertations / Theses on the topic 'Embodied vision'

To see the other types of publications on this topic, follow the link: Embodied vision.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 18 dissertations / theses for your research on the topic 'Embodied vision.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Wallenberg, Marcus. "Embodied Visual Object Recognition." Doctoral thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-132762.

Full text
Abstract:
Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
FaceTrack
APA, Harvard, Vancouver, ISO, and other styles
2

Saygın, Ayşe Pınar. "Embodied perception : neuropsychological and neuroimaging studies of language, vision, and attention." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2005. http://wwwlib.umi.com/cr/ucsd/fullcit?p3181787.

Full text
Abstract:
Thesis (Ph. D.)--University of California, San Diego, 2005.
Title from first page of PDF file (viewed October 21, 2005). Available online via ProQuest Digital Dissertations. Vita. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
3

Philippou, Styliane. "Vision and language : the modern Greek world embodied in architectural form." Thesis, University of Edinburgh, 1996. http://hdl.handle.net/1842/21464.

Full text
Abstract:
This thesis is concerned with architecture as a creative process which is distinct with respect to the physical appearance of its end products and the manual operation exclusively proper to the architect, yet it can be contextualised within the wider circle of human making with respect to the mental image to which all artists work - when their interest focuses on an inner world of reality - and to the noetic and imaginative operations proper to all makers. First, it embarks on a theoretical inquiry into the nature of architecture as a creative activity or process whereby man is brought into dwelling commensurate with human nature. The purpose of this inquiry is to illuminate the meaning of architecture and the formal principle that finds expression in its products, the kinship between architecture and poetry, and the pivotal role and function of language in the significant act of architectural creation. This theoretical inquiry establishes the perspective within which the architectural making process is examined in the modern Greek socio-cultural context, the distinct historical milieu of Greece after Independence. Viewing architecture as a human poetic projection, as a realisation of the unity of being with word, vision with language, this examination aims at delineating this long poetic journey that through stages of loss and recollection brought about the embodiment of the inner reality of the Greek world in architectural form, made by the hand of Dimitris Pikionis. The stages of this process are traced and paralleled to those of modern Greek poetry, a contemporaneous art process directed towards making intelligible the same reality, and one with a privileged position in the cultural life of modern Greece. Subsequently, the thesis focuses on the making process as a personal creative experience. An account of Pikionis' personal poetic journey is followed by a close reading of his most accomplished work on the Attic hills. This work is viewed as the built product of his self-knowing and world-knowing process, the embodiment of his vision of "the mythical reality of the world", the same vision of the eternal and sacred aspect of visible things that The Axion Esti of Pikionis' contemporary poet, Odysseus Elytis, seeks to evoke. A comparison is ventured between Pikionis' architectural work and The Axion Esti of Elytis, two art-acts which are not simply contemporaneous but also in the same spirit of loyalty - loyalty without servility - to the values and principles of the cultural order in which the two individual creators found themselves embedded and which, for them, conforms to the order of the natural world which they inhabit. Finally, the suggestion is put forward that the architectural act, and the art-act in general, the begetting of a significant form which 'speaks' about and of the created world-order, is essentially a 'world-redeeming' act, an act directed towards a recreation of the world as it was in the beginning.
APA, Harvard, Vancouver, ISO, and other styles
4

Shaw, Rachel. "The ties that bind : an investigation into the effect of action restriction on motor simulations." Thesis, University of Plymouth, 2014. http://hdl.handle.net/10026.1/3206.

Full text
Abstract:
This thesis examines the relationship between physical capabilities and the mental simulation of actions. Behavioural research suggests that the ability to understand of an action is directly related to the ability to perform it, an idea consistent with the Embodied theory of Cognition. The present work aims to further explore the relationship between the body and cognition and investigate whether the restriction of an action or movement disrupts the simulation of movements during motor imagery tasks, which have been shown to elicit motor activations upon performance. This theory was investigated in a series of seven motor simulation experiments during which participants’ movements were restrained. Studies 1-3 investigated simulations that occur unconsciously through the observation of manipulatable objects. Studies 4-6 investigated simulations that occur during performance of mental transformations of manipulatable objects and body part stimuli. The results of these studies found no significant difference in performance when movement was restricted compared to when free to move. Study 7 investigated simulations that occur consciously through the observation of actions performed by another individual and found a significant effect of restriction on performance. The findings of these studies indicate that the ability to perform a movement is required for the accurate simulation of actions when an action is being observed but not when a simulated action is required on a stationary object, which suggests a variable relationship between the body and cognitive processes. This thesis offers an interesting contribution to the Embodied Cognition debate and provides a further insight into the relationship between the motor and visual systems.
APA, Harvard, Vancouver, ISO, and other styles
5

Wallenberg, Marcus. "Components of Embodied Visual Object Recognition : Object Perception and Learning on a Robotic Platform." Licentiate thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93812.

Full text
Abstract:
Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, and the implementation of the system itself. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. Finally, in order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. All of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
APA, Harvard, Vancouver, ISO, and other styles
6

Büscher, Monika. "Ideas in the making : talk, vision, objects and embodied action in multi media art and landscape architecture." Thesis, Lancaster University, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.289004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Silverman, David. "The sensorimotor theory of perceptual experience." Thesis, University of St Andrews, 2014. http://hdl.handle.net/10023/5544.

Full text
Abstract:
The sensorimotor theory is an influential, non-mainstream account of perception and perceptual consciousness intended to improve in various ways on orthodox theories. It is often taken to be a variety of enactivism, and in common with enactivist cognitive science more generally, it de-emphasises the theoretical role played by internal representation and other purely neural processes, giving theoretical pride of place instead to interactive engagements between the brain, non-neural body and outside environment. In addition to offering a distinctive account of the processing that underlies perceptual consciousness, the sensorimotor theory aims to offer a new and improved account the logical and phenomenological character of perceptual experience, and the relation between physical and phenomenal states. Since its inception in a 2001 paper by O'Regan and Noë, the theory has prompted a good deal of increasingly prominent theoretical and practical work in cognitive science, as well as a large body of secondary literature in philosophy of cognitive science and philosophy of perception. In spite of its influential character, many of the theory's most basic tenets are incompletely or ambiguously defined, and it has attracted a number of prominent objections. This thesis aims to clarify the conceptual foundations of the sensorimotor theory, including the key theoretical concepts of sensorimotor contingency, sensorimotor mastery, and presence-as-access, and defends a particular understanding of the respective theoretical roles of internal representation and behavioural capacities. In so doing, the thesis aims to highlight the sensorimotor theory's virtues and defend it from some leading criticisms, with particular attention to a response by Clark which claims that perception and perceptual experience plausibly depend on the activation of representations which are not intimately involved in bodily engagements between the agent and environment. A final part of the thesis offers a sensorimotor account of the experience of temporally extended events, and shows how with reference to this we can better understand object experience.
APA, Harvard, Vancouver, ISO, and other styles
8

Jouen, Anne-Lise. "Au-delà des mots et des images, bases neurophysiologiques d'un système sémantique commun à la compréhension des phrases et des scènes visuelles." Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10322.

Full text
Abstract:
Certaines théories du fonctionnement cognitif postulent l'existence d'un système cérébral impliqué dans la compréhension sémantique indépendamment de la modalité d'entrée des stimuli. L'objectif de ce travail de thèse était d'étudier le fonctionnement d'un tel réseau, impliqué à la fois dans la compréhension de phrases et de scènes visuelles, en lien avec la théorie de la cognition incarnée. Dans la littérature, un ensemble d'aires frontotemporo- pariétales sensorimotrices et associatives sont décrites comme intervenant dans ces processus sémantiques, mais il existe un manque de consensus concernant la nature amodale de ce système et la plupart des travaux existants se sont concentrés sur l'identification de réseaux corticaux impliqués dans les représentations sémantiques, séparément pour l'une ou l'autre des modalités. De plus, les stimuli utilisés dans les protocoles expérimentaux sont généralement moins complexes que les situations interactives auxquelles nous sommes confrontés dans la vie de tous les jours. Une part importante de l'activité mentale humaine réside dans notre capacité à construire des représentations internes riches : ces modèles mentaux, impliqués dans une grande variété de processus cognitifs, nous permettent d'explorer certains souvenirs du passé, de planifier le futur ou encore de comprendre et de s'adapter à une situation en temps réel. Bien que les progrès des techniques d'Imagerie du Tenseur de Diffusion aient rendu possible la visualisation in vivo de fibres de matière blanche dans le cerveau humain, la connectivité du système sémantique amodal a très peu été étudiée jusque-là. Dans ce travail, nous avons utilisé différentes techniques (principalement de neuro-imagerie IRMf, DTI, EEG) pour mettre en évidence les bases neurophysiologiques d'un système sémantique commun impliqué dans la représentation et la compréhension de stimuli complexes verbaux et non-verbaux. Avec notre premier protocole combinant IRMf et DTI, nous nous sommes intéressés aux activations et à la connectivité cérébrales chez 19 sujets sains en train de lire des phrases ou d'observer des images représentant des événements quotidiens. Une analyse de l'activité cérébrale conjointe associée à la compréhension de ces deux types de stimuli a révélé un réseau fronto-temporo-pariétal commun, impliquant le gyrus frontal inférieur, le gyrus précentral, le cortex rétrosplénial, le gyrus temporal moyen avec une activité s'étendant jusqu'à la jonction temporo-pariétale (TPJ) et au lobe pariétal inférieur. La tractographie DTI a révélé une architecture spécifique de fibres de matière blanche, soutenant ce réseau sémantique et qui fait appel principalement aux faisceaux décrits comme la voie ventrale sémantique (IFOF, UF, ILF, MdLF). Notre seconde expérience (protocole comportemental) nous a permis d'étudier les différences interindividuelles dans la capacité à se représenter des phrases présentées visuellement ou auditivement. Nous avons démontré que les individus ne sont pas égaux quant à cette capacité de représentation et que ces différences se reflètent dans des marqueurs comportementaux tels que la facilité de représentation (évaluée par le COR, coefficient de représentabilité) et la vitesse de réponse (TR) ; mais aussi que ces différences interindividuelles trouvent une correspondance avec le nombre de fibres qui composent le MdLF, laissant supposer une implication de ce faisceau dans ces capacités de représentation. Les résultats de ce protocole comportemental, ainsi que ceux de notre troisième protocole en EEG, ont permis de mettre en évidence un effet contextuel particulièrement important pour la création d'une représentation dans les deux modalités : le contexte induit par la présentation d'un premier stimulus (phrase ou image) influence la représentation d'un second stimulus selon que celui-ci est sémantiquement cohérent ou non avec le premier stimulus présenté... [etc]
Certain theories of cognitive function postulate a neural system for processing meaning, independent of the stimulus input modality. The objective of this thesis work, in line with the embodied cognition domain, was to study functionalities of such a network involved in both sentence and visual scene comprehension. In the literature, a wide network of fronto-temporo-parietal sensorimotor and associative areas are described as being involved in this process, and while there’s a lack of consensus on the amodal nature of this system, extensive research has focused on identifying distributed cortical systems that participate in meaning representations separately in the visual and language modalities. Moreover, the stimuli used are generally less complex than everyday life situations we meet. However, a significant portion of human mental life is built upon the construction of perceptually and socially rich internal scene representations and these mental models are involved in a large variety of processes for exploring specific memories of the past, planning the future, or understanding current situations. Although diffusion-tensor imagery based techniques makes feasible the visualization of white matter tracts in the human brain, the connectivity of the semantic network has been little studied. Through different experimental protocols involving mainly neuroimaging techniques (fMRI, DTI, EEG), we were able to reveal the neurophysiological basis of this common semantic network involved in the building of representation and comprehension of rich verbal and non-verbal stimuli. With our first experiment, we examined brain activation and connectivity in 19 subjects who read sentences and viewed pictures corresponding to everyday events, in a combined fMRI and DTI study. Conjunction of activity in understanding sentences and pictures revealed a common fronto temporo-parietal network that included inferior frontal gyrus, precentral gyrus, the retrosplenial complex, and medial temporal gyrus extending into the temporo-parietal junction (TPJ) and inferior parietal lobe. DTI tractography revealed a specific architecture of white matter fibers supporting this network which involves principally the pathways described as the ventral semantic route (IFOF, UF, ILF, MdLF). Our second experiment, which is a behavioral protocol, explored interindividual differences in the ability to represent sentences presented in auditory or visual modality. We demonstrated that individuals are not equal in this capacity to represent sentences, these differences were reflected in the effects on behavioral markers including scores of ease of representation (COR) and speed of responses (TR); they are also related to the number of fibers of the MdLF which supposes a role for this fasciculus in capacities of representation. Both the results of this behavioral protocol and results from our third EEG experiment also showed that the contextual effect was significant: the context induced by the presentation of a first stimulus has the ability to influence the representation of a second stimulus when is the second is semantically consistent or not with the first presented stimulus. Our EEG results (ERPs) revealed components influenced by the available semantic information: early attentional effects which could be modality-specific and later semantic integration process common for verbal and non-verbal stimuli... [etc]
APA, Harvard, Vancouver, ISO, and other styles
9

Puhakka, Frejvall Nina. "Digital archaeology : The embodied visitor experience." Thesis, Stockholms universitet, Institutionen för arkeologi och antikens kultur, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-164860.

Full text
Abstract:
Archaeology is a field which has been impacted greatly by digital technology; the new technological instruments are developing both academic research and public mediation. Digital archaeology has been available at the museum for some time, but immersive technologies are recent introductions, which offer new experiences for museum visitors. Even though digital archaeology/virtual heritage have been studied for their technological virtues, the learning opportunities presented to the museum visitor has not yet been examined from a visitor’s perspective. In this dissertation, the visitor experience is the basis of analysis for determining how we can critically assess digital exhibitions using immersive technologies. This study examines if and how critical museology can be successfully applied to immersive digital displays; a detailed analysis of two case studies using VR (high immersion) and AR (low immersion) show that digital experiences are fully capable of communicating cultural content and that these multi-sensory technologies can successfully engage users in the creation of knowledge. The extent of sensory stimuli affecting the visitor is not accounted for in current critical museology, therefore the analysis of this study suggests a number of suggestions for future designs of digital displays using immersive technologies.
APA, Harvard, Vancouver, ISO, and other styles
10

Follin, Frances Marie. "Embodied visions : the op art work of Bridget Riley, 1961-65." Thesis, Birkbeck (University of London), 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.397025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Samanci, Ozge. "Embodying comics reinventing comics and animation for a digital performance /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29630.

Full text
Abstract:
Thesis (Ph.D)--Literature, Communication, and Culture, Georgia Institute of Technology, 2010.
Committee Chair: Mazalek, Alexandra; Committee Member: Bolter, Jay; Committee Member: Knospel, Kenneth; Committee Member: Murray, Janet; Committee Member: Winegarden, Claudia Rebola. Part of the SMARTech Electronic Thesis and Dissertation Collection.
APA, Harvard, Vancouver, ISO, and other styles
12

Dixon, Thomas Oliver. "An electrophysiological examination of visuomotor activity elicited by visual object affordances." Thesis, University of Plymouth, 2016. http://hdl.handle.net/10026.1/6758.

Full text
Abstract:
A wide literature of predominantly behavioural experiments that use Stimulus Response Compatibility (SRC) have suggested that visual action information such as object affordance yields rapid and concurrent activation of visual and motor brain areas, but has rarely provided direct evidence for this proposition. This thesis examines some of the key claims from the affordance literature by applying electrophysiological measures to well established SRC procedures to determine the verities of the behavioural claims of rapid and automatic visuomotor activation evoked by viewing affording objects. The temporal sensitivity offered by the Lateralised Readiness Potential and by visual evoked potentials P1 and N1 made ideal candidates to assess the behavioural claims of rapid visuomotor activation by seen objects by examining the timecourse of neural activation elicited by viewing affording objects under various conditions. The experimental work in this thesis broadly confirms the claims of the behavioural literature however it also found a series of novel results that are not predicted by the behavioural literature due to limitations in reaction time measures. For example, while different classes of affordance have been shown to exert the same behavioural facilitation, electrophysiological measures reveal very different patterns of cortical activation for grip-type and lateralised affordances. These novel findings question the applicability of the label ‘visuomotor’ to grip-type affordance processing and suggest considerable revision to models of affordance. This thesis also offers a series of novel and surprising insights into the ability to dissociate afforded motor activity from behavioural output, into the relationship between affordance and early visual evoked potentials, and into affordance in the absence of the intention to act. Overall, this thesis provides detailed suggestions for considerable changes to current models of the neural activity underpinning object affordance.
APA, Harvard, Vancouver, ISO, and other styles
13

Kenklies, Kai Malte. "Instructing workers through a head-worn Augmented Reality display and through a stationary screen on manual industrial assembly tasks : A comparison study." Thesis, Umeå universitet, Institutionen för informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172888.

Full text
Abstract:
It was analyzed if instructions on a head-worn Augmented Reality display (AR-HWD) are better for manual industrial assembly tasks than instructions on a stationary screen. A prototype was built which consisted of virtual instruction screens for two example assembly tasks. In a comparison study participants performed the tasks with instructions through an AR-HWD and alternatively through a stationary screen. Questionnaires, interviews and observation notes were used to evaluate the task performances and the user experience. The study revealed that the users were excited and enjoyed trying the technology. The perceived usefulness at the current state was diverse, but the users saw a huge potential in AR-HWDs for the future. The task accuracy with instructions on the AR-HWD was equally good as with instructions on the screen. AR-HWDs are found to be a better approach than a stationary screen, but technological limitations need to be overcome and workers need to train using the new technology to make its application efficient.
APA, Harvard, Vancouver, ISO, and other styles
14

Eufrasio, Espinosa Rafael Mauricio. "A visio-spatial life cycle energy model of building materials within a bioregional context : mapping the embodied energy of fired clay bricks in Cuitzeo, Mexico." Thesis, University of Sheffield, 2015. http://etheses.whiterose.ac.uk/13164/.

Full text
Abstract:
Despite the general acceptance of Life Cycle Assessments (LCA) to tackle environmental problems associated with the built environment, the literature shows that this complex assessment system presents limitations as a communication tool for decision-making process given that results are difficult to interpret. By trying to reduce the complexity of following multiple variables in LCA, a simplified and more straightforward process emerged to account for only energy using, Life cycle Energy Analysis (LCEA). However, LCEA has also inherited problems associated with LCA. Thus, discrepancies in calculation procedures, the lack of geographical considerations and ecological attitude and assumptions are criticized in both approaches. In this thesis, a Visio-Spatial Life Cycle Energy Model based on Geographical Information Systems (GIS) was developed in order to bridge the gap of LCEA as a communication tool by displaying embodied energy intensities in thematic maps taking into consideration bioregional principles in its analysis. A new dynamic Input-Output model, which efficiently simplifies the extraction process of energy paths from IO tables enabled the integration of hybrid energy coefficients to account for economic establishments dedicated to produce goods and services in the construction sector as illustrated in a bioregional case study area in Mexico. The full capability of the Visio-spatial energy model was then applied to a specific study of fired clay brick production within the bioregion. The results obtained by process analysis methods (PA) had a variation of 33.6% with respect to IO procedures, which can be considered acceptable in hybrid methods. Embodied energy figures expressed in thematic maps helped to reduce geographical assumptions and expand the sense of place in LCEA by visualizing patterns in manufacturing processes within the case study area.
APA, Harvard, Vancouver, ISO, and other styles
15

Hicks, Andrew Patrick. "Embodied vision sublimity and mystery in the fiction of Flannery O'Connor /." 2008. http://etd.utk.edu/August2008MastersTheses/HicksAndrewPatrick.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Anderson, Peter James. "Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents." Phd thesis, 2018. http://hdl.handle.net/1885/164018.

Full text
Abstract:
Each time we ask for an object, describe a scene, follow directions or read a document containing images or figures, we are converting information between visual and linguistic representations. Indeed, for many tasks it is essential to reason jointly over visual and linguistic information. People do this with ease, typically without even noticing. Intelligent systems that perform useful tasks in unstructured situations, and interact with people, will also require this ability. In this thesis, we focus on the joint modelling of visual and linguistic information using deep neural networks. We begin by considering the challenging problem of automatically describing the content of an image in natural language, i.e., image captioning. Although there is considerable interest in this task, progress is hindered by the difficulty of evaluating the generated captions. Our first contribution is a new automatic image caption evaluation metric that measures the quality of generated captions by analysing their semantic content. Extensive evaluations across a range of models and datasets indicate that our metric, dubbed SPICE, shows high correlation with human judgements. Armed with a more effective evaluation metric, we address the challenge of image captioning. Visual attention mechanisms have been widely adopted in image captioning and visual question answering (VQA) architectures to facilitate fine-grained visual processing. We extend existing approaches by proposing a bottom-up and top-down attention mechanism that enables attention to be focused at the level of objects and other salient image regions, which is the natural basis for attention to be considered. Applying this approach to image captioning we achieve state of the art results on the COCO test server. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge. Despite these advances, recurrent neural network (RNN) image captioning models typically do not generalise well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real applications. To address this problem, we propose constrained beam search, an approximate search algorithm that enforces constraints over RNN output sequences. Using this approach, we show that existing RNN captioning architectures can take advantage of side information such as object detector outputs and ground-truth image annotations at test time, without retraining. Our results significantly outperform previous approaches that incorporate the same information into the learning algorithm, achieving state of the art results for out-of-domain captioning on COCO. Last, to enable and encourage the application of vision and language methods to problems involving embodied agents, we present the Matterport3D Simulator, a large-scale interactive reinforcement learning environment constructed from densely-sampled panoramic RGB-D images of 90 real buildings. Using this simulator, which can in future support a range of embodied vision and language tasks, we collect the first benchmark dataset for visually-grounded natural language navigation in real buildings. We investigate the difficulty of this task, and particularly the difficulty of operating in unseen environments, using several baselines and a sequence-to-sequence model based on methods successfully applied to other vision and language tasks.
APA, Harvard, Vancouver, ISO, and other styles
17

Marjanovic, Matthew J. "Teaching an Old Robot New Tricks: Learning Novel Tasks via Interaction with People and Things." 2003. http://hdl.handle.net/1721.1/7108.

Full text
Abstract:
As AI has begun to reach out beyond its symbolic, objectivist roots into the embodied, experientialist realm, many projects are exploring different aspects of creating machines which interact with and respond to the world as humans do. Techniques for visual processing, object recognition, emotional response, gesture production and recognition, etc., are necessary components of a complete humanoid robot. However, most projects invariably concentrate on developing a few of these individual components, neglecting the issue of how all of these pieces would eventually fit together. The focus of the work in this dissertation is on creating a framework into which such specific competencies can be embedded, in a way that they can interact with each other and build layers of new functionality. To be of any practical value, such a framework must satisfy the real-world constraints of functioning in real-time with noisy sensors and actuators. The humanoid robot Cog provides an unapologetically adequate platform from which to take on such a challenge. This work makes three contributions to embodied AI. First, it offers a general-purpose architecture for developing behavior-based systems distributed over networks of PC's. Second, it provides a motor-control system that simulates several biological features which impact the development of motor behavior. Third, it develops a framework for a system which enables a robot to learn new behaviors via interacting with itself and the outside world. A few basic functional modules are built into this framework, enough to demonstrate the robot learning some very simple behaviors taught by a human trainer. A primary motivation for this project is the notion that it is practically impossible to build an "intelligent" machine unless it is designed partly to build itself. This work is a proof-of-concept of such an approach to integrating multiple perceptual and motor systems into a complete learning agent.
APA, Harvard, Vancouver, ISO, and other styles
18

Bullen, Leah Louise. "Virtual Nature: A practice-led enquiry into the relationship between painting and vernacular photography through the process of the painted monotype." Phd thesis, 2018. http://hdl.handle.net/1885/157027.

Full text
Abstract:
My practice-led research explores the relationship between painting and vernacular photography through the process of painted monotypes. This project has developed from an ongoing fascination with the visual qualities of photography and what happens when you translate photographs into other material forms, such as painting. The aim of this project is to develop images that interrogate how painted monotypes provide a distinctive interpretation of embodied experience through their visual, material and sensory qualities. Today, like no other time in history, photography is embedded in our daily lives through hand-held devices and the interface of the digital screen. My research examines how this embedded experience of the photographic relates to the processes and visual qualities of the painted monotype. The project is focused on three primary locations as subject matter: the aquarium, the botanical glasshouse and the habitat diorama. Through my research I explore how these sites function in optically and conceptually similar ways to the world of images, through shared notions of virtuality and indexicality. This research is informed by the work of Édouard Vuillard, Mamma Andersson, Peter Doig, David Hockney and the landscapes of Gustav Klimt. These painters interrogate the territory between painting and lens-based images in very specific ways, relating to visual perception, embodied vision, figure and ground relationships, and visual textures. In a theoretical context, my examination of the relationship between painting and photography has been motivated by the writings of Elizabeth Wynne Easton, Aaron Scharf, John Berger and Russell Ferguson; while Anne Friedberg, Rob Shields, Nicholas Mirzoeff, Geoffrey Batchen, Kris Paulsen and Johanna Love have been instrumental in determining a connection to the virtual and the index in my research.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography