Dissertations / Theses: 'Visual and semantic embedding'

1

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Full text

Abstract:

De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des modèles. Nous explorons dans cette thèse les espaces de représentations conjoints visuels et sémantiques. Nous proposons deux nouveaux modèles permettant de construire de tels espaces. Nous démontrons également leur capacité à localiser des concepts sémantiques dans le domaine visuel. Nous introduisons également une nouvelle méthode permettant d’apprendre une approximation différentiable des fonctions d’évaluation basée sur le rang
Nowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Qian. "Zero-shot visual recognition via latent embedding learning." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/zeroshot-visual-recognition-via-latent-embedding-learning(bec510af-6a53-4114-9407-75212e1a08e1).html.

Full text

Abstract:

Traditional supervised visual recognition methods require a great number of annotated examples for each concerned class. The collection and annotation of visual data (e.g., images and videos) could be laborious, tedious and time-consuming when the number of classes involved is very large. In addition, there are such situations where the test instances are from novel classes for which training examples are unavailable in the training stage. These issues can be addressed by zero-shot learning (ZSL), an emerging machine learning technique enabling the recognition of novel classes. The key issue in zero-shot visual recognition is the semantic gap between visual and semantic representations. We address this issue in this thesis from three different perspectives: visual representations, semantic representations and the learning models. We first propose a novel bidirectional latent embedding framework for zero-shot visual recognition. By learning a latent space from visual representations and labelling information of the training examples, instances of different classes can be mapped into the latent space with the preserving of both visual and semantic relatedness, hence the semantic gap can be bridged. We conduct experiments on both object and human action recognition benchmarks to validate the effectiveness of the proposed ZSL framework. Then we extend the ZSL to the multi-label scenarios for multi-label zero-shot human action recognition based on weakly annotated video data. We employ a long short term memory (LSTM) neural network to explore the multiple actions underlying the video data. A joint latent space is learned by two component models (i.e. the visual model and the semantic model) to bridge the semantic gap. The two component embedding models are trained alternately to optimize the ranking based objectives. Extensive experiments are carried out on two multi-label human action datasets to evaluate the proposed framework. Finally, we propose alternative semantic representations for human actions towards narrowing the semantic gap from the perspective of semantic representation. A simple yet effective solution based on the exploration of web data has been investigated to enhance the semantic representations for human actions. The novel semantic representations are proved to benefit the zero-shot human action recognition significantly compared to the traditional attributes and word vectors. In summary, we propose novel frameworks for zero-shot visual recognition towards narrowing and bridging the semantic gap, and achieve state-of-the-art performance in different settings on multiple benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

3

Ficapal, Vila Joan. "Anemone: a Visual Semantic Graph." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252810.

Full text

Abstract:

Semantic graphs have been used for optimizing various natural language processing tasks as well as augmenting search and information retrieval tasks. In most cases these semantic graphs have been constructed through supervised machine learning methodologies that depend on manually curated ontologies such as Wikipedia or similar. In this thesis, which consists of two parts, we explore in the first part the possibility to automatically populate a semantic graph from an ad hoc data set of 50 000 newspaper articles in a completely unsupervised manner. The utility of the visual representation of the resulting graph is tested on 14 human subjects performing basic information retrieval tasks on a subset of the articles. Our study shows that, for entity finding and document similarity our feature engineering is viable and the visual map produced by our artifact is visually useful. In the second part, we explore the possibility to identify entity relationships in an unsupervised fashion by employing abstractive deep learning methods for sentence reformulation. The reformulated sentence structures are qualitatively assessed with respect to grammatical correctness and meaningfulness as perceived by 14 test subjects. We negatively evaluate the outcomes of this second part as they have not been good enough to acquire any definitive conclusion but have instead opened new doors to explore.
Semantiska grafer har använts för att optimera olika processer för naturlig språkbehandling samt för att förbättra sökoch informationsinhämtningsuppgifter. I de flesta fall har sådana semantiska grafer konstruerats genom övervakade maskininlärningsmetoder som förutsätter manuellt kurerade ontologier såsom Wikipedia eller liknande. I denna uppsats, som består av två delar, undersöker vi i första delen möjligheten att automatiskt generera en semantisk graf från ett ad hoc dataset bestående av 50 000 tidningsartiklar på ett helt oövervakat sätt. Användbarheten hos den visuella representationen av den resulterande grafen testas på 14 försökspersoner som utför grundläggande informationshämtningsuppgifter på en delmängd av artiklarna. Vår studie visar att vår funktionalitet är lönsam för att hitta och dokumentera likhet med varandra, och den visuella kartan som produceras av vår artefakt är visuellt användbar. I den andra delen utforskar vi möjligheten att identifiera entitetsrelationer på ett oövervakat sätt genom att använda abstraktiva djupa inlärningsmetoder för meningsomformulering. De omformulerade meningarna utvärderas kvalitativt med avseende på grammatisk korrekthet och meningsfullhet såsom detta uppfattas av 14 testpersoner. Vi utvärderar negativt resultaten av denna andra del, eftersom de inte har varit tillräckligt bra för att få någon definitiv slutsats, men har istället öppnat nya dörrar för att utforska.

APA, Harvard, Vancouver, ISO, and other styles

4

Jakeš, Jan. "Visipedia - Embedding-driven Visual Feature Extraction and Learning." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236120.

Full text

Abstract:

Multidimenzionální indexování je účinným nástrojem pro zachycení podobností mezi objekty bez nutnosti jejich explicitní kategorizace. V posledních letech byla tato metoda hojně využívána pro anotaci objektů a tvořila významnou část publikací spojených s projektem Visipedia. Tato práce analyzuje možnosti strojového učení z multidimenzionálně indexovaných obrázků na základě jejich obrazových příznaků a přestavuje metody predikce multidimenzionálních souřadnic pro předem neznámé obrázky. Práce studuje příslušené algoritmy pro extrakci příznaků, analyzuje relevantní metody strojového účení a popisuje celý proces vývoje takového systému. Výsledný systém je pak otestován na dvou různých datasetech a provedené experimenty prezentují první výsledky pro úlohu svého druhu.

APA, Harvard, Vancouver, ISO, and other styles

5

Gao, Jizhou. "VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS." UKnowledge, 2013. http://uknowledge.uky.edu/cs_etds/14.

Full text

Abstract:

This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models.

APA, Harvard, Vancouver, ISO, and other styles

6

Liu, Jingen. "Learning Semantic Features for Visual Recognition." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3358.

Full text

Abstract:

Visual recognition (e.g., object, scene and action recognition) is an active area of research in computer vision due to its increasing number of real-world applications such as video (image) indexing and search, intelligent surveillance, human-machine interaction, robot navigation, etc. Effective modeling of the objects, scenes and actions is critical for visual recognition. Recently, bag of visual words (BoVW) representation, in which the image patches or video cuboids are quantized into visual words (i.e., mid-level features) based on their appearance similarity using clustering, has been widely and successfully explored. The advantages of this representation are: no explicit detection of objects or object parts and their tracking are required; the representation is somewhat tolerant to within-class deformations, and it is efficient for matching. However, the performance of the BoVW is sensitive to the size of the visual vocabulary. Therefore, computationally expensive cross-validation is needed to find the appropriate quantization granularity. This limitation is partially due to the fact that the visual words are not semantically meaningful. This limits the effectiveness and compactness of the representation. To overcome these shortcomings, in this thesis we present principled approach to learn a semantic vocabulary (i.e. high-level features) from a large amount of visual words (mid-level features). In this context, the thesis makes two major contributions. First, we have developed an algorithm to discover a compact yet discriminative semantic vocabulary. This vocabulary is obtained by grouping the visual-words based on their distribution in videos (images) into visual-word clusters. The mutual information (MI) be- tween the clusters and the videos (images) depicts the discriminative power of the semantic vocabulary, while the MI between visual-words and visual-word clusters measures the compactness of the vocabulary. We apply the information bottleneck (IB) algorithm to find the optimal number of visual-word clusters by finding the good tradeoff between compactness and discriminative power. We tested our proposed approach on the state-of-the-art KTH dataset, and obtained average accuracy of 94.2%. However, this approach performs one-side clustering, because only visual words are clustered regardless of which video they appear in. In order to leverage the co-occurrence of visual words and images, we have developed the co-clustering algorithm to simultaneously group the visual words and images. We tested our approach on the publicly available fifteen scene dataset and have obtained about 4% increase in the average accuracy compared to the one side clustering approaches. Second, instead of grouping the mid-level features, we first embed the features into a low-dimensional semantic space by manifold learning, and then perform the clustering. We apply Diffusion Maps (DM) to capture the local geometric structure of the mid-level feature space. The DM embedding is able to preserve the explicitly defined diffusion distance, which reflects the semantic similarity between any two features. Furthermore, the DM provides multi-scale analysis capability by adjusting the time steps in the Markov transition matrix. The experiments on KTH dataset show that DM can perform much better (about 3% to 6% improvement in average accuracy) than other manifold learning approaches and IB method. Above methods use only single type of features. In order to combine multiple heterogeneous features for visual recognition, we further propose the Fielder Embedding to capture the complicated semantic relationships between all entities (i.e., videos, images,heterogeneous features). The discovered relationships are then employed to further increase the recognition rate. We tested our approach on Weizmann dataset, and achieved about 17% 21% improvements in the average accuracy.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD

APA, Harvard, Vancouver, ISO, and other styles

7

Nguyen, Duc Minh Chau. "Affordance learning for visual-semantic perception." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2021. https://ro.ecu.edu.au/theses/2443.

Full text

Abstract:

Affordance Learning is linked to the study of interactions between robots and objects, including how robots perceive objects by scene understanding. This area has been popular in the Psychology, which has recently come to influence Computer Vision. In this way, Computer Vision has borrowed the concept of affordance from Psychology in order to develop Visual-Semantic recognition systems, and to develop the capabilities of robots to interact with objects, in particular. However, existing systems of Affordance Learning are still limited to detecting and segmenting object affordances, which is called Affordance Segmentation. Further, these systems are not designed to develop specific abilities to reason about affordances. For example, a Visual-Semantic system, for captioning a scene, can extract information from an image, such as “a person holds a chocolate bar and eats it”, but does not highlight the affordances: “hold” and “eat”. Indeed, these affordances and others commonly appear within all aspects of life, since affordances usually connect to actions (from a linguistic view, affordances are generally known as verbs in sentences). Due to the above mentioned limitations, this thesis aims to develop systems of Affordance Learning for Visual-Semantic Perception. These systems can be built using Deep Learning, which has been empirically shown to be efficient for performing Computer Vision tasks. There are two goals of the thesis: (1) study what are the key factors that contribute to the performance of Affordance Segmentation and (2) reason about affordances (Affordance Reasoning) based on parts of objects for Visual-Semantic Perception. In terms of the first goal, the thesis mainly investigates the feature extraction module as this is one of the earliest steps in learning to segment affordances. The thesis finds that the quality of feature extraction from images plays a vital role in improved performance of Affordance Segmentation. With regard to the second goal, the thesis infers affordances from object parts to reason about part-affordance relationships. Based on this approach, the thesis devises an Object Affordance Reasoning Network that can learn to construct relationships between affordances and object parts. As a result, reasoning about affordance becomes achievable in the generation of scene graphs of affordances and object parts. Empirical results, obtained from extensive experiments, show the potential of the system (that the thesis developed) towards Affordance Reasoning from Scene Graph Generation.

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Yifu. "Deep learning for visual semantic segmentation." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS200.

Full text

Abstract:

Dans cette thèse, nous nous intéressons à la segmentation sémantique visuelle, une des tâches de haut niveau qui ouvre la voie à une compréhension complète des scènes. Plus précisément, elle requiert une compréhension sémantique au niveau du pixel. Avec le succès de l’apprentissage approfondi de ces dernières années, les problèmes de segmentation sémantique sont abordés en utilisant des architectures profondes. Dans la première partie, nous nous concentrons sur la construction d’une fonction de coût plus appropriée pour la segmentation sémantique. En particulier, nous définissons une nouvelle fonction de coût basé sur un réseau de neurone de détection de contour sémantique. Cette fonction de coût impose des prédictions au niveau du pixel cohérentes avec les informa- tions de contour sémantique de la vérité terrain, et conduit donc à des résultats de segmentation mieux délimités. Dans la deuxième partie, nous abordons une autre question importante, à savoir l’apprentissage de modèle de segmentation avec peu de données annotées. Pour cela, nous proposons une nouvelle méthode d’attribution qui identifie les régions les plus importantes dans une image considérée par les réseaux de classification. Nous intégrons ensuite notre méthode d’attribution dans un contexte de segmentation faiblement supervisé. Les modèles de segmentation sémantique sont ainsi entraînés avec des données étiquetées au niveau de l’image uniquement, facile à collecter en grande quantité. Tous les modèles proposés dans cette thèse sont évalués expérimentalement de manière approfondie sur plusieurs ensembles de données et les résultats sont compétitifs avec ceux de la littérature
In this thesis, we are interested in Visual Semantic Segmentation, one of the high-level task that paves the way towards complete scene understanding. Specifically, it requires a semantic understanding at the pixel level. With the success of deep learning in recent years, semantic segmentation problems are being tackled using deep architectures. In the first part, we focus on the construction of a more appropriate loss function for semantic segmentation. More precisely, we define a novel loss function by employing a semantic edge detection network. This loss imposes pixel-level predictions to be consistent with the ground truth semantic edge information, and thus leads to better shaped segmentation results. In the second part, we address another important issue, namely, alleviating the need for training segmentation models with large amounts of fully annotated data. We propose a novel attribution method that identifies the most significant regions in an image considered by classification networks. We then integrate our attribution method into a weakly supervised segmentation framework. The semantic segmentation models can thus be trained with only image-level labeled data, which can be easily collected in large quantities. All models proposed in this thesis are thoroughly experimentally evaluated on multiple datasets and the results are competitive with the literature

APA, Harvard, Vancouver, ISO, and other styles

9

Fan, Wei. "Image super-resolution using neighbor embedding over visual primitive manifolds /." View abstract or full-text, 2007. http://library.ust.hk/cgi/db/thesis.pl?CSED%202007%20FAN.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Hanwell, David. "Weakly supervised learning of visual semantic attributes." Thesis, University of Bristol, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687063.

Full text

Abstract:

There are at present many billions of images on the internet, only a fraction of which are labelled according to their semantic content. To automatically provide labels for the rest, models of visual semantic concepts must be created. Such models are traditionally trained using images which have been manually acquired, segmented, and labelled. In this thesis, we submit that such models can be learned automatically using those few images which have already been labelled, either directly by their creators, or indirectly by their associated text. Such imagery can be acquired easily, cheaply, and in large quantities, using web image searches. Though there has been some work towards learning from such weakly labelled data, all methods yet proposed require more than a minimum of human effort. In this thesis we put forth a number of methods for reliably learning models of visual semantic attributes using only the raw, unadulterated results of web image searches. The proposed methods do not require any human input beyond specifying the names of the attributes to be learned. We also present means of identifying and localising learned attributes in challenging, real-world images. Our methods are of a probabilistic nature, and make extensive use of multivariate Gaussian mixture models to represent both data and learned models. The contributions of this thesis also include several tools for acquiring and comparing these distributions, including a novel clustering algorithm. We apply our weakly supervised learning methods to the training of models of a variety of visual semantic attributes including colour and pattern terms. Detection and localization of the learned attributes in unseen realworld images is demonstrated, and both quantitative and qualitative results are presented. We compare against other work, including both general methods of weakly supervised learning, and more attribute specific methods. We apply our learning methods to the training sets of previous works, and assess their performance on the test sets used by other authors. Our results show that our methods give better results than the current state of the art.

APA, Harvard, Vancouver, ISO, and other styles

11

Rabovsky, Milena. "Semantic richness effects in visual word processing." Doctoral thesis, Humboldt-Universität zu Berlin, Lebenswissenschaftliche Fakultät, 2014. http://dx.doi.org/10.18452/17073.

Full text

Abstract:

Lesen zielt darauf ab, Bedeutung aus geschriebenem Text zu extrahieren. Interessanterweise unterscheiden sich Wörter beträchtlich hinsichtlich der Menge mit ihnen assoziierter Bedeutung, und es wurde kürzlich gezeigt, dass eine hohe Bedeutungshaltigkeit lexikalische und semantische Aufgaben erleichtert. Die vorliegende Dissertation kombiniert ereigniskorrelierte Potentiale (EKPs) und konnektionistische Modellierung, um einige offene Fragen zur Rolle der Bedeutungshaltigkeit bei der Wortverarbeitung anzugehen. Hierbei wurden EKPs verwendet, um den Zeitverlauf unabhängiger Einflüsse der Anzahl semantischer Merkmale und Assoziationen beim Wortlesen zu bestimmen sowie Einflüsse von Bedeutungshaltigkeit auf implizites Wortlernen zu untersuchen. Um die zugrundeliegenden Mechanismen besser zu verstehen, wurden die Ergebnisse anschließend mittels eines semantischen Netzwerk-Modells simuliert. Es zeigten sich keine Einflüsse der Anzahl der Assoziationen, aber eine schnelle Aktivierung semantischer Merkmale, die das EKP bereits ab 190 ms beeinflussten - nur 20 bis 30 ms nach und zeitlich überlappend mit der Aktivierung orthographischer Repräsentationen, die durch N1-Lexikalitätseffekte angezeigt wurden. Im weiteren Verlauf ging eine hohe Merkmalsanzahl mit größeren N400-Amplituden einher. Zudem verstärkten semantische Merkmale Wiederholungseinflüsse auf die Akkuratheit lexikalischer Entscheidungen und N400-Amplituden, was einen ersten Hinweis auf Einflüsse von Bedeutungshaltigkeit auf implizites Wortlernen darstellt. Diese Ergebnisse stehen im Einklang mit merkmalsbasierten semantischen Netzwerk-Modellen. Simulationen legen nahe, dass semantische Aktivierung lexikalische Entscheidungen erleichtert, während Netzwerk-Fehler in engem Zusammenhang mit N400-Amplituden stehen. Da Netzwerk-Fehler psychologisch als implizite Vorhersagefehler interpretiert werden, deuten diese Ergebnisse darauf hin, dass N400-Amplituden implizite Vorhersagefehler im semantischen System widerspiegeln.
Language ultimately aims to convey meaning. Importantly, the amount of associated semantic information varies considerably between words. Recent evidence suggests that the richness of semantic representations facilitates performance in lexical and semantic tasks, but much remains to be learned about semantic richness effects. The present dissertation combined event-related brain potentials (ERPs) and connectionist modeling to address several unresolved issues concerning the role of semantic richness in word processing. Specifically, ERPs were employed to investigate the time course of independent influences of the number of semantic features and associates during word reading (study 1) and influences of semantic richness on implicit word learning (study 2). Aiming at advancing a mechanistic understanding of the obtained results, both studies were subsequently simulated using a network model of semantic cognition (study 3). Results showed no influences of the number of associates, but fast access to semantic features, with influences of feature-based semantic richness starting at about 190 ms - a mere 20 to 30 ms after and temporally overlapping with the activation of orthographic representations as reflected by N1 lexicality effects. Later on, a high number of semantic features induced larger N400 amplitudes. Furthermore, the number of semantic features enhanced repetition priming effects on lexical decision accuracy and N400 amplitudes, providing initial evidence for influences of semantic richness on implicit word learning. These results are in line with feature-based network models of semantic cognition. Simulations with such a model suggest that semantic activation can facilitate lexical decisions, while network error closely corresponds to N400 amplitudes. In psychological terms, network error has been conceptualized as implicit prediction error. Thus, these results are taken to suggest that N400 amplitudes reflect implicit prediction error in semantic memory.

APA, Harvard, Vancouver, ISO, and other styles

12

MALIK, WAQAS. "Visual Semantic Web.Ontology based E-learning management system." Thesis, Blekinge Tekniska Högskola, Avdelningen för för interaktion och systemdesign, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4326.

Full text

Abstract:

E-Learning is a process in which we use the electronic medium to access the defined set of applications and processes. With its increasing identification and recognition in academic and corporate world, a unique model or framework is required. E-Learning is a critical support mechanism for educational institutions to grow the performance of their students, teachers, as well as useful for organizations to enhance the performance of their employees. Semantic web represents a potential technology for realizing e-Learning requirements Research works in the field of e-Learning are represented by a wide range of applications, ranged from virtual classrooms to remote courses or distance learning. However, studies show that still it demands more effective approach. Ontology is a specification of conceptualization; the object, process, and other entities that are involved in making of the framework for E-learning. This thesis presents the ontology for E-learning process, such as course syllabus, teaching methods, learning activities and learning styles
E-lärande är en process där vi använder elektroniska medel för att få tillgång till de angivna program och processer. Med den ökande identifiering och erkännande i akademiskt och företagens värld, en unik modell eller ram är nödvändig. E-lärande är en viktig mekanism för utbildningsväsendet att växa fullgörandet av sina studenter, lärare, samt till nytta för organisationer att öka resultatet för sina anställda. Semantiska webben utgör en potentiell teknik för att förverkliga e-Learning krav Forskning fungerar inom e-lärande representeras av ett brett spektrum av applikationer, allt från virtuella klassrum till avlägsna kurser och distansutbildning. Men undersökningar visar att fortfarande det kräver mer effektiv metod. Ontology är en specifikation av conceptualization; objektet, och andra enheter som är inblandade i skapandet av en ram för e-lärande. Denna avhandling presenterar ontologi för e-lärande, såsom kursplan, undervisning, lärande och lärstilar
004531844124

APA, Harvard, Vancouver, ISO, and other styles

13

Schroff, Florian. "Semantic Image Segmentation and Web-Supervised Visual Learning." Thesis, University of Oxford, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.504578.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Telling, Anna L. "Semantic and phonological context effects in visual search." Thesis, University of Birmingham, 2008. http://etheses.bham.ac.uk//id/eprint/182/.

Full text

Abstract:

Visual search requires participants to search for a pre-specified target amongst a number of distractors. According to theories of visual search, attention is directed towards the target through a combination of stimulus-driven (bottom-up) and goal-driven (top-down) means. For example, when searching for a red car, top-down attention can prepare the visual system to prioritise items with matching visual properties to the target, e.g., red objects. Theories of visual search support guidance according to visual properties, including the Guided Search model (Wolfe, 1994) and Attentional Engagement Theory (AET: Duncan & Humphreys, 1989). However, whether or not attention can be guided according to non-visual properties of the stimulus, such as semantic and name information, remains controversial (Wolfe & Horowitz, 1994). This thesis studied search for a target (e.g., baseball-bat) in the presence of semantically related (e.g., racquet), phonologically identical (homophones, e.g., animal-bat) and phonologically related distractors (e.g., bag). Participants’ reaction times (RTs), error rates, eye movements and event-related potentials (ERPs) were monitored, and performance compared between young, older adult and brain-damaged individuals. Chapters 2 to 4 report semantic interference for all participant groups; Chapter 5 reports homophone interference in young adults and Chapter 6 reports no interference of phonologically related distractors in search for the target by young adults. The results support search being guided according to semantic and whole-name information about the target only. The mechanisms involved in this interference and contributions of these findings to the theories of visual search will be discussed.

APA, Harvard, Vancouver, ISO, and other styles

15

Ye, Meng. "VISUAL AND SEMANTIC KNOWLEDGE TRANSFER FOR NOVEL TASKS." Diss., Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/583037.

Full text

Abstract:

Computer and Information Science
Ph.D.
Data is a critical component in a supervised machine learning system. Many successful applications of learning systems on various tasks are based on a large amount of labeled data. For example, deep convolutional neural networks have surpassed human performance on ImageNet classification, which consists of millions of labeled images. However, one challenge in conventional supervised learning systems is their generalization ability. Once a model is trained on a specific dataset, it can only perform the task on those \emph{seen} classes and cannot be used for novel \emph{unseen} classes. In order to make the model work on new classes, one has to collect and label new data and then re-train the model. However, collecting data and labeling them is labor-intensive and costly, in some cases, it is even impossible. Also, there is an enormous amount of different tasks in the real world. It is not applicable to create a dataset for each of them. These problems raise the need for Transfer Learning, which is aimed at using data from the \emph{source} domain to improve the performance of a model on the \emph{target} domain, and these two domains have different data or different tasks. One specific case of transfer learning is Zero-Shot Learning. It deals with the situation where \emph{source} domain and \emph{target} domain have the same data distribution but do not have the same set of classes. For example, a model is given animal images of `cat' and `dog' for training and will be tested on classifying 'tiger' and 'wolf' images, which it has never seen. Different from conventional supervised learning, Zero-Shot Learning does not require training data in the \emph{target} domain to perform classification. This property gives ZSL the potential to be broadly applied in various applications where a system is expected to tackle unexpected situations. In this dissertation, we develop algorithms that can help a model effectively transfer visual and semantic knowledge learned from \emph{source} task to \emph{target} task. More specifically, first we develop a model that learns a uniform visual representation of semantic attributes, which help alleviate the domain shift problem in Zero-Shot Learning. Second, we develop an ensemble network architecture with a progressive training scheme, which transfers \emph{source} domain knowledge to the \emph{target} domain in an end-to-end manner. Lastly, we move a step beyond ZSL and explore Label-less Classification, which transfers knowledge from pre-trained object detectors into scene classification tasks. Our label-less classification takes advantage of word embeddings trained from unorganized online text, thus eliminating the need for expert-defined semantic attributes for each class. Through comprehensive experiments, we show that the proposed methods can effectively transfer visual and semantic knowledge between tasks, and achieve state-of-the-art performances on standard datasets.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

16

Kaewtrakulpong, Pakorn. "Adaptive probabilistic models for learning semantic patterns." Thesis, Brunel University, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.269084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Wang, Run Fen. "Semantic Text Matching Using Convolutional Neural Networks." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-362134.

Full text

Abstract:

Semantic text matching is a fundamental task for many applications in NaturalLanguage Processing (NLP). Traditional methods using term frequencyinversedocument frequency (TF-IDF) to match exact words in documentshave one strong drawback which is TF-IDF is unable to capture semanticrelations between closely-related words which will lead to a disappointingmatching result. Neural networks have recently been used for various applicationsin NLP, and achieved state-of-the-art performances on many tasks.Recurrent Neural Networks (RNN) have been tested on text classificationand text matching, but it did not gain any remarkable results, which is dueto RNNs working more effectively on texts with a short length, but longdocuments. In this paper, Convolutional Neural Networks (CNN) will beapplied to match texts in a semantic aspect. It uses word embedding representationsof two texts as inputs to the CNN construction to extract thesemantic features between the two texts and give a score as the output ofhow certain the CNN model is that they match. The results show that aftersome tuning of the parameters the CNN model could produce accuracy,prediction, recall and F1-scores all over 80%. This is a great improvementover the previous TF-IDF results and further improvements could be madeby using dynamic word vectors, better pre-processing of the data, generatelarger and more feature rich data sets and further tuning of the parameters.

APA, Harvard, Vancouver, ISO, and other styles

18

Binford, Adam Quarles. "A Bidirectional Pipeline for Semantic Interaction in Visual Analytics." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/72981.

Full text

Abstract:

Semantic interaction in visual data analytics allows users to indirectly adjust model parameters by directly manipulating the output of the models. This is accomplished using an underlying bidirectional pipeline that first uses statistical models to visualize the raw data. When a user interacts with the visualization, the interaction is interpreted into updates in the model parameters automatically, giving the users immediate feedback on each interaction. These interpreted interactions eliminate the need for a deep understanding of the underlying statistical models. However, the development of such tools is necessarily complex due to their interactive nature. Furthermore, each tool defines its own unique pipeline to suit its needs, which leads to difficulty experimenting with different types of data, models, interaction techniques, and visual encodings. To address this issue, we present a flexible multi-model bidirectional pipeline for prototyping visual analytics tools that rely on semantic interaction. The pipeline has plug-and-play functionality, enabling quick alterations to the type of data being visualized, how models transform the data, and interaction methods. In so doing, the pipeline enforces a separation between the data pipeline and the visualization, preventing the two from becoming codependent. To show the flexibility of the pipeline, we demonstrate a new visual analytics tool and several distinct variations, each of which were quickly and easily implemented with slight changes to the pipeline or client.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

19

Ma, Chao. "Visual analytic technique and system of spatiotemporal-semantic events." Kent State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=kent1594852646584126.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Choudhary, Rishabh R. "Construction and Visualization of Semantic Spaces for Domain-Specific Text Corpora." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627666092811419.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Durak, Nurcan. "Semantic Video Modeling And Retrieval With Visual, Auditory, Textual Sources." Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/12605438/index.pdf.

Full text

Abstract:

The studies on content-based video indexing and retrieval aim at accessing video content from different aspects more efficiently and effectively. Most of the studies have concentrated on the visual component of video content in modeling and retrieving the video content. Beside visual component, much valuable information is also carried in other media components, such as superimposed text, closed captions, audio, and speech that accompany the pictorial component. In this study, semantic content of video is modeled using visual, auditory, and textual components. In the visual domain, visual events, visual objects, and spatial characteristics of visual objects are extracted. In the auditory domain, auditory events and auditory objects are extracted. In textual domain, speech transcripts and visible texts are considered. With our proposed model, users can access video content from different aspects and get desired information more quickly. Beside multimodality, our model is constituted on semantic hierarchies that enable querying the video content at different semantic levels. There are sequence-scene hierarchies in visual domain, background-foreground hierarchies in auditory domain, and subject hierarchies in speech domain. Presented model has been implemented and multimodal content queries, hierarchical queries, fuzzy spatial queries, fuzzy regional queries, fuzzy spatio-temporal queries, and temporal queries have been applied on video content successfully.

APA, Harvard, Vancouver, ISO, and other styles

22

Barb, Adrian S. "Knowledge representation and exchange of visual patterns using semantic abstractions." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/6674.

Full text

Abstract:

Thesis (Ph. D.)--University of Missouri-Columbia, 2008.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on July 21, 2009) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

23

Zinovieff, Fiona M. "Interaction of lexical-semantic and imagery representations." Thesis, Bangor University, 2000. https://research.bangor.ac.uk/portal/en/theses/interaction-of-lexicalsemantic-and-imagery-representations(75423ae6-238f-4577-a935-e08dc4219c9c).html.

Full text

Abstract:

We report a series of experiments using a new methodology to investigate the relationships between visual and verbal representations and the process of acquiring new semantic associations. Transfer of associative information between stimulus modalities was investigated by training paired associations between novel pictures and novel words. Our results showed that the transfer of associations is a symbolic process, occurring only when participants are aware of the correspondence between the visual and the verbal items afforded by the name relations. We also obtained evidence to suggest that symbolic associations develop more readily from picture associations than from word associations. We argue that this is evidence that semantic knowledge is grounded in perceptual experience. Our most striking result, replicated across experiments, is that transfer of associations between modalities only occurs when subjects have specific conscious awareness about the relationships among associations. This should have implications for cognitive theories of symbolic representation. The methods we developed to expose this phenomenon can be extended to examine those implications more thoroughly. We discuss some of these implications in the terms of competing and complementary cognitive and behavioural theories relating representation to perception and symbols. Dual coding models fit our modality-transfer results more readily than single semantic store models, but neither is well suited for interpreting our awareness results, or for iv discussing perceptual grounding of representation. The models of Deacon and Barsalou both focus on systems of distributed representations grounded in perception; the role of awareness in symbol acquisition in their models is discussed and contrasted with theories from the stimulus equivalence tradition of behaviourist research. From these considerations, we argue that implicit associations underpin symbolic associations, but that semantic knowledge is conscious knowledge about the patterns of association which link representations.

APA, Harvard, Vancouver, ISO, and other styles

24

Bukva, Emir. "From the Wall to the Web: A Microformat for Visual Art." [Kent, Ohio] : Kent State University, 2009. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=kent1259115325.

Full text

Abstract:

Thesis (M.F.A.)--Kent State University, 2009.
Title from PDF t.p. (viewed April 22, 2010). Advisor: Sanda Katila. Keywords: microformats; semantic web; labels. Includes bibliographical references (p. 52-53).

APA, Harvard, Vancouver, ISO, and other styles

25

Mojica, Andrew Joseph. "Can Semantic Activation Affect Figure Assignment?" Diss., The University of Arizona, 2014. http://hdl.handle.net/10150/321450.

Full text

Abstract:

Figure assignment entails competition between object properties on opposite sides of borders. The figure is perceived on the side of the border that wins the competition. Ample evidence indicates that configural familiarity is among the competing object properties. We investigated whether priming the semantics of a familiar object suggested along one side of a border can increase its likelihood of winning the competition. To prime the semantics, we presented brief masked exposures of object names before brief masked exposures of displays where a portion of a familiar object was suggested on one side of a central border separating two equal-area, black-and-white regions. Participants reported whether the figure lay on the left or right side of the central border and were unaware of the presence of the word prime. These experimental primes named either the Same Object (SO) or a Different Object (DO) as the familiar object suggested in the display. In the DO condition, the word named an object either in the Same Category (DO-SC) or a Different Category (DO-DC) as the familiar object suggested in the display, where superordinate category was defined as natural versus artificial objects. We also used non-words as control primes. We hypothesized that, if semantic activation influences figure assignment, participants in the SO and DO-SC conditions should be more likely than participants in the DO-DC condition to perceive the figure on the side where the familiar object lies following experimental primes than control primes. We did not observe differences between experimental and control prime in any condition. However, we did obtain a Prime Context Effect, in that participants were more likely to perceive the figure on the familiar side of the border in the SO and DO-SC conditions than in the DO-DC condition. The Prime Context Effect shows that participants discerned the relationship between the masked word prime and the semantics of the familiar object suggested in the display, and this led them to change their strategy on both experimental and control trials. We also found that behavior changed over the course of the experiment: Participants in the DO-DC condition perceived the figure on the familiar side of the border more often in the second half of the experiment, on both experimental and control trials. This pattern suggests that over the course of the experiment, they learned to rely more on information from the display than from the prime, perhaps by restricting their attention to the time when the figure-ground display appeared. Participants in the DO-SC condition perceived the figure on the familiar side of the border more often on experimental trials in the second half of the experiment, whereas their performance on control trials did not differ in the first and second half. We hypothesize that participants in the DO-SC condition learned to match the superordinate semantics of the experimental prime and the display, leading to semantic priming. Taken together, these results show that (1) participants can quickly learn the relationship between experimental primes and target displays and can change their strategy accordingly, and (2) semantic activation can affect figure assignment.

APA, Harvard, Vancouver, ISO, and other styles

26

Stigeborn, Olivia. "Text ranking based on semantic meaning of sentences." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300442.

Full text

Abstract:

Finding a suitable candidate to client match is an important part of consultant companies work. It takes a lot of time and effort for the recruiters at the company to read possibly hundreds of resumes to find a suitable candidate. Natural language processing is capable of performing a ranking task where the goal is to rank the resumes with the most suitable candidates ranked the highest. This ensures that the recruiters are only required to look at the top ranked resumes and can quickly get candidates out in the field. Former research has used methods that count specific keywords in resumes and can make decisions on whether a candidate has an experience or not. The main goal of this thesis is to use the semantic meaning of the text in the resumes to get a deeper understanding of a candidate’s level of experience. It also evaluates if the model is possible to run on-device and if the database can contain a mix of English and Swedish resumes. An algorithm was created that uses the word embedding model DistilRoBERTa that is capable of capturing the semantic meaning of text. The algorithm was evaluated by generating job descriptions from the resumes by creating a summary of each resume. The run time, memory usage and the ranking the wanted candidate achieved was documented and used to analyze the results. When the candidate who was used to generate the job description is ranked in the top 10 the classification was considered to be correct. The accuracy was calculated using this method and an accuracy of 68.3% was achieved. The results show that the algorithm is capable of ranking resumes. The algorithm is able to rank both Swedish and English resumes with an accuracy of 67.7% for Swedish resumes and 74.7% for English. The run time was fast enough at an average of 578 ms but the memory usage was too large to make it possible to use the algorithm on-device. In conclusion the semantic meaning of resumes can be used to rank resumes and possible future work would be to combine this method with a method that counts keywords to research if the accuracy would increase.
Att hitta en lämplig kandidat till kundmatchning är en viktig del av ett konsultföretags arbete. Det tar mycket tid och ansträngning för rekryterare på företaget att läsa eventuellt hundratals CV:n för att hitta en lämplig kandidat. Det finns språkteknologiska metoder för att rangordna CV:n med de mest lämpliga kandidaterna rankade högst. Detta säkerställer att rekryterare endast behöver titta på de topprankade CV:erna och snabbt kan få kandidater ut i fältet. Tidigare forskning har använt metoder som räknar specifika nyckelord i ett CV och är kapabla att avgöra om en kandidat har specifika erfarenheter. Huvudmålet med denna avhandling är att använda den semantiska innebörden av texten iCV:n för att få en djupare förståelse för en kandidats erfarenhetsnivå. Den utvärderar också om modellen kan köras på mobila enheter och om algoritmen kan rangordna CV:n oberoende av om CV:erna är på svenska eller engelska. En algoritm skapades som använder ordinbäddningsmodellen DistilRoBERTa som är kapabel att fånga textens semantiska betydelse. Algoritmen utvärderades genom att generera jobbeskrivningar från CV:n genom att skapa en sammanfattning av varje CV. Körtiden, minnesanvändningen och rankningen som den önskade kandidaten fick dokumenterades och användes för att analysera resultatet. När den kandidat som användes för att generera jobbeskrivningen rankades i topp 10 ansågs klassificeringen vara korrekt. Noggrannheten beräknades med denna metod och en noggrannhet på 68,3 % uppnåddes. Resultaten visar att algoritmen kan rangordna CV:n. Algoritmen kan rangordna både svenska och engelska CV:n med en noggrannhet på 67,7 % för svenska och 74,7 % för engelska. Körtiden var i genomsnitt 578 ms vilket skulle möjliggöra att algoritmen kan köras på mobila enheter men minnesanvändningen var för stor. Sammanfattningsvis kan den semantiska betydelsen av CV:n användas för att rangordna CV:n och ett eventuellt framtida arbete är att kombinera denna metod med en metod som räknar nyckelord för att undersöka hur noggrannheten skulle påverkas.

APA, Harvard, Vancouver, ISO, and other styles

27

Bradel, Lauren C. "Multi-Model Semantic Interaction for Scalable Text Analytics." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/52785.

Full text

Abstract:

Learning from text data often involves a loop of tasks that iterate between foraging for information and synthesizing it in incremental hypotheses. Past research has shown the advantages of using spatial workspaces as a means for synthesizing information through externalizing hypotheses and creating spatial schemas. However, spatializing the entirety of datasets becomes prohibitive as the number of documents available to the analysts grows, particularly when only a small subset are relevant to the tasks at hand. To address this issue, we developed the multi-model semantic interaction (MSI) technique, which leverages user interactions to aid in the display layout (as was seen in previous semantic interaction work), forage for new, relevant documents as implied by the interactions, and then place them in context of the user's existing spatial layout. This results in the ability for the user to conduct both implicit queries and traditional explicit searches. A comparative user study of StarSPIRE discovered that while adding implicit querying did not impact the quality of the foraging, it enabled users to 1) synthesize more information than users with only explicit querying, 2) externalize more hypotheses, 3) complete more synthesis-related semantic interactions. Also, 18% of relevant documents were found by implicitly generated queries when given the option. StarSPIRE has also been integrated with web-based search engines, allowing users to work across vastly different levels of data scale to complete exploratory data analysis tasks (e.g. literature review, investigative journalism). The core contribution of this work is multi-model semantic interaction (MSI) for usable big data analytics. This work has expanded the understanding of how user interactions can be interpreted and mapped to underlying models to steer multiple algorithms simultaneously and at varying levels of data scale. This is represented in an extendable multi-model semantic interaction pipeline. The lessons learned from this dissertation work can be applied to other visual analytics systems, promoting direct manipulation of the data in context of the visualization rather than tweaking algorithmic parameters and creating usable and intuitive interfaces for big data analytics.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

28

Zhang, Qianni. "Multi-feature Space Optimisation and Semantic Infer3ence for Visual Information Retrieval." Thesis, Queen Mary, University of London, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.498157.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Endert, Alex. "Semantic Interaction for Visual Analytics: Inferring Analytical Reasoning for Model Steering." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28265.

Full text

Abstract:

User interaction in visual analytic systems is critical to enabling visual data exploration. Through interacting with visualizations, users engage in sensemaking, a process of developing and understanding relationships within datasets through foraging and synthesis. For example, two-dimensional layouts of high-dimensional data can be generated by dimension reduction models, and provide users with an overview of the relationships between information. However, exploring such spatializations can require expertise with the internal mechanisms and parameters of these models. The core contribution of this work is semantic interaction, capable of steering such models without requiring expertise in dimension reduction models, but instead leveraging the domain expertise of the user. Semantic interaction infers the analytical reasoning of the user with model updates, steering the dimension reduction model for visual data exploration. As such, it is an approach to user interaction that leverages interactions designed for synthesis, and couples them with the underlying mathematical model to provide computational support for foraging. As a result, semantic interaction performs incremental model learning to enable synergy between the user's insights and the mathematical model. The contributions of this work are organized by providing a description of the principles of semantic interaction, providing design guidelines through the development of a visual analytic prototype, ForceSPIRE, and the evaluation of the impact of semantic interaction on the analytic process. The positive results of semantic interaction open a fundamentally new design space for designing user interactions in visual analytic systems. This research was funded in part by the National Science Foundation, CCF-0937071 and CCF-0937133, the Institute for Critical Technology and Applied Science at Virginia Tech, and the National Geospatial-Intelligence Agency contract #HMI1582-05-1-2001.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

30

Wenskovitch, Jr John Edward. "Dimension Reduction and Clustering for Interactive Visual Analytics." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/96599.

Full text

Abstract:

When exploring large, high-dimensional datasets, analysts often utilize two techniques for reducing the data to make exploration more tractable. The first technique, dimension reduction, reduces the high-dimensional dataset into a low-dimensional space while preserving high-dimensional structures. The second, clustering, groups similar observations while simultaneously separating dissimilar observations. Existing work presents a number of systems and approaches that utilize these techniques; however, these techniques can cooperate or conflict in unexpected ways. The core contribution of this work is the systematic examination of the design space at the intersection of dimension reduction and clustering when building intelligent, interactive tools in visual analytics. I survey existing techniques for dimension reduction and clustering algorithms in visual analytics tools, and I explore the design space for creating projections and interactions that include dimension reduction and clustering algorithms in the same visual interface. Further, I implement and evaluate three prototype tools that implement specific points within this design space. Finally, I run a cognitive study to understand how analysts perform dimension reduction (spatialization) and clustering (grouping) operations. Contributions of this work include surveys of existing techniques, three interactive tools and usage cases demonstrating their utility, design decisions for implementing future tools, and a presentation of complex human organizational behaviors.
Doctor of Philosophy
When an analyst is exploring a dataset, they seek to gain insight from the data. With data sets growing larger, analysts require techniques to help them reduce the size of the data while still maintaining its meaning. Two commonly-utilized techniques are dimension reduction and clustering. Dimension reduction seeks to eliminate unnecessary features from the data, reducing the number of columns to a smaller number. Clustering seeks to group similar objects together, reducing the number of rows to a smaller number. The contribution of this work is to explore how dimension reduction and clustering are currently being used in interactive visual analytics systems, as well as to explore how they could be used to address challenges faced by analysts in the future. To do so, I survey existing techniques and explore the design space for creating visualizations that incorporate both types of computations. I look at methods by which an analyst could interact with those projections in other to communicate their interests to the system, thereby producing visualizations that better match the needs of the analyst. I develop and evaluate three tools that incorporate both dimension reduction and clustering in separate computational pipelines. Finally, I conduct a cognitive study to better understand how users think about these operations, in order to create guidelines for better systems in the future.

APA, Harvard, Vancouver, ISO, and other styles

31

Castronovo, Julie. "Numbers in the dark : early visual deprivation and the semantic numerical representation." Université catholique de Louvain, 2007. http://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-04022007-210758/.

Full text

Abstract:

Study of the impact of early visual deprivation and its following experience with numbers and numerosities on the elaboration of the semantic numerical representation with the same properties to those postulated in sighted people.

APA, Harvard, Vancouver, ISO, and other styles

32

Gulen, Elvan. "Fusing Semantic Information Extracted From Visual, Auditory And Textual Data Of Videos." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614582/index.pdf.

Full text

Abstract:

In recent years, due to the increasing usage of videos, manual information extraction is becoming insufficient to users. Therefore, extracting semantic information automatically turns out to be a serious requirement. Today, there exists some systems that extract semantic information automatically by using visual, auditory and textual data separately but the number of studies that uses more than one data source is very limited. As some studies on this topic have already shown, using multimodal video data for automatic information extraction ensures getting better results by guaranteeing increase in the accuracy of semantic information that is retrieved from visual, auditory and textual sources. In this thesis, a complete system which fuses the semantic information that is obtained from visual, auditory and textual video data is introduced. The fusion system carries out the following procedures
analyzing and uniting the semantic information that is extracted from multimodal data by utilizing concept interactions and consequently generating a semantic dataset which is ready to be stored in a database. Besides, experiments are conducted to compare results obtained from the proposed multimodal fusion operation with results obtained as an outcome of semantic information extraction from just one modality and other fusion methods. The results indicate that fusing all available information along with concept relations yields better results than any unimodal approaches and other traditional fusion methods in overall.

APA, Harvard, Vancouver, ISO, and other styles

33

Ishikawa, Erina Schaffer. "Semantic Interpretation of Eye Movements Using Author-designed Structure of Visual Content." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/217199.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Hussam, Ali. "Semantic highlighting : an approach to communicating information and knowledge through visual metadata." Thesis, University of Ulster, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.369988.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Heath, Derrall L. "Using Perceptually Grounded Semantic Models to Autonomously Convey Meaning Through Visual Art." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/6095.

Full text

Abstract:

Developing advanced semantic models is important in building computational systems that can not only understand language but also convey ideas and concepts to others. Semantic models can allow a creative image-producing-agent to autonomously produce artifacts that communicate an intended meaning. This notion of communicating meaning through art is often considered a necessary part of eliciting an aesthetic experience in the viewer and can thus enhance the (perceived) creativity of the agent. Computational creativity, a subfield of artificial intelligence, deals with designing computational systems and algorithms that either automatically create original and functional products, or that augment the ability of humans to do so. We present work on DARCI (Digital ARtist Communicating Intention), a system designed to autonomously produce original images that convey meaning. In order for DARCI to automatically express meaning through the art it creates, it must have its own semantic model that is perceptually grounded with visual capabilities.The work presented here focuses on designing, building, and incorporating advanced semantic and perceptual models into the DARCI system. These semantic models give DARCI a better understanding of the world and enable it to be more autonomous, to better evaluate its own artifacts, and to create artifacts with intention. Through designing, implementing, and studying DARCI, we have developed evaluation methods, models, frameworks, and theories related to the creative process that can be generalized to other domains outside of visual art. Our work on DARCI has even influenced the visual art community through several collaborative efforts, art galleries, and exhibits. We show that the DARCI system is successful at autonomously producing original art that is meaningful to human viewers. We also discuss insights that our efforts have contributed to the field of computational creativity.

APA, Harvard, Vancouver, ISO, and other styles

36

Siddiqui, Rafid. "On Fundamental Elements of Visual Navigation Systems." Doctoral thesis, Blekinge Tekniska Högskola, Institutionen för kommunikationssystem, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-00601.

Full text

Abstract:

Visual navigation is a ubiquitous yet complex task which is performed by many species for the purpose of survival. Although visual navigation is actively being studied within the robotics community, the determination of elemental constituents of a robust visual navigation system remains a challenge. Motion estimation is mistakenly considered as the sole ingredient to make a robust autonomous visual navigation system and therefore efforts are made to improve the accuracy of motion estimations. On the contrary, there are other factors which are as important as motion and whose absence could result in inability to perform seamless visual navigation such as the one exhibited by humans. Therefore, it is needed that a general model for a visual navigation system be devised which would describe it in terms of a set of elemental units. In this regard, a set of visual navigation elements (i.e. spatial memory, motion memory, scene geometry, context and scene semantics) are suggested as building blocks of a visual navigation system in this thesis. A set of methods are proposed which investigate the existence and role of visual navigation elements in a visual navigation system. A quantitative research methodology in the form of a series of systematic experiments is conducted on these methods. The thesis formulates, implements and analyzes the proposed methods in the context of visual navigation elements which are arranged into three major groupings; a) Spatial memory b) Motion Memory c) Manhattan, context and scene semantics. The investigations are carried out on multiple image datasets obtained by robot mounted cameras (2D/3D) moving in different environments. Spatial memory is investigated by evaluation of proposed place recognition methods. The recognized places and inter-place associations are then used to represent a visited set of places in the form of a topological map. Such a representation of places and their spatial associations models the concept of spatial memory. It resembles the humans’ ability of place representation and mapping for large environments (e.g. cities). Motion memory in a visual navigation system is analyzed by a thorough investigation of various motion estimation methods. This leads to proposals of direct motion estimation methods which compute accurate motion estimates by basing the estimation process on dominant surfaces. In everyday world, planar surfaces, especially the ground planes, are ubiquitous. Therefore, motion models are built upon this constraint. Manhattan structure provides geometrical cues which are helpful in solving navigation problems. There are some unique geometric primitives (e.g. planes) which make up an indoor environment. Therefore, a plane detection method is proposed as a result of investigations performed on scene structure. The method uses supervised learning to successfully classify the segmented clusters in 3D point-cloud datasets. In addition to geometry, the context of a scene also plays an important role in robustness of a visual navigation system. The context in which navigation is being performed imposes a set of constraints on objects and sections of the scene. The enforcement of such constraints enables the observer to robustly segment the scene and to classify various objects in the scene. A contextually aware scene segmentation method is proposed which classifies the image of a scene into a set of geometric classes. The geometric classes are sufficient for most of the navigation tasks. However, in order to facilitate the cognitive visual decision making process, the scene ought to be semantically segmented. The semantic of indoor scenes as well as semantic of the outdoor scenes are dealt with separately and separate methods are proposed for visual mapping of environments belonging to each type. An indoor scene consists of a corridor structure which is modeled as a cubic space in order to build a map of the environment. A “flash-n-extend” strategy is proposed which is responsible for controlling the map update frequency. The semantics of the outdoor scenes is also investigated and a scene classification method is proposed. The method employs a Markov Random Field (MRF) based classification framework which generates a set of semantic maps.

APA, Harvard, Vancouver, ISO, and other styles

37

Berndl, Emanuel [Verfasser], and Harald [Akademischer Betreuer] Kosch. "Embedding a Multimedia Metadata Model into a Workflow-driven Environment Using Idiomatic Semantic Web Technologies / Emanuel Berndl ; Betreuer: Harald Kosch." Passau : Universität Passau, 2019. http://d-nb.info/1192512022/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Couairon, Guillaume. "Text-Based Semantic Image Editing." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.

Full text

Abstract:

L’objectif de cette thèse est de proposer des algorithmes pour la tâche d’édition d’images basée sur le texte (TIE), qui consiste à éditer des images numériques selon une instruction formulée en langage naturel. Par exemple, étant donné une image d’un chien et la requête "Changez le chien en un chat", nous voulons produire une nouvelle image où le chien a été remplacé par un chat, en gardant tous les autres aspects de l’image inchangés (couleur et pose de l’animal, arrière- plan). L’objectif de l’étoile du nord est de permettre à tout un chacun de modifier ses images en utilisant uniquement des requêtes en langage naturel. Une des spécificités de l’édition d’images basée sur du texte est qu’il n’y a pratiquement pas de données d’entraînement pour former un algorithme supervisé. Dans cette thèse, nous proposons différentes solutions pour l’édition d’images, basées sur l’adaptation de grands modèles multimodaux entraînés sur d’énormes ensembles de données. Nous étudions tout d’abord une configuration d’édition simplifiée, appelée édition d’image basée sur la recherche, qui ne nécessite pas de modifier directement l’image d’entrée. Au lieu de cela, étant donné l’image et la requête de modification, nous recherchons dans une grande base de données une image qui correspond à la modification demandée. Nous nous appuyons sur des modèles multimodaux d’alignement image/texte entraînés sur des ensembles de données à l’échelle du web (comme CLIP) pour effectuer de telles transformations sans aucun exemple. Nous proposons également le cadre SIMAT pour évaluer l’édition d’images basée sur la recherche. Nous étudions ensuite comment modifier directement l’image d’entrée. Nous proposons FlexIT, une méthode qui modifie itérativement l’image d’entrée jus- qu’à ce qu’elle satisfasse un "objectif d’édition" abstrait défini dans un espace d’intégration multimodal. Nous introduisons des termes de régularisation pour imposer des transformations réalistes. Ensuite, nous nous concentrons sur les modèles de diffusion, qui sont des modèles génératifs puissants capables de synthétiser de nouvelles images conditionnées par une grande variété d’invites textuelles. Nous démontrons leur polyvalence en proposant DiffEdit, un algorithme qui adapte les modèles de diffusion pour l’édition d’images sans réglage fin. Nous proposons une stratégie "zero-shot" pour trouver automatiquement où l’image initiale doit être modifiée pour satisfaire la requête de transformation de texte
The aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models

APA, Harvard, Vancouver, ISO, and other styles

39

Ganis, Giorgio. "An electrophysiological analysis of semantic context effects on object identification /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 1998. http://wwwlib.umi.com/cr/ucsd/fullcit?p9820876.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Naci, Lorina. "Mechanisms for the semantic representation of everyday objects in the human ventral stream." Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Al, Halah Ziad [Verfasser], and R. [Akademischer Betreuer] Stiefelhagen. "Semantic Attributes for Transfer Learning in Visual Recognition / Ziad Al Halah ; Betreuer: R. Stiefelhagen." Karlsruhe : KIT-Bibliothek, 2018. http://d-nb.info/1164081020/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Rowley, Katherine Elizabeth. "Visual word recognition in deaf readers : the interplay between orthographic, semantic and phonological information." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10048381/.

Full text

Abstract:

Poor literacy is prevalent in the deaf population. This thesis assesses levels of literacy in the deaf population by investigating visual word recognition in deaf readers. For hearing readers, several studies have demonstrated that good visual word recognition skills are crucial for successful literacy attainment and poor readers are likely to have poor word recognition skills. In particular, phonology is known to play an important role in visual word recognition in hearing individuals. The role of phonology in deaf readers has also been addressed extensively. However, these have generated mixed results, which may be partly due to different methodological approaches and lack of control for reading level of participants. Studies reported in this thesis explore the role of orthography, semantics and phonology in deaf skilled readers during visual word recognition and also sentence reading using various methodologies and controlling carefully for reading level. The methodologies used include: lexical decision, masked priming, the visual world and the invisible boundary paradigm. The results from the various tasks described in this thesis show that there are similarities in the way deaf skilled and hearing readers process semantic and orthographic information. However, I found differences in how they process phonological information: deaf and hearing readers show similar effects of phonology in tasks that do not require semantic activation, however, deaf readers do not show phonological activation in tasks that require semantics while hearing readers do. This suggests qualitative differences in reading strategies for the two populations. These differences do not account for differences in literacy attainment across deaf and hearing groups (as our participants where matched for reading levels). Implications for theories of visual word recognition are discussed and in the final chapter, I introduce a proposed model of visual word recognition for deaf readers based on findings reported in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

43

Stadtlander, Leann M. "Orthographic, phonological and semantic coding in visual word recognition : an examination with familiar pseudowords /." The Ohio State University, 1993. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487844105976238.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Helo, Andrea. "Scene exploration during development : influence of perceptual features and semantic context on visual attention." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB205/document.

Full text

Abstract:

L'objectif de cette thèse est d'étudier le développement des mécanismes sous-tendant l'exploration d'une scène visuelle. Les résultats montrent que les stratégies d'attention ambiante et focale émergent vers l'âge de 12 mois. La saillance détermine davantage les mouvements oculaires chez les enfants de moins de 6 ans que chez les plus âgés. En outre, les objets sémantiquement inconsistants avec le contexte visuel attirent similairement le regard des jeunes enfants et des adultes. Cependant, seuls les enfants sont plus rapidement attirés par les objets à saillance élevée que par ceux à saillance réduite. L'effet du niveau de vocabulaire sur l'analyse des scènes visuelles se trouve uniquement dans la condition objets consistants. Toutefois, la latence et la topographie des PEs sont modulées selon le niveau de vocabulaire. Nos résultats suggèrent que le contrôle des mouvements oculaires liés à l'analyse d'une scène visuelle se développe de la petite enfance jusqu'à l'enfance. Bien que les modes ambiant et focal soient présents durant la petite enfance, l'exploration d'une scène visuelle est surtout influencée par le mode focal. De plus, les jeunes enfants utilisent - semblablement aux adultes - le contexte visuel pour orienter leur attention visuelle. Aux stades précoces du développement, la saillance paraît un facteur déterminant pour l'allocation du regard. L'attention visuelle est également influencée par les compétences linguistiques des jeunes enfants
This dissertation investigated developmental mechanism underlying scene exploration. The results showed that ambient and focal attention strategies emerge by 12 months of age. Saliency guided eye movements more in children younger than 6 years compared with older children. Additionally, objects that were semantically inconsistent with the scene context equally attracted the gaze in young children and adults. Children were, however, attracted faster to high salient than to low salient objects. High-producers looked longer to consistent objects than low-producers while both groups looked equally long to inconsistent objects. The N400 ERP component was more pronounced for inconsistent than for consistent scene-word pairs. Low-producers exhibited a later N400 effect over the right frontal recording sites while in high-producers the N400 effect was observed earlier over the left frontal sites. Our results suggest that eye movement control during scene viewing mature from infancy to childhood. Even though ambient and focal modes are present in early infancy, scene exploration is dominated by focal mode. Additionally, young children use scene context, similarly to adults, in guidance of their visual attention. However, during early stages of development saliency has a stronger effect on gaze allocation compared with adults. Visual attention was also influenced by linguistic skills in young children

APA, Harvard, Vancouver, ISO, and other styles

45

Scaltritti, Michele. "Retrospective Prime Reliance: A Flexible Retrospective Mechanism for Semantic Priming in Visual Word Recognition." Doctoral thesis, Università degli studi di Padova, 2013. http://hdl.handle.net/11577/3426649.

Full text

Abstract:

Recent evidences (Balota et al., 2008; Thomas et al., 2012) suggest that the cognitive system can retrospectively (i.e., after target presentation) increase its reliance on prime information when target-word recognition is made more difficult by experimental manipulations such as visual degradation. In fact, response time (RT) distributional analyses have shown that for clearly visible target-words the priming effect has the same size in all the portions of the RT distribution. In contrast, for degraded target-words, priming effects increase across the RT distribution, coherently with the idea of an increased reliance on prime information for degraded targets, which would be particularly beneficial for the most difficult responses (i.e., the slowest ones). The first study (with English-speaking participants), investigated the idea of retrospective prime reliance in the context of an important empirical conundrum within the word recognition literature, produced by the joint effects of stimulus visual quality (SQ), semantic priming and word frequency. The manipulation of these variables, in fact, has traditionally produced constraining results for models of priming (e.g., McNamara, 2005), as well as for visual word recognition models (e.g., Reynolds & Besner, 2004). In Experiment 1, all the three variables have been manipulated within a single speeded pronunciation task, where words and nonwords were randomly appearing as targets. The results indicated that the joint effect of SQ and word frequency on RTs were dependent upon prime relatedness. More specifically, additive effects of SQ and frequency were observed after related primes, while an overadditive interaction was observed after unrelated primes. Distributional analyses showed that this three-way interaction was mediated by slowest RTs and it was hypothesized that the pattern of effects reflects reliance on prime information. To test this hypothesis, in Experiment 2 related primes were eliminated from the list, to produce a context in which there was no reason to rely on prime information. Interactive effects of SQ and frequency found following unrelated primes in Experiment 1 reverted, in Experiment 2, to additive effects for the same unrelated prime conditions. Note that, in English, additive effects of SQ and frequency are found in standard speeded pronunciation tasks (i.e., with no primes), provided that words and nonwords are randomly intermixed in the target set (as was the case in Experiment 2). In a second study, the same experiments as in the first one were tested within a different priming paradigm, namely in zero-lag repetition priming (e.g., Ferguson et al., 2009) and within a different language (Italian). Although distributional analyses provided preliminary evidences that retrospective prime reliance is operative even in this context (Experiment 3), cross-linguistic differences were nonetheless observed. More specifically, in English SQ and frequency produce additive effects in a speeded pronunciation task, provided that nonword targets are intermixed with real words (O’Malley & Besner, 2008) and provided that primes (if present) are all unrelated (Experiment 2). This finding does not seem to be replicated in Italian, where the two variables still produced, in Experiment 4, an overadditive interaction despite the presence of nonwords in the target-set and despite the fact that only unrelated primes were presented (exactly as in Experiment 2). It was hypothesized the discrepancy might stem from the fact that, while in English the system needs to place a functional threshold at an earlier processing level in order to overcome the detrimental effect of visual degradation before lexical representations get activated (thus avoiding lexicalization errors), in a transparent language this might not be the case. It was thus argued that in Italian it is sufficient to increase the reliance on sublexical output, without qualitatively altering the activation-dynamics of the system. The third study explored the possibility that retrospective prime reliance entails episodic retrieval. In a first experiment, English-speaking participants first performed a lexical decision task where SQ and semantic priming were manipulated. After completing the lexical decision and a brief distracter-task, they also performed a recognition memory task on primes presented during the lexical decision. Results showed a trend towards better recognition of those primes that preceded degraded targets, as opposed to clearly visible ones. The result is coherent with the hypothesis that, for those primes that preceded degraded targets, episodic retrieval takes place even in lexical decision, thereby facilitating the recognition of these items in a subsequent memory task. In a second experiment (Italian participants), the effect of SQ in the memory task was not replicated, probably due to specific features of the materials used in the experiment. On the other hand, a strong lexicality effect was found in the memory performance: primes that preceded real words were recognized much better compared to those that preceded nonwords in the previous experimental phase. These results suggest that the interplay between primes and targets, and the cognitive operations required to process them in lexical decision may reflect into the memory traces left by these stimuli. In conclusion, retrospective prime reliance proved to be a useful theoretical tool to understand the joint effect of semantic priming, SQ, and frequency, thereby proposing a new perspective on this issue. Moreover, preliminary evidences suggest that a retrospective component might be involved even in a zero-lag repetition priming paradigm and that the mechanism beside retrospective reliance might entail the episodic retrieval of the prime’s representation. Most importantly, the results highlight the flexibility and the sensitivity of the reading system to the context (i.e., experimental task, characteristics of the stimuli).
Evidenze recenti (Balota et al., 2008; Thomas et al., 2012) suggeriscono che, qualora il riconoscimento delle parole-target sia reso più difficile da manipolazioni sperimentali quali la degradazione visiva, il sistema cognitivo possa incrementare in modo retrospettivo (i.e., dopo la presentazione della parole target) la misura in cui utilizza le informazioni convogliate dal prime semantico. Infatti, analisi della distribuzione dei tempi di reazione (TR) hanno mostrato che, per parole-target chiaramente visibili, l’effetto di priming semantico ha la stessa dimensione in tutte le porzioni della distribuzione dei TR. Diversamente, per parole-target visivamente degradate, l’effetto di priming semantico aumenta drasticamente nei TR più lenti, in accordo con l’ipotesi che il sistema si affidi in misura maggiore all’informazione convogliata dal prime per i targets visivamente degradati e che ciò sia di particolare beneficio per le risposte più difficili (i.e., le più lente). Nel primo studio (condotto con partecipanti di madrelingua Inglese), l’idea di un meccanismo retrospettivo e compensativo all’interno dell’effetto di priming semantico è stata indagata nel contesto degli effetti congiunti di qualità visiva (QV) dei target, frequenza di parole e priming semantico. In letteratura, la manipolazione di queste variabili ha prodotto, infatti, risultati molto rilevanti per i modelli di priming (e.g., McNamara, 2005) e per i modelli di riconoscimento visivo di parole singole (e.g., Reynolds & Besner, 2004). Nell’Esperimento 1, tutte e tre le variabili sono state congiuntamente manipolate all’interno di un singolo compito di lettura ad alta voce, in cui parole e non-parole comparivano in alternanza casuale come targets. I risultati hanno mostrato come gli effetti congiunti di QV e frequenza dipendano dalla relazione semantica tra prime e target. In particolare, le due variabili producono effetti additivi nel caso in cui prime e target siano semanticamente relati, mentre producono un’interazione sovradditiva nel caso in cui prime e target non siano relati. Analizzando la distribuzione dei TR, si è costatato che l’interazione a tre vie precedentemente descritta è mediata, principalmente, dai TR più lenti ed è stato conseguentemente ipotizzato che gli effetti riflettano un incremento retrospettivo della misura in cui il sistema si affida alle informazioni convogliate dal prime. Per testare l’ipotesi, nell’Esperimento 2 i prime semanticamente relati sono stati rimossi, al fine di creare un contesto in cui il sistema non avesse alcuna ragione per affidarsi all’informazione convogliata dal prime. I medesimi stimoli (coppie di prime - target non relati) che nell’Esperimento 1 avevano prodotto un’interazione, hanno prodotto effetti additivi nell’Esperimento 2. Si noti che, in Inglese, si riscontrano effetti additivi di QV e frequenza in compiti di lettura standard (senza primes), nel momento in cui parole e non parole appaiano in alternanza casuale come targets (come avveniva nell’Esperimento 2). In un secondo studio, i due esperimenti precedentemente descritti sono stati replicati utilizzando un paradigma sperimentale diverso, ovvero quello di priming di ripetizione (e.g., Ferguson et al., 2009), con partecipanti di madrelingua Italiana. Nonostante le analisi della distribuzione suggeriscano la presenza di una componente retrospettiva anche in questo secondo contesto (Esperimento 3), i risultati hanno mostrato anche importanti differenze. In Inglese QV e frequenza producono effetti additivi in compiti di lettura nei casi in cui sia parole che non-parole siano presentate come targets (O’Malley & Besner, 2008) e i primes (se presenti) siano tutti non relati (Esperimento 2). In Italiano le due variabili producono effetti sovradditivi (Esperimento 4) nonostante la contemporanea presenza di parole e non parole e nonostante il fatto che i targets fossero preceduti unicamente da primes non relati (esattamente come nell’Esperimento 2). E’ stato ipotizzato che la discrepanza nei risultati sia dovuta alle differenze cross-linguistiche (Inglese vs. Italiano). In Inglese il sistema presenta la necessità di variare la propria architettura funzionale assumendo un funzionamento seriale che confini l’effetto di degradazione visiva negli stadi precoci dell’elaborazione, al fine di evitare che l’attivazione di rappresentazioni lessicali produca errori di lessicalizzazione. In Italiano (un linguaggio trasparente) la situazione potrebbe essere differente. In questo contesto potrebbe essere sufficiente affidarsi in misura maggiore all’output della via sub-lessicale, senza una modificazione qualitativa dell’architettura funzionale. Nel terzo studio è stata esplorata la possibilità che la componente retrospettiva dell’effetto di priming semantico si basi sul recupero episodico della rappresentazione del prime. Nell’esperimento 5 i partecipanti (di madrelingua Inglese) hanno eseguito, durante la prima fase dell’esperimento, una decisione lessicale in cui sono stati manipolati QV e priming semantico. Al termine della prima fase, dopo un breve compito distrattore, i partecipanti eseguivano una prova di memoria di riconoscimento sui primes precedentemente presentati nel compito di decisione lessicale. I risultati hanno mostrato un trend in direzione di un miglior riconoscimento per quei primes che, nel compito di decisione lessicale, precedevano targets visivamente degradati rispetto a quelli che precedevano targets chiaramente visibili. Il risultato è coerente con l’idea che i prime presentati prima di target visivamente degradati siano soggetti a recupero episodico già nella fase di decisione lessicale e che ciò faciliti la prestazione nel compito di memoria. Nell’esperimento 6, analogo al precedente ma condotto con partecipanti di madrelingua Italiana, il tentativo di replicare l’effetto di QV nel compito di memoria non ha avuto successo, probabilmente a cause delle specifiche caratteristiche degli stimoli selezionati. Tuttavia, è stato rilevato, nel compito di memoria, un forte effetto di lessicalità: i partecipanti riconoscevano meglio quei primes che, in decisione lessicale, avevano preceduto parole reali, rispetto a quelli che avevano preceduto non-parole. Questi risultati suggeriscono che le operazioni cognitive condotte in un compito di decisione lessicale, e in particolare l’interazione tra prime e target, modulino le tracce mnesiche lasciate dagli stimoli stessi. In conclusione, la componente retrospettiva e compensativa descritta entro il meccanismo di priming semantico ha dimostrato di essere un utile mezzo teorico per comprendere gli effetti congiunti di priming semantico, QV e frequenza, proponendo pertanto una nuova prospettiva con cui investigare il tema. Inoltre, evidenze preliminari suggeriscono che la componente retrospettiva sia operativa anche in un paradigma di priming di ripetizione e che il meccanismo sottostante il processo retrospettivo possa comprendere il recupero episodico della rappresentazione del prime. Infine, i risultati sottolineano la flessibilità e la sensibilità del sistema di lettura al contesto sperimentale (i.e., compito proposto, caratteristiche degli stimoli).

APA, Harvard, Vancouver, ISO, and other styles

46

Rautiainen, M. (Mika). "Content-based search and browsing in semantic multimedia retrieval." Doctoral thesis, University of Oulu, 2006. http://urn.fi/urn:isbn:9514283007.

Full text

Abstract:

Abstract Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this thesis are composed of: an evaluation of two perceptually oriented colour spaces with details on the applicability of the HSV and CIE Lab spaces for low-level feature extraction; the development and evaluation of low-level visual features in example-based retrieval for image and video databases; the development and evaluation of a generic model for simple and efficient concept detection from video sequences with good detection performance on large video corpuses; the development of combination techniques for multi-modal visual, concept and lexical retrieval; the development of a cluster-temporal browsing model as a data navigation tool and its evaluation in several large and heterogeneous collections containing an assortment of video from educational and historical recordings to contemporary broadcast news, commercials and a multilingual television broadcast. The methods introduced here have been found to facilitate semantic queries for novice users without laborious manual annotation. Cluster-temporal browsing was found to outperform the conventional approach, which constitutes of sequential queries and relevance feedback, in semantic video retrieval by a statistically significant proportion.

APA, Harvard, Vancouver, ISO, and other styles

47

Siddiqui, Abujawad Rafid. "On Fundamental Elements of Visual Navigation Systems." Doctoral thesis, Blekinge Tekniska Högskola, Institutionen för kommunikationssystem, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-46484.

Full text

Abstract:

Visual navigation is a ubiquitous yet complex task which is performed by many species for the purpose of survival. Although visual navigation is actively being studied within the robotics community, the determination of elemental constituents of a robust visual navigation system remains a challenge. Motion estimation is mistakenly considered as the sole ingredient to make a robust autonomous visual navigation system and therefore efforts are made to improve the accuracy of motion estimations. On the contrary, there are other factors which are as important as motion and whose absence could result in inability to perform seamless visual navigation such as the one exhibited by humans. Therefore, it is needed that a general model for a visual navigation system be devised which would describe it in terms of a set of elemental units. In this regard, a set of visual navigation elements (i.e. spatial memory, motion memory, scene geometry, context and scene semantics) are suggested as building blocks of a visual navigation system in this thesis. A set of methods are proposed which investigate the existence and role of visual navigation elements in a visual navigation system. A quantitative research methodology in the form of a series of systematic experiments is conducted on these methods. The thesis formulates, implements and analyzes the proposed methods in the context of visual navigation elements which are arranged into three major groupings; a) Spatial memory b) Motion Memory c) Manhattan, context and scene semantics. The investigations are carried out on multiple image datasets obtained by robot mounted cameras (2D/3D) moving in different environments. Spatial memory is investigated by evaluation of proposed place recognition methods. The recognized places and inter-place associations are then used to represent a visited set of places in the form of a topological map. Such a representation of places and their spatial associations models the concept of spatial memory. It resembles the humans’ ability of place representation and mapping for large environments (e.g. cities). Motion memory in a visual navigation system is analyzed by a thorough investigation of various motion estimation methods. This leads to proposals of direct motion estimation methods which compute accurate motion estimates by basing the estimation process on dominant surfaces. In everyday world, planar surfaces, especially the ground planes, are ubiquitous. Therefore, motion models are built upon this constraint. Manhattan structure provides geometrical cues which are helpful in solving navigation problems. There are some unique geometric primitives (e.g. planes) which make up an indoor environment. Therefore, a plane detection method is proposed as a result of investigations performed on scene structure. The method uses supervised learning to successfully classify the segmented clusters in 3D point-cloud datasets. In addition to geometry, the context of a scene also plays an important role in robustness of a visual navigation system. The context in which navigation is being performed imposes a set of constraints on objects and sections of the scene. The enforcement of such constraints enables the observer to robustly segment the scene and to classify various objects in the scene. A contextually aware scene segmentation method is proposed which classifies the image of a scene into a set of geometric classes. The geometric classes are sufficient for most of the navigation tasks. However, in order to facilitate the cognitive visual decision making process, the scene ought to be semantically segmented. The semantic of indoor scenes as well as semantic of the outdoor scenes are dealt with separately and separate methods are proposed for visual mapping of environments belonging to each type. An indoor scene consists of a corridor structure which is modeled as a cubic space in order to build a map of the environment. A “flash-n-extend” strategy is proposed which is responsible for controlling the map update frequency. The semantics of the outdoor scenes is also investigated and a scene classification method is proposed. The method employs a Markov Random Field (MRF) based classification framework which generates a set of semantic maps.

APA, Harvard, Vancouver, ISO, and other styles

48

Zoccoli, Sandra L. "Object features and object recognition Semantic memory abilities during the normal aging process /." Ann Arbor, Mich. : ProQuest, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3288933.

Full text

Abstract:

Thesis (Ph.D. in Psychology)--S.M.U., 2007.
Title from PDF title page (viewed Nov. 19, 2009). Source: Dissertation Abstracts International, Volume: 68-11, Section: B, page: 7695. Adviser: Alan S. Brown. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

49

Sand, Anders. "Subliminal or not? : An appraisal of semantic processing in the near absence of visual awareness." Doctoral thesis, Stockholms universitet, Perception och psykofysik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-132211.

Full text

Abstract:

Stimuli that cannot be perceived (i.e., that are subliminal) can still elicit neural responses in an observer, but can such stimuli influence behavior and higher-order cognition? Empirical evidence for such effects has periodically been accepted and rejected over the last six decades. Today, many psychologists seem to consider such effects well-established and recent studies have extended the power of subliminal processing to new limits. In this thesis, I examine whether this shift in zeitgeist is matched by a shift in evidential strength for the phenomenon. This thesis consists of three empirical studies involving more than 250 participants, a simulation study, and a quantitative review. The conclusion based on these efforts is that several methodological, statistical, and theoretical issues remain in studies of subliminal processing. These issues mean that claimed subliminal effects might be caused by occasional or weak percepts (given the experimenters’ own definitions of perception) and that it is still unclear what evidence there is for the cognitive processing of subliminal stimuli. New data are presented suggesting that even in conditions traditionally claimed as “subliminal”, occasional or weak percepts may in fact influence cognitive processing more strongly than do the physical stimuli, possibly leading to reversed priming effects. I also summarize and provide methodological, statistical, and theoretical recommendations that could benefit future research aspiring to provide solid evidence for subliminal cognitive processing.

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Manuscript. Paper 4: Manuscript.

APA, Harvard, Vancouver, ISO, and other styles

50

Erol, Tugra. "The Visual Perception Of Automobile Seat Comfort." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/2/12607768/index.pdf.

Full text

Abstract:

The visual domain design constitutes the general designers communication basis for communicating messages of product attributes. In the design of an automobile seat where mainly the accommodating functions remain constant, an automobile seat&rsquo
s &ldquo
style&rdquo
affords the ability to provide certain meanings with affective connotations. Treating style aesthetics as a source of information, the communication of &ldquo
comfort&rdquo
can be provided via forms and other attributes. The literature provides strong evidence that comfort is related with aesthetics of any object in use, especially creating expectations towards the product. The &ldquo
Aesthetics of comfort&rdquo
can be explained as a variable intensity &ldquo
feeling&rdquo
or &ldquo
attitude&rdquo
regarding an entity of factors or characteristics of a multidimensional construct. Implemented by different layouts and cues, the consumer should be assisted in understanding the qualities of an automobile seat, such as comfort. As a result of the field study conducted, significant difference was found to exist in between the perception of visual comfort three production seat designs. A positive attitude about comfort towards an automobile seat was found to be influential in positively effecting the perception of seated comfort.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Visual and semantic embedding'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles