Bibliografias temáticas / Visual and semantic embedding

Literatura científica selecionada sobre o tema "Visual and semantic embedding"

Autor: Grafiati

Publicado: 25 de maio de 2024

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Índice

Artigos de revistas
Teses / dissertações
Livros
Capítulos de livros
Trabalhos de conferências
Relatórios de organizações

Consulte a lista de atuais artigos, livros, teses, anais de congressos e outras fontes científicas relevantes para o tema "Visual and semantic embedding".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Artigos de revistas sobre o assunto "Visual and semantic embedding"

Zhang, Yuanpeng, Jingye Guan, Haobo Wang, Kaiming Li, Ying Luo e Qun Zhang. "Generalized Zero-Shot Space Target Recognition Based on Global-Local Visual Feature Embedding Network". Remote Sensing 15, n.º 21 (28 de outubro de 2023): 5156. http://dx.doi.org/10.3390/rs15215156.

Texto completo da fonte

Resumo:

Existing deep learning-based space target recognition methods rely on abundantly labeled samples and are not capable of recognizing samples from unseen classes without training. In this article, based on generalized zero-shot learning (GZSL), we propose a space target recognition framework to simultaneously recognize space targets from both seen and unseen classes. First, we defined semantic attributes to describe the characteristics of different categories of space targets. Second, we constructed a dual-branch neural network, termed the global-local visual feature embedding network (GLVFENet), which jointly learns global and local visual features to obtain discriminative feature representations, thereby achieving GZSL for space targets with higher accuracy. Specifically, the global visual feature embedding subnetwork (GVFE-Subnet) calculates the compatibility score by measuring the cosine similarity between the projection of global visual features in the semantic space and various semantic vectors, thereby obtaining global visual embeddings. The local visual feature embedding subnetwork (LVFE-Subnet) introduces soft space attention, and an encoder discovers the semantic-guided local regions in the image to then generate local visual embeddings. Finally, the visual embeddings from both branches were combined and matched with semantics. The calibrated stacking method is introduced to achieve GZSL recognition of space targets. Extensive experiments were conducted on an electromagnetic simulation dataset of nine categories of space targets, and the effectiveness of our GLVFENet is confirmed.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Yeh, Mei-Chen, e Yi-Nan Li. "Multilabel Deep Visual-Semantic Embedding". IEEE Transactions on Pattern Analysis and Machine Intelligence 42, n.º 6 (1 de junho de 2020): 1530–36. http://dx.doi.org/10.1109/tpami.2019.2911065.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Merkx, Danny, e Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge". Natural Language Engineering 25, n.º 4 (julho de 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Texto completo da fonte

Resumo:

AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state of the art on two popular image-caption retrieval benchmark datasets: Microsoft Common Objects in Context (MSCOCO) and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity (STS) benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Zhou, Mo, Zhenxing Niu, Le Wang, Zhanning Gao, Qilin Zhang e Gang Hua. "Ladder Loss for Coherent Visual-Semantic Embedding". Proceedings of the AAAI Conference on Artificial Intelligence 34, n.º 07 (3 de abril de 2020): 13050–57. http://dx.doi.org/10.1609/aaai.v34i07.7006.

Texto completo da fonte

Resumo:

For visual-semantic embedding, the existing methods normally treat the relevance between queries and candidates in a bipolar way – relevant or irrelevant, and all “irrelevant” candidates are uniformly pushed away from the query by an equal margin in the embedding space, regardless of their various proximity to the query. This practice disregards relatively discriminative information and could lead to suboptimal ranking in the retrieval results and poorer user experience, especially in the long-tail query scenario where a matching candidate may not necessarily exist. In this paper, we introduce a continuous variable to model the relevance degree between queries and multiple candidates, and propose to learn a coherent embedding space, where candidates with higher relevance degrees are mapped closer to the query than those with lower relevance degrees. In particular, the new ladder loss is proposed by extending the triplet loss inequality to a more general inequality chain, which implements variable push-away margins according to respective relevance degrees. In addition, a proper Coherent Score metric is proposed to better measure the ranking results including those “irrelevant” candidates. Extensive experiments on multiple datasets validate the efficacy of our proposed method, which achieves significant improvement over existing state-of-the-art methods.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Ge, Jiannan, Hongtao Xie, Shaobo Min e Yongdong Zhang. "Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 2 (18 de maio de 2021): 1406–14. http://dx.doi.org/10.1609/aaai.v35i2.16230.

Texto completo da fonte

Resumo:

Generalized zero-shot Learning (GZSL) aims to recognize images from either seen or unseen domain, mainly by learning a joint embedding space to associate image features with the corresponding category descriptions. Recent methods have proved that localizing important object regions can effectively bridge the semantic-visual gap. However, these are all based on one-off visual localizers, lacking of interpretability and flexibility. In this paper, we propose a novel Semantic-guided Reinforced Region Embedding (SR2E) network that can localize important objects in the long-term interests to construct semantic-visual embedding space. SR2E consists of Reinforced Region Module (R2M) and Semantic Alignment Module (SAM). First, without the annotated bounding box as supervision, R2M encodes the semantic category guidance into the reward and punishment criteria to teach the localizer serialized region searching. Besides, R2M explores different action spaces during the serialized searching path to avoid local optimal localization, which thereby generates discriminative visual features with less redundancy. Second, SAM preserves the semantic relationship into visual features via semantic-visual alignment and designs a domain detector to alleviate the domain confusion. Experiments on four public benchmarks demonstrate that the proposed SR2E is an effective GZSL method with reinforced embedding space, which obtains averaged 6.1% improvements.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Nguyen, Huy Manh, Tomo Miyazaki, Yoshihiro Sugaya e Shinichiro Omachi. "Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence". Applied Sciences 11, n.º 7 (3 de abril de 2021): 3214. http://dx.doi.org/10.3390/app11073214.

Texto completo da fonte

Resumo:

Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods.

Estilos ABNT, Harvard, Vancouver, APA, etc.

MATSUBARA, Takashi. "Target-Oriented Deformation of Visual-Semantic Embedding Space". IEICE Transactions on Information and Systems E104.D, n.º 1 (1 de janeiro de 2021): 24–33. http://dx.doi.org/10.1587/transinf.2020mup0003.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Tang, Qi, Yao Zhao, Meiqin Liu, Jian Jin e Chao Yao. "Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution". Proceedings of the AAAI Conference on Artificial Intelligence 38, n.º 6 (24 de março de 2024): 5154–61. http://dx.doi.org/10.1609/aaai.v38i6.28321.

Texto completo da fonte

Resumo:

As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named Semantic Lens, predicated on semantic priors drawn from degraded videos. Specifically, video is modeled as instances, events, and scenes via a Semantic Extractor. Those semantics assist the Pixel Enhancer in understanding the recovered contents and generating more realistic visual results. The distilled global semantics embody the scene information of each frame, while the instance-specific semantics assemble the spatial-temporal contexts related to each instance. Furthermore, we devise a Semantics-Powered Attention Cross-Embedding (SPACE) block to bridge the pixel-level features with semantic knowledge, composed of a Global Perspective Shifter (GPS) and an Instance-Specific Semantic Embedding Encoder (ISEE). Concretely, the GPS module generates pairs of affine transformation parameters for pixel-level feature modulation conditioned on global semantics. After that the ISEE module harnesses the attention mechanism to align the adjacent frames in the instance-centric semantic space. In addition, we incorporate a simple yet effective pre-alignment module to alleviate the difficulty of model training. Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Keller, Patrick, Abdoul Kader Kaboré, Laura Plein, Jacques Klein, Yves Le Traon e Tegawendé F. Bissyandé. "What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning". ACM Transactions on Software Engineering and Methodology 31, n.º 2 (30 de abril de 2022): 1–34. http://dx.doi.org/10.1145/3485135.

Texto completo da fonte

Resumo:

Recent successes in training word embeddings for Natural Language Processing ( NLP ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the WySiWiM ( ‘ ‘What You See Is What It Means ” ) approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, our WySiWiM approach performs as effectively as state-of-the-art approaches such as ASTNN or TBCNN. We also showed with data from NVD and SARD that WySiWiM representation can be used to learn a vulnerable code detector with reasonable performance (accuracy ∼90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.

Estilos ABNT, Harvard, Vancouver, APA, etc.

He, Hai, e Haibo Yang. "Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization". Mathematical Problems in Engineering 2021 (28 de maio de 2021): 1–8. http://dx.doi.org/10.1155/2021/6654071.

Texto completo da fonte

Resumo:

Language and vision are the two most essential parts of human intelligence for interpreting the real world around us. How to make connections between language and vision is the key point in current research. Multimodality methods like visual semantic embedding have been widely studied recently, which unify images and corresponding texts into the same feature space. Inspired by the recent development of text data augmentation and a simple but powerful technique proposed called EDA (easy data augmentation), we can expand the information with given data using EDA to improve the performance of models. In this paper, we take advantage of the text data augmentation technique and word embedding initialization for multimodality retrieval. We utilize EDA for text data augmentation, word embedding initialization for text encoder based on recurrent neural networks, and minimizing the gap between the two spaces by triplet ranking loss with hard negative mining. On two Flickr-based datasets, we achieve the same recall with only 60% of the training dataset as the normal training with full available data. Experiment results show the improvement of our proposed model; and, on all datasets in this paper (Flickr8k, Flickr30k, and MS-COCO), our model performs better on image annotation and image retrieval tasks; the experiments also demonstrate that text data augmentation is more suitable for smaller datasets, while word embedding initialization is suitable for larger ones.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mais fontes

Teses / dissertações sobre o assunto "Visual and semantic embedding"

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Texto completo da fonte

Resumo:

De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des modèles. Nous explorons dans cette thèse les espaces de représentations conjoints visuels et sémantiques. Nous proposons deux nouveaux modèles permettant de construire de tels espaces. Nous démontrons également leur capacité à localiser des concepts sémantiques dans le domaine visuel. Nous introduisons également une nouvelle méthode permettant d’apprendre une approximation différentiable des fonctions d’évaluation basée sur le rang
Nowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric

Estilos ABNT, Harvard, Vancouver, APA, etc.

Wang, Qian. "Zero-shot visual recognition via latent embedding learning". Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/zeroshot-visual-recognition-via-latent-embedding-learning(bec510af-6a53-4114-9407-75212e1a08e1).html.

Texto completo da fonte

Resumo:

Traditional supervised visual recognition methods require a great number of annotated examples for each concerned class. The collection and annotation of visual data (e.g., images and videos) could be laborious, tedious and time-consuming when the number of classes involved is very large. In addition, there are such situations where the test instances are from novel classes for which training examples are unavailable in the training stage. These issues can be addressed by zero-shot learning (ZSL), an emerging machine learning technique enabling the recognition of novel classes. The key issue in zero-shot visual recognition is the semantic gap between visual and semantic representations. We address this issue in this thesis from three different perspectives: visual representations, semantic representations and the learning models. We first propose a novel bidirectional latent embedding framework for zero-shot visual recognition. By learning a latent space from visual representations and labelling information of the training examples, instances of different classes can be mapped into the latent space with the preserving of both visual and semantic relatedness, hence the semantic gap can be bridged. We conduct experiments on both object and human action recognition benchmarks to validate the effectiveness of the proposed ZSL framework. Then we extend the ZSL to the multi-label scenarios for multi-label zero-shot human action recognition based on weakly annotated video data. We employ a long short term memory (LSTM) neural network to explore the multiple actions underlying the video data. A joint latent space is learned by two component models (i.e. the visual model and the semantic model) to bridge the semantic gap. The two component embedding models are trained alternately to optimize the ranking based objectives. Extensive experiments are carried out on two multi-label human action datasets to evaluate the proposed framework. Finally, we propose alternative semantic representations for human actions towards narrowing the semantic gap from the perspective of semantic representation. A simple yet effective solution based on the exploration of web data has been investigated to enhance the semantic representations for human actions. The novel semantic representations are proved to benefit the zero-shot human action recognition significantly compared to the traditional attributes and word vectors. In summary, we propose novel frameworks for zero-shot visual recognition towards narrowing and bridging the semantic gap, and achieve state-of-the-art performance in different settings on multiple benchmarks.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Ficapal, Vila Joan. "Anemone: a Visual Semantic Graph". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252810.

Texto completo da fonte

Resumo:

Semantic graphs have been used for optimizing various natural language processing tasks as well as augmenting search and information retrieval tasks. In most cases these semantic graphs have been constructed through supervised machine learning methodologies that depend on manually curated ontologies such as Wikipedia or similar. In this thesis, which consists of two parts, we explore in the first part the possibility to automatically populate a semantic graph from an ad hoc data set of 50 000 newspaper articles in a completely unsupervised manner. The utility of the visual representation of the resulting graph is tested on 14 human subjects performing basic information retrieval tasks on a subset of the articles. Our study shows that, for entity finding and document similarity our feature engineering is viable and the visual map produced by our artifact is visually useful. In the second part, we explore the possibility to identify entity relationships in an unsupervised fashion by employing abstractive deep learning methods for sentence reformulation. The reformulated sentence structures are qualitatively assessed with respect to grammatical correctness and meaningfulness as perceived by 14 test subjects. We negatively evaluate the outcomes of this second part as they have not been good enough to acquire any definitive conclusion but have instead opened new doors to explore.
Semantiska grafer har använts för att optimera olika processer för naturlig språkbehandling samt för att förbättra sökoch informationsinhämtningsuppgifter. I de flesta fall har sådana semantiska grafer konstruerats genom övervakade maskininlärningsmetoder som förutsätter manuellt kurerade ontologier såsom Wikipedia eller liknande. I denna uppsats, som består av två delar, undersöker vi i första delen möjligheten att automatiskt generera en semantisk graf från ett ad hoc dataset bestående av 50 000 tidningsartiklar på ett helt oövervakat sätt. Användbarheten hos den visuella representationen av den resulterande grafen testas på 14 försökspersoner som utför grundläggande informationshämtningsuppgifter på en delmängd av artiklarna. Vår studie visar att vår funktionalitet är lönsam för att hitta och dokumentera likhet med varandra, och den visuella kartan som produceras av vår artefakt är visuellt användbar. I den andra delen utforskar vi möjligheten att identifiera entitetsrelationer på ett oövervakat sätt genom att använda abstraktiva djupa inlärningsmetoder för meningsomformulering. De omformulerade meningarna utvärderas kvalitativt med avseende på grammatisk korrekthet och meningsfullhet såsom detta uppfattas av 14 testpersoner. Vi utvärderar negativt resultaten av denna andra del, eftersom de inte har varit tillräckligt bra för att få någon definitiv slutsats, men har istället öppnat nya dörrar för att utforska.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Jakeš, Jan. "Visipedia - Embedding-driven Visual Feature Extraction and Learning". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236120.

Texto completo da fonte

Resumo:

Multidimenzionální indexování je účinným nástrojem pro zachycení podobností mezi objekty bez nutnosti jejich explicitní kategorizace. V posledních letech byla tato metoda hojně využívána pro anotaci objektů a tvořila významnou část publikací spojených s projektem Visipedia. Tato práce analyzuje možnosti strojového učení z multidimenzionálně indexovaných obrázků na základě jejich obrazových příznaků a přestavuje metody predikce multidimenzionálních souřadnic pro předem neznámé obrázky. Práce studuje příslušené algoritmy pro extrakci příznaků, analyzuje relevantní metody strojového účení a popisuje celý proces vývoje takového systému. Výsledný systém je pak otestován na dvou různých datasetech a provedené experimenty prezentují první výsledky pro úlohu svého druhu.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Gao, Jizhou. "VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS". UKnowledge, 2013. http://uknowledge.uky.edu/cs_etds/14.

Texto completo da fonte

Resumo:

This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Liu, Jingen. "Learning Semantic Features for Visual Recognition". Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3358.

Texto completo da fonte

Resumo:

Visual recognition (e.g., object, scene and action recognition) is an active area of research in computer vision due to its increasing number of real-world applications such as video (image) indexing and search, intelligent surveillance, human-machine interaction, robot navigation, etc. Effective modeling of the objects, scenes and actions is critical for visual recognition. Recently, bag of visual words (BoVW) representation, in which the image patches or video cuboids are quantized into visual words (i.e., mid-level features) based on their appearance similarity using clustering, has been widely and successfully explored. The advantages of this representation are: no explicit detection of objects or object parts and their tracking are required; the representation is somewhat tolerant to within-class deformations, and it is efficient for matching. However, the performance of the BoVW is sensitive to the size of the visual vocabulary. Therefore, computationally expensive cross-validation is needed to find the appropriate quantization granularity. This limitation is partially due to the fact that the visual words are not semantically meaningful. This limits the effectiveness and compactness of the representation. To overcome these shortcomings, in this thesis we present principled approach to learn a semantic vocabulary (i.e. high-level features) from a large amount of visual words (mid-level features). In this context, the thesis makes two major contributions. First, we have developed an algorithm to discover a compact yet discriminative semantic vocabulary. This vocabulary is obtained by grouping the visual-words based on their distribution in videos (images) into visual-word clusters. The mutual information (MI) be- tween the clusters and the videos (images) depicts the discriminative power of the semantic vocabulary, while the MI between visual-words and visual-word clusters measures the compactness of the vocabulary. We apply the information bottleneck (IB) algorithm to find the optimal number of visual-word clusters by finding the good tradeoff between compactness and discriminative power. We tested our proposed approach on the state-of-the-art KTH dataset, and obtained average accuracy of 94.2%. However, this approach performs one-side clustering, because only visual words are clustered regardless of which video they appear in. In order to leverage the co-occurrence of visual words and images, we have developed the co-clustering algorithm to simultaneously group the visual words and images. We tested our approach on the publicly available fifteen scene dataset and have obtained about 4% increase in the average accuracy compared to the one side clustering approaches. Second, instead of grouping the mid-level features, we first embed the features into a low-dimensional semantic space by manifold learning, and then perform the clustering. We apply Diffusion Maps (DM) to capture the local geometric structure of the mid-level feature space. The DM embedding is able to preserve the explicitly defined diffusion distance, which reflects the semantic similarity between any two features. Furthermore, the DM provides multi-scale analysis capability by adjusting the time steps in the Markov transition matrix. The experiments on KTH dataset show that DM can perform much better (about 3% to 6% improvement in average accuracy) than other manifold learning approaches and IB method. Above methods use only single type of features. In order to combine multiple heterogeneous features for visual recognition, we further propose the Fielder Embedding to capture the complicated semantic relationships between all entities (i.e., videos, images,heterogeneous features). The discovered relationships are then employed to further increase the recognition rate. We tested our approach on Weizmann dataset, and achieved about 17% 21% improvements in the average accuracy.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD

Estilos ABNT, Harvard, Vancouver, APA, etc.

Nguyen, Duc Minh Chau. "Affordance learning for visual-semantic perception". Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2021. https://ro.ecu.edu.au/theses/2443.

Texto completo da fonte

Resumo:

Affordance Learning is linked to the study of interactions between robots and objects, including how robots perceive objects by scene understanding. This area has been popular in the Psychology, which has recently come to influence Computer Vision. In this way, Computer Vision has borrowed the concept of affordance from Psychology in order to develop Visual-Semantic recognition systems, and to develop the capabilities of robots to interact with objects, in particular. However, existing systems of Affordance Learning are still limited to detecting and segmenting object affordances, which is called Affordance Segmentation. Further, these systems are not designed to develop specific abilities to reason about affordances. For example, a Visual-Semantic system, for captioning a scene, can extract information from an image, such as “a person holds a chocolate bar and eats it”, but does not highlight the affordances: “hold” and “eat”. Indeed, these affordances and others commonly appear within all aspects of life, since affordances usually connect to actions (from a linguistic view, affordances are generally known as verbs in sentences). Due to the above mentioned limitations, this thesis aims to develop systems of Affordance Learning for Visual-Semantic Perception. These systems can be built using Deep Learning, which has been empirically shown to be efficient for performing Computer Vision tasks. There are two goals of the thesis: (1) study what are the key factors that contribute to the performance of Affordance Segmentation and (2) reason about affordances (Affordance Reasoning) based on parts of objects for Visual-Semantic Perception. In terms of the first goal, the thesis mainly investigates the feature extraction module as this is one of the earliest steps in learning to segment affordances. The thesis finds that the quality of feature extraction from images plays a vital role in improved performance of Affordance Segmentation. With regard to the second goal, the thesis infers affordances from object parts to reason about part-affordance relationships. Based on this approach, the thesis devises an Object Affordance Reasoning Network that can learn to construct relationships between affordances and object parts. As a result, reasoning about affordance becomes achievable in the generation of scene graphs of affordances and object parts. Empirical results, obtained from extensive experiments, show the potential of the system (that the thesis developed) towards Affordance Reasoning from Scene Graph Generation.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Chen, Yifu. "Deep learning for visual semantic segmentation". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS200.

Texto completo da fonte

Resumo:

Dans cette thèse, nous nous intéressons à la segmentation sémantique visuelle, une des tâches de haut niveau qui ouvre la voie à une compréhension complète des scènes. Plus précisément, elle requiert une compréhension sémantique au niveau du pixel. Avec le succès de l’apprentissage approfondi de ces dernières années, les problèmes de segmentation sémantique sont abordés en utilisant des architectures profondes. Dans la première partie, nous nous concentrons sur la construction d’une fonction de coût plus appropriée pour la segmentation sémantique. En particulier, nous définissons une nouvelle fonction de coût basé sur un réseau de neurone de détection de contour sémantique. Cette fonction de coût impose des prédictions au niveau du pixel cohérentes avec les informa- tions de contour sémantique de la vérité terrain, et conduit donc à des résultats de segmentation mieux délimités. Dans la deuxième partie, nous abordons une autre question importante, à savoir l’apprentissage de modèle de segmentation avec peu de données annotées. Pour cela, nous proposons une nouvelle méthode d’attribution qui identifie les régions les plus importantes dans une image considérée par les réseaux de classification. Nous intégrons ensuite notre méthode d’attribution dans un contexte de segmentation faiblement supervisé. Les modèles de segmentation sémantique sont ainsi entraînés avec des données étiquetées au niveau de l’image uniquement, facile à collecter en grande quantité. Tous les modèles proposés dans cette thèse sont évalués expérimentalement de manière approfondie sur plusieurs ensembles de données et les résultats sont compétitifs avec ceux de la littérature
In this thesis, we are interested in Visual Semantic Segmentation, one of the high-level task that paves the way towards complete scene understanding. Specifically, it requires a semantic understanding at the pixel level. With the success of deep learning in recent years, semantic segmentation problems are being tackled using deep architectures. In the first part, we focus on the construction of a more appropriate loss function for semantic segmentation. More precisely, we define a novel loss function by employing a semantic edge detection network. This loss imposes pixel-level predictions to be consistent with the ground truth semantic edge information, and thus leads to better shaped segmentation results. In the second part, we address another important issue, namely, alleviating the need for training segmentation models with large amounts of fully annotated data. We propose a novel attribution method that identifies the most significant regions in an image considered by classification networks. We then integrate our attribution method into a weakly supervised segmentation framework. The semantic segmentation models can thus be trained with only image-level labeled data, which can be easily collected in large quantities. All models proposed in this thesis are thoroughly experimentally evaluated on multiple datasets and the results are competitive with the literature

Estilos ABNT, Harvard, Vancouver, APA, etc.

Fan, Wei. "Image super-resolution using neighbor embedding over visual primitive manifolds /". View abstract or full-text, 2007. http://library.ust.hk/cgi/db/thesis.pl?CSED%202007%20FAN.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Hanwell, David. "Weakly supervised learning of visual semantic attributes". Thesis, University of Bristol, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687063.

Texto completo da fonte

Resumo:

There are at present many billions of images on the internet, only a fraction of which are labelled according to their semantic content. To automatically provide labels for the rest, models of visual semantic concepts must be created. Such models are traditionally trained using images which have been manually acquired, segmented, and labelled. In this thesis, we submit that such models can be learned automatically using those few images which have already been labelled, either directly by their creators, or indirectly by their associated text. Such imagery can be acquired easily, cheaply, and in large quantities, using web image searches. Though there has been some work towards learning from such weakly labelled data, all methods yet proposed require more than a minimum of human effort. In this thesis we put forth a number of methods for reliably learning models of visual semantic attributes using only the raw, unadulterated results of web image searches. The proposed methods do not require any human input beyond specifying the names of the attributes to be learned. We also present means of identifying and localising learned attributes in challenging, real-world images. Our methods are of a probabilistic nature, and make extensive use of multivariate Gaussian mixture models to represent both data and learned models. The contributions of this thesis also include several tools for acquiring and comparing these distributions, including a novel clustering algorithm. We apply our weakly supervised learning methods to the training of models of a variety of visual semantic attributes including colour and pattern terms. Detection and localization of the learned attributes in unseen realworld images is demonstrated, and both quantitative and qualitative results are presented. We compare against other work, including both general methods of weakly supervised learning, and more attribute specific methods. We apply our learning methods to the training sets of previous works, and assess their performance on the test sets used by other authors. Our results show that our methods give better results than the current state of the art.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mais fontes

Livros sobre o assunto "Visual and semantic embedding"

Endert, Alex. Semantic Interaction for Visual Analytics. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-031-02603-4.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Paquette, Gilbert. Visual knowledge modeling for semantic web technologies: Models and ontologies. Hershey, PA: Information Science Reference, 2010.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Hussam, Ali. Semantic highlighting: An approach to communicating information and knowledge through visual metadata. [s.l: The Author], 1999.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Valkola, Jarmo. Perceiving the visual in cinema: Semantic approaches to film form and meaning. Jyväskylä: Jyväskylän Yliopisto, 1993.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Chen, Chaomei. Effects of spatial-semantic interfaces in visual information retrieval: Three experimental studies. [Great Britain]: Resource, 2002.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

K, kokula Krishna Hari, ed. Multi-secret Semantic Visual Cryptographic Protocol for Securing Image Communications: ICCS 2014. Bangkok, Thailand: Association of Scientists, Developers and Faculties, 2014.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Bratko, Aleksandr. Artificial intelligence, legal system and state functions. ru: INFRA-M Academic Publishing LLC., 2020. http://dx.doi.org/10.12737/1064996.

Texto completo da fonte

Resumo:

The monograph deals with methodological problems of embedding artificial intelligence in the legal system taking into account the laws of society. Describes the properties of the rule of law as a Microsystem in subsystems of law and methods of its fixation in the system of law and logic of legal norms. Is proposed and substantiated the idea of creating specifically for artificial intelligence, separate and distinct, unambiguous normative system, parallel to the principal branches of law is built on the logic of the four-membered structure of legal norms. Briefly discusses some of the theory of law as an instrument of methodology of modelling of the legal system and its semantic codes in order to function properly an artificial intelligence. The ways of application of artificial intelligence in the functioning of the state. For students and teachers and all those interested in issues of artificial intelligence from the point of view of law.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Video segmentation and its applications. New York: Springer, 2011.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Stoenescu, Livia. The Pictorial Art of El Greco. NL Amsterdam: Amsterdam University Press, 2019. http://dx.doi.org/10.5117/9789462989009.

Texto completo da fonte

Resumo:

The Pictorial Art of El Greco: Transmaterialities, Temporalities, and Media investigates El Greco’s pictorial art as foundational to the globalising trends manifested in the visual culture of early modernity. It also exposes the figurative, semantic, and allegorical senses that El Greco created to challenge an Italian Renaissance-centered discourse. Even though he was guided by the unprecedented burgeoning of devotional art in the post-Tridentine decades and by the expressive possibilities of earlier religious artifacts, especially those inherited from the apostolic past, the author demonstrates that El Greco forged his own independent trajectory. While his paintings have been studied in relation to the Italian and Spanish school traditions, his pictorial art in a global Mediterranean context continues to receive scant attention. Taking a global perspective as its focus, the book sheds new light on El Greco’s highly original contribution to early Mediterranean and multi-institutional configurations of the Christian faith in Byzantium, Venice, Rome, Toledo, and Madrid.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Zhang, Yu-jin. Semantic-Based Visual Information Retrieval. IRM Press, 2006.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mais fontes

Capítulos de livros sobre o assunto "Visual and semantic embedding"

Wang, Haoran, Ying Zhang, Zhong Ji, Yanwei Pang e Lin Ma. "Consensus-Aware Visual-Semantic Embedding for Image-Text Matching". In Computer Vision – ECCV 2020, 18–34. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58586-0_2.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Yang, Zhanbo, Li Li, Jun He, Zixi Wei, Li Liu e Jun Liao. "Multimodal Learning with Triplet Ranking Loss for Visual Semantic Embedding Learning". In Knowledge Science, Engineering and Management, 763–73. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-29551-6_67.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Jiang, Zhukai, e Zhichao Lian. "Self-supervised Visual-Semantic Embedding Network Based on Local Label Optimization". In Machine Learning for Cyber Security, 400–412. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-20102-8_31.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Filntisis, Panagiotis Paraskevas, Niki Efthymiou, Gerasimos Potamianos e Petros Maragos. "Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss". In Computer Vision – ECCV 2020 Workshops, 747–55. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-66415-2_52.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Valério, Rodrigo, e João Magalhães. "Learning Semantic-Visual Embeddings with a Priority Queue". In Pattern Recognition and Image Analysis, 67–81. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-36616-1_6.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Syed, Arsal, e Brendan Tran Morris. "CNN, Segmentation or Semantic Embeddings: Evaluating Scene Context for Trajectory Prediction". In Advances in Visual Computing, 706–17. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-64559-5_56.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Schall, Konstantin, Nico Hezel, Klaus Jung e Kai Uwe Barthel. "Vibro: Video Browsing with Semantic and Visual Image Embeddings". In MultiMedia Modeling, 665–70. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-27077-2_56.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Chen, Yanbei, e Loris Bazzani. "Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval". In Computer Vision – ECCV 2020, 136–52. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58542-6_9.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Theodoridou, Christina, Andreas Kargakos, Ioannis Kostavelis, Dimitrios Giakoumis e Dimitrios Tzovaras. "Spatially-Constrained Semantic Segmentation with Topological Maps and Visual Embeddings". In Lecture Notes in Computer Science, 117–29. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87156-7_10.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Thoma, Steffen, Achim Rettinger e Fabian Both. "Towards Holistic Concept Representations: Embedding Relational Knowledge, Visual Attributes, and Distributional Word Semantics". In Lecture Notes in Computer Science, 694–710. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68288-4_41.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Trabalhos de conferências sobre o assunto "Visual and semantic embedding"

Li, Zheng, Caili Guo, Zerun Feng, Jenq-Neng Hwang e Xijun Xue. "Multi-View Visual Semantic Embedding". In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/158.

Texto completo da fonte

Resumo:

Visual Semantic Embedding (VSE) is a dominant method for cross-modal vision-language retrieval. Its purpose is to learn an embedding space so that visual data can be embedded in a position close to the corresponding text description. However, there are large intra-class variations in the vision-language data. For example, multiple texts describing the same image may be described from different views, and the descriptions of different views are often dissimilar. The mainstream VSE method embeds samples from the same class in similar positions, which will suppress intra-class variations and lead to inferior generalization performance. This paper proposes a Multi-View Visual Semantic Embedding (MV-VSE) framework, which learns multiple embeddings for one visual data and explicitly models intra-class variations. To optimize MV-VSE, a multi-view upper bound loss is proposed, and the multi-view embeddings are jointly optimized while retaining intra-class variations. MV-VSE is plug-and-play and can be applied to various VSE models and loss functions without excessively increasing model complexity. Experimental results on the Flickr30K and MS-COCO datasets demonstrate the superior performance of our framework.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Ren, Zhou, Hailin Jin, Zhe Lin, Chen Fang e Alan Yuille. "Multiple Instance Visual-Semantic Embedding". In British Machine Vision Conference 2017. British Machine Vision Association, 2017. http://dx.doi.org/10.5244/c.31.89.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Wehrmann, Jônatas, e Rodrigo C. Barros. "Language-Agnostic Visual-Semantic Embeddings". In Concurso de Teses e Dissertações da SBC. Sociedade Brasileira de Computação, 2021. http://dx.doi.org/10.5753/ctd.2021.15751.

Texto completo da fonte

Resumo:

We propose a framework for training language-invariant cross-modal retrieval models. We introduce four novel text encoding approaches, as well as a character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. Moreover, we introduce a module called \adapt, which is responsible for providing query-aware visual representations that generate large improvements in terms of recall for four widely-used large-scale image-text datasets. We show that our models outperform the current state-of-the-art all scenarios. This thesis can serve as a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Li, Binglin, e Yang Wang. "Visual Relationship Detection Using Joint Visual-Semantic Embedding". In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018. http://dx.doi.org/10.1109/icpr.2018.8546097.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Ji, Rongrong, Hongxun Yao, Xiaoshuai Sun, Bineng Zhong e Wen Gao. "Towards semantic embedding in visual vocabulary". In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2010. http://dx.doi.org/10.1109/cvpr.2010.5540118.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Hong, Ziming, Shiming Chen, Guo-Sen Xie, Wenhan Yang, Jian Zhao, Yuanjie Shao, Qinmu Peng e Xinge You. "Semantic Compression Embedding for Generative Zero-Shot Learning". In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/134.

Texto completo da fonte

Resumo:

Generative methods have been successfully applied in zero-shot learning (ZSL) by learning an implicit mapping to alleviate the visual-semantic domain gaps and synthesizing unseen samples to handle the data imbalance between seen and unseen classes. However, existing generative methods simply use visual features extracted by the pre-trained CNN backbone. These visual features lack attribute-level semantic information. Consequently, seen classes are indistinguishable, and the knowledge transfer from seen to unseen classes is limited. To tackle this issue, we propose a novel Semantic Compression Embedding Guided Generation (SC-EGG) model, which cascades a semantic compression embedding network (SCEN) and an embedding guided generative network (EGGN). The SCEN extracts a group of attribute-level local features for each sample and further compresses them into the new low-dimension visual feature. Thus, a dense-semantic visual space is obtained. The EGGN learns a mapping from the class-level semantic space to the dense-semantic visual space, thus improving the discriminability of the synthesized dense-semantic unseen visual features. Extensive experiments on three benchmark datasets, i.e., CUB, SUN and AWA2, demonstrate the signiﬁcant performance gains of SC-EGG over current state-of-the-art methods and its baselines.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Perez-Martin, Jesus, Jorge Perez e Benjamin Bustos. "Visual-Syntactic Embedding for Video Captioning". In LatinX in AI at Computer Vision and Pattern Recognition Conference 2021. Journal of LatinX in AI Research, 2021. http://dx.doi.org/10.52591/lxai202106259.

Texto completo da fonte

Resumo:

Video captioning is the task of predicting a semantic and syntactically correct sequence of words given some context video. The most successful methods for video captioning have a strong dependency on the effectiveness of semantic representations learned from visual models, but often produce syntactically incorrect sentences which harms their performance on standard datasets. We address this limitation by considering syntactic representation learning as an essential component of video captioning. We construct a visual-syntactic embedding by mapping into a common vector space a visual representation, that depends only on the video, with a syntactic representation that depends only on Part-of-Speech (POS) tagging structures of the video description. We integrate this joint representation into an encoder-decoder architecture that we call Visual-Semantic-Syntactic Aligned Network (SemSynAN), which guides the decoder (text generation stage) by aligning temporal compositions of visual, semantic, and syntactic representations. We tested our proposed architecture obtaining state-of-the-art results on two widely used video captioning datasets. This is a short version of a paper recently published at a Computer Vision Conference. The complete reference has been redacted to fulfill the double-blind restriction.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Zeng, Zhixian, Jianjun Cao, Nianfeng Weng, Guoquan Jiang, Yizhuo Rao e Yuxin Xu. "Softmax Pooling for Super Visual Semantic Embedding". In 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). IEEE, 2021. http://dx.doi.org/10.1109/iemcon53756.2021.9623131.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Zhang, Licheng, Xianzhi Wang, Lina Yao, Lin Wu e Feng Zheng. "Zero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual Space". In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/126.

Texto completo da fonte

Resumo:

Zero-shot object detection (ZSD) has received considerable attention from the community of computer vision in recent years. It aims to simultaneously locate and categorize previously unseen objects during inference. One crucial problem of ZSD is how to accurately predict the label of each object proposal, i.e. categorizing object proposals, when conducting ZSD for unseen categories. Previous ZSD models generally relied on learning an embedding from visual space to semantic space or learning a joint embedding between semantic description and visual representation. As the features in the learned semantic space or the joint projected space tend to suffer from the hubness problem, namely the feature vectors are likely embedded to an area of incorrect labels, and thus it will lead to lower detection precision. In this paper, instead, we propose to learn a deep embedding from the semantic space to the visual space, which enables to well alleviate the hubness problem, because, compared with semantic space or joint embedding space, the distribution in visual space has smaller variance. After learning a deep embedding model, we perform $k$ nearest neighbor search in the visual space of unseen categories to determine the category of each semantic description. Extensive experiments on two public datasets show that our approach significantly outperforms the existing methods.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Song, Yale, e Mohammad Soleymani. "Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval". In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019. http://dx.doi.org/10.1109/cvpr.2019.00208.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Relatórios de organizações sobre o assunto "Visual and semantic embedding"

Kud, A. A. Figures and Tables. Reprinted from “Comprehensive сlassification of virtual assets”, A. A. Kud, 2021, International Journal of Education and Science, 4(1), 52–75. KRPOCH, 2021. http://dx.doi.org/10.26697/reprint.ijes.2021.1.6.a.kud.

Texto completo da fonte

Resumo:

Figure. Distributed Ledger Token Accounting System. Figure. Subjects of Social Relations Based on the Decentralized Information Platform. Figure. Derivativeness of a Digital Asset. Figure. Semantic Features of the Concept of a “Digital Asset” in Economic and Legal Aspects. Figure. Derivativeness of Polyassets and Monoassets. Figure. Types of Tokenized Assets Derived from Property. Figure. Visual Representation of the Methods of Financial and Management Accounting of Property Using Various Types of Tokenized Assets. Figure. Visual Representation of the Classification of Virtual Assets Based on the Complexity of Their Nature. Table. Comparison of Properties of Various Types of Virtual Assets of the Distributed Ledger Derivative of the Original Asset. Table. Main Properties and Parameters of Types of Tokenized Assets. Table. Classification of Virtual Assets as Tools for Implementing the Methods of Financial and Management Accounting of Property.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Tabinskyy, Yaroslav. VISUAL CONCEPTS OF PHOTO IN THE MEDIA (ON THE EXAMPLE OF «UKRAINER» AND «REPORTERS»). Ivan Franko National University of Lviv, março de 2021. http://dx.doi.org/10.30970/vjo.2021.50.11099.

Texto completo da fonte

Resumo:

The article is devoted to the analysis of the main forms of visualization in the media related to photo. The thematic visual concepts are described in accordance with the content of electronic media, which consider the impact of modern technologies on the development of media space. The researches of the Ukrainian and foreign educational institutions concerning the main features of modern photo is classificate. Modifications and new visual forms in the media are singled out. The main objective of the article is to study the visual concepts of modern photo and identify ideological and thematic priorities in photo projects. To achieve the main objective in the article a certain methodology were used. Due to the historical-theoretical description it was possible to substantiate the study of visual concepts. The conceptual-system method was used to study the subject of media photo projects. The main results of the research are the definition of visual concepts of photo on the example of electronic media and the identification of the main thematic features in the process of visual filling of the media space. Based on the study, we can conclude that today the information field needs quality visual content. For successful creation of visual concepts it is necessary to single out thematic features of modern photo and to carry out classifications on ideological and semantic signs. Given the rapid development of digital technologies, the topic of the scientific article we offer is relevant for scientists, journalists, media researchers, visual journalism experts and photojournalists. Modern space is filled with a large number of pictorial materials, which in most cases form specific images, patterns or stereotypes in the mind of the reader (viewer). Also important is the classification of photo used in journalistic publications. That is why there is a need to explore the content and principles of distribution of ideological priorities of photo in the media. The substantiation of scientists about the important place of photography in the modern media space and the future development of visual technologies, which already use artificial intelligence, is relevant.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mbani, Benson, Timm Schoening e Jens Greinert. Automated and Integrated Seafloor Classification Workflow (AI-SCW). GEOMAR, maio de 2023. http://dx.doi.org/10.3289/sw_2_2023.

Texto completo da fonte

Resumo:

The Automated and Integrated Seafloor Classification Workflow (AI-SCW) is a semi-automated underwater image processing pipeline that has been customized for use in classifying the seafloor into semantic habitat categories. The current implementation has been tested against a sequence of underwater images collected by the Ocean Floor Observation System (OFOS), in the Clarion-Clipperton Zone of the Pacific Ocean. Despite this, the workflow could also be applied to images acquired by other platforms such as an Autonomous Underwater Vehicle (AUV), or Remotely Operated Vehicle (ROV). The modules in AI-SCW have been implemented using the python programming language, specifically using libraries such as scikit-image for image processing, scikit-learn for machine learning and dimensionality reduction, keras for computer vision with deep learning, and matplotlib for generating visualizations. Therefore, AI-SCW modularized implementation allows users to accomplish a variety of underwater computer vision tasks, which include: detecting laser points from the underwater images for use in scale determination; performing contrast enhancement and color normalization to improve the visual quality of the images; semi-automated generation of annotations to be used downstream during supervised classification; training a convolutional neural network (Inception v3) using the generated annotations to semantically classify each image into one of pre-defined seafloor habitat categories; evaluating sampling strategies for generation of balanced training images to be used for fitting an unsupervised k-means classifier; and visualization of classification results in both feature space view and in map view geospatial co-ordinates. Thus, the workflow is useful for a quick but objective generation of image-based seafloor habitat maps to support monitoring of remote benthic ecosystems.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Yatsymirska, Mariya. SOCIAL EXPRESSION IN MULTIMEDIA TEXTS. Ivan Franko National University of Lviv, fevereiro de 2021. http://dx.doi.org/10.30970/vjo.2021.49.11072.

Texto completo da fonte

Resumo:

The article investigates functional techniques of extralinguistic expression in multimedia texts; the effectiveness of figurative expressions as a reaction to modern events in Ukraine and their influence on the formation of public opinion is shown. Publications of journalists, broadcasts of media resonators, experts, public figures, politicians, readers are analyzed. The language of the media plays a key role in shaping the worldview of the young political elite in the first place. The essence of each statement is a focused thought that reacts to events in the world or in one’s own country. The most popular platform for mass information and social interaction is, first of all, network journalism, which is characterized by mobility and unlimited time and space. Authors have complete freedom to express their views in direct language, including their own word formation. Phonetic, lexical, phraseological and stylistic means of speech create expression of the text. A figurative word, a good aphorism or proverb, a paraphrased expression, etc. enhance the effectiveness of a multimedia text. This is especially important for headlines that simultaneously inform and influence the views of millions of readers. Given the wide range of issues raised by the Internet as a medium, research in this area is interdisciplinary. The science of information, combining language and social communication, is at the forefront of global interactions. The Internet is an effective source of knowledge and a forum for free thought. Nonlinear texts (hypertexts) – «branching texts or texts that perform actions on request», multimedia texts change the principles of information collection, storage and dissemination, involving billions of readers in the discussion of global issues. Mastering the word is not an easy task if the author of the publication is not well-read, is not deep in the topic, does not know the psychology of the audience for which he writes. Therefore, the study of media broadcasting is an important component of the professional training of future journalists. The functions of the language of the media require the authors to make the right statements and convincing arguments in the text. Journalism education is not only knowledge of imperative and dispositive norms, but also apodictic ones. In practice, this means that there are rules in media creativity that are based on logical necessity. Apodicticity is the first sign of impressive language on the platform of print or electronic media. Social expression is a combination of creative abilities and linguistic competencies that a journalist realizes in his activity. Creative self-expression is realized in a set of many important factors in the media: the choice of topic, convincing arguments, logical presentation of ideas and deep philological education. Linguistic art, in contrast to painting, music, sculpture, accumulates all visual, auditory, tactile and empathic sensations in a universal sign – the word. The choice of the word for the reproduction of sensory and semantic meanings, its competent use in the appropriate context distinguishes the journalist-intellectual from other participants in forums, round tables, analytical or entertainment programs. Expressive speech in the media is a product of the intellect (ability to think) of all those who write on socio-political or economic topics. In the same plane with him – intelligence (awareness, prudence), the first sign of which (according to Ivan Ogienko) is a good knowledge of the language. Intellectual language is an important means of organizing a journalistic text. It, on the one hand, logically conveys the author’s thoughts, and on the other – encourages the reader to reflect and comprehend what is read. The richness of language is accumulated through continuous self-education and interesting communication. Studies of social expression as an important factor influencing the formation of public consciousness should open up new facets of rational and emotional media broadcasting; to trace physical and psychological reactions to communicative mimicry in the media. Speech mimicry as one of the methods of disguise is increasingly becoming a dangerous factor in manipulating the media. Mimicry is an unprincipled adaptation to the surrounding social conditions; one of the most famous examples of an animal characterized by mimicry (change of protective color and shape) is a chameleon. In a figurative sense, chameleons are called adaptive journalists. Observations show that mimicry in politics is to some extent a kind of game that, like every game, is always conditional and artificial.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!