Academic literature on the topic 'Visual and semantic embedding'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual and semantic embedding.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Visual and semantic embedding"

1

Zhang, Yuanpeng, Jingye Guan, Haobo Wang, Kaiming Li, Ying Luo, and Qun Zhang. "Generalized Zero-Shot Space Target Recognition Based on Global-Local Visual Feature Embedding Network." Remote Sensing 15, no. 21 (October 28, 2023): 5156. http://dx.doi.org/10.3390/rs15215156.

Full text
Abstract:
Existing deep learning-based space target recognition methods rely on abundantly labeled samples and are not capable of recognizing samples from unseen classes without training. In this article, based on generalized zero-shot learning (GZSL), we propose a space target recognition framework to simultaneously recognize space targets from both seen and unseen classes. First, we defined semantic attributes to describe the characteristics of different categories of space targets. Second, we constructed a dual-branch neural network, termed the global-local visual feature embedding network (GLVFENet), which jointly learns global and local visual features to obtain discriminative feature representations, thereby achieving GZSL for space targets with higher accuracy. Specifically, the global visual feature embedding subnetwork (GVFE-Subnet) calculates the compatibility score by measuring the cosine similarity between the projection of global visual features in the semantic space and various semantic vectors, thereby obtaining global visual embeddings. The local visual feature embedding subnetwork (LVFE-Subnet) introduces soft space attention, and an encoder discovers the semantic-guided local regions in the image to then generate local visual embeddings. Finally, the visual embeddings from both branches were combined and matched with semantics. The calibrated stacking method is introduced to achieve GZSL recognition of space targets. Extensive experiments were conducted on an electromagnetic simulation dataset of nine categories of space targets, and the effectiveness of our GLVFENet is confirmed.
APA, Harvard, Vancouver, ISO, and other styles
2

Yeh, Mei-Chen, and Yi-Nan Li. "Multilabel Deep Visual-Semantic Embedding." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 6 (June 1, 2020): 1530–36. http://dx.doi.org/10.1109/tpami.2019.2911065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (July 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Full text
Abstract:
AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state of the art on two popular image-caption retrieval benchmark datasets: Microsoft Common Objects in Context (MSCOCO) and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity (STS) benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics.
APA, Harvard, Vancouver, ISO, and other styles
4

Zhou, Mo, Zhenxing Niu, Le Wang, Zhanning Gao, Qilin Zhang, and Gang Hua. "Ladder Loss for Coherent Visual-Semantic Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 13050–57. http://dx.doi.org/10.1609/aaai.v34i07.7006.

Full text
Abstract:
For visual-semantic embedding, the existing methods normally treat the relevance between queries and candidates in a bipolar way – relevant or irrelevant, and all “irrelevant” candidates are uniformly pushed away from the query by an equal margin in the embedding space, regardless of their various proximity to the query. This practice disregards relatively discriminative information and could lead to suboptimal ranking in the retrieval results and poorer user experience, especially in the long-tail query scenario where a matching candidate may not necessarily exist. In this paper, we introduce a continuous variable to model the relevance degree between queries and multiple candidates, and propose to learn a coherent embedding space, where candidates with higher relevance degrees are mapped closer to the query than those with lower relevance degrees. In particular, the new ladder loss is proposed by extending the triplet loss inequality to a more general inequality chain, which implements variable push-away margins according to respective relevance degrees. In addition, a proper Coherent Score metric is proposed to better measure the ranking results including those “irrelevant” candidates. Extensive experiments on multiple datasets validate the efficacy of our proposed method, which achieves significant improvement over existing state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Ge, Jiannan, Hongtao Xie, Shaobo Min, and Yongdong Zhang. "Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1406–14. http://dx.doi.org/10.1609/aaai.v35i2.16230.

Full text
Abstract:
Generalized zero-shot Learning (GZSL) aims to recognize images from either seen or unseen domain, mainly by learning a joint embedding space to associate image features with the corresponding category descriptions. Recent methods have proved that localizing important object regions can effectively bridge the semantic-visual gap. However, these are all based on one-off visual localizers, lacking of interpretability and flexibility. In this paper, we propose a novel Semantic-guided Reinforced Region Embedding (SR2E) network that can localize important objects in the long-term interests to construct semantic-visual embedding space. SR2E consists of Reinforced Region Module (R2M) and Semantic Alignment Module (SAM). First, without the annotated bounding box as supervision, R2M encodes the semantic category guidance into the reward and punishment criteria to teach the localizer serialized region searching. Besides, R2M explores different action spaces during the serialized searching path to avoid local optimal localization, which thereby generates discriminative visual features with less redundancy. Second, SAM preserves the semantic relationship into visual features via semantic-visual alignment and designs a domain detector to alleviate the domain confusion. Experiments on four public benchmarks demonstrate that the proposed SR2E is an effective GZSL method with reinforced embedding space, which obtains averaged 6.1% improvements.
APA, Harvard, Vancouver, ISO, and other styles
6

Nguyen, Huy Manh, Tomo Miyazaki, Yoshihiro Sugaya, and Shinichiro Omachi. "Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence." Applied Sciences 11, no. 7 (April 3, 2021): 3214. http://dx.doi.org/10.3390/app11073214.

Full text
Abstract:
Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods.
APA, Harvard, Vancouver, ISO, and other styles
7

MATSUBARA, Takashi. "Target-Oriented Deformation of Visual-Semantic Embedding Space." IEICE Transactions on Information and Systems E104.D, no. 1 (January 1, 2021): 24–33. http://dx.doi.org/10.1587/transinf.2020mup0003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tang, Qi, Yao Zhao, Meiqin Liu, Jian Jin, and Chao Yao. "Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (March 24, 2024): 5154–61. http://dx.doi.org/10.1609/aaai.v38i6.28321.

Full text
Abstract:
As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named Semantic Lens, predicated on semantic priors drawn from degraded videos. Specifically, video is modeled as instances, events, and scenes via a Semantic Extractor. Those semantics assist the Pixel Enhancer in understanding the recovered contents and generating more realistic visual results. The distilled global semantics embody the scene information of each frame, while the instance-specific semantics assemble the spatial-temporal contexts related to each instance. Furthermore, we devise a Semantics-Powered Attention Cross-Embedding (SPACE) block to bridge the pixel-level features with semantic knowledge, composed of a Global Perspective Shifter (GPS) and an Instance-Specific Semantic Embedding Encoder (ISEE). Concretely, the GPS module generates pairs of affine transformation parameters for pixel-level feature modulation conditioned on global semantics. After that the ISEE module harnesses the attention mechanism to align the adjacent frames in the instance-centric semantic space. In addition, we incorporate a simple yet effective pre-alignment module to alleviate the difficulty of model training. Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods.
APA, Harvard, Vancouver, ISO, and other styles
9

Keller, Patrick, Abdoul Kader Kaboré, Laura Plein, Jacques Klein, Yves Le Traon, and Tegawendé F. Bissyandé. "What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning." ACM Transactions on Software Engineering and Methodology 31, no. 2 (April 30, 2022): 1–34. http://dx.doi.org/10.1145/3485135.

Full text
Abstract:
Recent successes in training word embeddings for Natural Language Processing ( NLP ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the WySiWiM ( ‘ ‘What You See Is What It Means ” ) approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, our WySiWiM approach performs as effectively as state-of-the-art approaches such as ASTNN or TBCNN. We also showed with data from NVD and SARD that WySiWiM representation can be used to learn a vulnerable code detector with reasonable performance (accuracy ∼90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.
APA, Harvard, Vancouver, ISO, and other styles
10

He, Hai, and Haibo Yang. "Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization." Mathematical Problems in Engineering 2021 (May 28, 2021): 1–8. http://dx.doi.org/10.1155/2021/6654071.

Full text
Abstract:
Language and vision are the two most essential parts of human intelligence for interpreting the real world around us. How to make connections between language and vision is the key point in current research. Multimodality methods like visual semantic embedding have been widely studied recently, which unify images and corresponding texts into the same feature space. Inspired by the recent development of text data augmentation and a simple but powerful technique proposed called EDA (easy data augmentation), we can expand the information with given data using EDA to improve the performance of models. In this paper, we take advantage of the text data augmentation technique and word embedding initialization for multimodality retrieval. We utilize EDA for text data augmentation, word embedding initialization for text encoder based on recurrent neural networks, and minimizing the gap between the two spaces by triplet ranking loss with hard negative mining. On two Flickr-based datasets, we achieve the same recall with only 60% of the training dataset as the normal training with full available data. Experiment results show the improvement of our proposed model; and, on all datasets in this paper (Flickr8k, Flickr30k, and MS-COCO), our model performs better on image annotation and image retrieval tasks; the experiments also demonstrate that text data augmentation is more suitable for smaller datasets, while word embedding initialization is suitable for larger ones.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Visual and semantic embedding"

1

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Full text
Abstract:
De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des modèles. Nous explorons dans cette thèse les espaces de représentations conjoints visuels et sémantiques. Nous proposons deux nouveaux modèles permettant de construire de tels espaces. Nous démontrons également leur capacité à localiser des concepts sémantiques dans le domaine visuel. Nous introduisons également une nouvelle méthode permettant d’apprendre une approximation différentiable des fonctions d’évaluation basée sur le rang
Nowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Qian. "Zero-shot visual recognition via latent embedding learning." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/zeroshot-visual-recognition-via-latent-embedding-learning(bec510af-6a53-4114-9407-75212e1a08e1).html.

Full text
Abstract:
Traditional supervised visual recognition methods require a great number of annotated examples for each concerned class. The collection and annotation of visual data (e.g., images and videos) could be laborious, tedious and time-consuming when the number of classes involved is very large. In addition, there are such situations where the test instances are from novel classes for which training examples are unavailable in the training stage. These issues can be addressed by zero-shot learning (ZSL), an emerging machine learning technique enabling the recognition of novel classes. The key issue in zero-shot visual recognition is the semantic gap between visual and semantic representations. We address this issue in this thesis from three different perspectives: visual representations, semantic representations and the learning models. We first propose a novel bidirectional latent embedding framework for zero-shot visual recognition. By learning a latent space from visual representations and labelling information of the training examples, instances of different classes can be mapped into the latent space with the preserving of both visual and semantic relatedness, hence the semantic gap can be bridged. We conduct experiments on both object and human action recognition benchmarks to validate the effectiveness of the proposed ZSL framework. Then we extend the ZSL to the multi-label scenarios for multi-label zero-shot human action recognition based on weakly annotated video data. We employ a long short term memory (LSTM) neural network to explore the multiple actions underlying the video data. A joint latent space is learned by two component models (i.e. the visual model and the semantic model) to bridge the semantic gap. The two component embedding models are trained alternately to optimize the ranking based objectives. Extensive experiments are carried out on two multi-label human action datasets to evaluate the proposed framework. Finally, we propose alternative semantic representations for human actions towards narrowing the semantic gap from the perspective of semantic representation. A simple yet effective solution based on the exploration of web data has been investigated to enhance the semantic representations for human actions. The novel semantic representations are proved to benefit the zero-shot human action recognition significantly compared to the traditional attributes and word vectors. In summary, we propose novel frameworks for zero-shot visual recognition towards narrowing and bridging the semantic gap, and achieve state-of-the-art performance in different settings on multiple benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
3

Ficapal, Vila Joan. "Anemone: a Visual Semantic Graph." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252810.

Full text
Abstract:
Semantic graphs have been used for optimizing various natural language processing tasks as well as augmenting search and information retrieval tasks. In most cases these semantic graphs have been constructed through supervised machine learning methodologies that depend on manually curated ontologies such as Wikipedia or similar. In this thesis, which consists of two parts, we explore in the first part the possibility to automatically populate a semantic graph from an ad hoc data set of 50 000 newspaper articles in a completely unsupervised manner. The utility of the visual representation of the resulting graph is tested on 14 human subjects performing basic information retrieval tasks on a subset of the articles. Our study shows that, for entity finding and document similarity our feature engineering is viable and the visual map produced by our artifact is visually useful. In the second part, we explore the possibility to identify entity relationships in an unsupervised fashion by employing abstractive deep learning methods for sentence reformulation. The reformulated sentence structures are qualitatively assessed with respect to grammatical correctness and meaningfulness as perceived by 14 test subjects. We negatively evaluate the outcomes of this second part as they have not been good enough to acquire any definitive conclusion but have instead opened new doors to explore.
Semantiska grafer har använts för att optimera olika processer för naturlig språkbehandling samt för att förbättra sökoch informationsinhämtningsuppgifter. I de flesta fall har sådana semantiska grafer konstruerats genom övervakade maskininlärningsmetoder som förutsätter manuellt kurerade ontologier såsom Wikipedia eller liknande. I denna uppsats, som består av två delar, undersöker vi i första delen möjligheten att automatiskt generera en semantisk graf från ett ad hoc dataset bestående av 50 000 tidningsartiklar på ett helt oövervakat sätt. Användbarheten hos den visuella representationen av den resulterande grafen testas på 14 försökspersoner som utför grundläggande informationshämtningsuppgifter på en delmängd av artiklarna. Vår studie visar att vår funktionalitet är lönsam för att hitta och dokumentera likhet med varandra, och den visuella kartan som produceras av vår artefakt är visuellt användbar. I den andra delen utforskar vi möjligheten att identifiera entitetsrelationer på ett oövervakat sätt genom att använda abstraktiva djupa inlärningsmetoder för meningsomformulering. De omformulerade meningarna utvärderas kvalitativt med avseende på grammatisk korrekthet och meningsfullhet såsom detta uppfattas av 14 testpersoner. Vi utvärderar negativt resultaten av denna andra del, eftersom de inte har varit tillräckligt bra för att få någon definitiv slutsats, men har istället öppnat nya dörrar för att utforska.
APA, Harvard, Vancouver, ISO, and other styles
4

Jakeš, Jan. "Visipedia - Embedding-driven Visual Feature Extraction and Learning." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236120.

Full text
Abstract:
Multidimenzionální indexování je účinným nástrojem pro zachycení podobností mezi objekty bez nutnosti jejich explicitní kategorizace. V posledních letech byla tato metoda hojně využívána pro anotaci objektů a tvořila významnou část publikací spojených s projektem Visipedia. Tato práce analyzuje možnosti strojového učení z multidimenzionálně indexovaných obrázků na základě jejich obrazových příznaků a přestavuje metody predikce multidimenzionálních souřadnic pro předem neznámé obrázky. Práce studuje příslušené algoritmy pro extrakci příznaků, analyzuje relevantní metody strojového účení a popisuje celý proces vývoje takového systému. Výsledný systém je pak otestován na dvou různých datasetech a provedené experimenty prezentují první výsledky pro úlohu svého druhu.
APA, Harvard, Vancouver, ISO, and other styles
5

Gao, Jizhou. "VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS." UKnowledge, 2013. http://uknowledge.uky.edu/cs_etds/14.

Full text
Abstract:
This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Jingen. "Learning Semantic Features for Visual Recognition." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3358.

Full text
Abstract:
Visual recognition (e.g., object, scene and action recognition) is an active area of research in computer vision due to its increasing number of real-world applications such as video (image) indexing and search, intelligent surveillance, human-machine interaction, robot navigation, etc. Effective modeling of the objects, scenes and actions is critical for visual recognition. Recently, bag of visual words (BoVW) representation, in which the image patches or video cuboids are quantized into visual words (i.e., mid-level features) based on their appearance similarity using clustering, has been widely and successfully explored. The advantages of this representation are: no explicit detection of objects or object parts and their tracking are required; the representation is somewhat tolerant to within-class deformations, and it is efficient for matching. However, the performance of the BoVW is sensitive to the size of the visual vocabulary. Therefore, computationally expensive cross-validation is needed to find the appropriate quantization granularity. This limitation is partially due to the fact that the visual words are not semantically meaningful. This limits the effectiveness and compactness of the representation. To overcome these shortcomings, in this thesis we present principled approach to learn a semantic vocabulary (i.e. high-level features) from a large amount of visual words (mid-level features). In this context, the thesis makes two major contributions. First, we have developed an algorithm to discover a compact yet discriminative semantic vocabulary. This vocabulary is obtained by grouping the visual-words based on their distribution in videos (images) into visual-word clusters. The mutual information (MI) be- tween the clusters and the videos (images) depicts the discriminative power of the semantic vocabulary, while the MI between visual-words and visual-word clusters measures the compactness of the vocabulary. We apply the information bottleneck (IB) algorithm to find the optimal number of visual-word clusters by finding the good tradeoff between compactness and discriminative power. We tested our proposed approach on the state-of-the-art KTH dataset, and obtained average accuracy of 94.2%. However, this approach performs one-side clustering, because only visual words are clustered regardless of which video they appear in. In order to leverage the co-occurrence of visual words and images, we have developed the co-clustering algorithm to simultaneously group the visual words and images. We tested our approach on the publicly available fifteen scene dataset and have obtained about 4% increase in the average accuracy compared to the one side clustering approaches. Second, instead of grouping the mid-level features, we first embed the features into a low-dimensional semantic space by manifold learning, and then perform the clustering. We apply Diffusion Maps (DM) to capture the local geometric structure of the mid-level feature space. The DM embedding is able to preserve the explicitly defined diffusion distance, which reflects the semantic similarity between any two features. Furthermore, the DM provides multi-scale analysis capability by adjusting the time steps in the Markov transition matrix. The experiments on KTH dataset show that DM can perform much better (about 3% to 6% improvement in average accuracy) than other manifold learning approaches and IB method. Above methods use only single type of features. In order to combine multiple heterogeneous features for visual recognition, we further propose the Fielder Embedding to capture the complicated semantic relationships between all entities (i.e., videos, images,heterogeneous features). The discovered relationships are then employed to further increase the recognition rate. We tested our approach on Weizmann dataset, and achieved about 17% 21% improvements in the average accuracy.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD
APA, Harvard, Vancouver, ISO, and other styles
7

Nguyen, Duc Minh Chau. "Affordance learning for visual-semantic perception." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2021. https://ro.ecu.edu.au/theses/2443.

Full text
Abstract:
Affordance Learning is linked to the study of interactions between robots and objects, including how robots perceive objects by scene understanding. This area has been popular in the Psychology, which has recently come to influence Computer Vision. In this way, Computer Vision has borrowed the concept of affordance from Psychology in order to develop Visual-Semantic recognition systems, and to develop the capabilities of robots to interact with objects, in particular. However, existing systems of Affordance Learning are still limited to detecting and segmenting object affordances, which is called Affordance Segmentation. Further, these systems are not designed to develop specific abilities to reason about affordances. For example, a Visual-Semantic system, for captioning a scene, can extract information from an image, such as “a person holds a chocolate bar and eats it”, but does not highlight the affordances: “hold” and “eat”. Indeed, these affordances and others commonly appear within all aspects of life, since affordances usually connect to actions (from a linguistic view, affordances are generally known as verbs in sentences). Due to the above mentioned limitations, this thesis aims to develop systems of Affordance Learning for Visual-Semantic Perception. These systems can be built using Deep Learning, which has been empirically shown to be efficient for performing Computer Vision tasks. There are two goals of the thesis: (1) study what are the key factors that contribute to the performance of Affordance Segmentation and (2) reason about affordances (Affordance Reasoning) based on parts of objects for Visual-Semantic Perception. In terms of the first goal, the thesis mainly investigates the feature extraction module as this is one of the earliest steps in learning to segment affordances. The thesis finds that the quality of feature extraction from images plays a vital role in improved performance of Affordance Segmentation. With regard to the second goal, the thesis infers affordances from object parts to reason about part-affordance relationships. Based on this approach, the thesis devises an Object Affordance Reasoning Network that can learn to construct relationships between affordances and object parts. As a result, reasoning about affordance becomes achievable in the generation of scene graphs of affordances and object parts. Empirical results, obtained from extensive experiments, show the potential of the system (that the thesis developed) towards Affordance Reasoning from Scene Graph Generation.
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Yifu. "Deep learning for visual semantic segmentation." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS200.

Full text
Abstract:
Dans cette thèse, nous nous intéressons à la segmentation sémantique visuelle, une des tâches de haut niveau qui ouvre la voie à une compréhension complète des scènes. Plus précisément, elle requiert une compréhension sémantique au niveau du pixel. Avec le succès de l’apprentissage approfondi de ces dernières années, les problèmes de segmentation sémantique sont abordés en utilisant des architectures profondes. Dans la première partie, nous nous concentrons sur la construction d’une fonction de coût plus appropriée pour la segmentation sémantique. En particulier, nous définissons une nouvelle fonction de coût basé sur un réseau de neurone de détection de contour sémantique. Cette fonction de coût impose des prédictions au niveau du pixel cohérentes avec les informa- tions de contour sémantique de la vérité terrain, et conduit donc à des résultats de segmentation mieux délimités. Dans la deuxième partie, nous abordons une autre question importante, à savoir l’apprentissage de modèle de segmentation avec peu de données annotées. Pour cela, nous proposons une nouvelle méthode d’attribution qui identifie les régions les plus importantes dans une image considérée par les réseaux de classification. Nous intégrons ensuite notre méthode d’attribution dans un contexte de segmentation faiblement supervisé. Les modèles de segmentation sémantique sont ainsi entraînés avec des données étiquetées au niveau de l’image uniquement, facile à collecter en grande quantité. Tous les modèles proposés dans cette thèse sont évalués expérimentalement de manière approfondie sur plusieurs ensembles de données et les résultats sont compétitifs avec ceux de la littérature
In this thesis, we are interested in Visual Semantic Segmentation, one of the high-level task that paves the way towards complete scene understanding. Specifically, it requires a semantic understanding at the pixel level. With the success of deep learning in recent years, semantic segmentation problems are being tackled using deep architectures. In the first part, we focus on the construction of a more appropriate loss function for semantic segmentation. More precisely, we define a novel loss function by employing a semantic edge detection network. This loss imposes pixel-level predictions to be consistent with the ground truth semantic edge information, and thus leads to better shaped segmentation results. In the second part, we address another important issue, namely, alleviating the need for training segmentation models with large amounts of fully annotated data. We propose a novel attribution method that identifies the most significant regions in an image considered by classification networks. We then integrate our attribution method into a weakly supervised segmentation framework. The semantic segmentation models can thus be trained with only image-level labeled data, which can be easily collected in large quantities. All models proposed in this thesis are thoroughly experimentally evaluated on multiple datasets and the results are competitive with the literature
APA, Harvard, Vancouver, ISO, and other styles
9

Fan, Wei. "Image super-resolution using neighbor embedding over visual primitive manifolds /." View abstract or full-text, 2007. http://library.ust.hk/cgi/db/thesis.pl?CSED%202007%20FAN.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Hanwell, David. "Weakly supervised learning of visual semantic attributes." Thesis, University of Bristol, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687063.

Full text
Abstract:
There are at present many billions of images on the internet, only a fraction of which are labelled according to their semantic content. To automatically provide labels for the rest, models of visual semantic concepts must be created. Such models are traditionally trained using images which have been manually acquired, segmented, and labelled. In this thesis, we submit that such models can be learned automatically using those few images which have already been labelled, either directly by their creators, or indirectly by their associated text. Such imagery can be acquired easily, cheaply, and in large quantities, using web image searches. Though there has been some work towards learning from such weakly labelled data, all methods yet proposed require more than a minimum of human effort. In this thesis we put forth a number of methods for reliably learning models of visual semantic attributes using only the raw, unadulterated results of web image searches. The proposed methods do not require any human input beyond specifying the names of the attributes to be learned. We also present means of identifying and localising learned attributes in challenging, real-world images. Our methods are of a probabilistic nature, and make extensive use of multivariate Gaussian mixture models to represent both data and learned models. The contributions of this thesis also include several tools for acquiring and comparing these distributions, including a novel clustering algorithm. We apply our weakly supervised learning methods to the training of models of a variety of visual semantic attributes including colour and pattern terms. Detection and localization of the learned attributes in unseen realworld images is demonstrated, and both quantitative and qualitative results are presented. We compare against other work, including both general methods of weakly supervised learning, and more attribute specific methods. We apply our learning methods to the training sets of previous works, and assess their performance on the test sets used by other authors. Our results show that our methods give better results than the current state of the art.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Visual and semantic embedding"

1

Endert, Alex. Semantic Interaction for Visual Analytics. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-031-02603-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Paquette, Gilbert. Visual knowledge modeling for semantic web technologies: Models and ontologies. Hershey, PA: Information Science Reference, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hussam, Ali. Semantic highlighting: An approach to communicating information and knowledge through visual metadata. [s.l: The Author], 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Valkola, Jarmo. Perceiving the visual in cinema: Semantic approaches to film form and meaning. Jyväskylä: Jyväskylän Yliopisto, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Chaomei. Effects of spatial-semantic interfaces in visual information retrieval: Three experimental studies. [Great Britain]: Resource, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

K, kokula Krishna Hari, ed. Multi-secret Semantic Visual Cryptographic Protocol for Securing Image Communications: ICCS 2014. Bangkok, Thailand: Association of Scientists, Developers and Faculties, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bratko, Aleksandr. Artificial intelligence, legal system and state functions. ru: INFRA-M Academic Publishing LLC., 2020. http://dx.doi.org/10.12737/1064996.

Full text
Abstract:
The monograph deals with methodological problems of embedding artificial intelligence in the legal system taking into account the laws of society. Describes the properties of the rule of law as a Microsystem in subsystems of law and methods of its fixation in the system of law and logic of legal norms. Is proposed and substantiated the idea of creating specifically for artificial intelligence, separate and distinct, unambiguous normative system, parallel to the principal branches of law is built on the logic of the four-membered structure of legal norms. Briefly discusses some of the theory of law as an instrument of methodology of modelling of the legal system and its semantic codes in order to function properly an artificial intelligence. The ways of application of artificial intelligence in the functioning of the state. For students and teachers and all those interested in issues of artificial intelligence from the point of view of law.
APA, Harvard, Vancouver, ISO, and other styles
8

Video segmentation and its applications. New York: Springer, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Stoenescu, Livia. The Pictorial Art of El Greco. NL Amsterdam: Amsterdam University Press, 2019. http://dx.doi.org/10.5117/9789462989009.

Full text
Abstract:
The Pictorial Art of El Greco: Transmaterialities, Temporalities, and Media investigates El Greco’s pictorial art as foundational to the globalising trends manifested in the visual culture of early modernity. It also exposes the figurative, semantic, and allegorical senses that El Greco created to challenge an Italian Renaissance-centered discourse. Even though he was guided by the unprecedented burgeoning of devotional art in the post-Tridentine decades and by the expressive possibilities of earlier religious artifacts, especially those inherited from the apostolic past, the author demonstrates that El Greco forged his own independent trajectory. While his paintings have been studied in relation to the Italian and Spanish school traditions, his pictorial art in a global Mediterranean context continues to receive scant attention. Taking a global perspective as its focus, the book sheds new light on El Greco’s highly original contribution to early Mediterranean and multi-institutional configurations of the Christian faith in Byzantium, Venice, Rome, Toledo, and Madrid.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Yu-jin. Semantic-Based Visual Information Retrieval. IRM Press, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Visual and semantic embedding"

1

Wang, Haoran, Ying Zhang, Zhong Ji, Yanwei Pang, and Lin Ma. "Consensus-Aware Visual-Semantic Embedding for Image-Text Matching." In Computer Vision – ECCV 2020, 18–34. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58586-0_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Yang, Zhanbo, Li Li, Jun He, Zixi Wei, Li Liu, and Jun Liao. "Multimodal Learning with Triplet Ranking Loss for Visual Semantic Embedding Learning." In Knowledge Science, Engineering and Management, 763–73. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-29551-6_67.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jiang, Zhukai, and Zhichao Lian. "Self-supervised Visual-Semantic Embedding Network Based on Local Label Optimization." In Machine Learning for Cyber Security, 400–412. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-20102-8_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Filntisis, Panagiotis Paraskevas, Niki Efthymiou, Gerasimos Potamianos, and Petros Maragos. "Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss." In Computer Vision – ECCV 2020 Workshops, 747–55. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-66415-2_52.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Valério, Rodrigo, and João Magalhães. "Learning Semantic-Visual Embeddings with a Priority Queue." In Pattern Recognition and Image Analysis, 67–81. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-36616-1_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Syed, Arsal, and Brendan Tran Morris. "CNN, Segmentation or Semantic Embeddings: Evaluating Scene Context for Trajectory Prediction." In Advances in Visual Computing, 706–17. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-64559-5_56.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Schall, Konstantin, Nico Hezel, Klaus Jung, and Kai Uwe Barthel. "Vibro: Video Browsing with Semantic and Visual Image Embeddings." In MultiMedia Modeling, 665–70. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-27077-2_56.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Yanbei, and Loris Bazzani. "Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval." In Computer Vision – ECCV 2020, 136–52. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58542-6_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Theodoridou, Christina, Andreas Kargakos, Ioannis Kostavelis, Dimitrios Giakoumis, and Dimitrios Tzovaras. "Spatially-Constrained Semantic Segmentation with Topological Maps and Visual Embeddings." In Lecture Notes in Computer Science, 117–29. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87156-7_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Thoma, Steffen, Achim Rettinger, and Fabian Both. "Towards Holistic Concept Representations: Embedding Relational Knowledge, Visual Attributes, and Distributional Word Semantics." In Lecture Notes in Computer Science, 694–710. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68288-4_41.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Visual and semantic embedding"

1

Li, Zheng, Caili Guo, Zerun Feng, Jenq-Neng Hwang, and Xijun Xue. "Multi-View Visual Semantic Embedding." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/158.

Full text
Abstract:
Visual Semantic Embedding (VSE) is a dominant method for cross-modal vision-language retrieval. Its purpose is to learn an embedding space so that visual data can be embedded in a position close to the corresponding text description. However, there are large intra-class variations in the vision-language data. For example, multiple texts describing the same image may be described from different views, and the descriptions of different views are often dissimilar. The mainstream VSE method embeds samples from the same class in similar positions, which will suppress intra-class variations and lead to inferior generalization performance. This paper proposes a Multi-View Visual Semantic Embedding (MV-VSE) framework, which learns multiple embeddings for one visual data and explicitly models intra-class variations. To optimize MV-VSE, a multi-view upper bound loss is proposed, and the multi-view embeddings are jointly optimized while retaining intra-class variations. MV-VSE is plug-and-play and can be applied to various VSE models and loss functions without excessively increasing model complexity. Experimental results on the Flickr30K and MS-COCO datasets demonstrate the superior performance of our framework.
APA, Harvard, Vancouver, ISO, and other styles
2

Ren, Zhou, Hailin Jin, Zhe Lin, Chen Fang, and Alan Yuille. "Multiple Instance Visual-Semantic Embedding." In British Machine Vision Conference 2017. British Machine Vision Association, 2017. http://dx.doi.org/10.5244/c.31.89.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wehrmann, Jônatas, and Rodrigo C. Barros. "Language-Agnostic Visual-Semantic Embeddings." In Concurso de Teses e Dissertações da SBC. Sociedade Brasileira de Computação, 2021. http://dx.doi.org/10.5753/ctd.2021.15751.

Full text
Abstract:
We propose a framework for training language-invariant cross-modal retrieval models. We introduce four novel text encoding approaches, as well as a character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. Moreover, we introduce a module called \adapt, which is responsible for providing query-aware visual representations that generate large improvements in terms of recall for four widely-used large-scale image-text datasets. We show that our models outperform the current state-of-the-art all scenarios. This thesis can serve as a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Binglin, and Yang Wang. "Visual Relationship Detection Using Joint Visual-Semantic Embedding." In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018. http://dx.doi.org/10.1109/icpr.2018.8546097.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ji, Rongrong, Hongxun Yao, Xiaoshuai Sun, Bineng Zhong, and Wen Gao. "Towards semantic embedding in visual vocabulary." In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2010. http://dx.doi.org/10.1109/cvpr.2010.5540118.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hong, Ziming, Shiming Chen, Guo-Sen Xie, Wenhan Yang, Jian Zhao, Yuanjie Shao, Qinmu Peng, and Xinge You. "Semantic Compression Embedding for Generative Zero-Shot Learning." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/134.

Full text
Abstract:
Generative methods have been successfully applied in zero-shot learning (ZSL) by learning an implicit mapping to alleviate the visual-semantic domain gaps and synthesizing unseen samples to handle the data imbalance between seen and unseen classes. However, existing generative methods simply use visual features extracted by the pre-trained CNN backbone. These visual features lack attribute-level semantic information. Consequently, seen classes are indistinguishable, and the knowledge transfer from seen to unseen classes is limited. To tackle this issue, we propose a novel Semantic Compression Embedding Guided Generation (SC-EGG) model, which cascades a semantic compression embedding network (SCEN) and an embedding guided generative network (EGGN). The SCEN extracts a group of attribute-level local features for each sample and further compresses them into the new low-dimension visual feature. Thus, a dense-semantic visual space is obtained. The EGGN learns a mapping from the class-level semantic space to the dense-semantic visual space, thus improving the discriminability of the synthesized dense-semantic unseen visual features. Extensive experiments on three benchmark datasets, i.e., CUB, SUN and AWA2, demonstrate the significant performance gains of SC-EGG over current state-of-the-art methods and its baselines.
APA, Harvard, Vancouver, ISO, and other styles
7

Perez-Martin, Jesus, Jorge Perez, and Benjamin Bustos. "Visual-Syntactic Embedding for Video Captioning." In LatinX in AI at Computer Vision and Pattern Recognition Conference 2021. Journal of LatinX in AI Research, 2021. http://dx.doi.org/10.52591/lxai202106259.

Full text
Abstract:
Video captioning is the task of predicting a semantic and syntactically correct sequence of words given some context video. The most successful methods for video captioning have a strong dependency on the effectiveness of semantic representations learned from visual models, but often produce syntactically incorrect sentences which harms their performance on standard datasets. We address this limitation by considering syntactic representation learning as an essential component of video captioning. We construct a visual-syntactic embedding by mapping into a common vector space a visual representation, that depends only on the video, with a syntactic representation that depends only on Part-of-Speech (POS) tagging structures of the video description. We integrate this joint representation into an encoder-decoder architecture that we call Visual-Semantic-Syntactic Aligned Network (SemSynAN), which guides the decoder (text generation stage) by aligning temporal compositions of visual, semantic, and syntactic representations. We tested our proposed architecture obtaining state-of-the-art results on two widely used video captioning datasets. This is a short version of a paper recently published at a Computer Vision Conference. The complete reference has been redacted to fulfill the double-blind restriction.
APA, Harvard, Vancouver, ISO, and other styles
8

Zeng, Zhixian, Jianjun Cao, Nianfeng Weng, Guoquan Jiang, Yizhuo Rao, and Yuxin Xu. "Softmax Pooling for Super Visual Semantic Embedding." In 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). IEEE, 2021. http://dx.doi.org/10.1109/iemcon53756.2021.9623131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Licheng, Xianzhi Wang, Lina Yao, Lin Wu, and Feng Zheng. "Zero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual Space." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/126.

Full text
Abstract:
Zero-shot object detection (ZSD) has received considerable attention from the community of computer vision in recent years. It aims to simultaneously locate and categorize previously unseen objects during inference. One crucial problem of ZSD is how to accurately predict the label of each object proposal, i.e. categorizing object proposals, when conducting ZSD for unseen categories. Previous ZSD models generally relied on learning an embedding from visual space to semantic space or learning a joint embedding between semantic description and visual representation. As the features in the learned semantic space or the joint projected space tend to suffer from the hubness problem, namely the feature vectors are likely embedded to an area of incorrect labels, and thus it will lead to lower detection precision. In this paper, instead, we propose to learn a deep embedding from the semantic space to the visual space, which enables to well alleviate the hubness problem, because, compared with semantic space or joint embedding space, the distribution in visual space has smaller variance. After learning a deep embedding model, we perform $k$ nearest neighbor search in the visual space of unseen categories to determine the category of each semantic description. Extensive experiments on two public datasets show that our approach significantly outperforms the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
10

Song, Yale, and Mohammad Soleymani. "Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019. http://dx.doi.org/10.1109/cvpr.2019.00208.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Visual and semantic embedding"

1

Kud, A. A. Figures and Tables. Reprinted from “Comprehensive сlassification of virtual assets”, A. A. Kud, 2021, International Journal of Education and Science, 4(1), 52–75. KRPOCH, 2021. http://dx.doi.org/10.26697/reprint.ijes.2021.1.6.a.kud.

Full text
Abstract:
Figure. Distributed Ledger Token Accounting System. Figure. Subjects of Social Relations Based on the Decentralized Information Platform. Figure. Derivativeness of a Digital Asset. Figure. Semantic Features of the Concept of a “Digital Asset” in Economic and Legal Aspects. Figure. Derivativeness of Polyassets and Monoassets. Figure. Types of Tokenized Assets Derived from Property. Figure. Visual Representation of the Methods of Financial and Management Accounting of Property Using Various Types of Tokenized Assets. Figure. Visual Representation of the Classification of Virtual Assets Based on the Complexity of Their Nature. Table. Comparison of Properties of Various Types of Virtual Assets of the Distributed Ledger Derivative of the Original Asset. Table. Main Properties and Parameters of Types of Tokenized Assets. Table. Classification of Virtual Assets as Tools for Implementing the Methods of Financial and Management Accounting of Property.
APA, Harvard, Vancouver, ISO, and other styles
2

Tabinskyy, Yaroslav. VISUAL CONCEPTS OF PHOTO IN THE MEDIA (ON THE EXAMPLE OF «UKRAINER» AND «REPORTERS»). Ivan Franko National University of Lviv, March 2021. http://dx.doi.org/10.30970/vjo.2021.50.11099.

Full text
Abstract:
The article is devoted to the analysis of the main forms of visualization in the media related to photo. The thematic visual concepts are described in accordance with the content of electronic media, which consider the impact of modern technologies on the development of media space. The researches of the Ukrainian and foreign educational institutions concerning the main features of modern photo is classificate. Modifications and new visual forms in the media are singled out. The main objective of the article is to study the visual concepts of modern photo and identify ideological and thematic priorities in photo projects. To achieve the main objective in the article a certain methodology were used. Due to the historical-theoretical description it was possible to substantiate the study of visual concepts. The conceptual-system method was used to study the subject of media photo projects. The main results of the research are the definition of visual concepts of photo on the example of electronic media and the identification of the main thematic features in the process of visual filling of the media space. Based on the study, we can conclude that today the information field needs quality visual content. For successful creation of visual concepts it is necessary to single out thematic features of modern photo and to carry out classifications on ideological and semantic signs. Given the rapid development of digital technologies, the topic of the scientific article we offer is relevant for scientists, journalists, media researchers, visual journalism experts and photojournalists. Modern space is filled with a large number of pictorial materials, which in most cases form specific images, patterns or stereotypes in the mind of the reader (viewer). Also important is the classification of photo used in journalistic publications. That is why there is a need to explore the content and principles of distribution of ideological priorities of photo in the media. The substantiation of scientists about the important place of photography in the modern media space and the future development of visual technologies, which already use artificial intelligence, is relevant.
APA, Harvard, Vancouver, ISO, and other styles
3

Mbani, Benson, Timm Schoening, and Jens Greinert. Automated and Integrated Seafloor Classification Workflow (AI-SCW). GEOMAR, May 2023. http://dx.doi.org/10.3289/sw_2_2023.

Full text
Abstract:
The Automated and Integrated Seafloor Classification Workflow (AI-SCW) is a semi-automated underwater image processing pipeline that has been customized for use in classifying the seafloor into semantic habitat categories. The current implementation has been tested against a sequence of underwater images collected by the Ocean Floor Observation System (OFOS), in the Clarion-Clipperton Zone of the Pacific Ocean. Despite this, the workflow could also be applied to images acquired by other platforms such as an Autonomous Underwater Vehicle (AUV), or Remotely Operated Vehicle (ROV). The modules in AI-SCW have been implemented using the python programming language, specifically using libraries such as scikit-image for image processing, scikit-learn for machine learning and dimensionality reduction, keras for computer vision with deep learning, and matplotlib for generating visualizations. Therefore, AI-SCW modularized implementation allows users to accomplish a variety of underwater computer vision tasks, which include: detecting laser points from the underwater images for use in scale determination; performing contrast enhancement and color normalization to improve the visual quality of the images; semi-automated generation of annotations to be used downstream during supervised classification; training a convolutional neural network (Inception v3) using the generated annotations to semantically classify each image into one of pre-defined seafloor habitat categories; evaluating sampling strategies for generation of balanced training images to be used for fitting an unsupervised k-means classifier; and visualization of classification results in both feature space view and in map view geospatial co-ordinates. Thus, the workflow is useful for a quick but objective generation of image-based seafloor habitat maps to support monitoring of remote benthic ecosystems.
APA, Harvard, Vancouver, ISO, and other styles
4

Yatsymirska, Mariya. SOCIAL EXPRESSION IN MULTIMEDIA TEXTS. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11072.

Full text
Abstract:
The article investigates functional techniques of extralinguistic expression in multimedia texts; the effectiveness of figurative expressions as a reaction to modern events in Ukraine and their influence on the formation of public opinion is shown. Publications of journalists, broadcasts of media resonators, experts, public figures, politicians, readers are analyzed. The language of the media plays a key role in shaping the worldview of the young political elite in the first place. The essence of each statement is a focused thought that reacts to events in the world or in one’s own country. The most popular platform for mass information and social interaction is, first of all, network journalism, which is characterized by mobility and unlimited time and space. Authors have complete freedom to express their views in direct language, including their own word formation. Phonetic, lexical, phraseological and stylistic means of speech create expression of the text. A figurative word, a good aphorism or proverb, a paraphrased expression, etc. enhance the effectiveness of a multimedia text. This is especially important for headlines that simultaneously inform and influence the views of millions of readers. Given the wide range of issues raised by the Internet as a medium, research in this area is interdisciplinary. The science of information, combining language and social communication, is at the forefront of global interactions. The Internet is an effective source of knowledge and a forum for free thought. Nonlinear texts (hypertexts) – «branching texts or texts that perform actions on request», multimedia texts change the principles of information collection, storage and dissemination, involving billions of readers in the discussion of global issues. Mastering the word is not an easy task if the author of the publication is not well-read, is not deep in the topic, does not know the psychology of the audience for which he writes. Therefore, the study of media broadcasting is an important component of the professional training of future journalists. The functions of the language of the media require the authors to make the right statements and convincing arguments in the text. Journalism education is not only knowledge of imperative and dispositive norms, but also apodictic ones. In practice, this means that there are rules in media creativity that are based on logical necessity. Apodicticity is the first sign of impressive language on the platform of print or electronic media. Social expression is a combination of creative abilities and linguistic competencies that a journalist realizes in his activity. Creative self-expression is realized in a set of many important factors in the media: the choice of topic, convincing arguments, logical presentation of ideas and deep philological education. Linguistic art, in contrast to painting, music, sculpture, accumulates all visual, auditory, tactile and empathic sensations in a universal sign – the word. The choice of the word for the reproduction of sensory and semantic meanings, its competent use in the appropriate context distinguishes the journalist-intellectual from other participants in forums, round tables, analytical or entertainment programs. Expressive speech in the media is a product of the intellect (ability to think) of all those who write on socio-political or economic topics. In the same plane with him – intelligence (awareness, prudence), the first sign of which (according to Ivan Ogienko) is a good knowledge of the language. Intellectual language is an important means of organizing a journalistic text. It, on the one hand, logically conveys the author’s thoughts, and on the other – encourages the reader to reflect and comprehend what is read. The richness of language is accumulated through continuous self-education and interesting communication. Studies of social expression as an important factor influencing the formation of public consciousness should open up new facets of rational and emotional media broadcasting; to trace physical and psychological reactions to communicative mimicry in the media. Speech mimicry as one of the methods of disguise is increasingly becoming a dangerous factor in manipulating the media. Mimicry is an unprincipled adaptation to the surrounding social conditions; one of the most famous examples of an animal characterized by mimicry (change of protective color and shape) is a chameleon. In a figurative sense, chameleons are called adaptive journalists. Observations show that mimicry in politics is to some extent a kind of game that, like every game, is always conditional and artificial.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography