Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Hierarchical representations of images.

Zeitschriftenartikel zum Thema „Hierarchical representations of images“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Hierarchical representations of images" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Abdelhack, Mohamed, und Yukiyasu Kamitani. „Sharpening of Hierarchical Visual Feature Representations of Blurred Images“. eneuro 5, Nr. 3 (Mai 2018): ENEURO.0443–17.2018. http://dx.doi.org/10.1523/eneuro.0443-17.2018.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Gao, Hongchao, Yujia Li, Jiao Dai, Xi Wang, Jizhong Han und Ruixuan Li. „Multi-granularity Deep Local Representations for Irregular Scene Text Recognition“. ACM/IMS Transactions on Data Science 2, Nr. 2 (02.04.2021): 1–18. http://dx.doi.org/10.1145/3446971.

Der volle Inhalt der Quelle
Annotation:
Recognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Ramos Lima, Gustavo, Thiago Oliveira Santos, Patrick Marques Ciarelli und Filipe Mutz. „Comparação de Técnicas para Representação Vetorial de Imagens com Redes Neurais para Aplicações de Recuperação de Produtos do Varejo“. Anais do Computer on the Beach 14 (03.05.2023): 355–62. http://dx.doi.org/10.14210/cotb.v14.p355-362.

Der volle Inhalt der Quelle
Annotation:
ABSTRACTProduct retrieval from images has multiple applications rangingfrom providing information and recommentations for customersin supermarkets to automatic invoice generation in smart stores.However, this task present important challenges such as the largenumber of products, the scarcity of images of items, differencesbetween real and iconic images of the products, and the constantchanges in the portfolio due to the addition or removal of products.Hence, this work investigates ways of generating vector representationsof images using deep neural networks such that theserepresentations can be used for product retrieval even in face ofthese challenges. Experimental analysis evaluated the effect thatnetwork architecture, data augmentation techniques and objectivefunctions used during training have on representation quality. Thebest configuration was achieved by fine-tuning a VGG-16 modelin the task of classifying products using a mix of Randaugmentand Augmix data augmentations and a hierarchical triplet loss as aregularization function. The representations built using this modelled to a top-1 accuracy of 80,38% and top-5 accuracy of 92.62% inthe Grocery Products dataset.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Ferreira, João Elias Vidueira, und Gwendolyn Angela Lawrie. „Profiling the combinations of multiple representations used in large-class teaching: pathways to inclusive practices“. Chemistry Education Research and Practice 20, Nr. 4 (2019): 902–23. http://dx.doi.org/10.1039/c9rp00001a.

Der volle Inhalt der Quelle
Annotation:
Teachers select multiple representations and adopt multiple visualization approaches in supporting their students to make meaning of chemical phenomena. Representational competence underpins students’ construction of their mental models of concepts thus it is important that teachers consider this while developing instructional resources. In tertiary chemistry, teachers typically use PowerPoint slides to guide lectures. This instructional resource is transferred between different teachers each semester and, while the sequence of topics are likely to be discussed and agreed upon, the content of the slides can evolve organically in this shared resource over time. The aim of this study was to analyse a teacher-generated resource in the form of a consensus set of course slides to characterise the combination and diversity in representations that students had encountered. This study was set in a unique context since the semester's lecture slides represented a distillation of consensus representations used by multiple chemistry lecturers for at least a decade. The representations included: those created by the lecturers; textbook images (from several texts); photographs and images sourced from the internet. Individual representations in each PowerPoint slide were coded in terms of the level of representation, mode and potential function in supporting deeper understanding of chemistry concepts. Three representational organizing frameworks (functional taxonomy of multiple representations, modes of representation and the chemistry triplet levels of thinking) were integrated to categorise the representations. This qualitative data was subjected to hierarchical cluster analysis and several relationships between the categories and topics taught were identified. Additional qualitative data in the form of student reflections on the perceived utility of specific representations were collected at the end of the semester. The findings from this study inform the design and choice of instructional resources for general chemistry particularly in combining representations to support deeper learning of concepts. A broader goal and application of the findings of this study is to identify opportunities for translation of representations into alternative modalities to widen access and participation in learning chemistry for all students. An example of a strategy for translating representations into tactile modes for teaching the topic of phase change is shared.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Liu, Hao, Bin Wang, Zhimin Bao, Mobai Xue, Sheng Kang, Deqiang Jiang, Yinsong Liu und Bo Ren. „Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 2 (28.06.2022): 1702–10. http://dx.doi.org/10.1609/aaai.v36i2.20062.

Der volle Inhalt der Quelle
Annotation:
We introduce Perceiving Stroke-Semantic Context (PerSec), a new approach to self-supervised representation learning tailored for Scene Text Recognition (STR) task. Considering scene text images carry both visual and semantic properties, we equip our PerSec with dual context perceivers which can contrast and learn latent representations from low-level stroke and high-level semantic contextual spaces simultaneously via hierarchical contrastive learning on unlabeled text image data. Experiments in un- and semi-supervised learning settings on STR benchmarks demonstrate our proposed framework can yield a more robust representation for both CTC-based and attention-based decoders than other contrastive learning methods. To fully investigate the potential of our method, we also collect a dataset of 100 million unlabeled text images, named UTI-100M, covering 5 scenes and 4 languages. By leveraging hundred-million-level unlabeled data, our PerSec shows significant performance improvement when fine-tuning the learned representation on the labeled data. Furthermore, we observe that the representation learned by PerSec presents great generalization, especially under few labeled data scenes.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Gazagnes, Simon, und Michael H. F. Wilkinson. „Distributed Component Forests in 2-D: Hierarchical Image Representations Suitable for Tera-Scale Images“. International Journal of Pattern Recognition and Artificial Intelligence 33, Nr. 11 (Oktober 2019): 1940012. http://dx.doi.org/10.1142/s0218001419400123.

Der volle Inhalt der Quelle
Annotation:
The standard representations known as component trees, used in morphological connected attribute filtering and multi-scale analysis, are unsuitable for cases in which either the image itself or the tree do not fit in the memory of a single compute node. Recently, a new structure has been developed which consists of a collection of modified component trees, one for each image tile. It has to-date only been applied to fairly simple image filtering based on area. In this paper, we explore other applications of these distributed component forests, in particular to multi-scale analysis such as pattern spectra, and morphological attribute profiles and multi-scale leveling segmentations.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

PAJAROLA, RENATO, MIGUEL SAINZ und YU MENG. „DMESH: FAST DEPTH-IMAGE MESHING AND WARPING“. International Journal of Image and Graphics 04, Nr. 04 (Oktober 2004): 653–81. http://dx.doi.org/10.1142/s0219467804001580.

Der volle Inhalt der Quelle
Annotation:
In this paper we present a novel and efficient depth-image representation and warping technique called DMesh which is based on a piece-wise linear approximation of the depth-image as a textured and simplified triangle mesh. We describe the application of a hierarchical multiresolution triangulation method to generate adaptively triangulated depth-meshes efficiently from reference depth-images, discuss depth-mesh segmentation methods to avoid occlusion artifacts and propose a new hardware accelerated depth-image rendering technique that supports per-pixel weighted blending of multiple depth-images in real-time. Applications of our technique include image-based object representations and the use of depth-images in large scale walk-through visualization systems.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Bai, Jie, Huiyan Jiang, Siqi Li und Xiaoqi Ma. „NHL Pathological Image Classification Based on Hierarchical Local Information and GoogLeNet-Based Representations“. BioMed Research International 2019 (21.03.2019): 1–13. http://dx.doi.org/10.1155/2019/1065652.

Der volle Inhalt der Quelle
Annotation:
Background. Accurate classification for different non-Hodgkin lymphomas (NHL) is one of the main challenges in clinical pathological diagnosis due to its intrinsic complexity. Therefore, this paper proposes an effective classification model for three types of NHL pathological images, including mantle cell lymphoma (MCL), follicular lymphoma (FL), and chronic lymphocytic leukemia (CLL). Methods. There are three main parts with respect to our model. First, NHL pathological images stained by hematoxylin and eosin (H&E) are transferred into blue ratio (BR) and Lab spaces, respectively. Then specific patch-level textural and statistical features are extracted from BR images and color features are obtained from Lab images both using a hierarchical way, yielding a set of hand-crafted representations corresponding to different image spaces. A random forest classifier is subsequently trained for patch-level classification. Second, H&E images are cropped and fed into a pretrained google inception net (GoogLeNet) for learning high-level representations and a softmax classifier is used for patch-level classification. Finally, three image-level classification strategies based on patch-level results are discussed including a novel method for calculating the weighted sum of patch results. Different classification results are fused at both feature 1 and image levels to obtain a more satisfactory result. Results. The proposed model is evaluated on a public IICBU Malignant Lymphoma Dataset and achieves an improved overall accuracy of 0.991 and area under the receiver operating characteristic curve of 0.998. Conclusion. The experimentations demonstrate the significantly increased classification performance of the proposed model, indicating that it is a suitable classification approach for NHL pathological images.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Pham, Hai X., Ricardo Guerrero, Vladimir Pavlovic und Jiatong Li. „CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 3 (18.05.2021): 2423–30. http://dx.doi.org/10.1609/aaai.v35i3.16343.

Der volle Inhalt der Quelle
Annotation:
Despite the abundance of multi-modal data, such as image-text pairs, there has been little effort in understanding the individual entities and their different roles in the construction of these data instances. In this work, we endeavour to discover the entities and their corresponding importance in cooking recipes automatically as a visual-linguistic association problem. More specifically, we introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks. This model allows one to discover complex functional and hierarchical relationships between images and text, and among textual parts of a recipe including title, ingredients and cooking instructions. Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are not only able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision, but we can also learn more meaningful feature representations of food recipes, appropriate for challenging cross-modal retrieval and recipe adaption tasks.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Qiu, Zexuan, Jiahong Liu, Yankai Chen und Irwin King. „HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 5 (24.03.2024): 4614–22. http://dx.doi.org/10.1609/aaai.v38i5.28261.

Der volle Inhalt der Quelle
Annotation:
Existing unsupervised deep product quantization methods primarily aim for the increased similarity between different views of the identical image, whereas the delicate multi-level semantic similarities preserved between images are overlooked. Moreover, these methods predominantly focus on the Euclidean space for computational convenience, compromising their ability to map the multi-level semantic relationships between images effectively. To mitigate these shortcomings, we propose a novel unsupervised product quantization method dubbed Hierarchical Hyperbolic Product Quantization (HiHPQ), which learns quantized representations by incorporating hierarchical semantic similarity within hyperbolic geometry. Specifically, we propose a hyperbolic product quantizer, where the hyperbolic codebook attention mechanism and the quantized contrastive learning on the hyperbolic product manifold are introduced to expedite quantization. Furthermore, we propose a hierarchical semantics learning module, designed to enhance the distinction between similar and non-matching images for a query by utilizing the extracted hierarchical semantics as an additional training supervision. Experiments on benchmark image datasets show that our proposed method outperforms state-of-the-art baselines.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
11

Ye, Jian, Jiangqun Ni und Yang Yi. „Deep Learning Hierarchical Representations for Image Steganalysis“. IEEE Transactions on Information Forensics and Security 12, Nr. 11 (November 2017): 2545–57. http://dx.doi.org/10.1109/tifs.2017.2710946.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
12

Chen, Wei-Bang, und Chengcui Zhang. „An Innovative Multiple-Object Image Retrieval Framework Using Hierarchical Region Tree“. International Journal of Multimedia Data Engineering and Management 4, Nr. 3 (Juli 2013): 1–23. http://dx.doi.org/10.4018/jmdem.2013070101.

Der volle Inhalt der Quelle
Annotation:
Inaccurate image segmentation often has a negative impact on object-based image retrieval. Researchers have attempted to alleviate this problem by using hierarchical image representation. However, these attempts suffer from the inefficiency in building the hierarchical image representation and the high computational complexity in matching two hierarchically represented images. This paper presents an innovative multiple-object retrieval framework named Multiple-Object Image Retrieval (MOIR) on the basis of hierarchical image representation. This framework concurrently performs image segmentation and hierarchical tree construction, producing a hierarchical region tree to represent the image. In addition, an efficient hierarchical region tree matching algorithm is designed for multiple-object retrieval with a reasonably low time complexity. The experimental results demonstrate the efficacy and efficiency of the proposed approach.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
13

Mylona, Eleftheria, Vassiliki Daskalopoulou, Olga Sykioti, Konstantinos Koutroumbas und Athanasios Rontogiannis. „Classification of Sentinel-2 Images Utilizing Abundance Representation“. Proceedings 2, Nr. 7 (22.03.2018): 328. http://dx.doi.org/10.3390/ecrs-2-05141.

Der volle Inhalt der Quelle
Annotation:
This paper deals with (both supervised and unsupervised) classification of multispectral Sentinel-2 images, utilizing the abundance representation of the pixels of interest. The latter pixel representation uncovers the hidden structured regions that are not often available in the reference maps. Additionally, it encourages class distinctions and bolsters accuracy. The adopted methodology, which has been successfully applied to hyperpsectral data, involves two main stages: (I) the determination of the pixel’s abundance representation; and (II) the employment of a classification algorithm applied to the abundance representations. More specifically, stage (I) incorporates two key processes, namely (a) endmember extraction, utilizing spectrally homogeneous regions of interest (ROIs); and (b) spectral unmixing, which hinges upon the endmember selection. The adopted spectral unmixing process assumes the linear mixing model (LMM), where each pixel is expressed as a linear combination of the endmembers. The pixel’s abundance vector is estimated via a variational Bayes algorithm that is based on a suitably defined hierarchical Bayesian model. The resulting abundance vectors are then fed to stage (II), where two off-the-shelf supervised classification approaches (namely nearest neighbor (NN) classification and support vector machines (SVM)), as well as an unsupervised classification process (namely the online adaptive possibilistic c-means (OAPCM) clustering algorithm), are adopted. Experiments are performed on a Sentinel-2 image acquired for a specific region of the Northern Pindos National Park in north-western Greece containing water, vegetation and bare soil areas. The experimental results demonstrate that the ad-hoc classification approaches utilizing abundance representations of the pixels outperform those utilizing the spectral signatures of the pixels in terms of accuracy.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
14

Zhang, Yinghua, und Wei Hou. „Vision Transformer with hierarchical structure and windows shifting for person re-identification“. PLOS ONE 18, Nr. 6 (30.06.2023): e0287979. http://dx.doi.org/10.1371/journal.pone.0287979.

Der volle Inhalt der Quelle
Annotation:
Extracting rich feature representations is a key challenge in person re-identification (Re-ID) tasks. However, traditional Convolutional Neural Networks (CNN) based methods could ignore a part of information when processing local regions of person images, which leads to incomplete feature extraction. To this end, this paper proposes a person Re-ID method based on vision Transformer with hierarchical structure and window shifting. When extracting person image features, the hierarchical Transformer model is constructed by introducing the hierarchical construction method commonly used in CNN. Then, considering the importance of local information of person images for complete feature extraction, the self-attention calculation is performed by shifting within the window region. Finally, experiments on three standard datasets demonstrate the effectiveness and superiority of the proposed method.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
15

Liu, Yuting, Hongyu Yang und Qijun Zhao. „Hierarchical Feature Aggregation from Body Parts for Misalignment Robust Person Re-Identification“. Applied Sciences 9, Nr. 11 (31.05.2019): 2255. http://dx.doi.org/10.3390/app9112255.

Der volle Inhalt der Quelle
Annotation:
In this work, we focus on the misalignment problem in person re-identification. Human body parts commonly contain discriminative local representations relevant with identity recognition. However, the representations are easily affected by misalignment that is due to varying poses or poorly detected bounding boxes. We thus present a two-branch Deep Joint Learning (DJL) network, where the local branch generates misalignment robust representations by pooling the features around the body parts, while the global branch generates representations from a holistic view. A Hierarchical Feature Aggregation mechanism is proposed to aggregate different levels of visual patterns within body part regions. Instead of aggregating each pooled body part features from multi-layers with equal weight, we assign each with the learned optimal weight. This strategy also mitigates the scale differences among multi-layers. By optimizing the global and local features jointly, the DJL network further enhances the discriminative capability of the learned hybrid feature. Experimental results on Market-1501 and CUHK03 datasets show that our method could effectively handle the misalignment induced intra-class variations and yield competitive accuracy particularly on poorly aligned pedestrian images.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
16

Wu, Hanbo, Xin Ma und Yibin Li. „Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks“. International Journal of Advanced Robotic Systems 16, Nr. 1 (01.01.2019): 172988141882509. http://dx.doi.org/10.1177/1729881418825093.

Der volle Inhalt der Quelle
Annotation:
Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
17

Chen, Yuhao, Alexander Wong, Yuan Fang, Yifan Wu und Linlin Xu. „Deep Residual Transform for Multi-scale Image Decomposition“. Journal of Computational Vision and Imaging Systems 6, Nr. 1 (15.01.2021): 1–5. http://dx.doi.org/10.15353/jcvis.v6i1.3537.

Der volle Inhalt der Quelle
Annotation:
Multi-scale image decomposition (MID) is a fundamental task in computer vision and image processing that involves the transformation of an image into a hierarchical representation comprising of different levels of visual granularity from coarse structures to fine details. A well-engineered MID disentangles the image signal into meaningful components which can be used in a variety of applications such as image denoising, image compression, and object classification. Traditional MID approaches such as wavelet transforms tackle the problem through carefully designed basis functions under rigid decomposition structure assumptions. However, as the information distribution varies from one type of image content to another, rigid decomposition assumptions lead to inefficiently representation, i.e., some scales can contain little to no information. To address this issue, we present Deep Residual Transform (DRT), a data-driven MID strategy where the input signal is transformed into a hierarchy of non-linear representations at different scales, with each representation being independently learned as the representational residual of previous scales at a user-controlled detail level. As such, the proposed DRT progressively disentangles scale information from the original signal by sequentially learning residual representations. The decomposition flexibility of this approach allows for highly tailored representations cater to specific types of image content, and results in greater representational efficiency and compactness. In this study, we realize the proposed transform by leveraging a hierarchy of sequentially trained autoencoders. To explore the efficacy of the proposed DRT, we leverage two datasets comprising of very different types of image content: 1) CelebFaces and 2) Cityscapes. Experimental results show that the proposed DRT achieved highly efficient information decomposition on both datasets amid their very different visual granularity characteristics.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
18

Xiao, Degui, Qilei Chen und Shanshan Li. „A Multi-Scale Cascaded Hierarchical Model for Image Labeling“. International Journal of Pattern Recognition and Artificial Intelligence 30, Nr. 09 (November 2016): 1660005. http://dx.doi.org/10.1142/s0218001416600053.

Der volle Inhalt der Quelle
Annotation:
Image labeling is an important and challenging task in the area of graphics and visual computing, where datasets with high quality labeling are critically needed. In this paper, based on the commonly accepted observation that the same semantic object in images with different resolutions may have different representations, we propose a novel multi-scale cascaded hierarchical model (MCHM) to enhance general image labeling methods. Our proposed approach first creates multi-resolution images from the original one to form an image pyramid and labels each image at different scale individually. Next, it constructs a cascaded hierarchical model and a feedback circle between image pyramid and labeling methods. The original image labeling result is used to adjust labeling parameters of those scaled images. Labeling results from the scaled images are then fed back to enhance the original image labeling results. These naturally form a global optimization problem under scale-space condition. We further propose a desirable iterative algorithm in order to run the model. The global convergence of the algorithm is proven through iterative approximation with latent optimization constraints. We have conducted extensive experiments with five widely used labeling methods on five popular image datasets. Experimental results indicate that MCHM improves labeling accuracy of the state-of-the-art image labeling approaches impressively.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
19

Liu, Xueping, Yibo Li und Qingjun Wang. „Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition“. International Journal of Pattern Recognition and Artificial Intelligence 32, Nr. 10 (20.06.2018): 1850033. http://dx.doi.org/10.1142/s0218001418500337.

Der volle Inhalt der Quelle
Annotation:
Human action recognition based on depth video sequence is an important research direction in the field of computer vision. The present study proposed a classification framework based on hierarchical multi-view to resolve depth video sequence-based action recognition. Herein, considering the distinguishing feature of 3D human action space, we project the 3D human action image to three coordinate planes, so that the 3D depth image is converted to three 2D images, and then feed them to three subnets, respectively. With the increase of the number of layers, the representations of subnets are hierarchically fused to be the inputs of next layers. The final representations of the depth video sequence are fed into a single layer perceptron, and the final result is decided by the time accumulated through the output of the perceptron. We compare with other methods on two publicly available datasets, and we also verify the proposed method through the human action database acquired by our Kinect system. Our experimental results demonstrate that our model has high computational efficiency and achieves the performance of state-of-the-art method.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
20

Fu, Qian, Linlin Liu, Fei Hou und Ying He. „Hierarchical vectorization for facial images“. Computational Visual Media 10, Nr. 1 (Februar 2023): 97–118. http://dx.doi.org/10.1007/s41095-022-0314-4.

Der volle Inhalt der Quelle
Annotation:
AbstractThe explosive growth of social media means portrait editing and retouching are in high demand. While portraits are commonly captured and stored as raster images, editing raster images is non-trivial and requires the user to be highly skilled. Aiming at developing intuitive and easy-to-use portrait editing tools, we propose a novel vectorization method that can automatically convert raster images into a 3-tier hierarchical representation. The base layer consists of a set of sparse diffusion curves (DCs) which characterize salient geometric features and low-frequency colors, providing a means for semantic color transfer and facial expression editing. The middle level encodes specular highlights and shadows as large, editable Poisson regions (PRs) and allows the user to directly adjust illumination by tuning the strength and changing the shapes of PRs. The top level contains two types of pixel-sized PRs for high-frequency residuals and fine details such as pimples and pigmentation. We train a deep generative model that can produce high-frequency residuals automatically. Thanks to the inherent meaning in vector primitives, editing portraits becomes easy and intuitive. In particular, our method supports color transfer, facial expression editing, highlight and shadow editing, and automatic retouching. To quantitatively evaluate the results, we extend the commonly used FLIP metric (which measures color and feature differences between two images) to consider illumination. The new metric, illumination-sensitive FLIP, can effectively capture salient changes in color transfer results, and is more consistent with human perception than FLIP and other quality measures for portrait images. We evaluate our method on the FFHQR dataset and show it to be effective for common portrait editing tasks, such as retouching, light editing, color transfer, and expression editing.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
21

LEE, CHANG WOO, HYUN KANG, HANG JOON KIM und KEECHUL JUNG. „FONT CLASSIFICATION USING NMF WITH HIERARCHICAL CLUSTERING“. International Journal of Pattern Recognition and Artificial Intelligence 19, Nr. 06 (September 2005): 755–73. http://dx.doi.org/10.1142/s0218001405004307.

Der volle Inhalt der Quelle
Annotation:
The current paper proposes a font classification method for document images that uses non-negative matrix factorization (NMF), that is able to learn part-based representations of objects. The basic idea of the proposed method is based on the fact that the characteristics of each font are derived from parts of individual characters in each font rather than holistic textures. Spatial localities, parts composing of font images, are automatically extracted using NMF and, then, used as features representing each font. Using hierarchical clustering algorithm, these feature sets are generalized for font classification, resulting in the prototype templates construction. In both the prototype construction and font classification, earth mover's distance (EMD) is used as the distance metric, which is more suitable for the NMF feature space than Cosine or Euclidean distance. In the experimental results, the distribution of features and the appropriateness of the features specifying each font are investigated, and the results are compared with a related algorithm: principal component analysis (PCA). The proposed method is expected to improve the performance of optical character recognition (OCR), document indexing and retrieval systems, when such systems adopt a font classifier as a preprocessor.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
22

Jiang, Yuming, Shuai Yang, Haonan Qju, Wayne Wu, Chen Change Loy und Ziwei Liu. „Text2Human“. ACM Transactions on Graphics 41, Nr. 4 (Juli 2022): 1–11. http://dx.doi.org/10.1145/3528223.3530104.

Der volle Inhalt der Quelle
Annotation:
Generating high-quality and diverse human images is an important yet challenging task in vision and graphics. However, existing generative models often fall short under the high diversity of clothing shapes and textures. Furthermore, the generation process is even desired to be intuitively controllable for layman users. In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation. We synthesize full-body human images starting from a given human pose with two dedicated steps. 1) With some texts describing the shapes of clothes, the given human pose is first translated to a human parsing map. 2) The final human image is then generated by providing the system with more attributes about the textures of clothes. Specifically, to model the diversity of clothing textures, we build a hierarchical texture-aware codebook that stores multi-scale neural representations for each type of texture. The codebook at the coarse level includes the structural representations of textures, while the codebook at the fine level focuses on the details of textures. To make use of the learned hierarchical codebook to synthesize desired images, a diffusion-based transformer sampler with mixture of experts is firstly employed to sample indices from the coarsest level of the codebook, which then is used to predict the indices of the codebook at finer levels. The predicted indices at different levels are translated to human images by the decoder learned accompanied with hierarchical codebooks. The use of mixture-of-experts allows for the generated image conditioned on the fine-grained text input. The prediction for finer level indices refines the quality of clothing textures. Extensive quantitative and qualitative evaluations demonstrate that our proposed Text2Human framework can generate more diverse and realistic human images compared to state-of-the-art methods. Our project page is https://yumingj.github.io/projects/Text2Human.html. Code and pretrained models are available at https://github.com/yumingj/Text2Human.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
23

Zhou, Xiaomao, Tao Bai, Yanbin Gao und Yuntao Han. „Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning“. Sensors 19, Nr. 7 (01.04.2019): 1576. http://dx.doi.org/10.3390/s19071576.

Der volle Inhalt der Quelle
Annotation:
Extensive studies have shown that many animals’ capability of forming spatial representations for self-localization, path planning, and navigation relies on the functionalities of place and head-direction (HD) cells in the hippocampus. Although there are numerous hippocampal modeling approaches, only a few span the wide functionalities ranging from processing raw sensory signals to planning and action generation. This paper presents a vision-based navigation system that involves generating place and HD cells through learning from visual images, building topological maps based on learned cell representations and performing navigation using hierarchical reinforcement learning. First, place and HD cells are trained from sequences of visual stimuli in an unsupervised learning fashion. A modified Slow Feature Analysis (SFA) algorithm is proposed to learn different cell types in an intentional way by restricting their learning to separate phases of the spatial exploration. Then, to extract the encoded metric information from these unsupervised learning representations, a self-organized learning algorithm is adopted to learn over the emerged cell activities and to generate topological maps that reveal the topology of the environment and information about a robot’s head direction, respectively. This enables the robot to perform self-localization and orientation detection based on the generated maps. Finally, goal-directed navigation is performed using reinforcement learning in continuous state spaces which are represented by the population activities of place cells. In particular, considering that the topological map provides a natural hierarchical representation of the environment, hierarchical reinforcement learning (HRL) is used to exploit this hierarchy to accelerate learning. The HRL works on different spatial scales, where a high-level policy learns to select subgoals and a low-level policy learns over primitive actions to specialize on the selected subgoals. Experimental results demonstrate that our system is able to navigate a robot to the desired position effectively, and the HRL shows a much better learning performance than the standard RL in solving our navigation tasks.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
24

Zhou, Yuxiao, Menglei Chai, Alessandro Pepe, Markus Gross und Thabo Beeler. „GroomGen: A High-Quality Generative Hair Model Using Hierarchical Latent Representations“. ACM Transactions on Graphics 42, Nr. 6 (05.12.2023): 1–16. http://dx.doi.org/10.1145/3618309.

Der volle Inhalt der Quelle
Annotation:
Despite recent successes in hair acquisition that fits a high-dimensional hair model to a specific input subject, generative hair models, which establish general embedding spaces for encoding, editing, and sampling diverse hairstyles, are way less explored. In this paper, we present GroomGen , the first generative model designed for hair geometry composed of highly-detailed dense strands. Our approach is motivated by two key ideas. First, we construct hair latent spaces covering both individual strands and hairstyles. The latent spaces are compact, expressive, and well-constrained for high-quality and diverse sampling. Second, we adopt a hierarchical hair representation that parameterizes a complete hair model to three levels: single strands, sparse guide hairs, and complete dense hairs. This representation is critical to the compactness of latent spaces, the robustness of training, and the efficiency of inference. Based on this hierarchical latent representation, our proposed pipeline consists of a strand-VAE and a hairstyle-VAE that encode an individual strand and a set of guide hairs to their respective latent spaces, and a hybrid densification step that populates sparse guide hairs to a dense hair model. GroomGen not only enables novel hairstyle sampling and plausible hairstyle interpolation, but also supports interactive editing of complex hairstyles, or can serve as strong data-driven prior for hairstyle reconstruction from images. We demonstrate the superiority of our approach with qualitative examples of diverse sampled hairstyles and quantitative evaluation of generation quality regarding every single component and the entire pipeline.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
25

Son, Dong-Min, und Sung-Hak Lee. „Enhancing Surveillance Vision with Multi-Layer Deep Learning Representation“. Mathematics 12, Nr. 9 (25.04.2024): 1313. http://dx.doi.org/10.3390/math12091313.

Der volle Inhalt der Quelle
Annotation:
This paper aimed to develop a method for generating sand–dust removal and dehazed images utilizing CycleGAN, facilitating object identification on roads under adverse weather conditions such as heavy dust or haze, which severely impair visibility. Initially, the study addressed the scarcity of paired image sets for training by employing unpaired CycleGAN training. The CycleGAN training module incorporates hierarchical single-scale Retinex (SSR) images with varying sigma sizes, facilitating multiple-scaled trainings. Refining the training data into detailed hierarchical layers for virtual paired training enhances the performance of CycleGAN training. Conventional sand–dust removal or dehazing algorithms, alongside deep learning methods, encounter challenges in simultaneously addressing sand–dust removal and dehazing with a singular algorithm. Such algorithms often necessitate resetting hyperparameters to process images from both scenarios. To overcome this limitation, we proposed a unified approach for removing sand–dust and haze phenomena using a single model, leveraging images processed hierarchically with SSR. The image quality and image sharpness metrics of the proposed method were BRIQUE, PIQE, CEIQ, MCMA, LPC-SI, and S3. In sand–dust environments, the proposed method achieved the highest scores, with an average of 21.52 in BRISQUE, 0.724 in MCMA, and 0.968 in LPC-SI compared to conventional methods. For haze images, the proposed method outperformed conventional methods with an average of 3.458 in CEIQ, 0.967 in LPC-SI, and 0.243 in S3. The images generated via this proposed method demonstrated superior performance in image quality and sharpness evaluation compared to conventional algorithms. The outcomes of this study hold particular relevance for camera images utilized in automobiles, especially in the context of self-driving cars or CCTV surveillance systems.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
26

BEHNKE, SVEN. „LEARNING ITERATIVE IMAGE RECONSTRUCTION IN THE NEURAL ABSTRACTION PYRAMID“. International Journal of Computational Intelligence and Applications 01, Nr. 04 (Dezember 2001): 427–38. http://dx.doi.org/10.1142/s1469026801000342.

Der volle Inhalt der Quelle
Annotation:
Successful image reconstruction requires the recognition of a scene and the generation of a clean image of that scene. We propose to use recurrent neural networks for both analysis and synthesis. The networks have a hierarchical architecture that represents images in multiple scales with different degrees of abstraction. The mapping between these representations is mediated by a local connection structure. We supply the networks with degraded images and train them to reconstruct the originals iteratively. This iterative reconstruction makes it possible to use partial results as context information to resolve ambiguities. We demonstrate the power of the approach using three examples: superresolution, fill-in of occluded parts, and noise removal/contrast enhancement. We also reconstruct images from sequences of degraded images.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
27

Mao, Xiaoyang, Tosiyasu Kunii, Issei Fujishiro und Tsukasa Noma. „Hierarchical Representations of 2D/3D Gray-Scale Images and Their 2D/3D Two-Way Conversion“. IEEE Computer Graphics and Applications 7, Nr. 12 (Dezember 1987): 37–44. http://dx.doi.org/10.1109/mcg.1987.276937.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
28

Park, Hisup, Mark R. Cutkosky, Andrew B. Conru und Soo-Hong Lee. „An agent-based approach to concurrent cable harness design“. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 8, Nr. 1 (1994): 45–61. http://dx.doi.org/10.1017/s0890060400000457.

Der volle Inhalt der Quelle
Annotation:
AbstractAn approach to providing computational support for concurrent design is discussed in the context of an industrial cable harness design problem. Key issues include the development of an architecture that supports collaboration among specialists, the development of hierarchical representations that capture different characteristics of the design, and the decomposition of tasks to achieve a trade-off between efficiency and robustness. An architecture is presented in which the main design tasks are supported by agents – asynchronous and semiautonomous modules that automate routine design tasks and provide specialized interfaces for working on particular aspects of the design. The agent communication and coordination mechanisms permit members of an engineering team to work concurrently, at different levels of detail and on different versions of the design. The design is represented hierarchically, with detailed models maintained by the participating agents. Abstractions of the detailed models, called “agent model images,” are shared with other agents. In conjunction with the architecture and design representations, issues pertaining to the exchange of information among different views of the design, management of dependencies and constraints, and propagation of design changes are discussed.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
29

Li, Rui, Zhenyu Liu und Jianrong Tan. „Reassessing Hierarchical Representation for Action Recognition in Still Images“. IEEE Access 6 (2018): 61386–400. http://dx.doi.org/10.1109/access.2018.2872798.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
30

Gagandeep Kaur, Vedant Pinjarkar, Rutuja Rajendra, Latika Pinjarkar, Jaspal Bagga, Poorva Agrawal,. „Deep Learning Model for Retrieving Color Logo Images in Content Based Image Retrieval“. Journal of Electrical Systems 20, Nr. 2s (31.03.2024): 1325–33. http://dx.doi.org/10.52783/jes.1773.

Der volle Inhalt der Quelle
Annotation:
Content-Based Image retrieval (CBIR) has gained a magnificent deal of consideration because of the digital image data's epidemic growth. The advancement of deep learning has enabled Convolutional Neural Networks to become an influential technique for extraction of discriminative image features. In recent years, convolutional neural networks (CNNs) have proven extremely effective at extracting unique information from images. In contrast to text-based image retrieval, CBIR gathers comparable images based primarily on their visual content. The use of deep learning, especially CNNs, for feature extraction and image processing has been shown to perform better than other techniques. In the proposed study, we investigate CNNs for CBIR focusing on how well they extract discriminative visual features and facilitate accurate image retrieval. Also Principal Component Analysis and Linear Discriminant Analysis are combined for optimization of features resulting in boosting the retrieval results. Using hierarchical representations learned by CNNs, we aim to improve retrieval accuracy and efficiency. In comparison with conventional retrieval techniques, our proposed CBIR system shows superior performance on a benchmark dataset.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
31

Santos, Jéssica de Castro, Cristina Arreguy-Sena, Paulo Ferreira Pinto, Elenir de Paiva Pereira, Marcelo da Silva Alves und Fabiano Bolpato Loures. „Social representation of elderly people on falls: structural analysis and in the light of Neuman“. Revista Brasileira de Enfermagem 71, suppl 2 (2018): 851–59. http://dx.doi.org/10.1590/0034-7167-2017-0258.

Der volle Inhalt der Quelle
Annotation:
ABSTRACT Objective: To understand the symbolic elements and the hierarchical system of representations of elderly people on falls, according to Abric’s structural analysis and Neuman’s theory. Method: Abric structural approach developed at the home of primary care users in a city of Minas Gerais. A free evocation technique of images triggered by images was performed in 2016 with elderly individuals (≥65 years old). Data treated by dictionary of equivalent terms; processed in Evoc 2000 software converging, analytically, according to Neuman. Ethical/legal criteria were met. Results: 195 people participated, 78.5% were women, and 45.1% were aged ≥75 years. Summarized 897 words; 155 different ones. Central nucleus containing cognates: dizziness-vertigo-labyrinthitis and slipper-shoes (behavioral and objective dimension). The word disease integrated the area of contrast. Environmental and personal stressors were identified according to Neuman. Final considerations: Objects and risk behaviors for falls integrated the representations, although environmental and personal stressors indicate the need for preventive interventions in the environment and in the intrapersonal dimension.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
32

Wang, Bo, Jichang Guo, Yan Zhang und Chongyi Li. „Hierarchical feature concatenation-based kernel sparse representations for image categorization“. Visual Computer 33, Nr. 5 (02.03.2016): 647–63. http://dx.doi.org/10.1007/s00371-016-1215-2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
33

Perret, B., S. Lefevre, C. Collet und Éric Slezak. „Hyperconnections and Hierarchical Representations for Grayscale and Multiband Image Processing“. IEEE Transactions on Image Processing 21, Nr. 1 (Januar 2012): 14–27. http://dx.doi.org/10.1109/tip.2011.2161322.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
34

Mahmood, Ammar, Ana Giraldo Ospina, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid, Renae Hovey, Robert B. Fisher und Gary A. Kendrick. „Automatic Hierarchical Classification of Kelps Using Deep Residual Features“. Sensors 20, Nr. 2 (13.01.2020): 447. http://dx.doi.org/10.3390/s20020447.

Der volle Inhalt der Quelle
Annotation:
Across the globe, remote image data is rapidly being collected for the assessment of benthic communities from shallow to extremely deep waters on continental slopes to the abyssal seas. Exploiting this data is presently limited by the time it takes for experts to identify organisms found in these images. With this limitation in mind, a large effort has been made globally to introduce automation and machine learning algorithms to accelerate both classification and assessment of marine benthic biota. One major issue lies with organisms that move with swell and currents, such as kelps. This paper presents an automatic hierarchical classification method local binary classification as opposed to the conventional flat classification to classify kelps in images collected by autonomous underwater vehicles. The proposed kelp classification approach exploits learned feature representations extracted from deep residual networks. We show that these generic features outperform the traditional off-the-shelf CNN features and the conventional hand-crafted features. Experiments also demonstrate that the hierarchical classification method outperforms the traditional parallel multi-class classifications by a significant margin (90.0% vs. 57.6% and 77.2% vs. 59.0%) on Benthoz15 and Rottnest datasets respectively. Furthermore, we compare different hierarchical classification approaches and experimentally show that the sibling hierarchical training approach outperforms the inclusive hierarchical approach by a significant margin. We also report an application of our proposed method to study the change in kelp cover over time for annually repeated AUV surveys.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
35

Yu, Huai, Tianheng Yan, Wen Yang und Hong Zheng. „AN INTEGRATIVE OBJECT-BASED IMAGE ANALYSIS WORKFLOW FOR UAV IMAGES“. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B1 (07.06.2016): 1085–91. http://dx.doi.org/10.5194/isprsarchives-xli-b1-1085-2016.

Der volle Inhalt der Quelle
Annotation:
In this work, we propose an integrative framework to process UAV images. The overall process can be viewed as a pipeline consisting of the geometric and radiometric corrections, subsequent panoramic mosaicking and hierarchical image segmentation for later Object Based Image Analysis (OBIA). More precisely, we first introduce an efficient image stitching algorithm after the geometric calibration and radiometric correction, which employs a fast feature extraction and matching by combining the local difference binary descriptor and the local sensitive hashing. We then use a Binary Partition Tree (BPT) representation for the large mosaicked panoramic image, which starts by the definition of an initial partition obtained by an over-segmentation algorithm, i.e., the simple linear iterative clustering (SLIC). Finally, we build an object-based hierarchical structure by fully considering the spectral and spatial information of the super-pixels and their topological relationships. Moreover, an optimal segmentation is obtained by filtering the complex hierarchies into simpler ones according to some criterions, such as the uniform homogeneity and semantic consistency. Experimental results on processing the post-seismic UAV images of the 2013 Ya’an earthquake demonstrate the effectiveness and efficiency of our proposed method.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
36

Yu, Huai, Tianheng Yan, Wen Yang und Hong Zheng. „AN INTEGRATIVE OBJECT-BASED IMAGE ANALYSIS WORKFLOW FOR UAV IMAGES“. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B1 (07.06.2016): 1085–91. http://dx.doi.org/10.5194/isprs-archives-xli-b1-1085-2016.

Der volle Inhalt der Quelle
Annotation:
In this work, we propose an integrative framework to process UAV images. The overall process can be viewed as a pipeline consisting of the geometric and radiometric corrections, subsequent panoramic mosaicking and hierarchical image segmentation for later Object Based Image Analysis (OBIA). More precisely, we first introduce an efficient image stitching algorithm after the geometric calibration and radiometric correction, which employs a fast feature extraction and matching by combining the local difference binary descriptor and the local sensitive hashing. We then use a Binary Partition Tree (BPT) representation for the large mosaicked panoramic image, which starts by the definition of an initial partition obtained by an over-segmentation algorithm, i.e., the simple linear iterative clustering (SLIC). Finally, we build an object-based hierarchical structure by fully considering the spectral and spatial information of the super-pixels and their topological relationships. Moreover, an optimal segmentation is obtained by filtering the complex hierarchies into simpler ones according to some criterions, such as the uniform homogeneity and semantic consistency. Experimental results on processing the post-seismic UAV images of the 2013 Ya’an earthquake demonstrate the effectiveness and efficiency of our proposed method.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
37

Zhang, Jiahang, Lilang Lin und Jiaying Liu. „Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 3 (26.06.2023): 3427–35. http://dx.doi.org/10.1609/aaai.v37i3.25451.

Der volle Inhalt der Quelle
Annotation:
Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons’ structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD. Our project is publicly available at: https://jhang2020.github.io/Projects/HiCLR/HiCLR.html.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
38

Liu, Xinlong, Chu He, Dehui Xiong und Mingsheng Liao. „Pattern Statistics Network for Classification of High-Resolution SAR Images“. Remote Sensing 11, Nr. 16 (20.08.2019): 1942. http://dx.doi.org/10.3390/rs11161942.

Der volle Inhalt der Quelle
Annotation:
The classification of synthetic aperture radar (SAR) images is of great importance for rapid scene understanding. Recently, convolutional neural networks (CNNs) have been applied to the classification of single-polarized SAR images. However, it is still difficult due to the random and complex spatial patterns lying in SAR images, especially in the case of finite training data. In this paper, a pattern statistics network (PSNet) is proposed to address this problem. PSNet borrows the idea from the statistics and probability theory and explicitly embeds the random nature of SAR images in the representation learning. In the PSNet, both fluctuation and pattern representations are extracted for SAR images. More specifically, the fluctuation representation does not consider the rigorous relationships between local pixels and only describes the average fluctuation of local pixels. By contrast, the pattern representation is devoted to hierarchically capturing the interactions between local pixels, namely, the spatial patterns of SAR images. The proposed PSNet is evaluated on three real SAR data, including spaceborne and airborne data. The experimental results indicate that the fluctuation representation is useful and PSNet achieves superior performance in comparison with related CNN-based and texture-based methods.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
39

Swan, Elaine. „Iconographies of the everyday: Mediated whiteness and food hospitality activism“. European Journal of Cultural Studies 24, Nr. 6 (25.11.2021): 1319–39. http://dx.doi.org/10.1177/13675494211055737.

Der volle Inhalt der Quelle
Annotation:
The category of the ‘everyday’ has been relatively un-theorised in studies of digital food culture. Drawing on theories that the everyday is not just a backdrop but through which race, class and gender are constituted, and the cultural production of whiteness, I analyse digital photographs from the Welcome Dinner Project’s webpages and social media. The Welcome Dinner Project is an Australian food hospitality activism charity, which organises and facilitates one-off dinners to bring ‘newly arrived’ and ‘established Australians’ together over potluck hospitality to address isolation and racism. My overall argument is that Welcome Dinner Project representations and media representations of Welcome Dinner Project are underscored by conflicting representations of race, diversity and privilege. Despite the good intentions of the Welcome Dinner Project, the formal images it disseminates work to service the status quo by enacting and reinforcing dominant notions of middle-class whiteness in Australia, moderating the transgressive potential of its activism. However, these processes are subverted by less formal and unruly images depicting people outside, in mess, in non-hierarchical groups and migrant hosting. Such imagery can be understood as a form of visual activism which challenges the iconographies of whiteness in digital food culture and normative ideals of race-neutral domesticity and everydayness.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
40

Kunii, Tosiyasu L., Issei Fujishiro und Xiaoyang Mao. „G-quadtree: A hierarchical representation of gray-scale digital images“. Visual Computer 2, Nr. 4 (August 1986): 219–26. http://dx.doi.org/10.1007/bf01900345.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
41

Hovhannisyan, Mariam, Alex Clarke, Benjamin R. Geib, Rosalie Cicchinelli, Zachary Monge, Tory Worth, Amanda Szymanski, Roberto Cabeza und Simon W. Davis. „The visual and semantic features that predict object memory: Concept property norms for 1,000 object images“. Memory & Cognition 49, Nr. 4 (19.01.2021): 712–31. http://dx.doi.org/10.3758/s13421-020-01130-5.

Der volle Inhalt der Quelle
Annotation:
AbstractHumans have a remarkable fidelity for visual long-term memory, and yet the composition of these memories is a longstanding debate in cognitive psychology. While much of the work on long-term memory has focused on processes associated with successful encoding and retrieval, more recent work on visual object recognition has developed a focus on the memorability of specific visual stimuli. Such work is engendering a view of object representation as a hierarchical movement from low-level visual representations to higher level categorical organization of conceptual representations. However, studies on object recognition often fail to account for how these high- and low-level features interact to promote distinct forms of memory. Here, we use both visual and semantic factors to investigate their relative contributions to two different forms of memory of everyday objects. We first collected normative visual and semantic feature information on 1,000 object images. We then conducted a memory study where we presented these same images during encoding (picture target) on Day 1, and then either a Lexical (lexical cue) or Visual (picture cue) memory test on Day 2. Our findings indicate that: (1) higher level visual factors (via DNNs) and semantic factors (via feature-based statistics) make independent contributions to object memory, (2) semantic information contributes to both true and false memory performance, and (3) factors that predict object memory depend on the type of memory being tested. These findings help to provide a more complete picture of what factors influence object memorability. These data are available online upon publication as a public resource.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
42

Cheng, Xi, Xiang Li und Jian Yang. „Triple-Attention Mixed-Link Network for Single-Image Super-Resolution“. Applied Sciences 9, Nr. 15 (25.07.2019): 2992. http://dx.doi.org/10.3390/app9152992.

Der volle Inhalt der Quelle
Annotation:
Single-image super-resolution is of great importance as a low-level computer-vision task. Recent approaches with deep convolutional neural networks have achieved impressive performance. However, existing architectures have limitations due to the less sophisticated structure along with less strong representational power. In this work, to significantly enhance the feature representation, we proposed triple-attention mixed-link network (TAN), which consists of (1) three different aspects (i.e., kernel, spatial, and channel) of attention mechanisms and (2) fusion of both powerful residual and dense connections (i.e., mixed link). Specifically, the network with multi-kernel learns multi-hierarchical representations under different receptive fields. The features are recalibrated by the effective kernel and channel attention, which filters the information and enables the network to learn more powerful representations. The features finally pass through the spatial attention in the reconstruction network, which generates a fusion of local and global information, lets the network restore more details, and improves the reconstruction quality. The proposed network structure decreases 50% of the parameter growth rate compared with previous approaches. The three attention mechanisms provide 0.49 dB, 0.58 dB, and 0.32 dB performance gain when evaluating on Set5, Set14, and BSD100. Thanks to the diverse feature recalibrations and the advanced information flow topology, our proposed model is strong enough to perform against the state-of-the-art methods on the benchmark evaluations.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
43

Yamanakkanavar, Nagaraj, Jae Young Choi und Bumshik Lee. „Multiscale and Hierarchical Feature-Aggregation Network for Segmenting Medical Images“. Sensors 22, Nr. 9 (30.04.2022): 3440. http://dx.doi.org/10.3390/s22093440.

Der volle Inhalt der Quelle
Annotation:
We propose an encoder–decoder architecture using wide and deep convolutional layers combined with different aggregation modules for the segmentation of medical images. Initially, we obtain a rich representation of features that span from low to high levels and from small to large scales by stacking multiple k × k kernels, where each k × k kernel operation is split into k × 1 and 1 × k convolutions. In addition, we introduce two feature-aggregation modules—multiscale feature aggregation (MFA) and hierarchical feature aggregation (HFA)—to better fuse information across end-to-end network layers. The MFA module progressively aggregates features and enriches feature representation, whereas the HFA module merges the features iteratively and hierarchically to learn richer combinations of the feature hierarchy. Furthermore, because residual connections are advantageous for assembling very deep networks, we employ an MFA-based long residual connections to avoid vanishing gradients along the aggregation paths. In addition, a guided block with multilevel convolution provides effective attention to the features that were copied from the encoder to the decoder to recover spatial information. Thus, the proposed method using feature-aggregation modules combined with a guided skip connection improves the segmentation accuracy, achieving a high similarity index for ground-truth segmentation maps. Experimental results indicate that the proposed model achieves a superior segmentation performance to that obtained by conventional methods for skin-lesion segmentation, with an average accuracy score of 0.97 on the ISIC-2018, PH2, and UFBA-UESC datasets.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
44

Rubin, Stuart, Roumen Kountchev, Mariofanna Milanova und Roumiana Kountcheva. „Multispectral Image Compression, Intelligent Analysis, and Hierarchical Search in Image Databases“. International Journal of Multimedia Data Engineering and Management 3, Nr. 4 (Oktober 2012): 1–30. http://dx.doi.org/10.4018/jmdem.2012100101.

Der volle Inhalt der Quelle
Annotation:
In this paper, a new approach is offered for the efficient processing and analysis of groups of multispectral images of the same objects. It comprises several tools: the Modified Inverse Pyramid Decomposition; the invariant object representation with Modified Mellin-Fourier transform, and the hierarchical search in image databases, for which the invariant representation is used. The new approach permits the definition of a large number of parameters, which are used for object analysis and evaluation. When combined with the KASER expert system, this approach yields a flexible tool for the analysis of multispectral images of the same object.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
45

Choi, Seongho, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee und Byoung-Tak Zhang. „DramaQA: Character-Centered Video Story Understanding with Hierarchical QA“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 2 (18.05.2021): 1166–74. http://dx.doi.org/10.1609/aaai.v35i2.16203.

Der volle Inhalt der Quelle
Annotation:
Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story. Moreover, researches on how to evaluate the degree of video understanding based on human cognitive process have not progressed as yet. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focuses on two perspectives: 1) Hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) Character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 17,983 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors and emotions of main characters, and coreference resolved scripts. Additionally, we suggest Multi-level Context Matching model which hierarchically understands character-centered representations of video to answer questions. We release our dataset and model publicly for research purposes, and we expect our work to provide a new perspective on video story understanding research.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
46

Zhang, Litian, Xiaoming Zhang und Junshu Pan. „Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 10 (28.06.2022): 11676–84. http://dx.doi.org/10.1609/aaai.v36i10.21422.

Der volle Inhalt der Quelle
Annotation:
Multimodal summarization with multimodal output (MSMO) generates a summary with both textual and visual content. Multimodal news report contains heterogeneous contents, which makes MSMO nontrivial. Moreover, it is observed that different modalities of data in the news report correlate hierarchically. Traditional MSMO methods indistinguishably handle different modalities of data by learning a representation for the whole data, which is not directly adaptable to the heterogeneous contents and hierarchical correlation. In this paper, we propose a hierarchical cross-modality semantic correlation learning model (HCSCL) to learn the intra- and inter-modal correlation existing in the multimodal data. HCSCL adopts a graph network to encode the intra-modal correlation. Then, a hierarchical fusion framework is proposed to learn the hierarchical correlation between text and images. Furthermore, we construct a new dataset with relevant image annotation and image object label information to provide the supervision information for the learning procedure. Extensive experiments on the dataset show that HCSCL significantly outperforms the baseline methods in automatic summarization metrics and fine-grained diversity tests.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
47

Valitova, Z. Kh, und A. B. Yessimova. „Territorial images of Kazakhstan in the perception of the student youth“. RUDN Journal of Sociology 21, Nr. 3 (17.09.2021): 543–56. http://dx.doi.org/10.22363/2313-2272-2021-21-3-543-556.

Der volle Inhalt der Quelle
Annotation:
Since gaining independence, Kazakhstan has undergone significant changes in the territorial structure, which affected social representations of its regions. The authors reconstruct the dominant territorial images of the younger generations that grew up in independent Kazakhstan. The article is based on the results of the mental maps method applied for revealing images of the country. The authors studied the representations of the countrys territories from two geographical positions - the center (Karaganda) and the south (Shymkent). According to the research procedure, the informants drew their version of the countrys map with the most important territorial objects, and proposed associations for the features of certain territories. 80 first- and second-year students were questioned in the higher educational institutions of Shymkent and Karaganda. In the first part of the article, the authors examine the images presented on mental maps, in the second part - associations for regions of the country. Thus, the authors identify three circles of the territorial vision: core, semi-periphery and periphery. The core consists of the place of residence and the cities of republican significance - Almaty and Nur-Sultan (the so-called southern and northern capitals). The dominant images of the core are political, cultural, toponymical and resource. The semi-periphery consists of regional centers with the natural-resource and climatic images, the periphery - of cities far from the students place of residence and of the voids - territories not indicated on the map. The images of the periphery reflect mainly the climatic features of territories. The authors argue that the recognizability of territories in the perception of the student youth reflects a certain hierarchical spatial structure in which the status cities dominate.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
48

Ding, Youli, Xianwei Zheng, Yan Zhou, Hanjiang Xiong und and Jianya Gong. „Low-Cost and Efficient Indoor 3D Reconstruction Through Annotated Hierarchical Structure-from-Motion“. Remote Sensing 11, Nr. 1 (29.12.2018): 58. http://dx.doi.org/10.3390/rs11010058.

Der volle Inhalt der Quelle
Annotation:
With the widespread application of location-based services, the appropriate representation of indoor spaces and efficient indoor 3D reconstruction have become essential tasks. Due to the complexity and closeness of indoor spaces, it is difficult to develop a versatile solution for large-scale indoor 3D scene reconstruction. In this paper, an annotated hierarchical Structure-from-Motion (SfM) method is proposed for low-cost and efficient indoor 3D reconstruction using unordered images collected with widely available smartphone or consumer-level cameras. Although the reconstruction of indoor models is often compromised by the indoor complexity, we make use of the availability of complex semantic objects to classify the scenes and construct a hierarchical scene tree to recover the indoor space. Starting with the semantic annotation of the images, images that share the same object were detected and classified utilizing visual words and the support vector machine (SVM) algorithm. The SfM method was then applied to hierarchically recover the atomic 3D point cloud model of each object, with the semantic information from the images attached. Finally, an improved random sample consensus (RANSAC) generalized Procrustes analysis (RGPA) method was employed to register and optimize the partial models into a complete indoor scene. The proposed approach incorporates image classification in the hierarchical SfM based indoor reconstruction task, which explores the semantic propagation from images to points. It also reduces the computational complexity of the traditional SfM by avoiding exhausting pair-wise image matching. The applicability and accuracy of the proposed method was verified on two different image datasets collected with smartphone and consumer cameras. The results demonstrate that the proposed method is able to efficiently and robustly produce semantically and geometrically correct indoor 3D point models.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
49

Suppes, Patrick, Marcos Perreau-Guimaraes und Dik Kin Wong. „Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language“. Neural Computation 21, Nr. 11 (November 2009): 3228–69. http://dx.doi.org/10.1162/neco.2009.04-08-764.

Der volle Inhalt der Quelle
Annotation:
The idea of a hierarchical structure of language constituents of phonemes, syllables, words, and sentences is robust and widely accepted. Empirical similarity differences at every level of this hierarchy have been analyzed in the form of confusion matrices for many years. By normalizing such data so that differences are represented by conditional probabilities, semiorders of similarity differences can be constructed. The intersection of two such orderings is an invariant partial ordering with respect to the two given orders. These invariant partial orderings, especially between perceptual and brain representations, but also for comparison of brain images of words generated by auditory or visual presentations, are the focus of this letter. Data from four experiments are analyzed, with some success in finding conceptually significant invariants.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
50

Jin, Ge, Xu Chen und Long Ying. „Deep Multi-Task Learning for an Autoencoder-Regularized Semantic Segmentation of Fundus Retina Images“. Mathematics 10, Nr. 24 (16.12.2022): 4798. http://dx.doi.org/10.3390/math10244798.

Der volle Inhalt der Quelle
Annotation:
Automated segmentation of retinal blood vessels is necessary for the diagnosis, monitoring, and treatment planning of the disease. Although current U-shaped structure models have achieved outstanding performance, some challenges still emerge due to the nature of this problem and mainstream models. (1) There does not exist an effective framework to obtain and incorporate features with different spatial and semantic information at multiple levels. (2) The fundus retina images coupled with high-quality blood vessel segmentation are relatively rare. (3) The information on edge regions, which are the most difficult parts to segment, has not received adequate attention. In this work, we propose a novel encoder–decoder architecture based on the multi-task learning paradigm to tackle these challenges. The shared image encoder is regularized by conducting the reconstruction task in the VQ-VAE (Vector Quantized Variational AutoEncoder) module branch to improve the generalization ability. Meanwhile, hierarchical representations are generated and integrated to complement the input image. The edge attention module is designed to make the model capture edge-focused feature representations via deep supervision, focusing on the target edge regions that are most difficult to recognize. Extensive evaluations of three publicly accessible datasets demonstrate that the proposed model outperforms the current state-of-the-art methods.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie