Log in

Relevant bibliographies by topics / Image-level Supervision / Journal articles

To see the other types of publications on this topic, follow the link: Image-level Supervision.

Journal articles on the topic 'Image-level Supervision'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Image-level Supervision.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ge, Ce, Jingyu Wang, Qi Qi, Haifeng Sun, Tong Xu, and Jianxin Liao. "Scene-Level Sketch-Based Image Retrieval with Minimal Pairwise Supervision." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 650–57. http://dx.doi.org/10.1609/aaai.v37i1.25141.

Full text

Abstract:

The sketch-based image retrieval (SBIR) task has long been researched at the instance level, where both query sketches and candidate images are assumed to contain only one dominant object. This strong assumption constrains its application, especially with the increasingly popular intelligent terminals and human-computer interaction technology. In this work, a more general scene-level SBIR task is explored, where sketches and images can both contain multiple object instances. The new general task is extremely challenging due to several factors: (i) scene-level SBIR inherently shares sketch-specific difficulties with instance-level SBIR (e.g., sparsity, abstractness, and diversity), (ii) the cross-modal similarity is measured between two partially aligned domains (i.e., not all objects in images are drawn in scene sketches), and (iii) besides instance-level visual similarity, a more complex multi-dimensional scene-level feature matching problem is imposed (including appearance, semantics, layout, etc.). Addressing these challenges, a novel Conditional Graph Autoencoder model is proposed to deal with scene-level sketch-images retrieval. More importantly, the model can be trained with only pairwise supervision, which distinguishes our study from others in that elaborate instance-level annotations (for example, bounding boxes) are no longer required. Extensive experiments confirm the ability of our model to robustly retrieve multiple related objects at the scene level and exhibit superior performance beyond strong competitors.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhou, Hongming, Kang Song, Xianglei Zhang, Wenyong Gui, and Qiusuo Qian. "WAILS: Watershed Algorithm With Image-Level Supervision for Weakly Supervised Semantic Segmentation." IEEE Access 7 (2019): 42745–56. http://dx.doi.org/10.1109/access.2019.2908216.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Zhang, Xiu. "Superresolution Reconstruction of Remote Sensing Image Based on Middle-Level Supervised Convolutional Neural Network." Journal of Sensors 2022 (January 4, 2022): 1–14. http://dx.doi.org/10.1155/2022/2603939.

Full text

Abstract:

Image has become one of the important carriers of visual information because of its large amount of information, easy to spread and store, and strong sense of sense. At the same time, the quality of image is also related to the completeness and accuracy of information transmission. This research mainly discusses the superresolution reconstruction of remote sensing images based on the middle layer supervised convolutional neural network. This paper designs a convolutional neural network with middle layer supervision. There are 16 layers in total, and the seventh layer is designed as an intermediate supervision layer. At present, there are many researches on traditional superresolution reconstruction algorithms and convolutional neural networks, but there are few researches that combine the two together. Convolutional neural network can obtain the high-frequency features of the image and strengthen the detailed information; so, it is necessary to study its application in image reconstruction. This article will separately describe the current research status of image superresolution reconstruction and convolutional neural networks. The middle supervision layer defines the error function of the supervision layer, which is used to optimize the error back propagation mechanism of the convolutional neural network to improve the disappearance of the gradient of the deep convolutional neural network. The algorithm training is mainly divided into four stages: the original remote sensing image preprocessing, the remote sensing image temporal feature extraction stage, the remote sensing image spatial feature extraction stage, and the remote sensing image reconstruction output layer. The last layer of the network draws on the single-frame remote sensing image SRCNN algorithm. The output layer overlaps and adds the remote sensing images of the previous layer, averages the overlapped blocks, eliminates the block effect, and finally obtains high-resolution remote sensing images, which is also equivalent to filter operation. In order to allow users to compare the superresolution effect of remote sensing images more clearly, this paper uses the Qt5 interface library to implement the user interface of the remote sensing image superresolution software platform and uses the intermediate layer convolutional neural network and the remote sensing image superresolution reconstruction algorithm proposed in this paper. When the training epoch reaches 35 times, the network has converged. At this time, the loss function converges to 0.017, and the cumulative time is about 8 hours. This research helps to improve the visual effects of remote sensing images.

APA, Harvard, Vancouver, ISO, and other styles

4

Cha, Junuk, Muhammad Saqlain, Changhwa Lee, Seongyeong Lee, Seungeun Lee, Donguk Kim, Won-Hee Park, and Seungryul Baek. "Towards Single 2D Image-Level Self-Supervision for 3D Human Pose and Shape Estimation." Applied Sciences 11, no. 20 (October 18, 2021): 9724. http://dx.doi.org/10.3390/app11209724.

Full text

Abstract:

Three-dimensional human pose and shape estimation is an important problem in the computer vision community, with numerous applications such as augmented reality, virtual reality, human computer interaction, and so on. However, training accurate 3D human pose and shape estimators based on deep learning approaches requires a large number of images and corresponding 3D ground-truth pose pairs, which are costly to collect. To relieve this constraint, various types of weakly or self-supervised pose estimation approaches have been proposed. Nevertheless, these methods still involve supervision signals, which require effort to collect, such as unpaired large-scale 3D ground truth data, a small subset of 3D labeled data, video priors, and so on. Often, they require installing equipment such as a calibrated multi-camera system to acquire strong multi-view priors. In this paper, we propose a self-supervised learning framework for 3D human pose and shape estimation that does not require other forms of supervision signals while using only single 2D images. Our framework inputs single 2D images, estimates human 3D meshes in the intermediate layers, and is trained to solve four types of self-supervision tasks (i.e., three image manipulation tasks and one neural rendering task) whose ground-truths are all based on the single 2D images themselves. Through experiments, we demonstrate the effectiveness of our approach on 3D human pose benchmark datasets (i.e., Human3.6M, 3DPW, and LSP), where we present the new state-of-the-art among weakly/self-supervised methods.

APA, Harvard, Vancouver, ISO, and other styles

5

Han, Sujy, Tae Bok Lee, and Yong Seok Heo. "Deep Image Prior for Super Resolution of Noisy Image." Electronics 10, no. 16 (August 20, 2021): 2014. http://dx.doi.org/10.3390/electronics10162014.

Full text

Abstract:

Single image super-resolution task aims to reconstruct a high-resolution image from a low-resolution image. Recently, it has been shown that by using deep image prior (DIP), a single neural network is sufficient to capture low-level image statistics using only a single image without data-driven training such that it can be used for various image restoration problems. However, super-resolution tasks are difficult to perform with DIP when the target image is noisy. The super-resolved image becomes noisy because the reconstruction loss of DIP does not consider the noise in the target image. Furthermore, when the target image contains noise, the optimization process of DIP becomes unstable and sensitive to noise. In this paper, we propose a noise-robust and stable framework based on DIP. To this end, we propose a noise-estimation method using the generative adversarial network (GAN) and self-supervision loss (SSL). We show that a generator of DIP can learn the distribution of noise in the target image with the proposed framework. Moreover, we argue that the optimization process of DIP is stabilized when the proposed self-supervision loss is incorporated. The experiments show that the proposed method quantitatively and qualitatively outperforms existing single image super-resolution methods for noisy images.

APA, Harvard, Vancouver, ISO, and other styles

6

Qin, Jie, Jie Wu, Xuefeng Xiao, Lujun Li, and Xingang Wang. "Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2117–25. http://dx.doi.org/10.1609/aaai.v36i2.20108.

Full text

Abstract:

Image-level weakly supervised semantic segmentation (WSSS) is a fundamental yet challenging computer vision task facilitating scene understanding and automatic driving. Most existing methods resort to classification-based Class Activation Maps (CAMs) to play as the initial pseudo labels, which tend to focus on the discriminative image regions and lack customized characteristics for the segmentation task. To alleviate this issue, we propose a novel activation modulation and recalibration (AMR) scheme, which leverages a spotlight branch and a compensation branch to obtain weighted CAMs that can provide recalibration supervision and task-specific concepts. Specifically, an attention modulation module (AMM) is employed to rearrange the distribution of feature importance from the channel-spatial sequential perspective, which helps to explicitly model channel-wise interdependencies and spatial encodings to adaptively modulate segmentation-oriented activation responses. Furthermore, we introduce a cross pseudo supervision for dual branches, which can be regarded as a semantic similar regularization to mutually refine two branches. Extensive experiments show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing not only current methods trained with the image-level of supervision but also some methods relying on stronger supervision, such as saliency label. Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance. Our code is available at: https://github.com/jieqin-ai/AMR.

APA, Harvard, Vancouver, ISO, and other styles

7

Bonfiglio, Basilio. "Ricostruzione della storia del paziente. Supervisione psicoanalitica in psichiatria." PSICOBIETTIVO, no. 3 (October 2009): 77–89. http://dx.doi.org/10.3280/psob2008-003007.

Full text

Abstract:

- It is taken into account one of the functions of the psychoanalytic group clinical thinking within Mental Health Services: the collection-reconstruction of the patient's history. This work, especially in rehabilitation centres where the patient stays for long term periods, allows to relocate the problems and tensions arising from dealing with the patient within a context which grants their comprehension on a deep emotional and thinking level. From that stems an ongoing redefinition of the patient's image and his/her better individualization in the mind of those who look after them. This in turn fosters a consolidation of their own identity as Mental Health professionals. A transcript of two supervision meetings with the staff of a exemplifies some aspects of this work.Key Words: Supervision, Patient's History, Psychosis, Identity, Projective Identification, Rehabilitation, Residential Therapeutic Centre.Parole chiave: supervisione, storia del paziente, psicosi, identitŕ, identificazione proiettiva, riabilitazione, strutture intermedie.

APA, Harvard, Vancouver, ISO, and other styles

8

Li, Yanyan, Weilong Peng, Keke Tang, and Meie Fang. "Spatio-Frequency Decoupled Weak-Supervision for Face Reconstruction." Computational Intelligence and Neuroscience 2022 (September 22, 2022): 1–12. http://dx.doi.org/10.1155/2022/5903514.

Full text

Abstract:

3D face reconstruction has witnessed considerable progress in recovering 3D face shapes and textures from in-the-wild images. However, due to a lack of texture detail information, the reconstructed shape and texture based on deep learning could not be used to re-render a photorealistic facial image since it does not work in harmony with weak supervision only from the spatial domain. In the paper, we propose a method of spatio-frequency decoupled weak-supervision for face reconstruction, which applies the losses from not only the spatial domain but also the frequency domain to learn the reconstruction process that approaches photorealistic effect based on the output shape and texture. In detail, the spatial domain losses cover image-level and perceptual-level supervision. Moreover, the frequency domain information is separated from the input and rendered images, respectively, and is then used to build the frequency-based loss. In particular, we devise a spectrum-wise weighted Wing loss to implement balanced attention on different spectrums. Through the spatio-frequency decoupled weak-supervision, the reconstruction process can be learned in harmony and generate detailed texture and high-quality shape only with labels of landmarks. The experiments on several benchmarks show that our method can generate high-quality results and outperform state-of-the-art methods in qualitative and quantitative comparisons.

APA, Harvard, Vancouver, ISO, and other styles

9

Gupta, Arjun, Zengming Shen, and Thomas Huang. "Text Embedding Bank for Detailed Image Paragraph Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15791–92. http://dx.doi.org/10.1609/aaai.v35i18.17892.

Full text

Abstract:

Existing deep learning-based models for image captioning typically consist of an image encoder to extract visual features and a language model decoder, an architecture that has shown promising results in single high-level sentence generation. However, only the word-level guiding signal is available when the image encoder is optimized to extract visual features. The inconsistency between the parallel extraction of visual features and sequential text supervision limits its success when the length of the generated text is long (more than 50 words). We propose a new module, called the Text Embedding Bank (TEB), to address this problem for image paragraph captioning. This module uses the paragraph vector model to learn fixed-length feature representations from a variable-length paragraph. We refer to the fixed-length feature as the TEB. This TEB module plays two roles to benefit paragraph captioning performance. First, it acts as a form of global and coherent deep supervision to regularize visual feature extraction in the image encoder. Second, it acts as a distributed memory to provide features of the whole paragraph to the language model, which alleviates the long-term dependency problem. Adding this module to two existing state-of-the-art methods achieves a new state-of-the-art result on the paragraph captioning Stanford Visual Genome dataset.

APA, Harvard, Vancouver, ISO, and other styles

10

Xiao, Shuomin, Aiping Qu, Penghui He, and Han Hong. "CA-Net: Context Aggregation Network for Nuclei Classification in Histopathology Image." Journal of Physics: Conference Series 2504, no. 1 (May 1, 2023): 012031. http://dx.doi.org/10.1088/1742-6596/2504/1/012031.

Full text

Abstract:

Abstract Accurately classifying nuclei in histopathology images is essential for cancer diagnosis and prognosis. However, due to the touching nuclei, nucleus shape variation, background complexity, and image artifacts, end-to-end nucleus classification is still difficult and challenging. In this manuscript, we propose a context aggregation network (CA-Net) for nuclei classification by fusing global contextual information which is critical for classifying nuclei in histopathology images. Specifically, we propose a multi-level semantic supervision (MSS) module focusing on extracting multi-scale context information by varying three different kernel sizes, and dynamically aggregating the context information from high to low level. Furthermore, we employ the GPG and SAPF modules in encoder and decoder networks to exact and aggregate global context information. Finally, the proposed network is verified on a mainstream nuclei classification image datasets (PanNuke) and achieves an improved global accuracy of 0.816. Our proposed MSS module can be easily transferred into any UNet-liked architecture as a deep supervision mechanism.

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, Chaoqun, Xuejin Chen, Shaobo Min, Xiaoyan Sun, and Houqiang Li. "Task-Independent Knowledge Makes for Transferable Representations for Generalized Zero-Shot Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 3 (May 18, 2021): 2710–18. http://dx.doi.org/10.1609/aaai.v35i3.16375.

Full text

Abstract:

Generalized Zero-Shot Learning (GZSL) targets recognizing new categories by learning transferable image representations. Existing methods find that, by aligning image representations with corresponding semantic labels, the semantic-aligned representations can be transferred to unseen categories. However, supervised by only seen category labels, the learned semantic knowledge is highly task-specific, which makes image representations biased towards seen categories. In this paper, we propose a novel Dual-Contrastive Embedding Network (DCEN) that simultaneously learns task-specific and task-independent knowledge via semantic alignment and instance discrimination. First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity. Besides task-specific knowledge, DCEN then introduces task-independent knowledge by attracting representations of different views of the same image and repelling representations of different images. Compared to high-level seen category supervision, this instance discrimination supervision encourages DCEN to capture low-level visual knowledge, which is less biased toward seen categories and alleviates the representation bias. Consequently, the task-specific and task-independent knowledge jointly make for transferable representations of DCEN, which obtains averaged 4.1% improvement on four public benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

12

Wei, Zhenfeng, and Xiaohua Zhang. "Feature Extraction and Retrieval of Ecommerce Product Images Based on Image Processing." Traitement du Signal 38, no. 1 (February 28, 2021): 181–90. http://dx.doi.org/10.18280/ts.380119.

Full text

Abstract:

The new retail is an industry featured by online ecommerce. One of the key techniques of the industry is the product identification based on image processing. This technique has an important business application value, because it is capable of improving the retrieval efficiency of products and the level of information supervision. To acquire high-level semantics of images and enhance the retrieval effect of products, this paper explores the feature extraction and retrieval of ecommerce product images based on image processing. The improved Fourier descriptor was innovatively into a metric learning-based product image feature extraction network, and the attention mechanism was introduced to realize accurate retrieval of product images. Firstly, the authors detailed how to acquire the product contour and the axis with minimum moment of inertia, and then extracted the shape feature of products. Next, a feature extraction network was established based on the metric learning supervision, which is capable of obtaining distinctive feature, and thus realized the extraction of distinctive and classification features of products. Finally, the authors expounded on the product image retrieval method based on cluster attention neural network. The effectiveness of our method was confirmed through experiments. The research results provide a reference for feature extraction and retrieval in other fields of image processing.

APA, Harvard, Vancouver, ISO, and other styles

13

Cui, Dejing, Weilong Ren, and Wenjin He. "An intelligent monitoring system based on computer vision." Journal of Physics: Conference Series 2290, no. 1 (June 1, 2022): 012062. http://dx.doi.org/10.1088/1742-6596/2290/1/012062.

Full text

Abstract:

Abstract Aiming at the customs supervision area of the comprehensive bonded area, this paper proposes an intelligent monitoring system based on computer vision through the research of computer vision, photoelectric sensors, digital image processing, data transmission network, automatic control and intelligent video analysis. We have realized the comprehensive supervision of people, vehicles, and things in the park, uploading all collected information to the centralized storage server, realizing uninterrupted data tracking, analysis, alarming, centralized management, unified scheduling, and command of front-end equipment. We have comprehensively improved the level of supervision of the comprehensive bonded area, provided the most intuitive information for emergency response, and performed the best response to emergencies according to the emergency plan to build a more intelligent comprehensive bonded area.

APA, Harvard, Vancouver, ISO, and other styles

14

Zhang, Daoan, Chenming Li, Haoquan Li, Wenjian Huang, Lingyun Huang, and Jianguo Zhang. "Rethinking Alignment and Uniformity in Unsupervised Image Semantic Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 11192–200. http://dx.doi.org/10.1609/aaai.v37i9.26325.

Full text

Abstract:

Unsupervised image segmentation aims to match low-level visual features with semantic-level representations without outer supervision. In this paper, we address the critical properties from the view of feature alignments and feature uniformity for UISS models. We also make a comparison between UISS and image-wise representation learning. Based on the analysis, we argue that the existing MI-based methods in UISS suffer from representation collapse. By this, we proposed a robust network called Semantic Attention Network(SAN), in which a new module Semantic Attention(SEAT) is proposed to generate pixel-wise and semantic features dynamically. Experimental results on multiple semantic segmentation benchmarks show that our unsupervised segmentation framework specializes in catching semantic representations, which outperforms all the unpretrained and even several pretrained methods.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhang, Liqun, Ke Chen, Lin Han, Yan Zhuang, Zhan Hua, Cheng Li, and Jiangli Lin. "Recognition of calcifications in thyroid nodules based on attention-gated collaborative supervision network of ultrasound images." Journal of X-Ray Science and Technology 28, no. 6 (December 5, 2020): 1123–39. http://dx.doi.org/10.3233/xst-200740.

Full text

Abstract:

BACKGROUND: Calcification is an important criterion for classification between benign and malignant thyroid nodules. Deep learning provides an important means for automatic calcification recognition, but it is tedious to annotate pixel-level labels for calcifications with various morphologies. OBJECTIVE: This study aims to improve accuracy of calcification recognition and prediction of its location, as well as to reduce the number of pixel-level labels in model training. METHODS: We proposed a collaborative supervision network based on attention gating (CS-AGnet), which was composed of two branches: a segmentation network and a classification network. The reorganized two-stage collaborative semi-supervised model was trained under the supervision of all image-level labels and few pixel-level labels. RESULTS: The results show that although our semi-supervised network used only 30% (289 cases) of pixel-level labels for training, the accuracy of calcification recognition reaches 92.1%, which is very close to 92.9% of deep supervision with 100% (966 cases) pixel-level labels. The CS-AGnet enables to focus the model’s attention on calcification objects. Thus, it achieves higher accuracy than other deep learning methods. CONCLUSIONS: Our collaborative semi-supervised model has a preferable performance in calcification recognition, and it reduces the number of manual annotations of pixel-level labels. Moreover, it may be of great reference for the object recognition of medical dataset with few labels.

APA, Harvard, Vancouver, ISO, and other styles

16

Chen, Jie, Fen He, Yi Zhang, Geng Sun, and Min Deng. "SPMF-Net: Weakly Supervised Building Segmentation by Combining Superpixel Pooling and Multi-Scale Feature Fusion." Remote Sensing 12, no. 6 (March 24, 2020): 1049. http://dx.doi.org/10.3390/rs12061049.

Full text

Abstract:

The lack of pixel-level labeling limits the practicality of deep learning-based building semantic segmentation. Weakly supervised semantic segmentation based on image-level labeling results in incomplete object regions and missing boundary information. This paper proposes a weakly supervised semantic segmentation method for building detection. The proposed method takes the image-level label as supervision information in a classification network that combines superpixel pooling and multi-scale feature fusion structures. The main advantage of the proposed strategy is its ability to improve the intactness and boundary accuracy of a detected building. Our method achieves impressive results on two 2D semantic labeling datasets, which outperform some competing weakly supervised methods and are close to the result of the fully supervised method.

APA, Harvard, Vancouver, ISO, and other styles

17

Fenneteau, Alexandre, Pascal Bourdon, David Helbert, Christine Fernandez-Maloigne, Christophe Habas, and Remy Guillevin. "Learning a CNN on multiple sclerosis lesion segmentation with self-supervision." Electronic Imaging 2020, no. 17 (January 26, 2020): 3–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.17.3dmp-002.

Full text

Abstract:

Multiple Sclerosis (MS) is a chronic, often disabling, autoimmune disease affecting the central nervous system and characterized by demyelination and neuropathic alterations. Magnetic Resonance (MR) images plays a pivotal role in the diagnosis and the screening of MS. MR images identify and localize demyelinating lesions (or plaques) and possible associated atrophic lesions whose MR aspect is in relation with the evolution of the disease. We propose a novel MS lesions segmentation method for MR images, based on Convolutional Neural Networks (CNNs) and partial self-supervision and studied the pros and cons of using self-supervision for the current segmentation task. Investigating the transferability by freezing the firsts convolutional layers, we discovered that improvements are obtained when the CNN is retrained from the first layers. We believe such results suggest that MRI segmentation is a singular task needing high level analysis from the very first stages of the vision process, as opposed to vision tasks aimed at day-to-day life such as face recognition or traffic sign classification. The evaluation of segmentation quality has been performed on full image size binary maps assembled from predictions on image patches from an unseen database.

APA, Harvard, Vancouver, ISO, and other styles

18

Yao, Zong Guo, Min Qin, Guan Wang, and Jin Ping Li. "Detection of Video Shot Changes Based on Image Matching." Applied Mechanics and Materials 263-266 (December 2012): 2064–69. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.2064.

Full text

Abstract:

In order to improve the supervision of TV commercials and reduce the illegal ones, it is necessary to develop a system to detect TV commercials in real time. One of the key points is to detect video shot changes. We put forward a novel method of image matching for shot change detection: firstly, a feature coding is introduced for describing image local gray level distribution; secondly, image matching is performed according to the feature coding; thirdly, shot changes can be detected. A preliminary system of video shot change detection is developed. Experiments show that the algorithm is better than traditional algorithms and can well meet the need of TV Video shot cut detection.

APA, Harvard, Vancouver, ISO, and other styles

19

Zhang, Jun, Yue Liu, Pengfei Wu, Zhenwei Shi, and Bin Pan. "Mining Cross-Domain Structure Affinity for Refined Building Segmentation in Weakly Supervised Constraints." Remote Sensing 14, no. 5 (March 2, 2022): 1227. http://dx.doi.org/10.3390/rs14051227.

Full text

Abstract:

Building segmentation for remote sensing images usually requires pixel-level labels which is difficult to collect when the images are in low resolution and quality. Recently, weakly supervised semantic segmentation methods have achieved promising performance, which only rely on image-level labels for each image. However, buildings in remote sensing images tend to present regular structures. The lack of supervision information may result in the ambiguous boundaries. In this paper, we propose a new weakly supervised network for refined building segmentation by mining the cross-domain structure affinity (CDSA) from multi-source remote sensing images. CDSA integrates the ideas of weak supervision and domain adaptation, where a pixel-level labeled source domain and an image-level labeled target domain are required. The target of CDSA is to learn a powerful segmentation network on the target domain with the guidance of source domain data. CDSA mainly consists of two branches, the structure affinity module (SAM) and the spatial structure adaptation (SSA). In brief, SAM is developed to learn the structure affinity of the buildings from source domain, and SSA infuses the structure affinity to the target domain via a domain adaptation approach. Moreover, we design an end-to-end network structure to simultaneously optimize the SAM and SSA. In this case, SAM can receive pseudosupervised information from SSA, and in turn provide a more accurate affinity matrix for SSA. In the experiments, our model can achieve an IoU score at 57.87% and 79.57% for the WHU and Vaihingen data sets. We compare CDSA with several state-of-the-art weakly supervised and domain adaptation methods, and the results indicate that our method presents advantages on two public data sets.

APA, Harvard, Vancouver, ISO, and other styles

20

Liu, Dongfang, Yiming Cui, Liqi Yan, Christos Mousas, Baijian Yang, and Yingjie Chen. "DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 7 (May 18, 2021): 6101–9. http://dx.doi.org/10.1609/aaai.v35i7.16760.

Full text

Abstract:

In this work, we introduce a Denser Feature Network(DenserNet) for visual localization. Our work provides three principal contributions. First, we develop a convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels for image representations. Using denser feature maps, our method can produce more key point features and increase image retrieval accuracy. Second, our model is trained end-to-end without pixel-level an-notation other than positive and negative GPS-tagged image pairs. We use a weakly supervised triplet ranking loss to learn discriminative features and encourage keypoint feature repeatability for image representation. Finally, our method is computationally efficient as our architecture has shared features and parameters during forwarding propagation. Our method is flexible and can be crafted on a light-weighted backbone architecture to achieve appealing efficiency with a small penalty on accuracy. Extensive experiment results indicate that our method sets a new state-of-the-art on four challenging large-scale localization benchmarks and three image retrieval benchmarks with the same level of supervision. The code is available at https://github.com/goodproj13/DenserNet

APA, Harvard, Vancouver, ISO, and other styles

21

Fan, Wan-Cyuan, Cheng-Fu Yang, Chiao-An Yang, and Yu-Chiang Frank Wang. "Target-Free Text-Guided Image Manipulation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 588–96. http://dx.doi.org/10.1609/aaai.v37i1.25134.

Full text

Abstract:

We tackle the problem of target-free text-guided image manipulation, which requires one to modify the input reference image based on the given text instruction, while no ground truth target image is observed during training. To address this challenging task, we propose a Cyclic-Manipulation GAN (cManiGAN) in this paper, which is able to realize where and how to edit the image regions of interest. Specifically, the image editor in cManiGAN learns to identify and complete the input image, while cross-modal interpreter and reasoner are deployed to verify the semantic correctness of the output image based on the input instruction. While the former utilizes factual/counterfactual description learning for authenticating the image semantics, the latter predicts the "undo" instruction and provides pixel-level supervision for the training of cManiGAN. With the above operational cycle-consistency, our cManiGAN can be trained in the above weakly supervised setting. We conduct extensive experiments on the datasets of CLEVR and COCO datasets, and the effectiveness and generalizability of our proposed method can be successfully verified. Project page: sites.google.com/view/wancyuanfan/projects/cmanigan.

APA, Harvard, Vancouver, ISO, and other styles

22

Yao, Huifeng, Xiaowei Hu, and Xiaomeng Li. "Enhancing Pseudo Label Quality for Semi-supervised Domain-Generalized Medical Image Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3099–107. http://dx.doi.org/10.1609/aaai.v36i3.20217.

Full text

Abstract:

Generalizing the medical image segmentation algorithms to unseen domains is an important research topic for computer-aided diagnosis and surgery. Most existing methods require a fully labeled dataset in each source domain. Although some researchers developed a semi-supervised domain generalized method, it still requires the domain labels. This paper presents a novel confidence-aware cross pseudo supervision algorithm for semi-supervised domain generalized medical image segmentation. The main goal is to enhance the pseudo label quality for unlabeled images from unknown distributions. To achieve it, we perform the Fourier transformation to learn low-level statistic information across domains and augment the images to incorporate cross-domain information. With these augmentations as perturbations, we feed the input to a confidence-aware cross pseudo supervision network to measure the variance of pseudo labels and regularize the network to learn with more confident pseudo labels. Our method sets new records on public datasets, i.e., M&Ms and SCGM. Notably, without using domain labels, our method surpasses the prior art that even uses domain labels by 11.67% on Dice on M&Ms dataset with 2% labeled data. Code is available at https://github.com/XMed-Lab/EPL SemiDG.

APA, Harvard, Vancouver, ISO, and other styles

23

Xiu, Supu, Yuanqiao Wen, Haiwen Yuan, Changshi Xiao, Wenqiang Zhan, Xiong Zou, Chunhui Zhou, and Sayed Chhattan Shah. "A Multi-Feature and Multi-Level Matching Algorithm Using Aerial Image and AIS for Vessel Identification." Sensors 19, no. 6 (March 15, 2019): 1317. http://dx.doi.org/10.3390/s19061317.

Full text

Abstract:

In order to monitor and manage vessels in channels effectively, identification and tracking are very necessary. This work developed a maritime unmanned aerial vehicle (Mar-UAV) system equipped with a high-resolution camera and an Automatic Identification System (AIS). A multi-feature and multi-level matching algorithm using the spatiotemporal characteristics of aerial images and AIS information was proposed to detect and identify field vessels. Specifically, multi-feature information, including position, scale, heading, speed, etc., are used to match between real-time image and AIS message. Additionally, the matching algorithm is divided into two levels, point matching and trajectory matching, for the accurate identification of surface vessels. Through such a matching algorithm, the Mar-UAV system is able to automatically identify the vessel’s vision, which improves the autonomy of the UAV in maritime tasks. The multi-feature and multi-level matching algorithm has been employed for the developed Mar-UAV system, and some field experiments have been implemented in the Yangzi River. The results indicated that the proposed matching algorithm and the Mar-UAV system are very significant for achieving autonomous maritime supervision.

APA, Harvard, Vancouver, ISO, and other styles

24

Li, Jiaxuan, Yiyang Liu, and Hao Wang. "Safety Supervision Method of Power Work Site Based on Computer Machine Learning and Image Recognition." Journal of Physics: Conference Series 2074, no. 1 (November 1, 2021): 012021. http://dx.doi.org/10.1088/1742-6596/2074/1/012021.

Full text

Abstract:

Abstract China’s traditional power system has been unable to meet the needs of society and the development of The Times. Under the background of intelligence, it is necessary to reform the power industry and increase the application of mobile application technology in the power system, so as to realize the precise management of the power system. The application of mobile application technology in the field operation of electric power construction, based on computer machine learning and image recognition, is helpful to realize the sustainable development of electric power enterprises, improve the service level of electric power enterprises and promote the on-site safety supervision.

APA, Harvard, Vancouver, ISO, and other styles

25

Wang, Sherrie, William Chen, Sang Michael Xie, George Azzari, and David B. Lobell. "Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery." Remote Sensing 12, no. 2 (January 7, 2020): 207. http://dx.doi.org/10.3390/rs12020207.

Full text

Abstract:

Accurate automated segmentation of remote sensing data could benefit applications from land cover mapping and agricultural monitoring to urban development surveyal and disaster damage assessment. While convolutional neural networks (CNNs) achieve state-of-the-art accuracy when segmenting natural images with huge labeled datasets, their successful translation to remote sensing tasks has been limited by low quantities of ground truth labels, especially fully segmented ones, in the remote sensing domain. In this work, we perform cropland segmentation using two types of labels commonly found in remote sensing datasets that can be considered sources of “weak supervision”: (1) labels comprised of single geotagged points and (2) image-level labels. We demonstrate that (1) a U-Net trained on a single labeled pixel per image and (2) a U-Net image classifier transferred to segmentation can outperform pixel-level algorithms such as logistic regression, support vector machine, and random forest. While the high performance of neural networks is well-established for large datasets, our experiments indicate that U-Nets trained on weak labels outperform baseline methods with as few as 100 labels. Neural networks, therefore, can combine superior classification performance with efficient label usage, and allow pixel-level labels to be obtained from image labels.

APA, Harvard, Vancouver, ISO, and other styles

26

Aberra, Tsige GebreMeskel, and Mogamat Noor Davids. "Open Distance and e-Learning: Ethiopian Doctoral Students’ Satisfaction with Support Services." International Review of Research in Open and Distributed Learning 23, no. 4 (November 1, 2022): 147–69. http://dx.doi.org/10.19173/irrodl.v23i4.6193.

Full text

Abstract:

This study assessed students’ level of satisfaction with the quality of student support services provided by an open distance e-learning (ODeL) university in Ethiopia. The target population was doctoral students who had been registered at the ODeL university for more than a year. To conduct a quantitative investigation, data were collected by means of a 34-item six-dimensional standardized questionnaire. Data analysis methods included linear as well as stepwise regressions. Using the gaps model as the theoretical framework, findings showed that the doctoral students were dissatisfied with four aspects of the student support services, namely supervision support, infrastructure, administrative support, and academic facilitation. In contrast, students were satisfied with the corporate image (reputation) of the ODeL university. For this ODeL university to play an effective role that coheres with the country’s socio-economic development plan, more attention should be given to the provision of supervision support, as there was strong dissatisfaction with this. The university could also build on or leverage aspects of their corporate image, for which there was strong satisfaction. Doing so will help the university make ongoing contributions and strengthen its commitment to the field of higher education and human capacity development in Ethiopia.

APA, Harvard, Vancouver, ISO, and other styles

27

Sun, Ruizhou, Yukun Su, and Qingyao Wu. "DENet: Disentangled Embedding Network for Visible Watermark Removal." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 2411–19. http://dx.doi.org/10.1609/aaai.v37i2.25337.

Full text

Abstract:

Adding visible watermark into image is a common copyright protection method of medias. Meanwhile, public research on watermark removal can be utilized as an adversarial technology to help the further development of watermarking. Existing watermark removal methods mainly adopt multi-task learning networks, which locate the watermark and restore the background simultaneously. However, these approaches view the task as an image-to-image reconstruction problem, where they only impose supervision after the final output, making the high-level semantic features shared between different tasks. To this end, inspired by the two-stage coarse-refinement network, we propose a novel contrastive learning mechanism to disentangle the high-level embedding semantic information of the images and watermarks, driving the respective network branch more oriented. Specifically, the proposed mechanism is leveraged for watermark image decomposition, which aims to decouple the clean image and watermark hints in the high-level embedding space. This can guarantee the learning representation of the restored image enjoy more task-specific cues. In addition, we introduce a self-attention-based enhancement module, which promotes the network's ability to capture semantic information among different regions, leading to further improvement on the contrastive learning mechanism. To validate the effectiveness of our proposed method, extensive experiments are conducted on different challenging benchmarks. Experimental evaluations show that our approach can achieve state-of-the-art performance and yield high-quality images. The code is available at: https://github.com/lianchengmingjue/DENet.

APA, Harvard, Vancouver, ISO, and other styles

28

Pan, Tao, Jiaqin Jiang, Jian Yao, Bin Wang, and Bin Tan. "A Novel Multi-Focus Image Fusion Network with U-Shape Structure." Sensors 20, no. 14 (July 13, 2020): 3901. http://dx.doi.org/10.3390/s20143901.

Full text

Abstract:

Multi-focus image fusion has become a very practical image processing task. It uses multiple images focused on various depth planes to create an all-in-focus image. Although extensive studies have been produced, the performance of existing methods is still limited by the inaccurate detection of the focus regions for fusion. Therefore, in this paper, we proposed a novel U-shape network which can generate an accurate decision map for the multi-focus image fusion. The Siamese encoder of our U-shape network can preserve the low-level cues with rich spatial details and high-level semantic information from the source images separately. Moreover, we introduce the ResBlocks to expand the receptive field, which can enhance the ability of our network to distinguish between focus and defocus regions. Moreover, in the bridge stage between the encoder and decoder, the spatial pyramid pooling is adopted as a global perception fusion module to capture sufficient context information for the learning of the decision map. Finally, we use a hybrid loss that combines the binary cross-entropy loss and the structural similarity loss for supervision. Extensive experiments have demonstrated that the proposed method can achieve the state-of-the-art performance.

APA, Harvard, Vancouver, ISO, and other styles

29

Han, Yanling, Lihua Huang, Zhonghua Hong, Shouqi Cao, Yun Zhang, and Jing Wang. "Deep Supervised Residual Dense Network for Underwater Image Enhancement." Sensors 21, no. 9 (May 10, 2021): 3289. http://dx.doi.org/10.3390/s21093289.

Full text

Abstract:

Underwater images are important carriers and forms of underwater information, playing a vital role in exploring and utilizing marine resources. However, underwater images have characteristics of low contrast and blurred details because of the absorption and scattering of light. In recent years, deep learning has been widely used in underwater image enhancement and restoration because of its powerful feature learning capabilities, but there are still shortcomings in detailed enhancement. To address the problem, this paper proposes a deep supervised residual dense network (DS_RD_Net), which is used to better learn the mapping relationship between clear in-air images and synthetic underwater degraded images. DS_RD_Net first uses residual dense blocks to extract features to enhance feature utilization; then, it adds residual path blocks between the encoder and decoder to reduce the semantic differences between the low-level features and high-level features; finally, it employs a deep supervision mechanism to guide network training to improve gradient propagation. Experiments results (PSNR was 36.2, SSIM was 96.5%, and UCIQE was 0.53) demonstrated that the proposed method can fully retain the local details of the image while performing color restoration and defogging compared with other image enhancement methods, achieving good qualitative and quantitative effects.

APA, Harvard, Vancouver, ISO, and other styles

30

Yan, Senbo, Xiaowen Song, and Guocong Liu. "Deeper and Mixed Supervision for Salient Object Detection in Automated Surface Inspection." Mathematical Problems in Engineering 2020 (February 25, 2020): 1–12. http://dx.doi.org/10.1155/2020/3751053.

Full text

Abstract:

In recent years, researches in the field of salient object detection have been widely made in many industrial visual inspection tasks. Automated surface inspection (ASI) can be regarded as one of the most challenging tasks in computer vision because of its high cost of data acquisition, serious imbalance of test samples, and high real-time requirement. Inspired by the requirements of industrial ASI and the methods of salient object detection (SOD), a task mode of defect type classification plus defect area segmentation and a novel deeper and mixed supervision network (DMS) architecture is proposed. The backbone network ResNeXt-101 was pretrained on ImageNet. Firstly, we extract five multiscale feature maps from backbone and concatenate them layer by layer. In addition, to obtain the classification prediction and saliency maps in one stage, the image-level and pixel-level ground truth is trained in a same side output network. Supervision signal is imposed on each side layer to realize deeper and mixed training for the network. Furthermore, the DMS network is equipped with residual refinement mechanism to refine the saliency maps of input images. We evaluate the DMS network on 4 open access ASI datasets and compare it with other 20 methods, which indicates that mixed supervision can significantly improve the accuracy of saliency segmentation. Experiment results show that the proposed method can achieve the state-of-the-art performance.

APA, Harvard, Vancouver, ISO, and other styles

31

Rudianto, Rudianto, and Eko Budi Setiawan. "Sistem Pengawasan Aktifitas Penggunaan Smartphone Android." Jurnal ULTIMA InfoSys 9, no. 1 (July 6, 2018): 24–31. http://dx.doi.org/10.31937/si.v9i1.839.

Full text

Abstract:

Availability the Application Programming Interface (API) for third-party applications on Android devices provides an opportunity to monitor Android devices with each other. This is used to create an application that can facilitate parents in child supervision through Android devices owned. In this study, some features added to the classification of image content on Android devices related to negative content. In this case, researchers using Clarifai API. The result of this research is to produce a system which has feature, give a report of image file contained in target smartphone and can do deletion on the image file, receive browser history report and can directly visit in the application, receive a report of child location and can be directly contacted via this application. This application works well on the Android Lollipop (API Level 22). Index Terms— Application Programming Interface(API), Monitoring, Negative Content, Children, Parent.

APA, Harvard, Vancouver, ISO, and other styles

32

Chen, Feng, Qinghua Xing, Bing Sun, Xuehu Yan, and Huan Lu. "A Novel RDA-Based Network to Conceal Image Data and Prevent Information Leakage." Mathematics 10, no. 19 (September 26, 2022): 3501. http://dx.doi.org/10.3390/math10193501.

Full text

Abstract:

Image data play an important role in our daily lives, and scholars have recently leveraged deep learning to design steganography networks to conceal and protect image data. However, the complexity of computation and the running speed have been neglected in their model designs, and steganography security still has much room for improvement. For this purpose, this paper proposes an RDA-based network, which can achieve higher security with lower computation complexity and faster running speed. To improve the hidden image’s quality and ensure that the hidden image and cover image are as similar as possible, a residual dense attention (RDA) module was designed to extract significant information from the cover image, thus assisting in reconstructing the salient target of the hidden image. In addition, we propose an activation removal strategy (ARS) to avoid undermining the fidelity of low-level features and to preserve more of the raw information from the input cover image and the secret image, which significantly boosts the concealing and revealing performance. Furthermore, to enable comprehensive supervision for the concealing and revealing processes, a mixed loss function was designed, which effectively improved the hidden image’s visual quality and enhanced the imperceptibility of secret content. Extensive experiments were conducted to verify the effectiveness and superiority of the proposed approach.

APA, Harvard, Vancouver, ISO, and other styles

33

Chen, Suting, Chaoqun Wu, Mithun Mukherjee, and Yujie Zheng. "HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation." ISPRS International Journal of Geo-Information 10, no. 10 (October 4, 2021): 672. http://dx.doi.org/10.3390/ijgi10100672.

Full text

Abstract:

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.

APA, Harvard, Vancouver, ISO, and other styles

34

Gimenez, Alba. "Distance and Implication." Film Studies 22, no. 1 (May 1, 2020): 112–28. http://dx.doi.org/10.7227/fs.22.0008.

Full text

Abstract:

Harun Farocki’s Eye/Machine (2003) is a video installation which analyses how what Farocki calls ‘the operational image’ reconfigures our visual regimes. The ‘operational image’ allows machines to operate ever more autonomously and to perform their tasks with no need for human supervision. Farocki links the birth of such operational images to the missiles with integrated cameras used during the Gulf War (1991) and therefore to military purposes. Eye/Machine poses a paradox: operational images generate a process of abstraction in which the image depicted (in the case of the war, the battlefield) gets detached from its indexical dimension, appearing as abstract and unreal. However, such detachment can be reversed when these images are recontextualised and reframed within an exhibition space, since that places them within a human experiential framework. Images, and our perception of them, are part of what Judith Butler calls the ‘extended materiality of war’. Thus, war is not only fought in the battlefield, but also at the level of the senses.

APA, Harvard, Vancouver, ISO, and other styles

35

Yao, Linli, Weiying Wang, and Qin Jin. "Image Difference Captioning with Pre-training and Contrastive Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3108–16. http://dx.doi.org/10.1609/aaai.v36i3.20218.

Full text

Abstract:

The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at https://github.com/yaolinli/IDC.

APA, Harvard, Vancouver, ISO, and other styles

36

Zhang, Bingfeng, Jimin Xiao, Yunchao Wei, Mingjie Sun, and Kaizhu Huang. "Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12765–72. http://dx.doi.org/10.1609/aaai.v34i07.6971.

Full text

Abstract:

Weakly supervised semantic segmentation is a challenging task as it only takes image-level information as supervision for training but produces pixel-level predictions for testing. To address such a challenging task, most recent state-of-the-art approaches propose to adopt two-step solutions, i.e. 1) learn to generate pseudo pixel-level masks, and 2) engage FCNs to train the semantic segmentation networks with the pseudo masks. However, the two-step solutions usually employ many bells and whistles in producing high-quality pseudo masks, making this kind of methods complicated and inelegant. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into confident yet tiny object/background regions. Such reliable regions are then directly served as ground-truth labels for the parallel segmentation branch, where a newly designed dense energy loss function is adopted for optimization. Despite its apparent simplicity, our one-step solution achieves competitive mIoU scores (val: 62.6, test: 62.9) on Pascal VOC compared with those two-step state-of-the-arts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC (val: 66.3, test: 66.5).

APA, Harvard, Vancouver, ISO, and other styles

37

Cui, Yuanhao, Fang Liu, Xu Liu, Lingling Li, and Xiaoxue Qian. "TCSPANet: Two-Staged Contrastive Learning and Sub-Patch Attention Based Network for PolSAR Image Classification." Remote Sensing 14, no. 10 (May 20, 2022): 2451. http://dx.doi.org/10.3390/rs14102451.

Full text

Abstract:

Polarimetric synthetic aperture radar (PolSAR) image classification has achieved great progress, but there still exist some obstacles. On the one hand, a large amount of PolSAR data is captured. Nevertheless, most of them are not labeled with land cover categories, which cannot be fully utilized. On the other hand, annotating PolSAR images relies more on domain knowledge and manpower, which makes pixel-level annotation harder. To alleviate the above problems, by integrating contrastive learning and transformer, we propose a novel patch-level PolSAR image classification, i.e., two-staged contrastive learning and sub-patch attention based network (TCSPANet). Firstly, the two-staged contrastive learning based network (TCNet) is designed for learning the representation information of PolSAR images without supervision, and obtaining the discrimination and comparability for actual land covers. Then, resorting to transformer, we construct the sub-patch attention encoder (SPAE) for modelling the context within patch samples. For training the TCSPANet, two patch-level datasets are built up based on unsupervised and semi-supervised methods. When predicting, the classification algorithm, classifying or splitting, is put forward to realise non-overlapping and coarse-to-fine patch-level classification. The classification results of multi-PolSAR images with one trained model suggests that our proposed model is superior to the compared methods.

APA, Harvard, Vancouver, ISO, and other styles

38

Mohammadi, M., F. Tabib Mahmoudi, and M. Hedayatifard. "AUTOMATIC VEHICLE RECOGNITION FOR URBAN TRAFFIC MANAGEMENT." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4/W18 (October 18, 2019): 741–44. http://dx.doi.org/10.5194/isprs-archives-xlii-4-w18-741-2019.

Full text

Abstract:

Abstract. Automatic vehicle recognition has an important role for many applications such as supervision, traffic management and rescue tasks. The ability of online supervision on the distribution of vehicles in urban environments prevents traffic, which in turn reduces air pollution and noise. However, this is extremely challenging due to the small size of vehicles, their different types and orientations, and the visual similarity to some other objects in very high resolution images. In this paper, an automatic vehicle recognition algorithm is proposed based on very high spatial resolution aerial images. In the first step of the proposed method, by generating the image pyramid, the candidate regions of the vehicles are recognized. Then, performing reverse pyramid, decision level fusion of the vehicle candidates and the land use/cover classification results of the original image resolution are performed in order to modify recognized vehicle regions. For evaluating the performance of the proposed method in this study, Ultracam aerial imagery with spatial resolution of 11 cm and 3 spectral bands have been used. Comparing the obtained vehicle recognition results from the proposed decision fusion algorithm with some manually selected vehicle regions confirm the accuracy of about %80. Moreover, the %78.87 and 0.71 are respectively the values for overall accuracy and Kappa coefficient of the obtained land use/cover classification map from decision fusion algorithm.

APA, Harvard, Vancouver, ISO, and other styles

39

Su, Yukun, Guosheng Lin, Yun Hao, Yiwen Cao, Wenjun Wang, and Qingyao Wu. "Self-Supervised Object Localization with Joint Graph Partition." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2289–97. http://dx.doi.org/10.1609/aaai.v36i2.20127.

Full text

Abstract:

Object localization aims to generate a tight bounding box for the target object, which is a challenging problem that has been deeply studied in recent years. Since collecting bounding-box labels is time-consuming and laborious, many researchers focus on weakly supervised object localization (WSOL). As the recent appealing self-supervised learning technique shows its powerful function in visual tasks, in this paper, we take the early attempt to explore unsupervised object localization by self-supervision. Specifically, we adopt different geometric transformations to image and utilize their parameters as pseudo labels for self-supervised learning. Then, the class-agnostic activation map (CAAM) is used to highlight the target object potential regions. However, such attention maps merely focus on the most discriminative part of the objects, which will affect the quality of the predicted bounding box. Based on the motivation that the activation maps of different transformations of the same image should be equivariant, we further design a siamese network that encodes the paired images and propose a joint graph cluster partition mechanism in an unsupervised manner to enhance the object co-occurrent regions. To validate the effectiveness of the proposed method, extensive experiments are conducted on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets. Experimental results show that our method outperforms state-of-the-art methods using the same level of supervision, even outperforms some weakly-supervised methods.

APA, Harvard, Vancouver, ISO, and other styles

40

Liu, Sheng, Kevin Lin, Lijuan Wang, Junsong Yuan, and Zicheng Liu. "OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 1773–81. http://dx.doi.org/10.1609/aaai.v36i2.20070.

Full text

Abstract:

We introduce the task of open-vocabulary visual instance search (OVIS). Given an arbitrary textual search query, Open-vocabulary Visual Instance Search (OVIS) aims to return a ranked list of visual instances, i.e., image patches, that satisfies the search intent from an image database. The term ``open vocabulary'' means that there are neither restrictions to the visual instance to be searched nor restrictions to the word that can be used to compose the textual search query. We propose to address such a search challenge via visual-semantic aligned representation learning (ViSA). ViSA leverages massive image-caption pairs as weak image-level (not instance-level) supervision to learn a rich cross-modal semantic space where the representations of visual instances (not images) and those of textual queries are aligned, thus allowing us to measure the similarities between any visual instance and an arbitrary textual query. To evaluate the performance of ViSA, we build two datasets named OVIS40 and OVIS1600 and also introduce a pipeline for error analysis. Through extensive experiments on the two datasets, we demonstrate ViSA's ability to search for visual instances in images not available during training given a wide range of textual queries including those composed of uncommon words. Experimental results show that ViSA achieves an mAP@50 of 27.8% on OVIS40 and achieves a recall@30 of 21.3% on OVIS1400 dataset under the most challenging settings.

APA, Harvard, Vancouver, ISO, and other styles

41

Zhang, Zhuang, and Wenjie Luo. "Hierarchical volumetric transformer with comprehensive attention for medical image segmentation." Mathematical Biosciences and Engineering 20, no. 2 (2022): 3177–90. http://dx.doi.org/10.3934/mbe.2023149.

Full text

Abstract:

<abstract> <p>Transformer is widely used in medical image segmentation tasks due to its powerful ability to model global dependencies. However, most of the existing transformer-based methods are two-dimensional networks, which are only suitable for processing two-dimensional slices and ignore the linguistic association between different slices of the original volume image blocks. To solve this problem, we propose a novel segmentation framework by deeply exploring the respective characteristic of convolution, comprehensive attention mechanism, and transformer, and assembling them hierarchically to fully exploit their complementary advantages. Specifically, we first propose a novel volumetric transformer block to help extract features serially in the encoder and restore the feature map resolution to the original level in parallel in the decoder. It can not only obtain the information of the plane, but also make full use of the correlation information between different slices. Then the local multi-channel attention block is proposed to adaptively enhance the effective features of the encoder branch at the channel level, while suppressing the invalid features. Finally, the global multi-scale attention block with deep supervision is introduced to adaptively extract valid information at different scale levels while filtering out useless information. Extensive experiments demonstrate that our proposed method achieves promising performance on multi-organ CT and cardiac MR image segmentation.</p> </abstract>

APA, Harvard, Vancouver, ISO, and other styles

42

Wu, Meiyu. "Analysis of the Application Effect of the Nursing Quality Control System in Hospital Nursing." Journal of Nursing 4, no. 2 (June 29, 2015): 24. http://dx.doi.org/10.18686/jn.v4i2.7.

Full text

Abstract:

Nursing quality is one of the important parts of the hospital service level as high-quality nursing can not only improve the hospital service level, but also promote the hospital image and perfect the relationship between doctors and patients. Nursing quality control system is a set of standard system which is set up to guarantee the nursing quality because the scientific nursing control system can shorten managing practice, improve the effect of nursing, and lower the operating costs for hospitals as well. Aiming at the problems existing in the modern nursing control system in China, the paper makes an analysis and a summary so as to perfect the nursing quality control system in our country and improve the nursing level by way of standardizing managing, setting up a scientific supervision system and perfecting the nursing team.

APA, Harvard, Vancouver, ISO, and other styles

43

Ochoa, Joan, Emilio García, Eduardo Quiles, and Antonio Correcher. "Redundant Fault Diagnosis for Photovoltaic Systems Based on an IRT Low-Cost Sensor." Sensors 23, no. 3 (January 24, 2023): 1314. http://dx.doi.org/10.3390/s23031314.

Full text

Abstract:

In large solar farms, supervision is an exhaustive task, often carried out manually by field technicians. Over time, automated or semi-automated fault detection and prevention methods in large photovoltaic plants are becoming increasingly common. The same does not apply when talking about small or medium-sized installations, where the cost of supervision at such level would mean total economic infeasibility. Although there are prevention protocols by suppliers, periodic inspections of the facilities by technicians do not ensure that faults such as the appearance of hot-spots are detected in time. That is why, nowadays, the only way of continuous supervision of a small or medium installation is often carried out by unqualified people and in a purely visual way. In this work, the development of a low-cost system prototype is proposed for the supervision of a medium or small photovoltaic installation based on the acquisition and treatment of thermographic images, with the aim of investigating the feasibility of an actual implementation. The work focuses on the system’s ability to detect hot-spots in supervised panels and successfully report detected faults. To achieve this goal, a low-cost thermal imaging camera is used for development, applying common image processing techniques, operating with OpenCV and MATLAB R2021b libraries. In this way, it is possible to demonstrate that it is achievable to successfully detect the hottest points of a photovoltaic (PV) installation with a much cheaper camera than the cameras used in today’s thermographic inspections, opening up the possibilities of creating a fully developed low-cost thermographic surveillance system.

APA, Harvard, Vancouver, ISO, and other styles

44

Feng, Jiangfan, Dini Wang, and Li Zhang. "Crowd Anomaly Detection via Spatial Constraints and Meaningful Perturbation." ISPRS International Journal of Geo-Information 11, no. 3 (March 18, 2022): 205. http://dx.doi.org/10.3390/ijgi11030205.

Full text

Abstract:

Crowd anomaly detection is a practical and challenging problem to computer vision and VideoGIS due to abnormal events’ rare and diverse nature. Consequently, traditional methods rely on low-level reconstruction in a single image space, easily affected by unimportant pixels or sudden variations. In addition, real-time detection for crowd anomaly detection is challenging, and localization of anomalies requires other supervision. We present a new detection approach to learn spatiotemporal features with the spatial constraints of a still dynamic image. First, a lightweight spatiotemporal autoencoder has been proposed, capable of real-time image reconstruction. Second, we offer a dynamic network to obtain a compact representation of video frames in motion, reducing false-positive anomaly alerts by spatial constraints. In addition, we adopt the perturbation visual interpretation method for anomaly visualization and localization to improve the credibility of the results. In experiments, our results provide competitive performance across various scenarios. Besides, our approach can process 52.9–63.4 fps in anomaly detection, making it practical for crowd anomaly detection in video surveillance.

APA, Harvard, Vancouver, ISO, and other styles

45

Zhang, Sujie, Ming Deng, and Xiaoyuan Xie. "Real-time recognition of weld defects based on visible spectral image and machine learning." MATEC Web of Conferences 355 (2022): 03014. http://dx.doi.org/10.1051/matecconf/202235503014.

Full text

Abstract:

The quality of Tungsten Inert Gas welding is dependent on human supervision, which can’t suitable for automation. This study designed a model for assessing the tungsten inert gas welding quality with the potential of application in real-time. The model used the K-Nearest Neighborhood (KNN) algorithm, paired with images in the visible spectrum formed by high dynamic range camera. Firstly, projecting the image of weld defects in the training set into a two-dimensional space using multidimensional scaling (MDS), so similar weld defects was aggregated into blocks and distributed in hash, and among different weld defects has overlap. Secondly, establishing models including the KNN, CNN, SVM, CART and NB classification, to classify and recognize the weld defect images. The results show that the KNN model is the best, which has the recognition accuracy of 98%, and the average time of recognizing a single image of 33ms, and suitable for common hardware devices. It can be applied to the image recognition system of automatic welding robot to improve the intelligent level of welding robot.

APA, Harvard, Vancouver, ISO, and other styles

46

Dandi, Yatin, Homanga Bharadhwaj, Abhishek Kumar, and Piyush Rai. "Generalized Adversarially Learned Inference." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 7185–92. http://dx.doi.org/10.1609/aaai.v35i8.16883.

Full text

Abstract:

Allowing effective inference of latent vectors while training GANs can greatly increase their applicability in various downstream tasks. Recent approaches, such as ALI and BiGAN frameworks, develop methods of inference of latent variables in GANs by adversarially training an image generator along with an encoder to match two joint distributions of image and latent vector pairs. We generalize these approaches to incorporate multiple layers of feedback on reconstructions, self-supervision, and other forms of supervision based on prior or learned knowledge about the desired solutions. We achieve this by modifying the discriminator's objective to correctly identify more than two joint distributions of tuples of an arbitrary number of random variables consisting of images, latent vectors, and other variables generated through auxiliary tasks, such as reconstruction and inpainting or as outputs of suitable pre-trained models. We design a non-saturating maximization objective for the generator-encoder pair and prove that the resulting adversarial game corresponds to a global optimum that simultaneously matches all the distributions. Within our proposed framework, we introduce a novel set of techniques for providing self-supervised feedback to the model based on properties, such as patch-level correspondence and cycle consistency of reconstructions. Through comprehensive experiments, we demonstrate the efficacy, scalability, and flexibility of the proposed approach for a variety of tasks. The appendix of the paper can be found at the following link: https://drive.google.com/file/d/1i99e682CqYWMEDXlnqkqrctGLVA9viiz/view?usp=sharing

APA, Harvard, Vancouver, ISO, and other styles

47

Wang, Chong, Zheng-Jun Zha, Dong Liu, and Hongtao Xie. "Robust Deep Co-Saliency Detection with Group Semantic." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8917–24. http://dx.doi.org/10.1609/aaai.v33i01.33018917.

Full text

Abstract:

High-level semantic knowledge in addition to low-level visual cues is essentially crucial for co-saliency detection. This paper proposes a novel end-to-end deep learning approach for robust co-saliency detection by simultaneously learning highlevel group-wise semantic representation as well as deep visual features of a given image group. The inter-image interaction at semantic-level as well as the complementarity between group semantics and visual features are exploited to boost the inferring of co-salient regions. Specifically, the proposed approach consists of a co-category learning branch and a co-saliency detection branch. While the former is proposed to learn group-wise semantic vector using co-category association of an image group as supervision, the latter is to infer precise co-salient maps based on the ensemble of group semantic knowledge and deep visual cues. The group semantic vector is broadcasted to each spatial location of multi-scale visual feature maps and is used as a top-down semantic guidance for boosting the bottom-up inferring of co-saliency. The co-category learning and co-saliency detection branches are jointly optimized in a multi-task learning manner, further improving the robustness of the approach. Moreover, we construct a new large-scale co-saliency dataset COCO-SEG to facilitate research of co-saliency detection. Extensive experimental results on COCO-SEG and a widely used benchmark Cosal2015 have demonstrated the superiority of the proposed approach as compared to the state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Shin, Jisu, Seunghyun Shin, and Hae-Gon Jeon. "Task-Specific Scene Structure Representations." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 2272–81. http://dx.doi.org/10.1609/aaai.v37i2.25322.

Full text

Abstract:

Understanding the informative structures of scenes is essential for low-level vision tasks. Unfortunately, it is difficult to obtain a concrete visual definition of the informative structures because influences of visual features are task-specific. In this paper, we propose a single general neural network architecture for extracting task-specific structure guidance for scenes. To do this, we first analyze traditional spectral clustering methods, which computes a set of eigenvectors to model a segmented graph forming small compact structures on image domains. We then unfold the traditional graph-partitioning problem into a learnable network, named Scene Structure Guidance Network (SSGNet), to represent the task-specific informative structures. The SSGNet yields a set of coefficients of eigenvectors that produces explicit feature representations of image structures. In addition, our SSGNet is light-weight (56K parameters), and can be used as a plug-and-play module for off-the-shelf architectures. We optimize the SSGNet without any supervision by proposing two novel training losses that enforce task-specific scene structure generation during training. Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications including joint upsampling and image denoising. We also demonstrate that our SSGNet generalizes well on unseen datasets, compared to existing methods which use structural embedding frameworks. Our source codes are available at https://github.com/jsshin98/SSGNet.

APA, Harvard, Vancouver, ISO, and other styles

49

Koval, I. "Medical and psychological paradigm supervision somatic patients with comorbid mental disorders." Fundamental and applied researches in practice of leading scientific schools 27, no. 3 (June 29, 2018): 46–56. http://dx.doi.org/10.33531/farplss.2018.3.06.

Full text

Abstract:

The thesis is devoted to the research of medical and psychological paradigm Supervision of patients. Studies performed on clinical material of patients with a wide range of somatic disorders. This reveals the conceptual foundations of medical activities as a complex medical and psycho-pedagogical process, which not only improves the efficiency of the professional activities of doctors, but also the formation of individual image of the world of patients achieving this level of personal development where disease and related restrictions do not interfere with their self-development based on the existing system of values and meanings. The paper provides a detailed analysis of the interaction of the three components of medical and psychological care to patients and their families: psychodiagnosis psyhoeducation and correction (psychotherapy). Presents a program of psychological diagnosis of somatic patients, including a study of the characteristics of emotional state and distress intrapsychological and behavioral patterns, strategies for overcoming stress behavior and family functioning determine the influence of psychic correction targets formulated the task of medical and psychological assistance and the amount psyhoeducation classes. Concepts and practical application of medical and psychological training of general practitioners and medical internist at the stages of pre-tested and post-graduate training in preparation of interns, and the pre-cycles thematic improvement of doctors. Determined, medical, psychological and educational determinants doctor's practice, positive motivation of their activities; preparedness to common therapeutic activities; focus on subject-subject interaction; mastering the techniques of effective communication and more.

APA, Harvard, Vancouver, ISO, and other styles

50

Wei, Ruifang, and Yukun Wu. "Image Inpainting via Context Discriminator and U-Net." Mathematical Problems in Engineering 2022 (May 6, 2022): 1–12. http://dx.doi.org/10.1155/2022/7328045.

Full text

Abstract:

Image inpainting is one of the research hotspots in the field of computer vision and image processing. The image inpainting methods based on deep learning models had made some achievements, but it is difficult to achieve ideal results when dealing with images with the relationship between global and local attributions. In particular, when repairing a large area of image defects, the semantic rationality, structural coherence, and detail accuracy of results need to be improved. In view of the existing shortcomings, this study has proposed an improved image inpainting model based on a fully convolutional neural network and generative countermeasure network. A novel image inpainting algorithm via network has been proposed as a generator to repair the defect image, and structural similarity had introduced as the reconstruction loss of image inpainting to supervise and guide model learning from the perspective of the human visual system to improve the effect of image inpainting. The improved global and local context discriminator networks had used as context discriminators to judge the authenticity of the repair results. At the same time, combined with the adversarial loss, a joint loss has proposed for the training of the supervision model, which makes the content of the real and natural repair area and has attribute consistency with the whole image. To verify the effectiveness of the proposed image inpainting model, the image inpainting effect is compared with the current mainstream image inpainting algorithm on the CelebA-HQ dataset based on subjective and objective indicators. The experimental results show that the proposed method has made progress in semantic rationality, structural coherence, and detail accuracy. The proposed model has a better understanding of the high-level semantics of image, and a more accurate grasp of context and detailed information.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!