Academic literature on the topic 'Graph and Multi-view Memory Attention'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Graph and Multi-view Memory Attention.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Graph and Multi-view Memory Attention"

1

Ai, Bing, Yibing Wang, Liang Ji, Jia Yi, Ting Wang, Wentao Liu, and Hui Zhou. "A graph neural network fused with multi-head attention for text classification." Journal of Physics: Conference Series 2132, no. 1 (December 1, 2021): 012032. http://dx.doi.org/10.1088/1742-6596/2132/1/012032.

Full text
Abstract:
Abstract Graph neural network (GNN) has done a good job of processing intricate architecture and fusion of global messages, research has explored GNN technology for text classification. However, the model that fixed the entire corpus as a graph in the past faced many problems such as high memory consumption and the inability to modify the construction of the graph. We propose an improved model based on GNN to solve these problems. The model no longer fixes the entire corpus as a graph but constructs different graphs for each text. This method reduces memory consumption, but still retains global information. We conduct experiments on the R8, R52, and 20newsgroups data sets, and use accuracy as the experimental standard. Experiments show that even if it consumes less memory, our model accomplish higher than existing models on multiple text classification data sets.
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Di, Hui Xu, Jianzhong Wang, Yinghua Lu, Jun Kong, and Miao Qi. "Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition." Sensors 21, no. 20 (October 12, 2021): 6761. http://dx.doi.org/10.3390/s21206761.

Full text
Abstract:
Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of this kind of method. In this work, we propose a novel Adaptive Attention Memory Graph Convolutional Networks (AAM-GCN) for human action recognition using skeleton data. We adopt GCN to adaptively model the spatial configuration of skeletons and employ Gated Recurrent Unit (GRU) to construct an attention-enhanced memory for capturing the temporal feature. With the memory module, our model can not only remember what happened in the past but also employ the information in the future using multi-bidirectional GRU layers. Furthermore, in order to extract discriminative temporal features, the attention mechanism is also employed to select key frames from the skeleton sequence. Extensive experiments on Kinetics, NTU RGB+D and HDM05 datasets show that the proposed network achieves better performance than some state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
3

Feng, Aosong, Irene Li, Yuang Jiang, and Rex Ying. "Diffuser: Efficient Transformers with Multi-Hop Attention Diffusion for Long Sequences." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 12772–80. http://dx.doi.org/10.1609/aaai.v37i11.26502.

Full text
Abstract:
Efficient Transformers have been developed for long sequence modeling, due to their subquadratic memory and time complexity. Sparse Transformer is a popular approach to improving the efficiency of Transformers by restricting self-attention to locations specified by the predefined sparse patterns. However, leveraging sparsity may sacrifice expressiveness compared to full-attention, when important token correlations are multiple hops away. To combine advantages of both the efficiency of sparse transformer and the expressiveness of full-attention Transformer, we propose Diffuser, a new state-of-the-art efficient Transformer. Diffuser incorporates all token interactions within one attention layer while maintaining low computation and memory costs. The key idea is to expand the receptive field of sparse attention using Attention Diffusion, which computes multi-hop token correlations based on all paths between corresponding disconnected tokens, besides attention among neighboring tokens. Theoretically, we show the expressiveness of Diffuser as a universal sequence approximator for sequence-to-sequence modeling, and investigate its ability to approximate full-attention by analyzing the graph expander property from the spectral perspective. Experimentally, we investigate the effectiveness of Diffuser with extensive evaluations, including language modeling, image modeling, and Long Range Arena (LRA). Evaluation results show that Diffuser achieves improvements by an average of 0.94% on text classification tasks and 2.30% on LRA, with 1.67x memory savings compared to state-of-the-art benchmarks, which demonstrates superior performance of Diffuser in both expressiveness and efficiency aspects.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Mingxiao, and Marie-Francine Moens. "Dynamic Key-Value Memory Enhanced Multi-Step Graph Reasoning for Knowledge-Based Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10983–92. http://dx.doi.org/10.1609/aaai.v36i10.21346.

Full text
Abstract:
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to correctly answer image-related questions using knowledge that is not presented in the given image. It is not only a more challenging task than regular VQA but also a vital step towards building a general VQA system. Most existing knowledge-based VQA systems process knowledge and image information similarly and ignore the fact that the knowledge base (KB) contains complete information about a triplet, while the extracted image information might be incomplete as the relations between two objects are missing or wrongly detected. In this paper, we propose a novel model named dynamic knowledge memory enhanced multi-step graph reasoning (DMMGR), which performs explicit and implicit reasoning over a key-value knowledge memory module and a spatial-aware image graph, respectively. Specifically, the memory module learns a dynamic knowledge representation and generates a knowledge-aware question representation at each reasoning step. Then, this representation is used to guide a graph attention operator over the spatial-aware image graph. Our model achieves new state-of-the-art accuracy on the KRVQR and FVQA datasets. We also conduct ablation experiments to prove the effectiveness of each component of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
5

Jung, Tae-Won, Chi-Seo Jeong, In-Seon Kim, Min-Su Yu, Soon-Chul Kwon, and Kye-Dong Jung. "Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud." Sensors 22, no. 21 (October 25, 2022): 8166. http://dx.doi.org/10.3390/s22218166.

Full text
Abstract:
Graph Neural Networks (GNNs) are neural networks that learn the representation of nodes and associated edges that connect it to every other node while maintaining graph representation. Graph Convolutional Neural Networks (GCNs), as a representative method in GNNs, in the context of computer vision, utilize conventional Convolutional Neural Networks (CNNs) to process data supported by graphs. This paper proposes a one-stage GCN approach for 3D object detection and poses estimation by structuring non-linearly distributed points of a graph. Our network provides the required details to analyze, generate and estimate bounding boxes by spatially structuring the input data into graphs. Our method proposes a keypoint attention mechanism that aggregates the relative features between each point to estimate the category and pose of the object to which the vertices of the graph belong, and also designs nine degrees of freedom of multi-object pose estimation. In addition, to avoid gimbal lock in 3D space, we use quaternion rotation, instead of Euler angle. Experimental results showed that memory usage and efficiency could be improved by aggregating point features from the point cloud and their neighbors in a graph structure. Overall, the system achieved comparable performance against state-of-the-art systems.
APA, Harvard, Vancouver, ISO, and other styles
6

Cui, Wei, Fei Wang, Xin He, Dongyou Zhang, Xuxiang Xu, Meng Yao, Ziwei Wang, and Jiejun Huang. "Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model." Remote Sensing 11, no. 9 (May 2, 2019): 1044. http://dx.doi.org/10.3390/rs11091044.

Full text
Abstract:
A comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing objects more accurately. Compared with traditional remote sensing object recognition methods, deep learning has the advantages of high accuracy and strong generalizability regarding scene classification and semantic segmentation. However, it is difficult to simultaneously recognize remote sensing objects and their spatial relationship from end-to-end only relying on present deep learning networks. To address this problem, we propose a multi-scale remote sensing image interpretation network, called the MSRIN. The architecture of the MSRIN is a parallel deep neural network based on a fully convolutional network (FCN), a U-Net, and a long short-term memory network (LSTM). The MSRIN recognizes remote sensing objects and their spatial relationship through three processes. First, the MSRIN defines a multi-scale remote sensing image caption strategy and simultaneously segments the same image using the FCN and U-Net on different spatial scales so that a two-scale hierarchy is formed. The output of the FCN and U-Net are masked to obtain the location and boundaries of remote sensing objects. Second, using an attention-based LSTM, the remote sensing image captions include the remote sensing objects (nouns) and their spatial relationships described with natural language. Finally, we designed a remote sensing object recognition and correction mechanism to build the relationship between nouns in captions and object mask graphs using an attention weight matrix to transfer the spatial relationship from captions to objects mask graphs. In other words, the MSRIN simultaneously realizes the semantic segmentation of the remote sensing objects and their spatial relationship identification end-to-end. Experimental results demonstrated that the matching rate between samples and the mask graph increased by 67.37 percentage points, and the matching rate between nouns and the mask graph increased by 41.78 percentage points compared to before correction. The proposed MSRIN has achieved remarkable results.
APA, Harvard, Vancouver, ISO, and other styles
7

Hou, Miaomiao, Xiaofeng Hu, Jitao Cai, Xinge Han, and Shuaiqi Yuan. "An Integrated Graph Model for Spatial–Temporal Urban Crime Prediction Based on Attention Mechanism." ISPRS International Journal of Geo-Information 11, no. 5 (April 30, 2022): 294. http://dx.doi.org/10.3390/ijgi11050294.

Full text
Abstract:
Crime issues have been attracting widespread attention from citizens and managers of cities due to their unexpected and massive consequences. As an effective technique to prevent and control urban crimes, the data-driven spatial–temporal crime prediction can provide reasonable estimations associated with the crime hotspot. It thus contributes to the decision making of relevant departments under limited resources, as well as promotes civilized urban development. However, the deficient performance in the aspect of the daily spatial–temporal crime prediction at the urban-district-scale needs to be further resolved, which serves as a critical role in police resource allocation. In order to establish a practical and effective daily crime prediction framework at an urban police-district-scale, an “online” integrated graph model is proposed. A residual neural network (ResNet), graph convolutional network (GCN), and long short-term memory (LSTM) are integrated with an attention mechanism in the proposed model to extract and fuse the spatial–temporal features, topological graphs, and external features. Then, the “online” integrated graph model is validated by daily theft and assault data within 22 police districts in the city of Chicago, US from 1 January 2015 to 7 January 2020. Additionally, several widely used baseline models, including autoregressive integrated moving average (ARIMA), ridge regression, support vector regression (SVR), random forest, extreme gradient boosting (XGBoost), LSTM, convolutional neural network (CNN), and Conv-LSTM models, are compared with the proposed model from a quantitative point of view by using the same dataset. The results show that the predicted spatial–temporal patterns by the proposed model are close to the observations. Moreover, the integrated graph model performs more accurately since it has lower average values of the mean absolute error (MAE) and root mean square error (RMSE) than the other eight models. Therefore, the proposed model has great potential in supporting the decision making for the police in the fields of patrolling and investigation, as well as resource allocation.
APA, Harvard, Vancouver, ISO, and other styles
8

Mi, Chunlei, Shifen Cheng, and Feng Lu. "Predicting Taxi-Calling Demands Using Multi-Feature and Residual Attention Graph Convolutional Long Short-Term Memory Networks." ISPRS International Journal of Geo-Information 11, no. 3 (March 9, 2022): 185. http://dx.doi.org/10.3390/ijgi11030185.

Full text
Abstract:
Predicting taxi-calling demands at the urban area level is vital to coordinate the supply–demand balance of the urban taxi system. Differing travel patterns, the impact of external data, and the expression of dynamic spatiotemporal demand dependence pose challenges to predicting demand. Here, a framework using residual attention graph convolutional long short-term memory networks (RAGCN-LSTMs) is proposed to predict taxi-calling demands. It consists of a spatial dependence (SD) extractor, which extracts SD features; an external dependence extractor, which extracts traffic environment-related features; a pattern dependence (PD) extractor, which extracts the PD of demands for different zones; and a temporal dependence extractor and predictor, which leverages the abovementioned features into an LSTM model to extract temporal dependence and predict demands. Experiments were conducted on taxi-calling records of Shanghai City. The results showed that the prediction accuracies of the RAGCN-LSTMs model were a mean absolute error of 0.8664, a root mean square error of 1.4965, and a symmetric mean absolute percentage error of 43.11%. It outperformed both classical time-series prediction methods and other deep learning models. Further, to illustrate the advantages of the proposed model, we investigated its predicting performance in various demand densities in multiple urban areas and proved its robustness and superiority.
APA, Harvard, Vancouver, ISO, and other styles
9

Karimanzira, Divas, Linda Ritzau, and Katharina Emde. "Catchment Area Multi-Streamflow Multiple Hours Ahead Forecast Based on Deep Learning." Transactions on Machine Learning and Artificial Intelligence 10, no. 5 (September 29, 2022): 15–29. http://dx.doi.org/10.14738/tmlai.105.13049.

Full text
Abstract:
Modeling of rainfall-runoff is very critical for flood prediction studies in decision making for disaster management. Deep learning methods have proven to be very useful in hydrological prediction. To increase their acceptance in the hydrological community, they must be physic-informed and show some interpretability. They are several ways this can be achieved e.g. by learning from a fully-trained hydrological model which assumes the availability of the hydrological model or to use physic-informed data. In this work we developed a Graph Attention Network (GAT) with learnable Adjacency Matrix coupled with a Bi-directional Gated Temporal Convolutional Neural Network (2DGAT-BiLSTM). Physic-informed data with spatial information from Digital Elevation Model and geographical data is used to train it. Besides, precipitation, evapotranspiration and discharge, the model utilizes the catchment area characteristic information, such as instantaneous slope, soil type, drainage area etc. The method is compared to two different current developments in deep learning structures for streamflow prediction, which also utilize all the spatial and temporal information in an integrated way. One, namely Graph Neural Rainfall-Runoff Models (GNRRM) uses timeseries prediction on each node and a Graph Neural Network (GNN) to route the information to the target node and another one called STA-LSTM is based on Spatial and temporal Attention Mechanism and Long Short Term Memory (LSTM) for prediction. The different methods were compared in their performance in predicting the flow at several points of a pilot catchment area. With an average prediction NSE and KGE of 0.995 and 0.981, respectively for 2DGAT-BiLSTM, it could be shown that graph attention mechanism and learning the adjacency matrix for spatial information can boost the model performance and robustness, and bring interpretability and with the inclusion of domain knowledge the acceptance of the models.
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Changhai, Jiaxi Ren, and Hui Liang. "MSGraph: Modeling multi-scale K-line sequences with graph attention network for profitable indices recommendation." Electronic Research Archive 31, no. 5 (2023): 2626–50. http://dx.doi.org/10.3934/era.2023133.

Full text
Abstract:
<abstract><p>Indices recommendation is a long-standing topic in stock market investment. Predicting the future trends of indices and ranking them based on the prediction results is the main scheme for indices recommendation. How to improve the forecasting performance is the central issue of this study. Inspired by the widely used trend-following investing strategy in financial investment, the indices' future trends are related to not only the nearby transaction data but also the long-term historical data. This article proposes the MSGraph, which tries to improve the index ranking performance by modeling the correlations of short and long-term historical embeddings with the graph attention network. The original minute-level transaction data is first synthesized into a series of K-line sequences with varying time scales. Each K-line sequence is input into a long short-term memory network (LSTM) to get the sequence embedding. Then, the embeddings for all indices with the same scale are fed into a graph convolutional network to achieve index aggregation. All the aggregated embeddings for the same index are input into a graph attention network to fuse the scale interactions. Finally, a fully connected network produces the index return ratio for the next day, and the recommended indices are obtained through ranking. In total, 60 indices in the Chinese stock market are selected as experimental data. The mean reciprocal rank, precision, accuracy and investment return ratio are used as evaluation metrics. The comparison results show that our method achieves state-of-the-art results in all evaluation metrics, and the ablation study also demonstrates that the combination of multiple scale K-lines facilitates the indices recommendation.</p></abstract>
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Graph and Multi-view Memory Attention"

1

Vijaikumar, M., Shirish Shevade, and M. Narasimha Murty. "GAMMA: A Graph and Multi-view Memory Attention Mechanism for Top-N Heterogeneous Recommendation." In Advances in Knowledge Discovery and Data Mining, 28–40. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-47426-3_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Junxin, Kuijie Lin, Xiang Chen, Xijun Wang, and Terng-Yin Hsu. "Location Recommendations Based on Multi-view Learning and Attention-Enhanced Graph Networks." In Big Data and Social Computing, 83–95. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-3925-1_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Song, Jie, Zhe Xue, Junping Du, Feifei Kou, Meiyu Liang, and Mingying Xu. "Multi-view Relevance Matching Model of Scientific Papers Based on Graph Convolutional Network and Attention Mechanism." In Artificial Intelligence, 724–34. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-93046-2_61.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Graph and Multi-view Memory Attention"

1

Han, Qilong, Dan Lu, and Rui Chen. "Fine-Grained Air Quality Inference via Multi-Channel Attention Model." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/346.

Full text
Abstract:
In this paper, we study the problem of fine-grained air quality inference that predicts the air quality level of any location from air quality readings of nearby monitoring stations. We point out the importance of explicitly modeling both static and dynamic spatial correlations, and consequently propose a novel multi-channel attention model (MCAM) that models static and dynamic spatial correlations as separate channels. The static channel combines the beauty of attention mechanisms and graph-based spatial modeling via an adapted bilateral filtering technique, which considers not only locations' Euclidean distances but also their similarity of geo-context features. The dynamic channel learns stations' time-dependent spatial influence on a target location at each time step via long short-term memory (LSTM) networks and attention mechanisms. In addition, we introduce two novel ideas, atmospheric dispersion theories and the hysteretic nature of air pollutant dispersion, to better model the dynamic spatial correlation. We also devise a multi-channel graph convolutional fusion network to effectively fuse the graph outputs, along with other features, from both channels. Our extensive experiments on real-world benchmark datasets demonstrate that MCAM significantly outperforms the state-of-the-art solutions.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhao, Mingxia, and Adele Lu Jia. "Multi-View Heterogeneous Graph Attention Network." In 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2023. http://dx.doi.org/10.1109/cscwd57460.2023.10152688.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chen, Dianying, Xiumei Wei, and Xuesong Jiang. "Multi-view clustering method based on graph attention autoencoder." In 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta). IEEE, 2022. http://dx.doi.org/10.1109/smartworld-uic-atc-scalcom-digitaltwin-pricomp-metaverse56740.2022.00213.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Fu, You, Siyu Fang, Rui Wang, Xiulong Yi, Jianzhi Yu, and Rong Hua. "Multi-view Attention with Memory Assistant for Image Captioning." In 2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC ). IEEE, 2022. http://dx.doi.org/10.1109/iaeac54830.2022.9929571.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Dongyue, Ruonan Liu, Wenlong Yu, Kai Zhang, Yusheng Pu, and Di Cao. "Fault Diagnosis of Industrial Control System With Graph Attention Network on Multi-view Graph." In 2021 5th Asian Conference on Artificial Intelligence Technology (ACAIT). IEEE, 2021. http://dx.doi.org/10.1109/acait53529.2021.9731197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cui, Nan, Chunqi Chen, Beijun Shen, and Yuting Chen. "Learning to Match Workers and Tasks via a Multi-View Graph Attention Network." In 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2021. http://dx.doi.org/10.1109/compsac51774.2021.00035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cheng, Jiafeng, Qianqian Wang, Zhiqiang Tao, Deyan Xie, and Quanxue Gao. "Multi-View Attribute Graph Convolution Networks for Clustering." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/411.

Full text
Abstract:
Graph neural networks (GNNs) have made considerable achievements in processing graph-structured data. However, existing methods can not allocate learnable weights to different nodes in the neighborhood and lack of robustness on account of neglecting both node attributes and graph reconstruction. Moreover, most of multi-view GNNs mainly focus on the case of multiple graphs, while designing GNNs for solving graph-structured data of multi-view attributes is still under-explored. In this paper, we propose a novel Multi-View Attribute Graph Convolution Networks (MAGCN) model for the clustering task. MAGCN is designed with two-pathway encoders that map graph embedding features and learn the view-consistency information. Specifically, the first pathway develops multi-view attribute graph attention networks to reduce the noise/redundancy and learn the graph embedding features for each multi-view graph data. The second pathway develops consistent embedding encoders to capture the geometric relationship and probability distribution consistency among different views, which adaptively finds a consistent clustering embedding space for multi-view attributes. Experiments on three benchmark graph datasets show the superiority of our method compared with several state-of-the-art algorithms.
APA, Harvard, Vancouver, ISO, and other styles
8

Cui, Chenhang, Yazhou Ren, Jingyu Pu, Xiaorong Pu, and Lifang He. "Deep Multi-view Subspace Clustering with Anchor Graph." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/398.

Full text
Abstract:
Deep multi-view subspace clustering (DMVSC) has recently attracted increasing attention due to its promising performance. However, existing DMVSC methods still have two issues: (1) they mainly focus on using autoencoders to nonlinearly embed the data, while the embedding may be suboptimal for clustering because the clustering objective is rarely considered in autoencoders, and (2) existing methods typically have a quadratic or even cubic complexity, which makes it challenging to deal with large-scale data. To address these issues, in this paper we propose a novel deep multi-view subspace clustering method with anchor graph (DMCAG). To be specific, DMCAG firstly learns the embedded features for each view independently, which are used to obtain the subspace representations. To significantly reduce the complexity, we construct an anchor graph with small size for each view. Then, spectral clustering is performed on an integrated anchor graph to obtain pseudo-labels. To overcome the negative impact caused by suboptimal embedded features, we use pseudo-labels to refine the embedding process to make it more suitable for the clustering task. Pseudo-labels and embedded features are updated alternately. Furthermore, we design a strategy to keep the consistency of the labels based on contrastive learning to enhance the clustering performance. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Mingyang, Tong Li, Yong Li, and Pan Hui. "Multi-View Joint Graph Representation Learning for Urban Region Embedding." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/611.

Full text
Abstract:
The increasing amount of urban data enable us to investigate urban dynamics, assist urban planning, and eventually, make our cities more livable and sustainable. In this paper, we focus on learning an embedding space from urban data for urban regions. For the first time, we propose a multi-view joint learning model to learn comprehensive and representative urban region embeddings. We first model different types of region correlations based on both human mobility and inherent region properties. Then, we apply a graph attention mechanism in learning region representations from each view of the built correlations. Moreover, we introduce a joint learning module that boosts the region embedding learning by sharing cross-view information and fuses multi-view embeddings by learning adaptive weights. Finally, we exploit the learned embeddings in the downstream applications of land usage classification and crime prediction in urban areas with real-world data. Extensive experiment results demonstrate that by exploiting our proposed joint learning model, the performance is improved by a large margin on both tasks compared with the state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Weitao, Hongbin Xu, Zhipeng Zhou, Yang Liu, Baigui Sun, Wenxiong Kang, and Xuansong Xie. "CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/67.

Full text
Abstract:
The core of Multi-view Stereo(MVS) is the matching process among reference and source pixels. Cost aggregation plays a significant role in this process, while previous methods focus on handling it via CNNs. This may inherit the natural limitation of CNNs that fail to discriminate repetitive or incorrect matches due to limited local receptive fields. To handle the issue, we aim to involve Transformer into cost aggregation. However, another problem may occur due to the quadratically growing computational complexity caused by Transformer, resulting in memory overflow and inference latency. In this paper, we overcome these limits with an efficient Transformer-based cost aggregation network, namely CostFormer. The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions. Furthermore, Residual Regression Transformer(RRT) is proposed to enhance spatial attention. The proposed method is a universal plug-in to improve learning-based MVS methods.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography