To see the other types of publications on this topic, follow the link: Cross modal person re-identification.

Journal articles on the topic 'Cross modal person re-identification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Cross modal person re-identification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hafner, Frank M., Amran Bhuyian, Julian F. P. Kooij, and Eric Granger. "Cross-modal distillation for RGB-depth person re-identification." Computer Vision and Image Understanding 216 (February 2022): 103352. http://dx.doi.org/10.1016/j.cviu.2021.103352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Minghui, Yafei Zhang, and Huafeng Li. "Survey of Cross-Modal Person Re-Identification from a Mathematical Perspective." Mathematics 11, no. 3 (January 28, 2023): 654. http://dx.doi.org/10.3390/math11030654.

Full text
Abstract:
Person re-identification (Re-ID) aims to retrieve a particular pedestrian’s identification from a surveillance system consisting of non-overlapping cameras. In recent years, researchers have begun to focus on open-world person Re-ID tasks based on non-ideal situations. One of the most representative of these is cross-modal person Re-ID, which aims to match probe data with target data from different modalities. According to the modalities of probe and target data, we divided cross-modal person Re-ID into visible–infrared, visible–depth, visible–sketch, and visible–text person Re-ID. In cross-modal person Re-ID, the most challenging problem is the modal gap. According to the different methods of narrowing the modal gap, we classified the existing works into picture-based style conversion methods, feature-based modality-invariant embedding mapping methods, and modality-unrelated auxiliary information mining methods. In addition, by generalizing the aforementioned works, we find that although deep-learning-based models perform well, the black-box-like learning process makes these models less interpretable and generalized. Therefore, we attempted to interpret different cross-modal person Re-ID models from a mathematical perspective. Through the above work, we attempt to compensate for the lack of mathematical interpretation of models in previous person Re-ID reviews and hope that our work will bring new inspiration to researchers.
APA, Harvard, Vancouver, ISO, and other styles
3

Xie, Zhongwei, Lin Li, Xian Zhong, Luo Zhong, and Jianwen Xiang. "Image-to-video person re-identification with cross-modal embeddings." Pattern Recognition Letters 133 (May 2020): 70–76. http://dx.doi.org/10.1016/j.patrec.2019.03.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Diangang, Xing Wei, Xiaopeng Hong, and Yihong Gong. "Infrared-Visible Cross-Modal Person Re-Identification with an X Modality." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4610–17. http://dx.doi.org/10.1609/aaai.v34i04.5891.

Full text
Abstract:
This paper focuses on the emerging Infrared-Visible cross-modal person re-identification task (IV-ReID), which takes infrared images as input and matches with visible color images. IV-ReID is important yet challenging, as there is a significant gap between the visible and infrared images. To reduce this ‘gap’, we introduce an auxiliary X modality as an assistant and reformulate infrared-visible dual-mode cross-modal learning as an X-Infrared-Visible three-mode learning problem. The X modality restates from RGB channels to a format with which cross-modal learning can be easily performed. With this idea, we propose an X-Infrared-Visible (XIV) ReID cross-modal learning framework. Firstly, the X modality is generated by a lightweight network, which is learnt in a self-supervised manner with the labels inherited from visible images. Secondly, under the XIV framework, cross-modal learning is guided by a carefully designed modality gap constraint, with information exchanged cross the visible, X, and infrared modalities. Extensive experiments are performed on two challenging datasets SYSU-MM01 and RegDB to evaluate the proposed XIV-ReID approach. Experimental results show that our method considerably achieves an absolute gain of over 7% in terms of rank 1 and mAP even compared with the latest state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Lin, Ronghui, Rong Wang, Wenjing Zhang, Ao Wu, and Yihan Bi. "Joint Modal Alignment and Feature Enhancement for Visible-Infrared Person Re-Identification." Sensors 23, no. 11 (May 23, 2023): 4988. http://dx.doi.org/10.3390/s23114988.

Full text
Abstract:
Visible-infrared person re-identification aims to solve the matching problem between cross-camera and cross-modal person images. Existing methods strive to perform better cross-modal alignment, but often neglect the critical importance of feature enhancement for achieving better performance. Therefore, we proposed an effective method that combines both modal alignment and feature enhancement. Specifically, we introduced Visible-Infrared Modal Data Augmentation (VIMDA) for visible images to improve modal alignment. Margin MMD-ID Loss was also used to further enhance modal alignment and optimize model convergence. Then, we proposed Multi-Grain Feature Extraction (MGFE) Structure for feature enhancement to further improve recognition performance. Extensive experiments have been carried out on SYSY-MM01 and RegDB. The result indicates that our method outperforms the current state-of-the-art method for visible-infrared person re-identification. Ablation experiments verified the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
6

Syed, Muhammad Adnan, Yongsheng Ou, Tao Li, and Guolai Jiang. "Lightweight Multimodal Domain Generic Person Reidentification Metric for Person-Following Robots." Sensors 23, no. 2 (January 10, 2023): 813. http://dx.doi.org/10.3390/s23020813.

Full text
Abstract:
Recently, person-following robots have been increasingly used in many real-world applications, and they require robust and accurate person identification for tracking. Recent works proposed to use re-identification metrics for identification of the target person; however, these metrics suffer due to poor generalization, and due to impostors in nonlinear multi-modal world. This work learns a domain generic person re-identification to resolve real-world challenges and to identify the target person undergoing appearance changes when moving across different indoor and outdoor environments or domains. Our generic metric takes advantage of novel attention mechanism to learn deep cross-representations to address pose, viewpoint, and illumination variations, as well as jointly tackling impostors and style variations the target person randomly undergoes in various indoor and outdoor domains; thus, our generic metric attains higher recognition accuracy of target person identification in complex multi-modal open-set world, and attains 80.73% and 64.44% Rank-1 identification in multi-modal close-set PRID and VIPeR domains, respectively.
APA, Harvard, Vancouver, ISO, and other styles
7

Farooq, Ammarah, Muhammad Awais, Josef Kittler, and Syed Safwan Khalid. "AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 4477–85. http://dx.doi.org/10.1609/aaai.v36i4.20370.

Full text
Abstract:
Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align cross-modality representations conforming to semantic information present for a person and ignore background information. This work presents a novel convolutional neural network (CNN) based architecture designed to learn semantically aligned cross-modal visual and textual representations. The underlying building block, named AXM-Block, is a unified multi-layer network that dynamically exploits the multi-scale knowledge from both modalities and re-calibrates each modality according to shared semantics. To complement the convolutional design, contextual attention is applied in the text branch to manipulate long-term dependencies. Moreover, we propose a unique design to enhance visual part-based feature coherence and locality information. Our framework is novel in its ability to implicitly learn aligned semantics between modalities during the feature learning stage. The unified feature learning effectively utilizes textual data as a super-annotation signal for visual representation learning and automatically rejects irrelevant information. The entire AXM-Net is trained end-to-end on CUHK-PEDES data. We report results on two tasks, person search and cross-modal Re-ID. The AXM-Net outperforms the current state-of-the-art (SOTA) methods and achieves 64.44% Rank@1 on the CUHK-PEDES test set. It also outperforms by >10% for cross-viewpoint text-to-image Re-ID scenarios on CrossRe-ID and CUHK-SYSU datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

Zheng, Aihua, Zi Wang, Zihan Chen, Chenglong Li, and Jin Tang. "Robust Multi-Modality Person Re-identification." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3529–37. http://dx.doi.org/10.1609/aaai.v35i4.16467.

Full text
Abstract:
To avoid the illumination limitation in visible person re-identification (Re-ID) and the heterogeneous issue in cross-modality Re-ID, we propose to utilize complementary advantages of multiple modalities including visible (RGB), near infrared (NI) and thermal infrared (TI) ones for robust person Re-ID. A novel progressive fusion network is designed to learn effective multi-modal features from single to multiple modalities and from local to global views. Our method works well in diversely challenging scenarios even in the presence of missing modalities. Moreover, we contribute a comprehensive benchmark dataset, RGBNT201, including 201 identities captured from various challenging conditions, to facilitate the research of RGB-NI-TI multi-modality person Re-ID. Comprehensive experiments on RGBNT201 dataset comparing to the state-of-the-art methods demonstrate the contribution of multi-modality person Re-ID and the effectiveness of the proposed approach, which launch a new benchmark and a new baseline for multi-modality person Re-ID.
APA, Harvard, Vancouver, ISO, and other styles
9

Shi, Shuo, Changwei Huo, Yingchun Guo, Stephen Lean, Gang Yan, and Ming Yu. "Truncated attention mechanism and cascade loss for cross-modal person re-identification." Journal of Intelligent & Fuzzy Systems 41, no. 6 (December 16, 2021): 6575–87. http://dx.doi.org/10.3233/jifs-210382.

Full text
Abstract:
Person re-identification with natural language description is a process of retrieving the corresponding person’s image from an image dataset according to a text description of the person. The key challenge in this cross-modal task is to extract visual and text features and construct loss functions to achieve cross-modal matching between text and image. Firstly, we designed a two-branch network framework for person re-identification with natural language description. In this framework we include the following: a Bi-directional Long Short-Term Memory (Bi-LSTM) network is used to extract text features and a truncated attention mechanism is proposed to select the principal component of the text features; a MobileNet is used to extract image features. Secondly, we proposed a Cascade Loss Function (CLF), which includes cross-modal matching loss and single modal classification loss, both with relative entropy function, to fully exploit the identity-level information. The experimental results on the CUHK-PEDES dataset demonstrate that our method achieves better results in Top-5 and Top-10 than other current 10 state-of-the-art algorithms.
APA, Harvard, Vancouver, ISO, and other styles
10

Yan, Shiyang, Jianan Zhao, and Lin Xu. "Adaptive multi-task learning for cross domain and modal person re-identification." Neurocomputing 486 (May 2022): 123–34. http://dx.doi.org/10.1016/j.neucom.2021.11.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Huang Bochun, 黄勃淳, 李凡 Li Fan, and 汪淑娟 Wang Shujuan. "跨模态身份不一致的交叉分类素描行人重识别." Laser & Optoelectronics Progress 60, no. 4 (2023): 0410006. http://dx.doi.org/10.3788/lop212820.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Wu, Jingjing, Jianguo Jiang, Meibin Qi, Cuiqun Chen, and Jingjing Zhang. "An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification." ACM Transactions on Multimedia Computing, Communications, and Applications 18, no. 4 (November 30, 2022): 1–22. http://dx.doi.org/10.1145/3506708.

Full text
Abstract:
The RGB-D cross-modal person re-identification (re-id) task aims to identify the person of interest across the RGB and depth image modes. The tremendous discrepancy between these two modalities makes this task difficult to tackle. Few researchers pay attention to this task, and the deep networks of existing methods still cannot be trained in an end-to-end manner. Therefore, this article proposes an end-to-end module for RGB-D cross-modal person re-id. This network introduces a cross-modal relational branch to narrow the gaps between two heterogeneous images. It models the abundant correlations between any cross-modal sample pairs, which are constrained by heterogeneous interactive learning. The proposed network also exploits a dual-modal local branch, which aims to capture the common spatial contexts in two modalities. This branch adopts shared attentive pooling and mutual contextual graph networks to extract the spatial attention within each local region and the spatial relations between distinct local parts, respectively. Experimental results on two public benchmark datasets, that is, the BIWI and RobotPKU datasets, demonstrate that our method is superior to the state-of-the-art. In addition, we perform thorough experiments to prove the effectiveness of each component in the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
13

Zhao, Qianqian, Hanxiao Wu, and Jianqing Zhu. "Margin-Based Modal Adaptive Learning for Visible-Infrared Person Re-Identification." Sensors 23, no. 3 (January 27, 2023): 1426. http://dx.doi.org/10.3390/s23031426.

Full text
Abstract:
Visible-infrared person re-identification (VIPR) has great potential for intelligent transportation systems for constructing smart cities, but it is challenging to utilize due to the huge modal discrepancy between visible and infrared images. Although visible and infrared data can appear to be two domains, VIPR is not identical to domain adaptation as it can massively eliminate modal discrepancies. Because VIPR has complete identity information on both visible and infrared modalities, once the domain adaption is overemphasized, the discriminative appearance information on the visible and infrared domains would drain. For that, we propose a novel margin-based modal adaptive learning (MMAL) method for VIPR in this paper. On each domain, we apply triplet and label smoothing cross-entropy functions to learn appearance-discriminative features. Between the two domains, we design a simple yet effective marginal maximum mean discrepancy (M3D) loss function to avoid an excessive suppression of modal discrepancies to protect the features’ discriminative ability on each domain. As a result, our MMAL method could learn modal-invariant yet appearance-discriminative features for improving VIPR. The experimental results show that our MMAL method acquires state-of-the-art VIPR performance, e.g., on the RegDB dataset in the visible-to-infrared retrieval mode, the rank-1 accuracy is 93.24% and the mean average precision is 83.77%.
APA, Harvard, Vancouver, ISO, and other styles
14

Jiang, Jianguo, Kaiyuan Jin, Meibin Qi, Qian Wang, Jingjing Wu, and Cuiqun Chen. "A Cross-Modal Multi-granularity Attention Network for RGB-IR Person Re-identification." Neurocomputing 406 (September 2020): 59–67. http://dx.doi.org/10.1016/j.neucom.2020.03.109.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Chan, Sixian, Feng Du, Yanjing Lei, Zhounian Lai, Jiafa Mao, and Chao Li. "Learning Identity-Consistent Feature for Cross-Modality Person Re-Identification via Pixel and Feature Alignment." Mobile Information Systems 2022 (October 11, 2022): 1–9. http://dx.doi.org/10.1155/2022/4131322.

Full text
Abstract:
RGB-IR cross-modality person re-identification (ReID) can be seen as a multicamera retrieval problem that aims to match pedestrian images captured by visible and infrared cameras. Most of the existing methods focus on reducing modality differences through feature representation learning. However, they ignore the huge difference in pixel space between the two modalities. Unlike these methods, we utilize the pixel and feature alignment network (PFANet) to reduce modal differences in pixel space while aligning features in feature space in this paper. Our model contains three components, including a feature extractor, a generator, and a joint discriminator. Like previous methods, the generator and the joint discriminator are used to generate high-quality cross-modality images; however, we make substantial improvements to the feature extraction module. Firstly, we fuse batch normalization and global attention (BNG) which can pay attention to channel information while conducting information interaction between channels and spaces. Secondly, to alleviate the modal difference in feature space, we propose the modal mitigation module (MMM). Then, by jointly training the entire model, our model is able to not only mitigate the cross-modality and intramodality variations but also learn identity-consistent features. Finally, extensive experimental results show that our model outperforms other methods. On the SYSU-MM01 dataset, our model achieves a rank-1 accuracy of 40.83 % and an mAP of 39.84 % .
APA, Harvard, Vancouver, ISO, and other styles
16

Huo Dongdong, 霍东东, and 杜海顺 Du Haishun. "基于通道重组和注意力机制的跨模态行人重识别." Laser & Optoelectronics Progress 60, no. 14 (2023): 1410007. http://dx.doi.org/10.3788/lop221850.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Wang, Zi, Chenglong Li, Aihua Zheng, Ran He, and Jin Tang. "Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2633–41. http://dx.doi.org/10.1609/aaai.v36i3.20165.

Full text
Abstract:
Multi-modal person Re-ID introduces more complementary information to assist the traditional Re-ID task. Existing multi-modal methods ignore the importance of modality-specific information in the feature fusion stage. To this end, we propose a novel method to boost modality-specific representations for multi-modal person Re-ID: Interact, Embed, and EnlargE (IEEE). First, we propose a cross-modal interacting module to exchange useful information between different modalities in the feature extraction phase. Second, we propose a relation-based embedding module to enhance the richness of feature descriptors by embedding the global feature into the fine-grained local information. Finally, we propose multi-modal margin loss to force the network to learn modality-specific information for each modality by enlarging the intra-class discrepancy. Superior performance on multi-modal Re-ID dataset RGBNT201 and three constructed Re-ID datasets validate the effectiveness of the proposed method compared with the state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhu, Xiangping, Xiatian Zhu, Minxian Li, Pietro Morerio, Vittorio Murino, and Shaogang Gong. "Intra-Camera Supervised Person Re-Identification." International Journal of Computer Vision 129, no. 5 (February 26, 2021): 1580–95. http://dx.doi.org/10.1007/s11263-021-01440-4.

Full text
Abstract:
AbstractExisting person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data. This requires a tedious data collection and annotation process, leading to poor scalability in practical re-id applications. On the other hand unsupervised re-id methods do not need identity label information, but they usually suffer from much inferior and insufficient model performance. To overcome these fundamental limitations, we propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process, significantly reducing the amount of human annotation efforts. Consequently, it gives rise to a more scalable and more feasible setting, which we call Intra-Camera Supervised (ICS) person re-id, for which we formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method. Specifically, MATE is designed for self-discovering the cross-camera identity correspondence in a per-camera multi-task inference framework. Extensive experiments demonstrate the cost-effectiveness superiority of our method over the alternative approaches on three large person re-id datasets. For example, MATE yields 88.7% rank-1 score on Market-1501 in the proposed ICS person re-id setting, significantly outperforming unsupervised learning models and closely approaching conventional fully supervised learning competitors.
APA, Harvard, Vancouver, ISO, and other styles
19

Ma, Li, Zhibin Guan, Xinguan Dai, Hangbiao Gao, and Yuanmeng Lu. "A Cross-Modality Person Re-Identification Method Based on Joint Middle Modality and Representation Learning." Electronics 12, no. 12 (June 15, 2023): 2687. http://dx.doi.org/10.3390/electronics12122687.

Full text
Abstract:
Modality differences and intra-class differences have been hot research problems in the field of cross-modality person re-identification currently. In this paper, we propose a cross-modality person re-identification method based on joint middle modality and representation learning. To reduce the modality differences, a middle modal generator is used to map different modal images to a unified feature space to generate middle modality images. A two-stream network with parameter sharing is used to extract the combined features of the original image and the middle modality image. In addition, a multi-granularity pooling strategy combining global features and local features is used to improve the representation learning capability of the model and further reduce the modality differences. To reduce the intra-class differences, the model is further optimized by combining distribution consistency loss, label smoothing cross-entropy loss, and hetero-center triplet loss to reduce the intra-class distance and accelerate the model convergence. In this paper, we use the publicly available datasets RegDB and SYSU-MM01 for validation. The results show that the proposed approach in this paper reaches 68.11% mAP in All Search mode for the SYSU-MM01 dataset and 86.54% mAP in VtI mode for the RegDB dataset, with a performance improvement of 3.29% and 3.29%, respectively, which demonstrate the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
20

Yin, Qingze, Guan’an Wang, Jinlin Wu, Haonan Luo, and Zhenmin Tang. "Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification." Mathematics 10, no. 10 (May 12, 2022): 1654. http://dx.doi.org/10.3390/math10101654.

Full text
Abstract:
Person Re-Identification (ReID) has witnessed tremendous improvements with the help of deep convolutional neural networks (CNN). Nevertheless, because different fields have their characteristics, most existing methods encounter the problem of poor generalization ability to invisible people. To address this problem, based on the relationship between the temporal and camera position, we propose a robust and effective training strategy named temporal smoothing dynamic re-weighting and cross-camera learning (TSDRC). It uses robust and effective algorithms to transfer valuable knowledge of existing labeled source domains to unlabeled target domains. In the target domain training stage, TSDRC iteratively clusters the samples into several centers and dynamically re-weights unlabeled samples from each center with a temporal smoothing score. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Particularly, to improve the discernibility of CNN models in the source domain, generally shared person attributes and margin-based softmax loss are adapted to train the source model. In terms of the unlabeled target domain, the samples are clustered into several centers iteratively and the unlabeled samples are dynamically re-weighted from each center. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Comprehensive experiments on the Market-1501 and DukeMTMC-reID datasets demonstrate that the proposed method vastly improves the performance of unsupervised domain adaptability.
APA, Harvard, Vancouver, ISO, and other styles
21

Wang, Xiaoqi, Xi Yang, and Dong Yang. "A Novel SRTSR Model for Cross-Resolution Person Re-Identification." IEEE Access 9 (2021): 32106–14. http://dx.doi.org/10.1109/access.2021.3060927.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Wang, Yuanyuan, Xiang Li, Mingxin Jiang, Haiyan Zhang, and E. Tang. "Cross-view pedestrian clustering via graph convolution network for unsupervised person re-identification." Journal of Intelligent & Fuzzy Systems 39, no. 3 (October 7, 2020): 4453–62. http://dx.doi.org/10.3233/jifs-200435.

Full text
Abstract:
At present, supervised person re-identification method achieves high identification performance. However, there are a lot of cross cameras with unlabeled data in the actual application scenarios. The high cost of marking data will greatly reduce the effect of the supervised learning model transferring to other scene domains. Therefore, unsupervised learning of person re-identification becomes more attractive in the real world. In addition, due to changes in camera angle, illumination and posture, the extracted person image representation is generally different in the non-cross camera view, but the existing algorithm ignores the difference among cross camera images under camera parameters and environments. In order to overcome the above problems, we propose unsupervised person re-identification metric learning method. The model learns a shared space to reduce the discrepancy under different cameras. The graph convolution network is further employed to cluster the cross-view image features extracted from the shared space. Our model improves the scalability of pedestrian re-identification in practical application scenarios. Extensive experiments on four large-scale person re-identification public datasets have been conducted to demonstrate the effectiveness of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
23

Cao, Dongjiang, Ruofeng Liu, Hao Li, Shuai Wang, Wenchao Jiang, and Chris Xiaoxuan Lu. "Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, no. 3 (September 6, 2022): 1–25. http://dx.doi.org/10.1145/3550325.

Full text
Abstract:
Human identification is a key requirement for many applications in everyday life, such as personalized services, automatic surveillance, continuous authentication, and contact tracing during pandemics, etc. This work studies the problem of cross-modal human re-identification (ReID), in response to the regular human movements across camera-allowed regions (e.g., streets) and camera-restricted regions (e.g., offices) deployed with heterogeneous sensors. By leveraging the emerging low-cost RGB-D cameras and mmWave radars, we propose the first-of-its-kind vision-RF system for cross-modal multi-person ReID at the same time. Firstly, to address the fundamental inter-modality discrepancy, we propose a novel signature synthesis algorithm based on the observed specular reflection model of a human body. Secondly, an effective cross-modal deep metric learning model is introduced to deal with interference caused by unsynchronized data across radars and cameras. Through extensive experiments in both indoor and outdoor environments, we demonstrate that our proposed system is able to achieve ~ 92.5% top-1 accuracy and ~ 97.5% top-5 accuracy out of 56 volunteers. We also show that our proposed system is able to robustly reidentify subjects even when multiple subjects are present in the sensors' field of view.
APA, Harvard, Vancouver, ISO, and other styles
24

Wu, Shaojun, and Ling Gao. "Cross-Camera Erased Feature Learning for Unsupervised Person Re-Identification." Algorithms 13, no. 8 (August 10, 2020): 193. http://dx.doi.org/10.3390/a13080193.

Full text
Abstract:
Most supervised person re-identification methods show their excellent performance, but using labeled datasets is very expensive, which limits its application in practical scenarios. To solve the scalability problem, we propose a Cross-camera Erased Feature Learning (CEFL) framework for unsupervised person re-identification that learns discriminative features from image appearances without manual annotations, where both of the cross-camera global image appearance and the local details are explored. Specifically, for the global appearance, in order to bridge the gap between images with the same identities under different cameras, we generate style-transferred images. The network is trained to classify the original images, the style-transferred images and the negative samples. To learn the partial details of the images, we generate erased images and train the network to pull the similar erased images together and push the dissimilar ones away. In addition, we joint learn the discriminative global and local information to learn a more robust model. Global and erased features are used together in feature learning which are successful conjunction of BFENet. A large number of experiments show the superiority of CEFL in unsupervised pedestrian re-identification.
APA, Harvard, Vancouver, ISO, and other styles
25

Yang, Fengxiang, Ke Li, Zhun Zhong, Zhiming Luo, Xing Sun, Hao Cheng, Xiaowei Guo, Feiyue Huang, Rongrong Ji, and Shaozi Li. "Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12597–604. http://dx.doi.org/10.1609/aaai.v34i07.6950.

Full text
Abstract:
Person re-identification (re-ID), is a challenging task due to the high variance within identity samples and imaging conditions. Although recent advances in deep learning have achieved remarkable accuracy in settled scenes, i.e., source domain, few works can generalize well on the unseen target domain. One popular solution is assigning unlabeled target images with pseudo labels by clustering, and then retraining the model. However, clustering methods tend to introduce noisy labels and discard low confidence samples as outliers, which may hinder the retraining process and thus limit the generalization ability. In this study, we argue that by explicitly adding a sample filtering procedure after the clustering, the mined examples can be much more efficiently used. To this end, we design an asymmetric co-teaching framework, which resists noisy labels by cooperating two models to select data with possibly clean labels for each other. Meanwhile, one of the models receives samples as pure as possible, while the other takes in samples as diverse as possible. This procedure encourages that the selected training samples can be both clean and miscellaneous, and that the two models can promote each other iteratively. Extensive experiments show that the proposed framework can consistently benefit most clustering based methods, and boost the state-of-the-art adaptation accuracy. Our code is available at https://github.com/FlyingRoastDuck/ACT_AAAI20.
APA, Harvard, Vancouver, ISO, and other styles
26

Huang, Yangru, Peixi Peng, Yi Jin, Yidong Li, and Junliang Xing. "Domain Adaptive Attention Learning for Unsupervised Person Re-Identification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11069–76. http://dx.doi.org/10.1609/aaai.v34i07.6762.

Full text
Abstract:
Person re-identification (Re-ID) across multiple datasets is a challenging task due to two main reasons: the presence of large cross-dataset distinctions and the absence of annotated target instances. To address these two issues, this paper proposes a domain adaptive attention learning approach to reliably transfer discriminative representation from the labeled source domain to the unlabeled target domain. In this approach, a domain adaptive attention model is learned to separate the feature map into domain-shared part and domain-specific part. In this manner, the domain-shared part is used to capture transferable cues that can compensate cross-dataset distinctions and give positive contributions to the target task, while the domain-specific part aims to model the noisy information to avoid the negative transfer caused by domain diversity. A soft label loss is further employed to take full use of unlabeled target data by estimating pseudo labels. Extensive experiments on the Market-1501, DukeMTMC-reID and MSMT17 benchmarks demonstrate the proposed approach outperforms the state-of-the-arts.
APA, Harvard, Vancouver, ISO, and other styles
27

Kniaz, V. V., V. A. Knyaz, and P. V. Moshkantsev. "MULTIMODAL PERSON RE-IDENTIFICATION IN AERIAL IMAGERY BASED ON CONDITIONAL ADVERSARIAL NETWORKS." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-2/W3-2023 (May 12, 2023): 121–28. http://dx.doi.org/10.5194/isprs-archives-xlviii-2-w3-2023-121-2023.

Full text
Abstract:
Abstract. Person Re-Identification (Re-ID) is the task of matching the same person in multiple images captured by different cameras. Recently deep learning-based Re-ID algorithms demonstrated an exciting progress for terrestrial-based cameras, still, person Re-ID in aerial images poses multiple challenges including occlusion of human feature parts, image distortion, and dynamic camera location. In this paper, we propose a new Person Aerial Re-ID framework Robust to Occlusion and Thermal imagery (ParrotGAN). Our model is focused on cross-modality person Re-ID in aerial images. Furthermore, we collected a new large-scale synthetic multimodal AerialReID dataset with 30k images and 137 person ID. Our ParrotGAN model leverages two strategies to achieve robust performance in the task of person Re-ID in thermal and visible range. Firstly, we use a latent space of StyleGAN2 model to estimate the distance between two images of a person. Specifically, we project each real image into the latent space with a correspondent latent vector z. We use the distance between latent vectors to provide a Re-ID similarity metric. Secondly, we use a generative-adversarial network to translate a color image to a synthetic thermal image. We use synthetic image for a cross-modality Re-ID. We evaluate our ParrotGAN model and basleines on our AerialReID and PRAI-1581 datasets. The results of the evaluation are encouraging and demonstrate that our ParrotGAN model competes with baselines in visible range aerial person Re-ID and outperforms them in the cross-modality setting. We made our code and dataset publicly available.
APA, Harvard, Vancouver, ISO, and other styles
28

Yan, Xiai, Shengkai Ding, Wei Zhou, Weiqi Shi, and Hua Tian. "Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer." Electronics 11, no. 19 (September 27, 2022): 3082. http://dx.doi.org/10.3390/electronics11193082.

Full text
Abstract:
Person re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still, it completes the extraction of images from local information to global information by continuously accumulating convolution layers. As a complex CNN, a vision transformer (ViT) captures global information from the beginning to extract more powerful features. This paper proposes an unsupervised domain adaptive person re-identification model (ViTReID) based on the vision transformer, taking the ViT model trained on ImageNet as the pre-training weight and a transformer encoder as the feature extraction network, which makes up for some defects of the CNN model. At the same time, the combined loss function of cross-entropy and triplet loss function combined with the center loss function is used to optimize the network; the person’s head is evaluated and trained as a local feature combined with the global feature of the whole body, focusing on the head, to enhance the head feature information. The experimental results show that ViTReID exceeds the baseline method (SSG) by 14% (Market1501 → MSMT17) in mean average precision (mAP). In MSMT17 → Market1501, ViTReID is 1.2% higher in rank-1 (R1) accuracy than a state-of-the-art method (SPCL); in PersonX → MSMT17, the mAP is 3.1% higher than that of the MMT-dbscan method, and in PersonX → Market1501, the mAP is 1.5% higher than that of the MMT-dbscan method.
APA, Harvard, Vancouver, ISO, and other styles
29

Fan, Xing, Wei Jiang, Hao Luo, Weijie Mao, and Hongyan Yu. "Instance Hard Triplet Loss for In-video Person Re-identification." Applied Sciences 10, no. 6 (March 24, 2020): 2198. http://dx.doi.org/10.3390/app10062198.

Full text
Abstract:
Traditional Person Re-identification (ReID) methods mainly focus on cross-camera scenarios, while identifying a person in the same video/camera from adjacent subsequent frames is also an important question, for example, in human tracking and pose tracking. We try to address this unexplored in-video ReID problem with a new large-scale video-based ReID dataset called PoseTrack-ReID with full images available and a new network structure called ReID-Head, which can extract multi-person features efficiently in real time and can be integrated with both one-stage and two-stage human or pose detectors. A new loss function is also required to solve this new in-video problem. Hence, a triplet-based loss function with an online hard example mining designed to distinguish persons in the same video/group is proposed, called instance hard triplet loss, which can be applied in both cross-camera ReID and in-video ReID. Compared with the widely-used batch hard triplet loss, our proposed loss achieves competitive performance and saves more than 30% of the training time. We also propose an automatic reciprocal identity association method, so we can train our model in an unsupervised way, which further extends the potential applications of in-video ReID. The PoseTrack-ReID dataset and code will be publicly released.
APA, Harvard, Vancouver, ISO, and other styles
30

Ling, Yongguo, Zhun Zhong, Zhiming Luo, Fengxiang Yang, Donglin Cao, Yaojin Lin, Shaozi Li, and Nicu Sebe. "Cross-Modality Earth Mover’s Distance for Visible Thermal Person Re-identification." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 1631–39. http://dx.doi.org/10.1609/aaai.v37i2.25250.

Full text
Abstract:
Visible thermal person re-identification (VT-ReID) suffers from inter-modality discrepancy and intra-identity variations. Distribution alignment is a popular solution for VT-ReID, however, it is usually restricted to the influence of the intra-identity variations. In this paper, we propose the Cross-Modality Earth Mover's Distance (CM-EMD) that can alleviate the impact of the intra-identity variations during modality alignment. CM-EMD selects an optimal transport strategy and assigns high weights to pairs that have a smaller intra-identity variation. In this manner, the model will focus on reducing the inter-modality discrepancy while paying less attention to intra-identity variations, leading to a more effective modality alignment. Moreover, we introduce two techniques to improve the advantage of CM-EMD. First, Cross-Modality Discrimination Learning (CM-DL) is designed to overcome the discrimination degradation problem caused by modality alignment. By reducing the ratio between intra-identity and inter-identity variances, CM-DL leads the model to learn more discriminative representations. Second, we construct the Multi-Granularity Structure (MGS), enabling us to align modalities from both coarse- and fine-grained levels with the proposed CM-EMD. Extensive experiments show the benefits of the proposed CM-EMD and its auxiliary techniques (CM-DL and MGS). Our method achieves state-of-the-art performance on two VT-ReID benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
31

Wang, Guan-An, Tianzhu Zhang, Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, and Zeng-Guang Hou. "Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12144–51. http://dx.doi.org/10.1609/aaai.v34i07.6894.

Full text
Abstract:
RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.
APA, Harvard, Vancouver, ISO, and other styles
32

Chen, Yun-Chun, Yu-Jhe Li, Xiaofei Du, and Yu-Chiang Frank Wang. "Learning Resolution-Invariant Deep Representations for Person Re-Identification." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8215–22. http://dx.doi.org/10.1609/aaai.v33i01.33018215.

Full text
Abstract:
Person re-identification (re-ID) solves the task of matching images across cameras and is among the research topics in vision community. Since query images in real-world scenarios might suffer from resolution loss, how to solve the resolution mismatch problem during person re-ID becomes a practical problem. Instead of applying separate image super-resolution models, we propose a novel network architecture of Resolution Adaptation and re-Identification Network (RAIN) to solve cross-resolution person re-ID. Advancing the strategy of adversarial learning, we aim at extracting resolution-invariant representations for re-ID, while the proposed model is learned in an end-to-end training fashion. Our experiments confirm that the use of our model can recognize low-resolution query images, even if the resolution is not seen during training. Moreover, the extension of our model for semi-supervised re-ID further confirms the scalability of our proposed method for real-world scenarios and applications.
APA, Harvard, Vancouver, ISO, and other styles
33

Zhou, Jianglin, Qing Dong, Zhong Zhang, Shuang Liu, and Tariq S. Durrani. "Cross-Modality Person Re-Identification via Local Paired Graph Attention Network." Sensors 23, no. 8 (April 15, 2023): 4011. http://dx.doi.org/10.3390/s23084011.

Full text
Abstract:
Cross-modality person re-identification (ReID) aims at searching a pedestrian image of RGB modality from infrared (IR) pedestrian images and vice versa. Recently, some approaches have constructed a graph to learn the relevance of pedestrian images of distinct modalities to narrow the gap between IR modality and RGB modality, but they omit the correlation between IR image and RGB image pairs. In this paper, we propose a novel graph model called Local Paired Graph Attention Network (LPGAT). It uses the paired local features of pedestrian images from different modalities to build the nodes of the graph. For accurate propagation of information among the nodes of the graph, we propose a contextual attention coefficient that leverages distance information to regulate the process of updating the nodes of the graph. Furthermore, we put forward Cross-Center Contrastive Learning (C3L) to constrain how far local features are from their heterogeneous centers, which is beneficial for learning the completed distance metric. We conduct experiments on the RegDB and SYSU-MM01 datasets to validate the feasibility of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
34

Shi, Wei, Hong Liu, and Mengyuan Liu. "Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning." Pattern Recognition 122 (February 2022): 108314. http://dx.doi.org/10.1016/j.patcog.2021.108314.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Ren, Danping, Tingting He, and Huisheng Dong. "Joint Cross-Consistency Learning and Multi-Feature Fusion for Person Re-Identification." Sensors 22, no. 23 (December 1, 2022): 9387. http://dx.doi.org/10.3390/s22239387.

Full text
Abstract:
To solve the problem of inadequate feature extraction by the model due to factors such as occlusion and illumination in person re-identification tasks, this paper proposed a model with a joint cross-consistency learning and multi-feature fusion person re-identification. The attention mechanism and the mixed pooling module were first embedded in the residual network so that the model adaptively focuses on the more valid information in the person images. Secondly, the dataset was randomly divided into two categories according to the camera perspective, and a feature classifier was trained for the two types of datasets respectively. Then, two classifiers with specific knowledge were used to guide the model to extract features unrelated to the camera perspective for the two types of datasets so that the obtained image features were endowed with domain invariance by the model, and the differences in the perspective, attitude, background, and other related information of different images were alleviated. Then, the multi-level features were fused through the feature pyramid to concern the more critical information of the image. Finally, a combination of Cosine Softmax loss, triplet loss, and cluster center loss was proposed to train the model to address the differences of multiple losses in the optimization space. The first accuracy of the proposed model reached 95.9% and 89.7% on the datasets Market-1501 and DukeMTMC-reID, respectively. The results indicated that the proposed model has good feature extraction capability.
APA, Harvard, Vancouver, ISO, and other styles
36

Li, Jiayue, and YanPiao. "Cross-Camera Multi-Object Tracking based on Person Re-Identification and Spatial-Temporal Constraints." Journal of Physics: Conference Series 2492, no. 1 (May 1, 2023): 012032. http://dx.doi.org/10.1088/1742-6596/2492/1/012032.

Full text
Abstract:
Abstract In order to reduce the influence of occlusion on the overall feature representation of tracks and improve the accuracy of track correlation between cameras, this paper proposes a cross camera multi-target tracking method based on person appearance and spatial-temporal constraints: a new cross-camera multi-object tracking framework is constructed. Then the person spatial-temporal probability model is established. Finally, the spatial-temporal probability model and the person appearance similarity are jointly measured and the person trajectory correlation under cross-camera is completed by using data correlation; Comparative experiments on the dataset proved that the method is effective.
APA, Harvard, Vancouver, ISO, and other styles
37

Hao, Yi, Nannan Wang, Jie Li, and Xinbo Gao. "HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8385–92. http://dx.doi.org/10.1609/aaai.v33i01.33018385.

Full text
Abstract:
Person Re-identification(re-ID) has great potential to contribute to video surveillance that automatically searches and identifies people across different cameras. Heterogeneous person re-identification between thermal(infrared) and visible images is essentially a cross-modality problem and important for night-time surveillance application. Current methods usually train a model by combining classification and metric learning algorithms to obtain discriminative and robust feature representations. However, the combined loss function ignored the correlation between classification subspace and feature embedding subspace. In this paper, we use Sphere Softmax to learn a hypersphere manifold embedding and constrain the intra-modality variations and cross-modality variations on this hypersphere. We propose an end-to-end dualstream hypersphere manifold embedding network(HSMEnet) with both classification and identification constraint. Meanwhile, we design a two-stage training scheme to acquire decorrelated features, we refer the HSME with decorrelation as D-HSME. We conduct experiments on two crossmodality person re-identification datasets. Experimental results demonstrate that our method outperforms the state-of-the-art methods on two datasets. On RegDB dataset, rank-1 accuracy is improved from 33.47% to 50.85%, and mAP is improved from 31.83% to 47.00%.
APA, Harvard, Vancouver, ISO, and other styles
38

Wu, Guile, Xiatian Zhu, and Shaogang Gong. "Tracklet Self-Supervised Learning for Unsupervised Person Re-Identification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12362–69. http://dx.doi.org/10.1609/aaai.v34i07.6921.

Full text
Abstract:
Existing unsupervised person re-identification (re-id) methods mainly focus on cross-domain adaptation or one-shot learning. Although they are more scalable than the supervised learning counterparts, relying on a relevant labelled source domain or one labelled tracklet per person initialisation still restricts their scalability in real-world deployments. To alleviate these problems, some recent studies develop unsupervised tracklet association and bottom-up image clustering methods, but they still rely on explicit camera annotation or merely utilise suboptimal global clustering. In this work, we formulate a novel tracklet self-supervised learning (TSSL) method, which is capable of capitalising directly from abundant unlabelled tracklet data, to optimise a feature embedding space for both video and image unsupervised re-id. This is achieved by designing a comprehensive unsupervised learning objective that accounts for tracklet frame coherence, tracklet neighbourhood compactness, and tracklet cluster structure in a unified formulation. As a pure unsupervised learning re-id model, TSSL is end-to-end trainable at the absence of source data annotation, person identity labels, and camera prior knowledge. Extensive experiments demonstrate the superiority of TSSL over a wide variety of the state-of-the-art alternative methods on four large-scale person re-id benchmarks, including Market-1501, DukeMTMC-ReID, MARS and DukeMTMC-VideoReID.
APA, Harvard, Vancouver, ISO, and other styles
39

Zhang, Rumeng, Mengyao Li, Xueshuai Lv, and Ling Gao. "Single Camera Person Re-identification with Self-paced Joint Learning." Journal of Physics: Conference Series 2504, no. 1 (May 1, 2023): 012045. http://dx.doi.org/10.1088/1742-6596/2504/1/012045.

Full text
Abstract:
Abstract Existing re-identification (re-ID) methods rely on a large number of cross-camera identity tags for training, and the data annotation process is tedious and time-consuming, resulting in a difficult deployment of real-world re-ID applications. To overcome this problem, we focus on the single camera training (SCT) re-ID setting, where each identity is annotated in a single camera. Since there is no annotation across cameras, it takes much less time in data acquisition, and enables fast deployment in new environments. To address SCT re-ID, we proposed a joint comparison learning framework and split the training data into three parts, single-camera labeled data, pseudo labeled data, and unlabeled instances. In this framework, we iteratively (1) train the network and dynamically update the memory to store the three types of data, (2) assign pseudo-labels to the unlabeled images using a clustering algorithm. In the model training phase, we jointly train the three types of data to update the CNN model, and this joint training method can continuously takes advantages of both labeled, pseudo labeled or unlabeled images. Extensive experiments are conducted on three widely adopted datasets, including Market1501-SCT and MSMT17-SCT, and show the superiority of our method in SCT. Specifically, the mAP of our method significantly outperforms state-of-the-art SCT methods by 42.6% and 30.1%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
40

Wang, Chuandong, Chi Zhang, Yujian Feng, Yimu Ji, and Jianyu Ding. "Learning Visible Thermal Person Re-Identification via Spatial Dependence and Dual-Constraint Loss." Entropy 24, no. 4 (March 23, 2022): 443. http://dx.doi.org/10.3390/e24040443.

Full text
Abstract:
Visible thermal person re-identification (VT Re-ID) is the task of matching pedestrian images collected by thermal and visible light cameras. The two main challenges presented by VT Re-ID are the intra-class variation between pedestrian images and the cross-modality difference between visible and thermal images. Existing works have principally focused on local representation through cross-modality feature distribution, but ignore the internal connection of the local features of pedestrian body parts. Therefore, this paper proposes a dual-path attention network model to establish the spatial dependency relationship between the local features of the pedestrian feature map and to effectively enhance the feature extraction. Meanwhile, we propose cross-modality dual-constraint loss, which adds the center and boundary constraints for each class distribution in the embedding space to promote compactness within the class and enhance the separability between classes. Our experimental results show that our proposed approach has advantages over the state-of-the-art methods on the two public datasets SYSU-MM01 and RegDB. The result for the SYSU-MM01 is Rank-1/mAP 57.74%/54.35%, and the result for the RegDB is Rank-1/mAP 76.07%/69.43%.
APA, Harvard, Vancouver, ISO, and other styles
41

Bayoumi, Randa Mohamed, Elsayed E. Hemayed, Mohammad Ehab Ragab, and Magda B. Fayek. "Person Re-Identification via Pyramid Multipart Features and Multi-Attention Framework." Big Data and Cognitive Computing 6, no. 1 (February 9, 2022): 20. http://dx.doi.org/10.3390/bdcc6010020.

Full text
Abstract:
Video-based person re-identification has become quite attractive due to its importance in many vision surveillance problems. It is a challenging topic due to the inter/intra changes, occlusion, and pose variations involved. In this paper, we propose a pyramid-attentive framework that relies on multi-part features and multiple attention to aggregate features of multi-levels and learns attention-based representations of persons through various aspects. Self-attention is used to strengthen the most discriminative features in the spatial and channel domains and hence capture robust global information. We propose the use of part-relation attention between different multi-granularities of features’ representation to focus on learning appropriate local features. Temporal attention is used to aggregate temporal features. We integrate the most robust features in the global and multi-level views to build an effective convolution neural network (CNN) model. The proposed model outperforms the previous state-of-the art models on three datasets. Notably, using the proposed model enables the achievement of 98.9% (a relative improvement of 2.7% on the GRL) top1 accuracy and 99.3% mAP on the PRID2011, and 92.8% (a relative improvement of 2.4% relative to GRL) top1 accuracy on iLIDS-vid. We also explore the generalization ability of our model on a cross dataset.
APA, Harvard, Vancouver, ISO, and other styles
42

Lu, Yong, and Ming Zhe Jin. "Dual-Branch Network Fused With Two-Level Attention Mechanism for Clothes-Changing Person Re-Identification." International Journal of Web Services Research 20, no. 1 (April 20, 2023): 1–14. http://dx.doi.org/10.4018/ijwsr.322021.

Full text
Abstract:
Clothes-changing person re-identification is a hot topic in the current academic circles. Most of the current methods assume that the clothes of a person will not change in a short period of time, but they are not applicable when people change clothes. Based on this situation, this paper proposes a dual-branch network for clothes-changing person re-identification that integrates a two-level attention mechanism and captures and aggregates fine-grained person semantic information in channels and spaces through a two-level attention mechanism and suppresses the sensitivity of the network to clothing features by training the clothing classification branch. The method does not use auxiliary means such as human skeletons, and the complexity of the model is greatly reduced compared with most methods. This paper conducts experiments on the popular clothes-changing person re-identification dataset PRCC and a very large-scale cross-spatial-temporal dataset (LaST). The experimental results show that the method in this paper is more advanced than the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
43

Chen, Shengbo, Hongchang Zhang, and Zhou Lei. "Person Re-Identification Based on Attention Mechanism and Context Information Fusion." Future Internet 13, no. 3 (March 13, 2021): 72. http://dx.doi.org/10.3390/fi13030072.

Full text
Abstract:
Person re-identification (ReID) plays a significant role in video surveillance analysis. In the real world, due to illumination, occlusion, and deformation, pedestrian features extraction is the key to person ReID. Considering the shortcomings of existing methods in pedestrian features extraction, a method based on attention mechanism and context information fusion is proposed. A lightweight attention module is introduced into ResNet50 backbone network equipped with a small number of network parameters, which enhance the significant characteristics of person and suppress irrelevant information. Aiming at the problem of person context information loss due to the over depth of the network, a context information fusion module is designed to sample the shallow feature map of pedestrians and cascade with the high-level feature map. In order to improve the robustness, the model is trained by combining the loss of margin sample mining with the loss function of cross entropy. Experiments are carried out on datasets Market1501 and DukeMTMC-reID, our method achieves rank-1 accuracy of 95.9% on the Market1501 dataset, and 90.1% on the DukeMTMC-reID dataset, outperforming the current mainstream method in case of only using global feature.
APA, Harvard, Vancouver, ISO, and other styles
44

Zhang, Yaqing, Xi Li, and Zhongfei Zhang. "Learning a Key-Value Memory Co-Attention Matching Network for Person Re-Identification." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9235–42. http://dx.doi.org/10.1609/aaai.v33i01.33019235.

Full text
Abstract:
Person re-identification (Re-ID) is typically cast as the problem of semantic representation and alignment, which requires precisely discovering and modeling the inherent spatial structure information on person images. Motivated by this observation, we propose a Key-Value Memory Matching Network (KVM-MN) model that consists of key-value memory representation and key-value co-attention matching. The proposed KVM-MN model is capable of building an effective local-position-aware person representation that encodes the spatial feature information in the form of multi-head key-value memory. Furthermore, the proposed KVM-MN model makes use of multi-head co-attention to automatically learn a number of cross-person-matching patterns, resulting in more robust and interpretable matching results. Finally, we build a setwise learning mechanism that implements a more generalized query-to-gallery-image-set learning procedure. Experimental results demonstrate the effectiveness of the proposed model against the state-of-the-art.
APA, Harvard, Vancouver, ISO, and other styles
45

Zhang, Minying, Kai Liu, Yidong Li, Shihui Guo, Hongtao Duan, Yimin Long, and Yi Jin. "Unsupervised Domain Adaptation for Person Re-identification via Heterogeneous Graph Alignment." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3360–68. http://dx.doi.org/10.1609/aaai.v35i4.16448.

Full text
Abstract:
Unsupervised person re-identification (re-ID) is becoming increasingly popular due to its power in real-world systems such as public security and intelligent transportation systems. However, the person re-ID task is challenged by the problems of data distribution discrepancy across cameras and lack of label information. In this paper, we propose a coarse-to-fine heterogeneous graph alignment (HGA) method to find cross-camera person matches by characterizing the unlabeled data as a heterogeneous graph for each camera. In the coarse-alignment stage, we assign a projection for each camera and utilize an adversarial learning based method to align coarse-grained node groups from different cameras into a shared space, which consequently alleviates the distribution discrepancy between cameras. In the fine-alignment stage, we exploit potential fine-grained node groups in the shared space and introduce conservative alignment loss functions to constrain the graph aligning process, resulting in reliable pseudo labels as learning guidance. The proposed domain adaptation framework not only improves model generalization on target domain, but also facilitates mining and integrating the potential discriminative information across different cameras. Extensive experiments on benchmark datasets demonstrate that the proposed approach outperforms the state-of-the-arts.
APA, Harvard, Vancouver, ISO, and other styles
46

Jin, Keying, Jiahao Zhai, and Yunyuan Gao. "TwinsReID: Person re-identification based on twins transformer's multi-level features." Mathematical Biosciences and Engineering 20, no. 2 (2022): 2110–30. http://dx.doi.org/10.3934/mbe.2023098.

Full text
Abstract:
<abstract> <p>In the traditional person re-identification model, the CNN network is usually used for feature extraction. When converting the feature map into a feature vector, a large number of convolution operations are used to reduce the size of the feature map. In CNN, since the receptive field of the latter layer is obtained by convolution operation on the feature map of the previous layer, the size of this local receptive field is limited, and the computational cost is large. For these problems, combined with the self-attention characteristics of Transformer, an end-to-end person re-identification model (twinsReID) is designed that integrates feature information between levels in this article. For Transformer, the output of each layer is the correlation between its previous layer and other elements. This operation is equivalent to the global receptive field because each element needs to calculate the correlation with other elements, and the calculation is simple, so its cost is small. From these perspectives, Transformer has certain advantages over CNN's convolution operation. This paper uses Twins-SVT Transformer to replace the CNN network, combines the features extracted from the two different stages and divides them into two branches. First, convolve the feature map to obtain a fine-grained feature map, perform global adaptive average pooling on the second branch to obtain the feature vector. Then divide the feature map level into two sections, perform global adaptive average pooling on each. These three feature vectors are obtained and sent to the Triplet Loss respectively. After sending the feature vectors to the fully connected layer, the output is input to the Cross-Entropy Loss and Center-Loss. The model is verified On the Market-1501 dataset in the experiments. The mAP/rank1 index reaches 85.4%/93.7%, and reaches 93.6%/94.9% after reranking. The statistics of the parameters show that the parameters of the model are less than those of the traditional CNN model.</p> </abstract>
APA, Harvard, Vancouver, ISO, and other styles
47

Duncanson, Kayne A., Simon Thwaites, David Booth, Gary Hanly, William S. P. Robertson, Ehsan Abbasnejad, and Dominic Thewlis. "Deep Metric Learning for Scalable Gait-Based Person Re-Identification Using Force Platform Data." Sensors 23, no. 7 (March 23, 2023): 3392. http://dx.doi.org/10.3390/s23073392.

Full text
Abstract:
Walking gait data acquired with force platforms may be used for person re-identification (re-ID) in various authentication, surveillance, and forensics applications. Current force platform-based re-ID systems classify a fixed set of identities (IDs), which presents a problem when IDs are added or removed from the database. We formulated force platform-based re-ID as a deep metric learning (DML) task, whereby a deep neural network learns a feature representation that can be compared between inputs using a distance metric. The force platform dataset used in this study is one of the largest and the most comprehensive of its kind, containing 193 IDs with significant variations in clothing, footwear, walking speed, and time between trials. Several DML model architectures were evaluated in a challenging setting where none of the IDs were seen during training (i.e., zero-shot re-ID) and there was only one prior sample per ID to compare with each query sample. The best architecture was 85% accurate in this setting, though an analysis of changes in walking speed and footwear between measurement instances revealed that accuracy was 28% higher on same-speed, same-footwear comparisons, compared to cross-speed, cross-footwear comparisons. These results demonstrate the potential of DML algorithms for zero-shot re-ID using force platform data, and highlight challenging cases.
APA, Harvard, Vancouver, ISO, and other styles
48

N V, Suchetha, Anusha K S, Deekshitha P, Deepika P S, and Pavitra Gopal Naik. "Survey on Face and Fingerprint based Person Identification System." Journal of Computer Science Engineering and Software Testing 8, no. 2 (August 8, 2022): 51–56. http://dx.doi.org/10.46610/jocses.2022.v08i02.005.

Full text
Abstract:
Biometric technologies are commonly used to upgrade the system security by allowing people to be recognised. Multi-modal biometric system based on face and fingerprint biometric attributes are included in this survey report. A camera is used to capture the pictures of the faces in the system. The fingerprint obtained from the fingerprint dataset is cross-checked once the input face is recognised to authenticate the identification. The characteristics of the fingerprint and face are retrieved using haar transformation approach. Score level fusion with real-time datasets determines the final conclusion.
APA, Harvard, Vancouver, ISO, and other styles
49

Tran, Nha, Toan Nguyen, Minh Nguyen, Khiet Luong, and Tai Lam. "Global-local attention with triplet loss and label smoothed crossentropy for person re-identification." IAES International Journal of Artificial Intelligence (IJ-AI) 12, no. 4 (December 1, 2023): 1883. http://dx.doi.org/10.11591/ijai.v12.i4.pp1883-1891.

Full text
Abstract:
Person re-identification (Person Re-ID) is a research direction on tracking and identifying people in surveillance camera systems with non-overlapping camera perspectives. Despite much research on this topic, there are still some practical problems that Person Re-ID has not yet solved, in reality, human objects can easily be obscured by obstructions such as other people, trees, luggage, umbrellas, signs, cars, motorbikes. In this paper, we propose a multibranch deep learning network architecture. In which one branch is for the representation of global features and two branches are for the representation of local features. Dividing the input image into small parts and changing the number of parts between the two branches helps the model to represent the features better. In addition, we add an attention module to the ResNet50 backbone that enhances important human characteristics and eliminates irrelevant information. To improve robustness, the model is trained by combining triplet loss and label smoothing cross-entropy loss (LSCE). Experiments are carried out on datasets Market1501, and duke multi-target multi-camera (DukeMTMC) datasets, our method achieved 96.04% rank-1, 88,11% mean average precision (mAP) on the Market1501 dataset, and 88.78% rank-1, 78,6% mAP on the DukeMTMC dataset. This method achieves performance better than some state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
50

Li, Siyuan, Li Sun, and Qingli Li. "CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 1405–13. http://dx.doi.org/10.1609/aaai.v37i1.25225.

Full text
Abstract:
Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lacking concrete text descriptions. Therefore, it remains to be determined how such models could be applied to these tasks. This paper first finds out that simply fine-tuning the visual model initialized by the image encoder in CLIP, has already obtained competitive performances in various ReID tasks. Then we propose a two-stage strategy to facilitate a better visual representation. The key idea is to fully exploit the cross-modal description ability in CLIP through a set of learnable text tokens for each ID and give them to the text encoder to form ambiguous descriptions. In the first training stage, image and text encoders from CLIP keep fixed, and only the text tokens are optimized from scratch by the contrastive loss computed within a batch. In the second stage, the ID-specific text tokens and their encoder become static, providing constraints for fine-tuning the image encoder. With the help of the designed loss in the downstream task, the image encoder is able to represent data as vectors in the feature embedding accurately. The effectiveness of the proposed strategy is validated on several datasets for the person or vehicle ReID tasks. Code is available at https://github.com/Syliz517/CLIP-ReID.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography