Zaloguj się

Gotowe bibliografie tematyczne / Discriminative Pose Robust Descriptors / Artykuły w czasopismach

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Discriminative Pose Robust Descriptors.

Artykuły w czasopismach na temat „Discriminative Pose Robust Descriptors”

Autor: Grafiati

Data publikacji: 6 września 2023

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Discriminative Pose Robust Descriptors”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Kniaz, V. V., V. V. Fedorenko i N. A. Fomin. "DEEP LEARNING FOR LOWTEXTURED IMAGE MATCHING". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2 (30.05.2018): 513–18. http://dx.doi.org/10.5194/isprs-archives-xlii-2-513-2018.

Pełny tekst źródła

Streszczenie:

Low-textured objects pose challenges for an automatic 3D model reconstruction. Such objects are common in archeological applications of photogrammetry. Most of the common feature point descriptors fail to match local patches in featureless regions of an object. Hence, automatic documentation of the archeological process using Structure from Motion (SfM) methods is challenging. Nevertheless, such documentation is possible with the aid of a human operator. Deep learning-based descriptors have outperformed most of common feature point descriptors recently. This paper is focused on the development of a new Wide Image Zone Adaptive Robust feature Descriptor (WIZARD) based on the deep learning. We use a convolutional auto-encoder to compress discriminative features of a local path into a descriptor code. We build a codebook to perform point matching on multiple images. The matching is performed using the nearest neighbor search and a modified voting algorithm. We present a new “Multi-view Amphora” (Amphora) dataset for evaluation of point matching algorithms. The dataset includes images of an Ancient Greek vase found at Taman Peninsula in Southern Russia. The dataset provides color images, a ground truth 3D model, and a ground truth optical flow. We evaluated the WIZARD descriptor on the “Amphora” dataset to show that it outperforms the SIFT and SURF descriptors on the complex patch pairs.

Style APA, Harvard, Vancouver, ISO itp.

2

Rabba, Salah, Matthew Kyan, Lei Gao, Azhar Quddus, Ali Shahidi Zandi i Ling Guan. "Discriminative Robust Head-Pose and Gaze Estimation Using Kernel-DMCCA Features Fusion". International Journal of Semantic Computing 14, nr 01 (marzec 2020): 107–35. http://dx.doi.org/10.1142/s1793351x20500014.

Pełny tekst źródła

Streszczenie:

There remain outstanding challenges for improving accuracy of multi-feature information for head-pose and gaze estimation. The proposed framework employs discriminative analysis for head-pose and gaze estimation using kernel discriminative multiple canonical correlation analysis (K-DMCCA). The feature extraction component of the framework includes spatial indexing, statistical and geometrical elements. Head-pose and gaze estimation is constructed by feature aggregation and transforming features into a higher dimensional space using K-DMCCA for accurate estimation. The two main contributions are: Enhancing fusion performance through the use of kernel-based DMCCA, and by introducing an improved iris region descriptor based on quadtree. The overall approach is also inclusive of statistical and geometrical indexing that are calibration free (does not require any subsequent adjustment). We validate the robustness of the proposed framework across a wide variety of datasets, which consist of different modalities (RGB and Depth), constraints (wide range of head-poses, not only frontal), quality (accurately labelled for validation), occlusion (due to glasses, hair bang, facial hair) and illumination. Our method achieved an accurate head-pose and gaze estimation of 4.8∘ using Cave, 4.6∘ using MPII, 5.1∘ using ACS, 5.9∘ using EYEDIAP, 4.3∘ using OSLO and 4.6∘ using UULM datasets.

Style APA, Harvard, Vancouver, ISO itp.

3

Kawulok, Michal, Jakub Nalepa, Jolanta Kawulok i Bogdan Smolka. "Dynamics of facial actions for assessing smile genuineness". PLOS ONE 16, nr 1 (5.01.2021): e0244647. http://dx.doi.org/10.1371/journal.pone.0244647.

Pełny tekst źródła

Streszczenie:

Applying computer vision techniques to distinguish between spontaneous and posed smiles is an active research topic of affective computing. Although there have been many works published addressing this problem and a couple of excellent benchmark databases created, the existing state-of-the-art approaches do not exploit the action units defined within the Facial Action Coding System that has become a standard in facial expression analysis. In this work, we explore the possibilities of extracting discriminative features directly from the dynamics of facial action units to differentiate between genuine and posed smiles. We report the results of our experimental study which shows that the proposed features offer competitive performance to those based on facial landmark analysis and on textural descriptors extracted from spatial-temporal blocks. We make these features publicly available for the UvA-NEMO and BBC databases, which will allow other researchers to further improve the classification scores, while preserving the interpretation capabilities attributed to the use of facial action units. Moreover, we have developed a new technique for identifying the smile phases, which is robust against the noise and allows for continuous analysis of facial videos.

Style APA, Harvard, Vancouver, ISO itp.

4

Sanyal, Soubhik, Sivaram Prasad Mudunuri i Soma Biswas. "Discriminative pose-free descriptors for face and object matching". Pattern Recognition 67 (lipiec 2017): 353–65. http://dx.doi.org/10.1016/j.patcog.2017.02.016.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

5

Singh, Geetika, i Indu Chhabra. "Discriminative Moment Feature Descriptors for Face Recognition". International Journal of Computer Vision and Image Processing 5, nr 2 (lipiec 2015): 81–97. http://dx.doi.org/10.4018/ijcvip.2015070105.

Pełny tekst źródła

Streszczenie:

Zernike Moment (ZM) is a promising technique to extract invariant features for face recognition. It has been modified in previous studies to Discriminative ZM (DZM), which selects most discriminative features to perform recognition, and shows improved results. The present paper proposes modification of DZM, named Modified DZM (MDZM), which selects coefficients based on their discriminative ability by considering extent of variability between their class-averages. This reduces within-class variations while maintaining between-class differences. The study also investigates this idea of feature selection on recently introduced Polar Complex Exponential Transform (PCET) (named discriminative or DPCET). Performance of the techniques is evaluated on ORL, Yale and FERET databases against pose, illumination, expression and noise variations. Accuracy improves up to 3.1% by MDZM at reduced dimensions over ZM and DZM. DPCET shows 1.9% of further improvement at less computational complexity. Performance is also tested on LFW database and compared with many other state-of-art approaches.

Style APA, Harvard, Vancouver, ISO itp.

6

Hajraoui, Abdellatif, i Mohamed Sabri. "Generic and Robust Method for Head Pose Estimation". Indonesian Journal of Electrical Engineering and Computer Science 4, nr 2 (1.11.2016): 439. http://dx.doi.org/10.11591/ijeecs.v4.i2.pp439-446.

Pełny tekst źródła

Streszczenie:

Head pose estimation has fascinated the research community due to its application in facial motion capture, human-computer interaction and video conferencing. It is a pre-requisite to gaze tracking, face recognition, and facial expression analysis. In this paper, we present a generic and robust method for model-based global 2D head pose estimation from single RGB Image. In our approach we use of the one part the Gabor filters to conceive a robust pose descriptor to illumination and facial expression variations, and that target the pose information. Moreover, we ensure the classification of these descriptors using a SVM classifier. The approach has proved effective view the rate for the correct pose estimations that we got.

Style APA, Harvard, Vancouver, ISO itp.

7

Lin, Guojun, Meng Yang, Linlin Shen, Mingzhong Yang i Mei Xie. "Robust and discriminative dictionary learning for face recognition". International Journal of Wavelets, Multiresolution and Information Processing 16, nr 02 (marzec 2018): 1840004. http://dx.doi.org/10.1142/s0219691318400040.

Pełny tekst źródła

Streszczenie:

For face recognition, conventional dictionary learning (DL) methods have some disadvantages. First, face images of the same person vary with facial expressions and pose, illumination and disguises, so it is hard to obtain a robust dictionary for face recognition. Second, they don’t cover important components (e.g., particularity and disturbance) completely, which limit their performance. In the paper, we propose a novel robust and discriminative DL (RDDL) model. The proposed model uses sample diversities of the same face image to learn a robust dictionary, which includes class-specific dictionary atoms and disturbance dictionary atoms. These atoms can well represent the data from different classes. Discriminative regularizations on the dictionary and the representation coefficients are used to exploit discriminative information, which improves effectively the classification capability of the dictionary. The proposed RDDL is extensively evaluated on benchmark face image databases, and it shows superior performance to many state-of-the-art dictionary learning methods for face recognition.

Style APA, Harvard, Vancouver, ISO itp.

8

Singh, Geetika, i Indu Chhabra. "Integrating Global Zernike and Local Discriminative HOG Features for Face Recognition". International Journal of Image and Graphics 16, nr 04 (październik 2016): 1650021. http://dx.doi.org/10.1142/s0219467816500212.

Pełny tekst źródła

Streszczenie:

Extraction of global face appearance and local interior differences is essential for any face recognition application. This paper presents a novel framework for face recognition by combining two effective descriptors namely, Zernike moments (ZM) and histogram of oriented gradients (HOG). ZMs are global descriptors that are invariant to image rotation, noise and scale. HOGs capture local details and are robust to illumination changes. Fusion of these two descriptors combines the merits of both local and global approaches and is effective against diverse variations present in face images. Further, as the processing time of HOG features is high owing to its large dimensionality, so, the study proposes to improve its performance by selecting only most discriminative HOG features (named discriminative HOG (DHOG)) for performing recognition. Efficacy of the proposed methods (DHOG, [Formula: see text] and [Formula: see text]) is tested on ORL, Yale and FERET databases. DHOG provides an improvement of 3% to 5% over the existing HOG approach. Recognition results achieved by [Formula: see text] and [Formula: see text] are up to 15% and 18% higher respectively than those obtained with these descriptors individually. Performance is also analyzed on LFW face database and compared with recent and state-of-the-art methods.

Style APA, Harvard, Vancouver, ISO itp.

9

SCHWARTZ, WILLIAM ROBSON, i HELIO PEDRINI. "IMPROVED FRACTAL IMAGE COMPRESSION BASED ON ROBUST FEATURE DESCRIPTORS". International Journal of Image and Graphics 11, nr 04 (październik 2011): 571–87. http://dx.doi.org/10.1142/s0219467811004251.

Pełny tekst źródła

Streszczenie:

Fractal image compression is one of the most promising techniques for image compression due to advantages such as resolution independence and fast decompression. It exploits the fact that natural scenes present self-similarity to remove redundancy and obtain high compression rates with smaller quality degradation compared to traditional compression methods. The main drawback of fractal compression is its computationally intensive encoding process, due to the need for searching regions with high similarity in the image. Several approaches have been developed to reduce the computational cost to locate similar regions. In this work, we propose a method based on robust feature descriptors to speed up the encoding time. The use of robust features provides more discriminative and representative information for regions of the image. When the regions are better represented, the search for similar parts of the image can be reduced to focus only on the most likely matching candidates, which leads to reduction on the computational time. Our experimental results show that the use of robust feature descriptors reduces the encoding time while keeping high compression rates and reconstruction quality.

Style APA, Harvard, Vancouver, ISO itp.

10

Chen, Si, Dong Yan i Yan Yan. "Directional Correlation Filter Bank for Robust Head Pose Estimation and Face Recognition". Mathematical Problems in Engineering 2018 (21.10.2018): 1–10. http://dx.doi.org/10.1155/2018/1923063.

Pełny tekst źródła

Streszczenie:

During the past few decades, face recognition has been an active research area in pattern recognition and computer vision due to its wide range of applications. However, one of the most challenging problems encountered by face recognition is the difficulty of handling large head pose variations. Therefore, the efficient and effective head pose estimation is a critical step of face recognition. In this paper, a novel feature extraction framework, called Directional Correlation Filter Bank (DCFB), is presented for head pose estimation. Specifically, in the proposed framework, the 1-Dimensional Optimal Tradeoff Filters (1D-OTF) corresponding to different head poses are simultaneously and jointly designed in the low-dimensional linear subspace. Different from the traditional methods that heavily rely on the precise localization of the key facial feature points, our proposed framework exploits the frequency domain of the face images, which effectively captures the high-order statistics of faces. As a result, the obtained features are compact and discriminative. Experimental results on public face databases with large head pose variations show the superior performance obtained by the proposed framework on the tasks of both head pose estimation and face recognition.

Style APA, Harvard, Vancouver, ISO itp.

11

de Souza, Gustavo Botelho, Daniel Felipe da Silva Santos, Rafael Gonçalves Pires, Aparecido Nilceu Marana i João Paulo Papa. "Deep Features Extraction for Robust Fingerprint Spoofing Attack Detection". Journal of Artificial Intelligence and Soft Computing Research 9, nr 1 (1.01.2019): 41–49. http://dx.doi.org/10.2478/jaiscr-2018-0023.

Pełny tekst źródła

Streszczenie:

Abstract Biometric systems have been widely considered as a synonym of security. However, in recent years, malicious people are violating them by presenting forged traits, such as gelatin fingers, to fool their capture sensors (spoofing attacks). To detect such frauds, methods based on traditional image descriptors have been developed, aiming liveness detection from the input data. However, due to their handcrafted approaches, most of them present low accuracy rates in challenging scenarios. In this work, we propose a novel method for fingerprint spoofing detection using the Deep Boltzmann Machines (DBM) for extraction of high-level features from the images. Such deep features are very discriminative, thus making complicated the task of forgery by attackers. Experiments show that the proposed method outperforms other state-of-the-art techniques, presenting high accuracy regarding attack detection.

Style APA, Harvard, Vancouver, ISO itp.

12

Bao, Junqi, Xiaochen Yuan, Guoheng Huang i Chan-Tong Lam. "Point Cloud Plane Segmentation-Based Robust Image Matching for Camera Pose Estimation". Remote Sensing 15, nr 2 (13.01.2023): 497. http://dx.doi.org/10.3390/rs15020497.

Pełny tekst źródła

Streszczenie:

The mainstream image matching method for recovering the motion of the camera is based on local feature matching, which faces the challenges of rotation, illumination, and the presence of dynamic objects. In addition, local feature matching relies on the distance between descriptors, which easily leads to lots of mismatches. In this paper, we propose a new robust image matching method for camera pose estimation, called IM_CPE. It is a novel descriptor matching method combined with 3-D point clouds for image matching. Specifically, we propose to extract feature points based on a pair of matched point cloud planes, which are generated and segmented based on depth images. Then, the feature points are matched based on the distance between their corresponding 3-D points on the point cloud planes and the distance between their descriptors. Moreover, the robustness of the matching can be guaranteed by the centroid distance of the matched point cloud planes. We evaluate the performance of IM_CPE using four well-known key point extraction algorithms, namely Scale-Invariant Feature Transform (SIFT), Speed Up Robust Feature (SURF), Features from Accelerated Segment Test (FAST), and Oriented FAST and Rotated Brief (ORB), with four sequences from the TUM RGBD dataset. According to the experimental results, compared to the original SIFT, SURF, FAST, and ORB algorithms, the NN_mAP performance of the four key point algorithms has been improved by 11.25%, 13.98%, 16.63%, and 10.53% on average, respectively, and the M.Score has also been improved by 25.15%, 23.05%, 22.28%, and 11.05% on average, respectively. The results show that the IM_CPE can be combined with the existing key points extraction algorithms and the IM_CPE can significantly improve the performance of these key points algorithms.

Style APA, Harvard, Vancouver, ISO itp.

13

Rabidas, Rinku, Abhishek Midya, Jayasree Chakraborty i Wasim Arif. "Multi-Resolution Analysis of Edge-Texture Features for Mammographic Mass Classification". Journal of Circuits, Systems and Computers 29, nr 10 (9.12.2019): 2050156. http://dx.doi.org/10.1142/s021812662050156x.

Pełny tekst źródła

Streszczenie:

In this paper, multi-resolution analysis of two edge-texture based descriptors, Discriminative Robust Local Binary Pattern (DRlbp) and Discriminative Robust Local Ternary Pattern (DRltp), are proposed for the determination of mammographic masses as benign or malignant. As an extension of Local Binary Pattern (LBP) and Local Ternary Pattern (LTP), DRlbp and LTP-based features overcome the drawbacks of these features preserving the edge information along with texture. With the hypothesis that multi-resolution analysis of these features for different regions related to mammaographic masses with wavelet transform will capture more discriminating patterns and thus can help in characterizing masses. In order to evaluate the efficiency of the proposed approach, several experiments are carried out using the mini-MIAS database where a 5-fold cross validation technique is incorporated with Support Vector Machine (SVM) on the optimal set of features obtained via stepwise logistic regression method. An area under the receiver operating characteristic (ROC) curve ([Formula: see text] value) of 0.96 is achieved with DRlbp attributes as the best performance. The superiority of the proposed scheme is established by comparing the obtained results with recently developed other competing schemes.

Style APA, Harvard, Vancouver, ISO itp.

14

Dubé, Renaud, Andrei Cramariuc, Daniel Dugas, Hannes Sommer, Marcin Dymczyk, Juan Nieto, Roland Siegwart i Cesar Cadena. "SegMap: Segment-based mapping and localization using data-driven descriptors". International Journal of Robotics Research 39, nr 2-3 (10.07.2019): 339–55. http://dx.doi.org/10.1177/0278364919863090.

Pełny tekst źródła

Streszczenie:

Precisely estimating a robot’s pose in a prior, global map is a fundamental capability for mobile robotics, e.g., autonomous driving or exploration in disaster zones. This task, however, remains challenging in unstructured, dynamic environments, where local features are not discriminative enough and global scene descriptors only provide coarse information. We therefore present SegMap: a map representation solution for localization and mapping based on the extraction of segments in 3D point clouds. Working at the level of segments offers increased invariance to view-point and local structural changes, and facilitates real-time processing of large-scale 3D data. SegMap exploits a single compact data-driven descriptor for performing multiple tasks: global localization, 3D dense map reconstruction, and semantic information extraction. The performance of SegMap is evaluated in multiple urban driving and search and rescue experiments. We show that the learned SegMap descriptor has superior segment retrieval capabilities, compared with state-of-the-art handcrafted descriptors. As a consequence, we achieve a higher localization accuracy and a 6% increase in recall over state-of-the-art handcrafted descriptors. These segment-based localizations allow us to reduce the open-loop odometry drift by up to 50%. SegMap is open-source available along with easy to run demonstrations.

Style APA, Harvard, Vancouver, ISO itp.

15

Deli, Yan, Tuo Wenkun, Wang Weiming i Li Shaohua. "Illumination Robust Loop Closure Detection with the Constraint of Pose". Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) 13, nr 7 (4.11.2020): 1097–106. http://dx.doi.org/10.2174/2352096513999200422141150.

Pełny tekst źródła

Streszczenie:

Background: Loop closure detection is a crucial part in robot navigation and simultaneous location and mapping (SLAM). Appearance-based loop closure detection still faces many challenges, such as illumination changes, perceptual aliasing and increasing computational complexity. Methods: In this paper, we proposed a visual loop closure detection algorithm that combines illumination robust descriptor DIRD and odometry information. In this algorithm, a new distance function is built by fusing the Euclidean distance function and Mahalanobis distance function, which integrates the pose uncertainty of body and can dynamically adjust the threshold of potential loop closure locations. Then, potential locations are verified by calculating the similarity of DIRD descriptors. Results: The proposed algorithm is evaluated on KITTI and EuRoC datasets, and is compared with SeqSLAM algorithm, which is one of the state of the art loop closure detection algorithms. The results show that the proposed algorithm could effectively reduce the computing time and get better performance on P-R curve. Conclusion: The new loop closure detection method makes full use of odometry information and image appearance information. The application of the new distance function can effectively reduce the missed detection caused by odometry error accumulation. The algorithm does not require extracting image features or learning stage, and can realize real-time detection and run on the platform with limited computational power.

Style APA, Harvard, Vancouver, ISO itp.

16

Li, Fuqiang, Tongzhuang Zhang, Yong Liu i Feiqi Long. "Deep Residual Vector Encoding for Vein Recognition". Electronics 11, nr 20 (13.10.2022): 3300. http://dx.doi.org/10.3390/electronics11203300.

Pełny tekst źródła

Streszczenie:

Vein recognition has been drawing more attention recently because it is highly secure and reliable for practical biometric applications. However, underlying issues such as uneven illumination, low contrast, and sparse patterns with high inter-class similarities make the traditional vein recognition systems based on hand-engineered features unreliable. Recent successes of convolutional neural networks (CNNs) for large-scale image recognition tasks motivate us to replace the traditional hand-engineered features with the superior CNN to design a robust and discriminative vein recognition system. To address the difficulty of direct training or fine-tuning of a CNN with existing small-scale vein databases, a new knowledge transfer approach is formulated using pre-trained CNN models together with a training dataset (e.g., ImageNet) as a robust descriptor generation machine. With the generated deep residual descriptors, a very discriminative model, namely deep residual vector encoding (DRVE), is proposed by a hierarchical design of dictionary learning, coding, and classifier training procedures. Rigorous experiments are conducted with a high-quality hand-dorsa vein database, and superior recognition results compared with state-of-the-art models fully demonstrate the effectiveness of the proposed models. An additional experiment with the PolyU multispectral palmprint database is designed to illustrate the generalization ability.

Style APA, Harvard, Vancouver, ISO itp.

17

Zhang, Lei, Yanjie Wang, Honghai Sun, Zhijun Yao i Shuwen He. "Robust Visual Correlation Tracking". Mathematical Problems in Engineering 2015 (2015): 1–13. http://dx.doi.org/10.1155/2015/238971.

Pełny tekst źródła

Streszczenie:

Recent years have seen greater interests in the tracking-by-detection methods in the visual object tracking, because of their excellent tracking performance. But most existing methods fix the scale which makes the trackers unreliable to handle large scale variations in complex scenes. In this paper, we decompose the tracking into target translation and scale prediction. We adopt a scale estimation approach based on the tracking-by-detection framework, develop a new model update scheme, and present a robust correlation tracking algorithm with discriminative correlation filters. The approach works by learning the translation and scale correlation filters. We obtain the target translation and scale by finding the maximum output response of the learned correlation filters and then online update the target models. Extensive experiments results on 12 challenging benchmark sequences show that the proposed tracking approach reduces the average center location error (CLE) by 6.8 pixels, significantly improves the performance by 17.5% in the average success rate (SR) and by 5.4% in the average distance precision (DP) compared to the second best one of the other five excellent existing tracking algorithms, and is robust to appearance variations introduced by scale variations, pose variations, illumination changes, partial occlusion, fast motion, rotation, and background clutter.

Style APA, Harvard, Vancouver, ISO itp.

18

Alharithi, Fahd, Ahmed Almulihi, Sami Bourouis, Roobaea Alroobaea i Nizar Bouguila. "Discriminative Learning Approach Based on Flexible Mixture Model for Medical Data Categorization and Recognition". Sensors 21, nr 7 (2.04.2021): 2450. http://dx.doi.org/10.3390/s21072450.

Pełny tekst źródła

Streszczenie:

In this paper, we propose a novel hybrid discriminative learning approach based on shifted-scaled Dirichlet mixture model (SSDMM) and Support Vector Machines (SVMs) to address some challenging problems of medical data categorization and recognition. The main goal is to capture accurately the intrinsic nature of biomedical images by considering the desirable properties of both generative and discriminative models. To achieve this objective, we propose to derive new data-based SVM kernels generated from the developed mixture model SSDMM. The proposed approach includes the following steps: the extraction of robust local descriptors, the learning of the developed mixture model via the expectation–maximization (EM) algorithm, and finally the building of three SVM kernels for data categorization and classification. The potential of the implemented framework is illustrated through two challenging problems that concern the categorization of retinal images into normal or diabetic cases and the recognition of lung diseases in chest X-rays (CXR) images. The obtained results demonstrate the merits of our hybrid approach as compared to other methods.

Style APA, Harvard, Vancouver, ISO itp.

19

Bashiri, Fereshteh S., Reihaneh Rostami, Peggy Peissig, Roshan M. D’Souza i Zeyun Yu. "An Application of Manifold Learning in Global Shape Descriptors". Algorithms 12, nr 8 (16.08.2019): 171. http://dx.doi.org/10.3390/a12080171.

Pełny tekst źródła

Streszczenie:

With the rapid expansion of applied 3D computational vision, shape descriptors have become increasingly important for a wide variety of applications and objects from molecules to planets. Appropriate shape descriptors are critical for accurate (and efficient) shape retrieval and 3D model classification. Several spectral-based shape descriptors have been introduced by solving various physical equations over a 3D surface model. In this paper, for the first time, we incorporate a specific manifold learning technique, introduced in statistics and machine learning, to develop a global, spectral-based shape descriptor in the computer graphics domain. The proposed descriptor utilizes the Laplacian Eigenmap technique in which the Laplacian eigenvalue problem is discretized using an exponential weighting scheme. As a result, our descriptor eliminates the limitations tied to the existing spectral descriptors, namely dependency on triangular mesh representation and high intra-class quality of 3D models. We also present a straightforward normalization method to obtain a scale-invariant and noise-resistant descriptor. The extensive experiments performed in this study using two standard 3D shape benchmarks—high-resolution TOSCA and McGill datasets—demonstrate that the present contribution provides a highly discriminative and robust shape descriptor under the presence of a high level of noise, random scale variations, and low sampling rate, in addition to the known isometric-invariance property of the Laplace–Beltrami operator. The proposed method significantly outperforms state-of-the-art spectral descriptors in shape retrieval and classification. The proposed descriptor is limited to closed manifolds due to its inherited inability to accurately handle manifolds with boundaries.

Style APA, Harvard, Vancouver, ISO itp.

20

Xie, Shaorong, Chao Pan, Yaxin Peng, Ke Liu i Shihui Ying. "Large-Scale Place Recognition Based on Camera-LiDAR Fused Descriptor". Sensors 20, nr 10 (19.05.2020): 2870. http://dx.doi.org/10.3390/s20102870.

Pełny tekst źródła

Streszczenie:

In the field of autonomous driving, carriers are equipped with a variety of sensors, including cameras and LiDARs. However, the camera suffers from problems of illumination and occlusion, and the LiDAR encounters motion distortion, degenerate environment and limited ranging distance. Therefore, fusing the information from these two sensors deserves to be explored. In this paper, we propose a fusion network which robustly captures both the image and point cloud descriptors to solve the place recognition problem. Our contribution can be summarized as: (1) applying the trimmed strategy in the point cloud global feature aggregation to improve the recognition performance, (2) building a compact fusion framework which captures both the robust representation of the image and 3D point cloud, and (3) learning a proper metric to describe the similarity of our fused global feature. The experiments on KITTI and KAIST datasets show that the proposed fused descriptor is more robust and discriminative than the single sensor descriptor.

Style APA, Harvard, Vancouver, ISO itp.

21

Ghimire, Pamir, Igor Jovančević i Jean-José Orteu. "Learning Local Descriptor for Comparing Renders with Real Images". Applied Sciences 11, nr 8 (7.04.2021): 3301. http://dx.doi.org/10.3390/app11083301.

Pełny tekst źródła

Streszczenie:

We present a method to train a deep-network-based feature descriptor to calculate discriminative local descriptions from renders and corresponding real images with similar geometry. We are interested in using such descriptors for automatic industrial visual inspection whereby the inspection camera has been coarsely localized with respect to a relatively large mechanical assembly and presence of certain components needs to be checked compared to the reference computer-aided design model (CAD). We aim to perform the task by comparing the real inspection image with the render of textureless 3D CAD using the learned descriptors. The descriptor was trained to capture geometric features while staying invariant to image domain. Patch pairs for training the descriptor were extracted in a semisupervised manner from a small data set of 100 pairs of real images and corresponding renders that were manually finely registered starting from a relatively coarse localization of the inspection camera. Due to the small size of the training data set, the descriptor network was initialized with weights from classification training on ImageNet. A two-step training is proposed for addressing the problem of domain adaptation. The first, “bootstrapping”, is a classification training to obtain good initial weights for second training step, triplet-loss training, that provides weights for extracting the discriminative features comparable using l2 distance. The descriptor was tested for comparing renders and real images through two approaches: finding local correspondences between the images through nearest neighbor matching and transforming the images into Bag of Visual Words (BoVW) histograms. We observed that learning a robust cross-domain descriptor is feasible, even with a small data set, and such features might be of interest for CAD-based inspection of mechanical assemblies, and related applications such as tracking or finely registered augmented reality. To the best of our knowledge, this is the first work that reports learning local descriptors for comparing renders with real inspection images.

Style APA, Harvard, Vancouver, ISO itp.

22

Thenmozhi, M., i P. Gnanaskanda Parthiban. "Robust Face Recognition from NIR Dataset via Sparse Representation". Applied Mechanics and Materials 573 (czerwiec 2014): 495–500. http://dx.doi.org/10.4028/www.scientific.net/amm.573.495.

Pełny tekst źródła

Streszczenie:

A biometric identification system may be a pc application for mechanically distinctive or confirmative of an individual from a digital image or a video frame from a video supply. One in all the ways that to try and do this can be by examination designated face expression from the image and a facial information. This paper planned dynamic face recognition from near-infrared images by exploitation sparse representation classifier. Most of the prevailing datasets for facial expressions are captured in a very visible light spectrum. However, the visible light (VIS) will modify with time and placement, causing important variations in look and texture. This new framework was designed to attain strength to pose variation and occlusion and to resolve uncontrolled environmental illumination for reliable biometric identification. This paper gift a unique analysis on a dynamic facial features recognition, exploitation near-infrared (NIR) datasets and LBP(Local binary patterns) feature descriptors. It shows sensible and strong results against illumination variations by exploitation infrared imaging system.

Style APA, Harvard, Vancouver, ISO itp.

23

García-Olalla, Óscar, Laura Fernández-Robles, Enrique Alegre, Manuel Castejón-Limas i Eduardo Fidalgo. "Boosting Texture-Based Classification by Describing Statistical Information of Gray-Levels Differences". Sensors 19, nr 5 (1.03.2019): 1048. http://dx.doi.org/10.3390/s19051048.

Pełny tekst źródła

Streszczenie:

This paper presents a new texture descriptor booster, Complete Local Oriented Statistical Information Booster (CLOSIB), based on statistical information of the image. Our proposal uses the statistical information of the texture provided by the image gray-levels differences to increase the discriminative capability of Local Binary Patterns (LBP)-based and other texture descriptors. We demonstrated that Half-CLOSIB and M-CLOSIB versions are more efficient and precise than the general one. H-CLOSIB may eliminate redundant statistical information and the multi-scale version, M-CLOSIB, is more robust. We evaluated our method using four datasets: KTH TIPS (2-a) for material recognition, UIUC and USPTex for general texture recognition and JAFFE for face recognition. The results show that when we combine CLOSIB with well-known LBP-based descriptors, the hit rate increases in all the cases, introducing in this way the idea that CLOSIB can be used to enhance the description of texture in a significant number of situations. Additionally, a comparison with recent algorithms demonstrates that a combination of LBP methods with CLOSIB variants obtains comparable results to those of the state-of-the-art.

Style APA, Harvard, Vancouver, ISO itp.

24

Blanco, Jose-Luis, Javier González-Jiménez i Juan-Antonio Fernández-Madrigal. "A robust, multi-hypothesis approach to matching occupancy grid maps". Robotica 31, nr 5 (11.01.2013): 687–701. http://dx.doi.org/10.1017/s0263574712000732.

Pełny tekst źródła

Streszczenie:

SUMMARYThis paper presents a new approach to matching occupancy grid maps by means of finding correspondences between a set of sparse features detected in the maps. The problem is stated here as a special instance of generic image registration. To cope with the uncertainty and ambiguity that arise from matching grid maps, we introduce a modified RANSAC algorithm which searches for a dynamic number of internally consistent subsets of feature pairings from which to compute hypotheses about the translation and rotation between the maps. By providing a (possibly multi-modal) probability distribution of the relative pose of the maps, our method can be seamlessly integrated into large-scale mapping frameworks for mobile robots. This paper provides a benchmarking of different detectors and descriptors, along extensive experimental results that illustrate the robustness of the algorithm with a 97% success ratio in loop-closure detection for ~1700 matchings between local maps obtained from four publicly available datasets.

Style APA, Harvard, Vancouver, ISO itp.

25

Patruno, Cosimo, Roberto Colella, Massimiliano Nitti, Vito Renò, Nicola Mosca i Ettore Stella. "A Vision-Based Odometer for Localization of Omnidirectional Indoor Robots". Sensors 20, nr 3 (6.02.2020): 875. http://dx.doi.org/10.3390/s20030875.

Pełny tekst źródła

Streszczenie:

In this paper we tackle the problem of indoor robot localization by using a vision-based approach. Specifically, we propose a visual odometer able to give back the relative pose of an omnidirectional automatic guided vehicle (AGV) that moves inside an indoor industrial environment. A monocular downward-looking camera having the optical axis nearly perpendicular to the ground floor, is used for collecting floor images. After a preliminary analysis of images aimed at detecting robust point features (keypoints) takes place, specific descriptors associated to the keypoints enable to match the detected points to their consecutive frames. A robust correspondence feature filter based on statistical and geometrical information is devised for rejecting those incorrect matchings, thus delivering better pose estimations. A camera pose compensation is further introduced for ensuring better positioning accuracy. The effectiveness of proposed methodology has been proven through several experiments, in laboratory as well as in an industrial setting. Both quantitative and qualitative evaluations have been made. Outcomes have shown that the method provides a final positioning percentage error of 0.21% on an average distance of 17.2 m. A longer run in an industrial context has provided comparable results (a percentage error of 0.94% after about 80 m). The average relative positioning error is about 3%, which is still in good agreement with current state of the art.

Style APA, Harvard, Vancouver, ISO itp.

26

Peng, Yu Qing, Wei Liu, Cui Cui Zhao i Tie Jun Li. "Detection of Violent Video with Audio-Visual Features Based on MPEG-7". Applied Mechanics and Materials 411-414 (wrzesień 2013): 1002–7. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.1002.

Pełny tekst źródła

Streszczenie:

In order to solve the problem that there isn’t an effective way to detect the violent video in the network, a new method using MPEG-7 audio and visual features to detect violent video was put forward. In feature extraction, the new method targeted chosen the features about audio, color, space, time, motion. Parts of MPEG-7 descriptors were added and improved: instantaneous feature of audio was added, motion intensity descriptor was customized, and a new method to extract dominant color of video was proposed. BP neural network optimized by GA was used to fuse the features. Experiment shows that these selected features are representative, discriminative and can reduce the data redundancy. Fusion model of neural network is more robust. And the method of fusing audio and visual features improves the recall and precision of video detecting.

Style APA, Harvard, Vancouver, ISO itp.

27

Singh, Natesh, i Bruno O. Villoutreix. "A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces". International Journal of Molecular Sciences 23, nr 22 (18.11.2022): 14364. http://dx.doi.org/10.3390/ijms232214364.

Pełny tekst źródła

Streszczenie:

The modulation of protein–protein interactions (PPIs) by small chemical compounds is challenging. PPIs play a critical role in most cellular processes and are involved in numerous disease pathways. As such, novel strategies that assist the design of PPI inhibitors are of major importance. We previously reported that the knowledge-based DLIGAND2 scoring tool was the best-rescoring function for improving receptor-based virtual screening (VS) performed with the Surflex docking engine applied to several PPI targets with experimentally known active and inactive compounds. Here, we extend our investigation by assessing the vs. potential of other types of scoring functions with an emphasis on docking-pose derived solvent accessible surface area (SASA) descriptors, with or without the use of machine learning (ML) classifiers. First, we explored rescoring strategies of Surflex-generated docking poses with five GOLD scoring functions (GoldScore, ChemScore, ASP, ChemPLP, ChemScore with Receptor Depth Scaling) and with consensus scoring. The top-ranked poses were post-processed to derive a set of protein and ligand SASA descriptors in the bound and unbound states, which were combined to derive descriptors of the docked protein-ligand complexes. Further, eight ML models (tree, bagged forest, random forest, Bayesian, support vector machine, logistic regression, neural network, and neural network with bagging) were trained using the derivatized SASA descriptors and validated on test sets. The results show that many SASA descriptors are better than Surflex and GOLD scoring functions in terms of overall performance and early recovery success on the used dataset. The ML models were superior to all scoring functions and rescoring approaches for most targets yielding up to a seven-fold increase in enrichment factors at 1% of the screened collections. In particular, the neural networks and random forest-based ML emerged as the best techniques for this PPI dataset, making them robust and attractive vs. tools for hit-finding efforts. The presented results suggest that exploring further docking-pose derived SASA descriptors could be valuable for structure-based virtual screening projects, and in the present case, to assist the rational design of small-molecule PPI inhibitors.

Style APA, Harvard, Vancouver, ISO itp.

28

Lei, Song Ze, i Qiang Zhu. "Human Ear Recognition Using Hybrid Filter and Supervised Locality Preserving Projection". Advanced Materials Research 529 (czerwiec 2012): 271–75. http://dx.doi.org/10.4028/www.scientific.net/amr.529.271.

Pełny tekst źródła

Streszczenie:

To solve the difficult problem of human ear recognition caused by variety of ear angle, a novel method which combines hybrid filter with supervised locality preserving projection (SLPP) is proposed. The ear image is firstly filtered by Log-Gabor filter which is constructed with 5 scales and 8 orientations. The important parameters of Log-Gabor filter are selected through experiments. To form effective and discriminative feature too many Log-Gabor coefficients are reduced by discrete cosine transform. Lastly feature is constructed by SLPP to discovery geometrical rules. Experimental results show that compared with the traditional methods, the proposed method obtains higher recognition rate, and is robust to multi-pose of ear recognition.

Style APA, Harvard, Vancouver, ISO itp.

29

Zhang, Baoqing, Zhichun Mu, Hui Zeng i Shuang Luo. "Robust Ear Recognition via Nonnegative Sparse Representation of Gabor Orientation Information". Scientific World Journal 2014 (2014): 1–11. http://dx.doi.org/10.1155/2014/131605.

Pełny tekst źródła

Streszczenie:

Orientation information is critical to the accuracy of ear recognition systems. In this paper, a new feature extraction approach is investigated for ear recognition by using orientation information of Gabor wavelets. The proposed Gabor orientation feature can not only avoid too much redundancy in conventional Gabor feature but also tend to extract more precise orientation information of the ear shape contours. Then, Gabor orientation feature based nonnegative sparse representation classification (Gabor orientation + NSRC) is proposed for ear recognition. Compared with SRC in which the sparse coding coefficients can be negative, the nonnegativity of NSRC conforms to the intuitive notion of combining parts to form a whole and therefore is more consistent with the biological modeling of visual data. Additionally, the use of Gabor orientation features increases the discriminative power of NSRC. Extensive experimental results show that the proposed Gabor orientation feature based nonnegative sparse representation classification paradigm achieves much better recognition performance and is found to be more robust to challenging problems such as pose changes, illumination variations, and ear partial occlusion in real-world applications.

Style APA, Harvard, Vancouver, ISO itp.

30

Bai, Yu-Fei, Hong-Bo Zhang, Qing Lei i Ji-Xiang Du. "Multistage Polymerization Network for Multiperson Pose Estimation". Journal of Sensors 2021 (29.12.2021): 1–10. http://dx.doi.org/10.1155/2021/1484218.

Pełny tekst źródła

Streszczenie:

Multiperson pose estimation is an important and complex problem in computer vision. It is regarded as the problem of human skeleton joint detection and solved by the joint heat map regression network in recent years. The key of achieving accurate pose estimation is to learn robust and discriminative feature maps. Although the current methods have made significant progress through interlayer fusion and intralevel fusion of feature maps, few works pay attention to the combination of the two methods. In this paper, we propose a multistage polymerization network (MPN) for multiperson pose estimation. The MPN continuously learns rich underlying spatial information by fusing features within the layers. The MPN also adds hierarchical connections between feature maps at the same resolution for interlayer fusion, so as to reuse low-level spatial information and refine high-level semantic information to obtain accurate keypoint representation. In addition, we observe a lack of connection between the output low-level information and the high-level information. To solve this problem, an effective shuffled attention mechanism (SAM) is proposed. The shuffle aims to promote the cross-channel information exchange between pyramid feature maps, while attention makes a trade-off between the low-level and high-level representations of the output features. As a result, the relationship between the space and the channel of the feature map is further enhanced. Evaluation of the proposed method is carried out on public datasets, and experimental results show that our method has better performance than current methods.

Style APA, Harvard, Vancouver, ISO itp.

31

Oad, Ammar, Karishma Kumari, Imtiaz Hussain, Feng Dong, Bacha Hammad i Rajkumari Oad. "Performance comparison of ORB, SURF and SIFT using Intracranial Haemorrhage CTScan Brain images". International Journal of Artificial Intelligence & Mathematical Sciences 1, nr 2 (31.01.2023): 26–34. http://dx.doi.org/10.58921/ijaims.v1i2.41.

Pełny tekst źródła

Streszczenie:

Medical images are crucial for both the doctor's accurate diagnosis and the patient's subsequent therapy. It is feasible to swiftly identify lesions in medical photos by using clever algorithms, and it is crucial to extract information from images. Feature extraction is an important step in image classification. It allows the representation of the content of images as perfectly as possible. The intention of this study is to certain overall performance assessment among the feature detector and the descriptor method, especially while there are numerous combos for assessment. Three techniques were decided on for the feature descriptors: ORB (Oriented FAST and Rotated BRIEF), SIFT (Scale Invariant Feature Transformation), and SURF (Accelerated Robust Feature) and to calculate matching evaluation parameters, for example, the number of key points in the image, Execution time required for each algorithm and to find the best match. The dataset was taken from Kaggle, which contained 170 CTScan images of the brain with intracranial hemorrhage masks. The brute force method is used to achieve feature matching. Performance analysis shows the discriminative power of various combinations of detector and descriptor methods. SURF algorithm is the best and most robust in CTScan imaging to help medical diagnosis.

Style APA, Harvard, Vancouver, ISO itp.

32

Ni, Peishuang, Yanyang Liu, Hao Pei, Haoze Du, Haolin Li i Gang Xu. "CLISAR-Net: A Deformation-Robust ISAR Image Classification Network Using Contrastive Learning". Remote Sensing 15, nr 1 (21.12.2022): 33. http://dx.doi.org/10.3390/rs15010033.

Pełny tekst źródła

Streszczenie:

The inherent unknown deformations of inverse synthetic aperture radar (ISAR) images, such as translation, scaling, and rotation, pose great challenges to space target classification. To achieve high-precision classification for ISAR images, a deformation-robust ISAR image classification network using contrastive learning (CL), i.e., CLISAR-Net, is proposed for deformation ISAR image classification. Unlike traditional supervised learning methods, CLISAR-Net develops a new unsupervised pretraining phase, which means that the method uses a two-phase training strategy to achieve classification. In the unsupervised pretraining phase, combined with data augmentation, positive and negative sample pairs are constructed using unlabeled ISAR images, and then the encoder is trained to learn discriminative deep representations of deformation ISAR images by means of CL. In the fine-tuning phase, based on the deep representations obtained from pretraining, a classifier is fine-tuned using a small number of labeled ISAR images, and finally, the deformation ISAR image classification is realized. In the experimental analysis, CLISAR-Net achieves higher classification accuracy than supervised learning methods for unknown scaled, rotated, and combined deformations. It implies that CLISAR-Net learned more robust deep features of deformation ISAR images through CL, which ensures the performance of the subsequent classification.

Style APA, Harvard, Vancouver, ISO itp.

33

Tavares, Gabriel, i Sylvio Barbon. "Matching business process behavior with encoding techniques via meta-learning: An anomaly detection study". Computer Science and Information Systems, nr 00 (2023): 5. http://dx.doi.org/10.2298/csis220110005t.

Pełny tekst źródła

Streszczenie:

Recording anomalous traces in business processes diminishes an event log?s quality. The abnormalities may represent bad execution, security issues, or deviant behavior. Focusing on mitigating this phenomenon, organizations spend efforts to detect anomalous traces in their business processes to save resources and improve process execution. However, in many real-world environments, reference models are unavailable, requiring expert assistance and increasing costs. The con15 siderable number of techniques and reduced availability of experts pose an additional challenge for particular scenarios. In this work, we combine the representational power of encoding with a Meta-learning strategy to enhance the detection of anomalous traces in event logs towards fitting the best discriminative capability be tween common and irregular traces. Our approach creates an event log profile and recommends the most suitable encoding technique to increase the anomaly detetion performance. We used eight encoding techniques from different families, 80 log descriptors, 168 event logs, and six anomaly types for experiments. Results indicate that event log characteristics influence the representational capability of encodings. Moreover, we investigate the process behavior?s influence for choosing the suitable encoding technique, demonstrating that traditional process mining analysis can be leveraged when matched with intelligent decision support approaches.

Style APA, Harvard, Vancouver, ISO itp.

34

Mahmud, Hasan, Md Kamrul Hasan, Abdullah-Al-Tariq, Md Hasanul Kabir i M. A. Mottalib. "Recognition of Symbolic Gestures Using Depth Information". Advances in Human-Computer Interaction 2018 (19.11.2018): 1–13. http://dx.doi.org/10.1155/2018/1069823.

Pełny tekst źródła

Streszczenie:

Symbolic gestures are the hand postures with some conventionalized meanings. They are static gestures that one can perform in a very complex environment containing variations in rotation and scale without using voice. The gestures may be produced in different illumination conditions or occluding background scenarios. Any hand gesture recognition system should find enough discriminative features, such as hand-finger contextual information. However, in existing approaches, depth information of hand fingers that represents finger shapes is utilized in limited capacity to extract discriminative features of fingers. Nevertheless, if we consider finger bending information (i.e., a finger that overlaps palm), extracted from depth map, and use them as local features, static gestures varying ever so slightly can become distinguishable. Our work here corroborated this idea and we have generated depth silhouettes with variation in contrast to achieve more discriminative keypoints. This approach, in turn, improved the recognition accuracy up to 96.84%. We have applied Scale-Invariant Feature Transform (SIFT) algorithm which takes the generated depth silhouettes as input and produces robust feature descriptors as output. These features (after converting into unified dimensional feature vectors) are fed into a multiclass Support Vector Machine (SVM) classifier to measure the accuracy. We have tested our results with a standard dataset containing 10 symbolic gesture representing 10 numeric symbols (0-9). After that we have verified and compared our results among depth images, binary images, and images consisting of the hand-finger edge information generated from the same dataset. Our results show higher accuracy while applying SIFT features on depth images. Recognizing numeric symbols accurately performed through hand gestures has a huge impact on different Human-Computer Interaction (HCI) applications including augmented reality, virtual reality, and other fields.

Style APA, Harvard, Vancouver, ISO itp.

35

Fu, Yang, Xiaoyang Wang, Yunchao Wei i Thomas Huang. "STA: Spatial-Temporal Attention for Large-Scale Video-Based Person Re-Identification". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 8287–94. http://dx.doi.org/10.1609/aaai.v33i01.33018287.

Pełny tekst źródła

Streszczenie:

In this work, we propose a novel Spatial-Temporal Attention (STA) approach to tackle the large-scale person reidentification task in videos. Different from the most existing methods, which simply compute representations of video clips using frame-level aggregation (e.g. average pooling), the proposed STA adopts a more effective way for producing robust clip-level feature representation. Concretely, our STA fully exploits those discriminative parts of one target person in both spatial and temporal dimensions, which results in a 2-D attention score matrix via inter-frame regularization to measure the importances of spatial parts across different frames. Thus, a more robust clip-level feature representation can be generated according to a weighted sum operation guided by the mined 2-D attention score matrix. In this way, the challenging cases for video-based person re-identification such as pose variation and partial occlusion can be well tackled by the STA. We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and DukeMTMCVideoReID. In particular, the mAP reaches 87.7% on MARS, which significantly outperforms the state-of-the-arts with a large margin of more than 11.6%.

Style APA, Harvard, Vancouver, ISO itp.

36

Wang, Jingtao, Changcai Yang, Lifang Wei i Riqing Chen. "CSCE-Net: Channel-Spatial Contextual Enhancement Network for Robust Point Cloud Registration". Remote Sensing 14, nr 22 (14.11.2022): 5751. http://dx.doi.org/10.3390/rs14225751.

Pełny tekst źródła

Streszczenie:

Seeking reliable correspondences between two scenes is crucial for solving feature-based point cloud registration tasks. In this paper, we propose a novel outlier rejection network, called Channel-Spatial Contextual Enhancement Network (CSCE-Net), to obtain rich contextual information on correspondences, which can effectively remove outliers and improve the accuracy of point cloud registration. To be specific, we design a novel channel-spatial contextual (CSC) block, which is mainly composed of the Channel-Spatial Attention (CSA) layer and the Nonlocal Channel-Spatial Attention (Nonlocal CSA) layer. The CSC block is able to obtain more reliable contextual information, in which the CSA layer can selectively aggregate the mutual information between the channel and spatial dimensions. The Nonlocal CSA layer can compute feature similarity and spatial consistency for each correspondence, and the CSA layer and Nonlocal CSA layer can support each other. In addition, to improve the distinguishing ability between inliers and outliers, we present an advanced seed selection mechanism to select more dependable initial correspondences. Extensive experiments demonstrate that CSCE-Net outperforms state-of-the-art methods for outlier rejection and pose estimation tasks on public datasets with varying 3D local descriptors. In addition, the network parameters of CSCE-Net are reduced from 1.05M to 0.56M compared to the recently learning-based outlier rejection method PointDSC.

Style APA, Harvard, Vancouver, ISO itp.

37

Wei, Leyi, Chen Zhou, Ran Su i Quan Zou. "PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning". Bioinformatics 35, nr 21 (17.04.2019): 4272–80. http://dx.doi.org/10.1093/bioinformatics/btz246.

Pełny tekst źródła

Streszczenie:

Abstract Motivation Prediction of therapeutic peptides is critical for the discovery of novel and efficient peptide-based therapeutics. Computational methods, especially machine learning based methods, have been developed for addressing this need. However, most of existing methods are peptide-specific; currently, there is no generic predictor for multiple peptide types. Moreover, it is still challenging to extract informative feature representations from the perspective of primary sequences. Results In this study, we have developed PEPred-Suite, a bioinformatics tool for the generic prediction of therapeutic peptides. In PEPred-Suite, we introduce an adaptive feature representation strategy that can learn the most representative features for different peptide types. To be specific, we train diverse sequence-based feature descriptors, integrate the learnt class information into our features, and utilize a two-step feature optimization strategy based on the area under receiver operating characteristic curve to extract the most discriminative features. Using the learnt representative features, we trained eight random forest models for eight different types of functional peptides, respectively. Benchmarking results showed that as compared with existing predictors, PEPred-Suite achieves better and robust performance for different peptides. As far as we know, PEPred-Suite is currently the first tool that is capable of predicting so many peptide types simultaneously. In addition, our work demonstrates that the learnt features can reliably predict different peptides. Availability and implementation The user-friendly webserver implementing the proposed PEPred-Suite is freely accessible at http://server.malab.cn/PEPred-Suite. Supplementary information Supplementary data are available at Bioinformatics online.

Style APA, Harvard, Vancouver, ISO itp.

38

Gu, Bo, Jianxun Liu, Huiyuan Xiong, Tongtong Li i Yuelong Pan. "ECPC-ICP: A 6D Vehicle Pose Estimation Method by Fusing the Roadside Lidar Point Cloud and Road Feature". Sensors 21, nr 10 (17.05.2021): 3489. http://dx.doi.org/10.3390/s21103489.

Pełny tekst źródła

Streszczenie:

In the vehicle pose estimation task based on roadside Lidar in cooperative perception, the measurement distance, angle, and laser resolution directly affect the quality of the target point cloud. For incomplete and sparse point clouds, current methods are either less accurate in correspondences solved by local descriptors or not robust enough due to the reduction of effective boundary points. In response to the above weakness, this paper proposed a registration algorithm Environment Constraint Principal Component-Iterative Closest Point (ECPC-ICP), which integrated road information constraints. The road normal feature was extracted, and the principal component of the vehicle point cloud matrix under the road normal constraint was calculated as the initial pose result. Then, an accurate 6D pose was obtained through point-to-point ICP registration. According to the measurement characteristics of the roadside Lidars, this paper defined the point cloud sparseness description. The existing algorithms were tested on point cloud data with different sparseness. The simulated experimental results showed that the positioning MAE of ECPC-ICP was about 0.5% of the vehicle scale, the orientation MAE was about 0.26°, and the average registration success rate was 95.5%, which demonstrated an improvement in accuracy and robustness compared with current methods. In the real test environment, the positioning MAE was about 2.6% of the vehicle scale, and the average time cost was 53.19 ms, proving the accuracy and effectiveness of ECPC-ICP in practical applications.

Style APA, Harvard, Vancouver, ISO itp.

39

Wu, Shaojun, i Ling Gao. "Multi-Level Joint Feature Learning for Person Re-Identification". Algorithms 13, nr 5 (29.04.2020): 111. http://dx.doi.org/10.3390/a13050111.

Pełny tekst źródła

Streszczenie:

In person re-identification, extracting image features is an important step when retrieving pedestrian images. Most of the current methods only extract global features or local features of pedestrian images. Some inconspicuous details are easily ignored when learning image features, which is not efficient or robust to for scenarios with large differences. In this paper, we propose a Multi-level Feature Fusion model that combines both global features and local features of images through deep learning networks to generate more discriminative pedestrian descriptors. Specifically, we extract local features from different depths of network by the Part-based Multi-level Net to fuse low-to-high level local features of pedestrian images. Global-Local Branches are used to extract the local features and global features at the highest level. The experiments have proved that our deep learning model based on multi-level feature fusion works well in person re-identification. The overall results outperform the state of the art with considerable margins on three widely-used datasets. For instance, we achieve 96% Rank-1 accuracy on the Market-1501 dataset and 76.1% mAP on the DukeMTMC-reID dataset, outperforming the existing works by a large margin (more than 6%).

Style APA, Harvard, Vancouver, ISO itp.

40

Bayoumi, Randa Mohamed, Elsayed E. Hemayed, Mohammad Ehab Ragab i Magda B. Fayek. "Person Re-Identification via Pyramid Multipart Features and Multi-Attention Framework". Big Data and Cognitive Computing 6, nr 1 (9.02.2022): 20. http://dx.doi.org/10.3390/bdcc6010020.

Pełny tekst źródła

Streszczenie:

Video-based person re-identification has become quite attractive due to its importance in many vision surveillance problems. It is a challenging topic due to the inter/intra changes, occlusion, and pose variations involved. In this paper, we propose a pyramid-attentive framework that relies on multi-part features and multiple attention to aggregate features of multi-levels and learns attention-based representations of persons through various aspects. Self-attention is used to strengthen the most discriminative features in the spatial and channel domains and hence capture robust global information. We propose the use of part-relation attention between different multi-granularities of features’ representation to focus on learning appropriate local features. Temporal attention is used to aggregate temporal features. We integrate the most robust features in the global and multi-level views to build an effective convolution neural network (CNN) model. The proposed model outperforms the previous state-of-the art models on three datasets. Notably, using the proposed model enables the achievement of 98.9% (a relative improvement of 2.7% on the GRL) top1 accuracy and 99.3% mAP on the PRID2011, and 92.8% (a relative improvement of 2.4% relative to GRL) top1 accuracy on iLIDS-vid. We also explore the generalization ability of our model on a cross dataset.

Style APA, Harvard, Vancouver, ISO itp.

41

Liu, Jingyang, Yucheng Xu, Lu Zhou i Lei Sun. "PCRMLP: A Two-Stage Network for Point Cloud Registration in Urban Scenes". Sensors 23, nr 12 (20.06.2023): 5758. http://dx.doi.org/10.3390/s23125758.

Pełny tekst źródła

Streszczenie:

Point cloud registration plays a crucial role in 3D mapping and localization. Urban scene point clouds pose significant challenges for registration due to their large data volume, similar scenarios, and dynamic objects. Estimating the location by instances (bulidings, traffic lights, etc.) in urban scenes is a more humanized matter. In this paper, we propose PCRMLP (point cloud registration MLP), a novel model for urban scene point cloud registration that achieves comparable registration performance to prior learning-based methods. Compared to previous works that focused on extracting features and estimating correspondence, PCRMLP estimates transformation implicitly from concrete instances. The key innovation lies in the instance-level urban scene representation method, which leverages semantic segmentation and density-based spatial clustering of applications with noise (DBSCAN) to generate instance descriptors, enabling robust feature extraction, dynamic object filtering, and logical transformation estimation. Then, a lightweight network consisting of Multilayer Perceptrons (MLPs) is employed to obtain transformation in an encoder–decoder manner. Experimental validation on the KITTI dataset demonstrates that PCRMLP achieves satisfactory coarse transformation estimates from instance descriptors within a remarkable time of 0.0028 s. With the incorporation of an ICP refinement module, our proposed method outperforms prior learning-based approaches, yielding a rotation error of 2.01° and a translation error of 1.58 m. The experimental results highlight PCRMLP’s potential for coarse registration of urban scene point clouds, thereby paving the way for its application in instance-level semantic mapping and localization.

Style APA, Harvard, Vancouver, ISO itp.

42

Wang, Qi, ZhiHao Luo, JinCai Huang, YangHe Feng i Zhong Liu. "A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM". Computational Intelligence and Neuroscience 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/1827016.

Pełny tekst źródła

Streszczenie:

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the samples near the decision boundary which contain more discriminative information should be valued and the skew of the boundary would be corrected by constructing synthetic samples. Inspired by the truth and sense of geometry, we designed a new synthetic minority oversampling technique to incorporate the borderline information. What is more, ensemble model always tends to capture more complicated and robust decision boundary in practice. Taking these factors into considerations, a novel ensemble method, called Bagging of Extrapolation Borderline-SMOTE SVM (BEBS), has been proposed in dealing with imbalanced data learning (IDL) problems. Experiments on open access datasets showed significant superior performance using our model and a persuasive and intuitive explanation behind the method was illustrated. As far as we know, this is the first model combining ensemble of SVMs with borderline information for solving such condition.

Style APA, Harvard, Vancouver, ISO itp.

43

Zhang, La, Haiyun Guo, Kuan Zhu, Honglin Qiao, Gaopan Huang, Sen Zhang, Huichen Zhang, Jian Sun i Jinqiao Wang. "Hybrid Modality Metric Learning for Visible-Infrared Person Re-Identification". ACM Transactions on Multimedia Computing, Communications, and Applications 18, nr 1s (28.02.2022): 1–15. http://dx.doi.org/10.1145/3473341.

Pełny tekst źródła

Streszczenie:

Visible-infrared person re-identification (Re-ID) has received increasing research attention for its great practical value in night-time surveillance scenarios. Due to the large variations in person pose, viewpoint, and occlusion in the same modality, as well as the domain gap brought by heterogeneous modality, this hybrid modality person matching task is quite challenging. Different from the metric learning methods for visible person re-ID, which only pose similarity constraints on class level, an efficient metric learning approach for visible-infrared person Re-ID should take both the class-level and modality-level similarity constraints into full consideration to learn sufficiently discriminative and robust features. In this article, the hybrid modality is divided into two types, within modality and cross modality. We first fully explore the variations that hinder the ranking results of visible-infrared person re-ID and roughly summarize them into three types: within-modality variation, cross-modality modality-related variation, and cross-modality modality-unrelated variation. Then, we propose a comprehensive metric learning framework based on four kinds of paired-based similarity constraints to address all the variations within and cross modality. This framework focuses on both class-level and modality-level similarity relationships between person images. Furthermore, we demonstrate the compatibility of our framework with any paired-based loss functions by giving detailed implementation of combing it with triplet loss and contrastive loss separately. Finally, extensive experiments of our approach on SYSU-MM01 and RegDB demonstrate the effectiveness and superiority of our proposed metric learning framework for visible-infrared person Re-ID.

Style APA, Harvard, Vancouver, ISO itp.

44

Ullah, Asad, Jing Wang, M. Shahid Anwar, Usman Ahmad, Uzair Saeed i Zesong Fei. "Facial Expression Recognition of Nonlinear Facial Variations Using Deep Locality De-Expression Residue Learning in the Wild". Electronics 8, nr 12 (6.12.2019): 1487. http://dx.doi.org/10.3390/electronics8121487.

Pełny tekst źródła

Streszczenie:

Automatic facial expression recognition is an emerging field. Moreover, the interest has been increased with the transition from laboratory-controlled conditions to in the wild scenarios. Most of the research has been done over nonoccluded faces under the constrained environment, while automatic facial expression is less understood/implemented for partial occlusion in the real world conditions. Apart from that, our research aims to tackle the issues of overfitting (caused by the shortage of adequate training data) and to alleviate the expression-unrelated/intraclass/nonlinear facial variations, such as head pose estimation, eye gaze estimation, intensity and microexpressions. In our research, we control the magnitude of each Action Unit (AU) and combine several of the Action Unit combinations to leverage learning from the generative and discriminative representations for automatic FER. We have also addressed the problem of diversification of expressions from lab controlled to real-world scenarios from our cross-database study and proposed a model for enhancement of the discriminative power of deep features while increasing the interclass scatters, by preserving the locality closeness. Furthermore, facial expression consists of an expressive component as well as neutral component, so we proposed a generative model which is capable of generating neutral expression from an input image using cGAN. The expressive component is filtered and passed to the intermediate layers and the process is called De-expression Residue Learning. The residue in the intermediate/middle layers is very important for learning through expressive components. Finally, we validate the effectiveness of our method (DLP-DeRL) through qualitative and quantitative experimental results using four databases. Our method is more accurate and robust, and outperforms all the existing methods (hand crafted features and deep learning) while dealing the images in the wild.

Style APA, Harvard, Vancouver, ISO itp.

45

Sahloul, Hamdi, Shouhei Shirafuji i Jun Ota. "3D Affine: An Embedding of Local Image Features for Viewpoint Invariance Using RGB-D Sensor Data". Sensors 19, nr 2 (12.01.2019): 291. http://dx.doi.org/10.3390/s19020291.

Pełny tekst źródła

Streszczenie:

Local image features are invariant to in-plane rotations and robust to minor viewpoint changes. However, the current detectors and descriptors for local image features fail to accommodate out-of-plane rotations larger than 25°–30°. Invariance to such viewpoint changes is essential for numerous applications, including wide baseline matching, 6D pose estimation, and object reconstruction. In this study, we present a general embedding that wraps a detector/descriptor pair in order to increase viewpoint invariance by exploiting input depth maps. The proposed embedding locates smooth surfaces within the input RGB-D images and projects them into a viewpoint invariant representation, enabling the detection and description of more viewpoint invariant features. Our embedding can be utilized with different combinations of descriptor/detector pairs, according to the desired application. Using synthetic and real-world objects, we evaluated the viewpoint invariance of various detectors and descriptors, for both standalone and embedded approaches. While standalone local image features fail to accommodate average viewpoint changes beyond 33.3°, our proposed embedding boosted the viewpoint invariance to different levels, depending on the scene geometry. Objects with distinct surface discontinuities were on average invariant up to 52.8°, and the overall average for all evaluated datasets was 45.4°. Similarly, out of a total of 140 combinations involving 20 local image features and various objects with distinct surface discontinuities, only a single standalone local image feature exceeded the goal of 60° viewpoint difference in just two combinations, as compared with 19 different local image features succeeding in 73 combinations when wrapped in the proposed embedding. Furthermore, the proposed approach operates robustly in the presence of input depth noise, even that of low-cost commodity depth sensors, and well beyond.

Style APA, Harvard, Vancouver, ISO itp.

46

Bloesch, Michael, Michael Burri, Sammy Omari, Marco Hutter i Roland Siegwart. "Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback". International Journal of Robotics Research 36, nr 10 (wrzesień 2017): 1053–72. http://dx.doi.org/10.1177/0278364917728574.

Pełny tekst źródła

Streszczenie:

This paper presents a visual-inertial odometry framework that tightly fuses inertial measurements with visual data from one or more cameras, by means of an iterated extended Kalman filter. By employing image patches as landmark descriptors, a photometric error is derived, which is directly integrated as an innovation term in the filter update step. Consequently, the data association is an inherent part of the estimation process and no additional feature extraction or matching processes are required. Furthermore, it enables the tracking of noncorner-shaped features, such as lines, and thereby increases the set of possible landmarks. The filter state is formulated in a fully robocentric fashion, which reduces errors related to nonlinearities. This also includes partitioning of a landmark’s location estimate into a bearing vector and distance and thereby allows an undelayed initialization of landmarks. Overall, this results in a compact approach, which exhibits a high level of robustness with respect to low scene texture and motion blur. Furthermore, there is no time-consuming initialization procedure and pose estimates are available starting at the second image frame. We test the filter on different real datasets and compare it with other state-of-the-art visual-inertial frameworks. Experimental results show that robust localization with high accuracy can be achieved with this filter-based framework.

Style APA, Harvard, Vancouver, ISO itp.

47

Gómez, Juan, Olivier Aycard i Junaid Baber. "Efficient Detection and Tracking of Human Using 3D LiDAR Sensor". Sensors 23, nr 10 (12.05.2023): 4720. http://dx.doi.org/10.3390/s23104720.

Pełny tekst źródła

Streszczenie:

Light Detection and Ranging (LiDAR) technology is now becoming the main tool in many applications such as autonomous driving and human–robot collaboration. Point-cloud-based 3D object detection is becoming popular and widely accepted in the industry and everyday life due to its effectiveness for cameras in challenging environments. In this paper, we present a modular approach to detect, track and classify persons using a 3D LiDAR sensor. It combines multiple principles: a robust implementation for object segmentation, a classifier with local geometric descriptors, and a tracking solution. Moreover, we achieve a real-time solution in a low-performance machine by reducing the number of points to be processed by obtaining and predicting regions of interest via movement detection and motion prediction without any previous knowledge of the environment. Furthermore, our prototype is able to successfully detect and track persons consistently even in challenging cases due to limitations on the sensor field of view or extreme pose changes such as crouching, jumping, and stretching. Lastly, the proposed solution is tested and evaluated in multiple real 3D LiDAR sensor recordings taken in an indoor environment. The results show great potential, with particularly high confidence in positive classifications of the human body as compared to state-of-the-art approaches.

Style APA, Harvard, Vancouver, ISO itp.

48

Le, Tuan-Tang, i Chyi-Yeu Lin. "Bin-Picking for Planar Objects Based on a Deep Learning Network: A Case Study of USB Packs". Sensors 19, nr 16 (19.08.2019): 3602. http://dx.doi.org/10.3390/s19163602.

Pełny tekst źródła

Streszczenie:

Random bin-picking is a prominent, useful, and challenging industrial robotics application. However, many industrial and real-world objects are planar and have oriented surface points that are not sufficiently compact and discriminative for those methods using geometry information, especially depth discontinuities. This study solves the above-mentioned problems by proposing a novel and robust solution for random bin-picking for planar objects in a cluttered environment. Different from other research that has mainly focused on 3D information, this study first applies an instance segmentation-based deep learning approach using 2D image data for classifying and localizing the target object while generating a mask for each instance. The presented approach, moreover, serves as a pioneering method to extract 3D point cloud data based on 2D pixel values for building the appropriate coordinate system on the planar object plane. The experimental results showed that the proposed method reached an accuracy rate of 100% for classifying two-sided objects in the unseen dataset, and 3D appropriate pose prediction was highly effective, with average translation and rotation errors less than 0.23 cm and 2.26°, respectively. Finally, the system success rate for picking up objects was over 99% at an average processing time of 0.9 s per step, fast enough for continuous robotic operation without interruption. This showed a promising higher successful pickup rate compared to previous approaches to random bin-picking problems. Successful implementation of the proposed approach for USB packs provides a solid basis for other planar objects in a cluttered environment. With remarkable precision and efficiency, this study shows significant commercialization potential.

Style APA, Harvard, Vancouver, ISO itp.

49

Thao, Nguyen Duc, Nguyen Viet Anh, Le Thanh Ha i Ngo Thi Duyen. "Robustify Hand Tracking by Fusing Generative and Discriminative Methods". VNU Journal of Science: Computer Science and Communication Engineering 37, nr 1 (17.02.2021). http://dx.doi.org/10.25073/2588-1086/vnucsce.261.

Pełny tekst źródła

Streszczenie:

With the development of virtual reality (VR) technology and its applications in many fields, creating simulated hands in the virtual environment is an e ective way to replace the controller as well as to enhance user experience in interactive processes. Therefore, hand tracking problem is gaining a lot of research attention, making an important contribution in recognizing hand postures as well as tracking hand motions for VR’s input or human machine interaction applications. In order to create a markerless real-time hand tracking system suitable for natural human machine interaction, we propose a new method that combines generative and discriminative methods to solve the hand tracking problem using a single RGBD camera. Our system removes the requirement of the user having to wear to color wrist band and robustifies the hand localization even in di cult tracking scenarios. KeywordsHand tracking, generative method, discriminative method, human performance capture References[1] Malik, A. Elhayek, F. Nunnari, K. Varanasi, Tamaddon, A. Heloir, D. Stricker, Deephps: End-to-end estimation of 3d hand pose and shape by learning from synthetic depth, CoRR abs/1808.09208, 2018. URL http://arxiv.org/abs/1808.09208. [2] Glauser, S. Wu, D. Panozzo, O. Hilliges, Sorkine-Hornung, Interactive hand pose estimation using a stretch-sensing soft glove, ACM Trans, Graph. 38(4) (2019) 1-15.[3] Jiang, H. Xia, C. Guo, A model-based system for real-time articulated hand tracking using a simple data glove and a depth camera, Sensors 19 (2019) 4680. https://doi.org/10.3390/s19214680.[4] Cao, G. Hidalgo, T. Simon, S. Wei, Y. Sheikh, Openpose: Realtime multi-person 2d pose estimation using part a nity fields, CoRR abs/1812.08008, 2018.[5] Tagliasacchi, M. Schroder, A. Tkach, S. Bouaziz, M. Botsch, M. Pauly, Robust articulated-icp for real-time hand tracking, Computer Graphics Forum 34, 2015.[6] Qian, X. Sun, Y. Wei, X. Tang, J. Sun, Realtime and robust hand tracking from depth, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.[7] Tomasi, Petrov, Sastry, 3d tracking = classification + interpolation, in: Proceedings Ninth IEEE International Conference on Computer Vision 2 (2003) 1441-1448.[8] Sharp, C. Keskin, D. Robertson, J. Taylor, J. Shotton, D. Kim, C. Rhemann, I. Leichter, A. Vinnikov, Y. Wei, et al., Accurate, robust, and flexible real-time hand tracking, in: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015, pp. 3633-3642.[9] Sridhar, F. Mueller, A. Oulasvirta, C. Theobalt, Fast and robust hand tracking using detection-guided optimization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.[10] Oikonomidis, N. Kyriazis, A.A. Argyros, Tracking the articulated motion of two strongly interacting hands, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1862-1869.[11] Melax, L. Keselman, S. Orsten, Dynamics based 3d skeletal hand tracking, CoRR abs/1705.07640, 2017.[12] Wang, S. Paris, J. Popovic, 6d hands: Markerless hand tracking for computer aided design, 2011, pp. 549-558. https://doi.org/10.1145/2047196.2047269.[13]Tang, T. Yu, T. Kim, Real-time articulated hand pose estimation using semi-supervised transductive regression forests, in: 2013 IEEE International Conference on Computer Vision, 2013, pp. 3224-3231.[14] Oberweger, P. Wohlhart, V. Lepetit, Generalized feedback loop for joint hand-object pose estimation, 2019, CoRR abs/1903.10883. URL http://arxiv.org/abs/1903.10883.[15] Malik, A. Elhayek, F. Nunnari, K. Varanasi, K. Tamaddon, A. Heloir,´ D. Stricker, Deephps: End-to-end estimation of 3d hand pose and shape by learning from synthetic depth, 2018, pp. 110-119. https://doi.org/10.1109/3DV.2018.00023.[16] A. Mohammed, J.L.M. Islam, A deep learning-based end-to-end composite system for hand detection and gesture recognition, Sensors 19 (2019) 5282. https://doi.org/10.3390/s19235282.

Style APA, Harvard, Vancouver, ISO itp.

50

Vinoharan, Veerapathirapillai, i Amirthalingam Ramanan. "An Efficient BoF Representation for Object Classification". ELCVIA Electronic Letters on Computer Vision and Image Analysis 20, nr 2 (16.12.2021). http://dx.doi.org/10.5565/rev/elcvia.1403.

Pełny tekst źródła

Streszczenie:

The Bag-of-features (BoF) approach has proved to yield better performance in a patch-based object classification system owing to its simplicity. However, often the very large number of patch-based descriptors (such as scale-invariant feature transform and speeded up robust features, extracted from images to create a BoF vector) leads to huge computational cost and an increased storage requirement. This paper demonstrates a two-staged approach to creating a discriminative and compact BoF representation for object classification. As a preprocessing stage to the codebook construction, ambiguous patch-based descriptors are eliminated using an entropy-based and one-pass feature selection approach, to retain high-quality descriptors. As a post-processing stage to the codebook construction, a subset of codewords which is not activated enough in images are eliminated from the initially constructed codebook based on statistical measures. Finally, each patch-based descriptor of an image is assigned to the closest codeword to create a histogram representation. One-versus-all support vector machine is applied to classify the histogram representation. The proposed methods are evaluated on benchmark image datasets. Testing results show that the proposed methods enables the codebook to be more discriminative and compact in moderate sized visual object classification tasks.

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!