Littérature scientifique sur le sujet « 2D Encoding representation »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « 2D Encoding representation ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Articles de revues sur le sujet "2D Encoding representation"

1

He, Qingdong, Hao Zeng, Yi Zeng et Yijun Liu. « SCIR-Net : Structured Color Image Representation Based 3D Object Detection Network from Point Clouds ». Proceedings of the AAAI Conference on Artificial Intelligence 36, no 4 (28 juin 2022) : 4486–94. http://dx.doi.org/10.1609/aaai.v36i4.20371.

Texte intégral
Résumé :
3D object detection from point clouds data has become an indispensable part in autonomous driving. Previous works for processing point clouds lie in either projection or voxelization. However, projection-based methods suffer from information loss while voxelization-based methods bring huge computation. In this paper, we propose to encode point clouds into structured color image representation (SCIR) and utilize 2D CNN to fulfill the 3D detection task. Specifically, we use the structured color image encoding module to convert the irregular 3D point clouds into a squared 2D tensor image, where each point corresponds to a spatial point in the 3D space. Furthermore, in order to fit for the Euclidean structure, we apply feature normalization to parameterize the 2D tensor image onto a regular dense color image. Then, we conduct repeated multi-scale fusion with different levels so as to augment the initial features and learn scale-aware feature representations for box prediction. Extensive experiments on KITTI benchmark, Waymo Open Dataset and more challenging nuScenes dataset show that our proposed method yields decent results and demonstrate the effectiveness of such representations for point clouds.
Styles APA, Harvard, Vancouver, ISO, etc.
2

Wu, Banghe, Chengzhong Xu et Hui Kong. « LiDAR Road-Atlas : An Efficient Map Representation for General 3D Urban Environment ». Field Robotics 3, no 1 (10 janvier 2023) : 435–59. http://dx.doi.org/10.55417/fr.2023014.

Texte intégral
Résumé :
In this work, we propose the LiDAR Road-Atlas, a compact and efficient 3D map representation, for autonomous robot or vehicle navigation in a general urban environment. The LiDAR Road-Atlas can be generated by an online mapping framework which incrementally merges local 2D occupancy grid maps (2D-OGMs). Specifically, the contributions of our method are threefold. First, we solve the challenging problem of creating local 2D-OGMs in nonstructured urban scenes based on a real-time delimitation of traversable and curb regions in a LiDAR point cloud. Second, we achieve accurate 3D mapping in multiple-layer urban road scenarios by a probabilistic fusion scheme. Third, we achieve a very efficient 3D map representation of a general environment thanks to the automatic local-OGM-induced traversable-region labeling and a sparse probabilistic local point-cloud encoding. Given the LiDAR Road-Atlas, one can achieve accurate vehicle localization, path planning, and some other tasks. Our map representation is insensitive to dynamic objects which can be filtered out in the resulting map based on a probabilistic fusion. Empirically, we compare our map representation with a couple of popular map representations in robotics society, and our map representation is more favorable in terms of efficiency, scalability, and compactness. Additionally, we also evaluate localization performance given the LiDAR Road-Atlas representations on two public datasets. With a 16-channel LiDAR sensor, our method achieves an average global localization error of 0.26 m (translation) and 1.07 (rotation) on the Apollo dataset, and 0.89 m (translation) and 1.29 (rotation) on the MulRan dataset, respectively, at 10 Hz, which validates its promising performance. The code for this work is open-sourced at https://github.com/IMRL/Lidar-road-atlas.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Yuan, Hangjie, et Dong Ni. « Learning Visual Context for Group Activity Recognition ». Proceedings of the AAAI Conference on Artificial Intelligence 35, no 4 (18 mai 2021) : 3261–69. http://dx.doi.org/10.1609/aaai.v35i4.16437.

Texte intégral
Résumé :
Group activity recognition aims to recognize an overall activity in a multi-person scene. Previous methods strive to reason on individual features. However, they under-explore the person-specific contextual information, which is significant and informative in computer vision tasks. In this paper, we propose a new reasoning paradigm to incorporate global contextual information. Specifically, we propose two modules to bridge the gap between group activity and visual context. The first is Transformer based Context Encoding (TCE) module, which enhances individual representation by encoding global contextual information to individual features and refining the aggregated information. The second is Spatial-Temporal Bilinear Pooling (STBiP) module. It firstly further explores pairwise relationships for the context encoded individual representation, then generates semantic representations via gated message passing on a constructed spatial-temporal graph. On their basis, we further design a two-branch model that integrates the designed modules into a pipeline. Systematic experiments demonstrate each module's effectiveness on either branch. Visualizations indicate that visual contextual cues can be aggregated globally by TCE. Moreover, our method achieves state-of-the-art results on two widely used benchmarks using only RGB images as input and 2D backbones.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Yang, Xiaobao, Shuai He, Junsheng Wu, Yang Yang, Zhiqiang Hou et Sugang Ma. « Exploring Spatial-Based Position Encoding for Image Captioning ». Mathematics 11, no 21 (4 novembre 2023) : 4550. http://dx.doi.org/10.3390/math11214550.

Texte intégral
Résumé :
Image captioning has become a hot topic in artificial intelligence research and sits at the intersection of computer vision and natural language processing. Most recent imaging captioning models have adopted an “encoder + decoder” architecture, in which the encoder is employed generally to extract the visual feature, while the decoder generates the descriptive sentence word by word. However, the visual features need to be flattened into sequence form before being forwarded to the decoder, and this results in the loss of the 2D spatial position information of the image. This limitation is particularly pronounced in the Transformer architecture since it is inherently not position-aware. Therefore, in this paper, we propose a simple coordinate-based spatial position encoding method (CSPE) to remedy this deficiency. CSPE firstly creates the 2D position coordinates for each feature pixel, and then encodes them by row and by column separately via trainable or hard encoding, effectively strengthening the position representation of visual features and enriching the generated description sentences. In addition, in order to reduce the time cost, we also explore a diagonal-based spatial position encoding (DSPE) approach. Compared with CSPE, DSPE is slightly inferior in performance but has a faster calculation speed. Extensive experiments on the MS COCO 2014 dataset demonstrate that CSPE and DSPE can significantly enhance the spatial position representation of visual features. CSPE, in particular, demonstrates BLEU-4 and CIDEr metrics improved by 1.6% and 5.7%, respectively, compared with a baseline model without sequence-based position encoding, and also outperforms current sequence-based position encoding approaches by a significant margin. In addition, the robustness and plug-and-play ability of the proposed method are validated based on a medical captioning generation model.
Styles APA, Harvard, Vancouver, ISO, etc.
5

Rebollo-Neira, Laura, et Aurelien Inacio. « Enhancing sparse representation of color images by cross channel transformation ». PLOS ONE 18, no 1 (26 janvier 2023) : e0279917. http://dx.doi.org/10.1371/journal.pone.0279917.

Texte intégral
Résumé :
Transformations for enhancing sparsity in the approximation of color images by 2D atomic decomposition are discussed. The sparsity is firstly considered with respect to the most significant coefficients in the wavelet decomposition of the color image. The discrete cosine transform is singled out as an effective 3 point transformation for this purpose. The enhanced feature is further exploited by approximating the transformed arrays using an effective greedy strategy with a separable highly redundant dictionary. The relevance of the achieved sparsity is illustrated by a simple encoding procedure. On typical test images the compression at high quality recovery is shown to significantly improve upon JPEG and WebP formats.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Tripura Sundari, Yeluripati Bala, et K. Usha Mahalakshmi. « Enhancing Brain Tumor Diagnosis : A 3D Auto-Encoding Approach for Accurate Classification ». International Journal of Scientific Methods in Engineering and Management 01, no 09 (2023) : 38–46. http://dx.doi.org/10.58599/ijsmem.2023.1905.

Texte intégral
Résumé :
The brain’s capacity to control and coordinate the body’s other organs makes it an integral part of the nervous system. Brain tumours, which form when abnormal cells in the brain grow uncontrolled, may be deadly if not diagnosed and treated promptly. The use of image processing technology is essential in the quest to identify malignancies in medical imaging. Because of this, the final depiction will be more complete. Only a tiny percentage of 3D form instances will be viable for feature learning because of its complex spatial structure. These issues have inspired the development of some potential solutions, such as automatic encoders for learning properties from 2D images and the translation of 3D shapes into 2D space. With the help of camera images and state-space structures, the suggested 3D-based Spatial Auto Encoder method may automatically learn a representation of the state. Autoencoders can be taught to use the prototypes they generate to rebuild a picture, with the resulting learned coefficients being put to use in 3D form matching and retrieval. It’s possible that the learned coefficients may serve this function. The auto-encoder’s impressive results in image retrieval have been attributed, at least in part, to the ease with which it can learn new features from existing ones.
Styles APA, Harvard, Vancouver, ISO, etc.
7

Rybińska-Fryca, Anna, Anita Sosnowska et Tomasz Puzyn. « Representation of the Structure—A Key Point of Building QSAR/QSPR Models for Ionic Liquids ». Materials 13, no 11 (30 mai 2020) : 2500. http://dx.doi.org/10.3390/ma13112500.

Texte intégral
Résumé :
The process of encoding the structure of chemicals by molecular descriptors is a crucial step in quantitative structure-activity/property relationships (QSAR/QSPR) modeling. Since ionic liquids (ILs) are disconnected structures, various ways of representing their structure are used in the QSAR studies: the models can be based on descriptors either derived for particular ions or for the whole ionic pair. We have examined the influence of the type of IL representation (separate ions vs. ionic pairs) on the model’s quality, the process of the automated descriptors selection and reliability of the applicability domain (AD) assessment. The result of the benchmark study showed that a less precise description of ionic liquid, based on the 2D descriptors calculated for ionic pairs, is sufficient to develop a reliable QSAR/QSPR model with the highest accuracy in terms of calibration as well as validation. Moreover, the process of a descriptors’ selection is more effective when the possible number of variables can be decreased at the beginning of model development. Additionally, 2D descriptors usually demand less effort in mechanistic interpretation and are more convenient for virtual screening studies.
Styles APA, Harvard, Vancouver, ISO, etc.
8

Cohen, Lear, Ehud Vinepinsky, Opher Donchin et Ronen Segev. « Boundary vector cells in the goldfish central telencephalon encode spatial information ». PLOS Biology 21, no 4 (25 avril 2023) : e3001747. http://dx.doi.org/10.1371/journal.pbio.3001747.

Texte intégral
Résumé :
Navigation is one of the most fundamental cognitive skills for the survival of fish, the largest vertebrate class, and almost all other animal classes. Space encoding in single neurons is a critical component of the neural basis of navigation. To study this fundamental cognitive component in fish, we recorded the activity of neurons in the central area of the goldfish telencephalon while the fish were freely navigating in a quasi-2D water tank embedded in a 3D environment. We found spatially modulated neurons with firing patterns that gradually decreased with the distance of the fish from a boundary in each cell’s preferred direction, resembling the boundary vector cells found in the mammalian subiculum. Many of these cells exhibited beta rhythm oscillations. This type of spatial representation in fish brains is unique among space-encoding cells in vertebrates and provides insights into spatial cognition in this lineage.
Styles APA, Harvard, Vancouver, ISO, etc.
9

Ciprian, David, et Vasile Gui. « 2D Sensor Based Design of a Dynamic Hand Gesture Interpretation System ». Advanced Engineering Forum 8-9 (juin 2013) : 553–62. http://dx.doi.org/10.4028/www.scientific.net/aef.8-9.553.

Texte intégral
Résumé :
A complete 2D sensor based system for dynamic gesture interpretation is presented in this paper. A hand model is devised for this purpose, composed of the palm area and the fingertips. Multiple cues are integrated in a feature space. Segmentation is carried out in this space to output the hand model. The robust technique of mean shift mode estimation is used to estimate the parameters of the hand model, making it adaptive and robust. The model is validated in various experiments concerning difficult situations like occlusion, varying illumination, and camouflage. Real time requirements are also met. The gesture interpretation approach refers to dynamic hand gestures. A collection of fingertip locations is collected from the hand model. Tensor voting approach is used to smooth and reconstruct the trajectory. The final output is represented by an encoding sequence of local trajectory directions. These are obtained by mean shift mode detection on the trajectory representation on Radon space. This module was tested and proved highly accurate.
Styles APA, Harvard, Vancouver, ISO, etc.
10

Huang, Yuhao, Sanping Zhou, Junjie Zhang, Jinpeng Dong et Nanning Zheng. « Voxel or Pillar : Exploring Efficient Point Cloud Representation for 3D Object Detection ». Proceedings of the AAAI Conference on Artificial Intelligence 38, no 3 (24 mars 2024) : 2426–35. http://dx.doi.org/10.1609/aaai.v38i3.28018.

Texte intégral
Résumé :
Efficient representation of point clouds is fundamental for LiDAR-based 3D object detection. While recent grid-based detectors often encode point clouds into either voxels or pillars, the distinctions between these approaches remain underexplored. In this paper, we quantify the differences between the current encoding paradigms and highlight the limited vertical learning within. To tackle these limitations, we propose a hybrid detection framework named Voxel-Pillar Fusion (VPF), which synergistically combines the unique strengths of both voxels and pillars. To be concrete, we first develop a sparse voxel-pillar encoder that encodes point clouds into voxel and pillar features through 3D and 2D sparse convolutions respectively, and then introduce the Sparse Fusion Layer (SFL), facilitating bidirectional interaction between sparse voxel and pillar features. Our computationally efficient, fully sparse method can be seamlessly integrated into both dense and sparse detectors. Leveraging this powerful yet straightforward representation, VPF delivers competitive performance, achieving real-time inference speeds on the nuScenes and Waymo Open Dataset.
Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet "2D Encoding representation"

1

Abidi, Azza. « Investigating Deep Learning and Image-Encoded Time Series Approaches for Multi-Scale Remote Sensing Analysis in the context of Land Use/Land Cover Mapping ». Electronic Thesis or Diss., Université de Montpellier (2022-....), 2024. http://www.theses.fr/2024UMONS007.

Texte intégral
Résumé :
Cette thèse explore le potentiel de l'apprentissage automatique pour améliorer la cartographie de modèles complexes d'utilisation des sols et de la couverture terrestre à l'aide de données d'observation de la Terre. Traditionnellement, les méthodes de cartographie reposent sur la classification et l'interprétation manuelles des images satellites, qui sont sujettes à l'erreur humaine. Cependant, l'application de l'apprentissage automatique, en particulier par le biais des réseaux neuronaux, a automatisé et amélioré le processus de classification, ce qui a permis d'obtenir des résultats plus objectifs et plus précis. En outre, l'intégration de données de séries temporelles d'images satellitaires (STIS) ajoute une dimension temporelle aux informations spatiales, offrant une vue dynamique de la surface de la Terre au fil du temps. Ces informations temporelles sont essentielles pour une classification précise et une prise de décision éclairée dans diverses applications. Les informations d'utilisation des sols et de la couverture terrestre précises et actuelles dérivées des données STIS sont essentielles pour guider les initiatives de développement durable, la gestion des ressources et l'atténuation des risques environnementaux. Le processus de cartographie de d'utilisation des sols et de la couverture terrestre à l'aide du l'apprentissage automatique implique la collecte de données, le prétraitement, l'extraction de caractéristiques et la classification à l'aide de divers algorithmes l'apprentissage automatique . Deux stratégies principales de classification des données STIS ont été proposées : l'approche au niveau du pixel et l'approche basée sur l'objet. Bien que ces deux approches se soient révélées efficaces, elles posent également des problèmes, tels que l'incapacité à capturer les informations contextuelles dans les approches basées sur les pixels et la complexité de la segmentation dans les approches basées sur les objets.Pour relever ces défis, cette thèse vise à mettre en œuvre une métho basée sur des informations multi-échelles pour effectuer la classification de l'utilisation des terres et de la couverture terrestre, en couplant les informations spectrales et temporelles par le biais d'une méthodologie combinée pixel-objet et en appliquant une approche méthodologique pour représenter efficacement les données multi-variées SITS dans le but de réutiliser la grande quantité d'avancées de la recherche proposées dans le domaine de la vision par ordinateur
In this thesis, the potential of machine learning (ML) in enhancing the mapping of complex Land Use and Land Cover (LULC) patterns using Earth Observation data is explored. Traditionally, mapping methods relied on manual and time-consuming classification and interpretation of satellite images, which are susceptible to human error. However, the application of ML, particularly through neural networks, has automated and improved the classification process, resulting in more objective and accurate results. Additionally, the integration of Satellite Image Time Series(SITS) data adds a temporal dimension to spatial information, offering a dynamic view of the Earth's surface over time. This temporal information is crucial for accurate classification and informed decision-making in various applications. The precise and current LULC information derived from SITS data is essential for guiding sustainable development initiatives, resource management, and mitigating environmental risks. The LULC mapping process using ML involves data collection, preprocessing, feature extraction, and classification using various ML algorithms. Two main classification strategies for SITS data have been proposed: pixel-level and object-based approaches. While both approaches have shown effectiveness, they also pose challenges, such as the inability to capture contextual information in pixel-based approaches and the complexity of segmentation in object-based approaches.To address these challenges, this thesis aims to implement a method based on multi-scale information to perform LULC classification, coupling spectral and temporal information through a combined pixel-object methodology and applying a methodological approach to efficiently represent multivariate SITS data with the aim of reusing the large amount of research advances proposed in the field of computer vision
Styles APA, Harvard, Vancouver, ISO, etc.

Actes de conférences sur le sujet "2D Encoding representation"

1

Özkil, Ali Gürcan, et Thomas Howard. « Automatically Annotated Mapping for Indoor Mobile Robot Applications ». Dans ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2012. http://dx.doi.org/10.1115/detc2012-71351.

Texte intégral
Résumé :
This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use of 2D visual tags allows encoding information physically at places-of-interest. Moreover, using physical characteristics of the visual tags (i.e. paper size) is exploited to recover relative poses of the tags in the environment using a simple camera. This method extends tag encoding to simultaneous localization and mapping in topology space, and fuses camera and robot pose estimations to build an automatically annotated global topo-metric map. It is developed as a framework for a hospital service robot and tested in a real hospital. Experiments show that the method is capable of producing globally consistent, automatically annotated hybrid metric-topological maps that is needed by mobile service robots.
Styles APA, Harvard, Vancouver, ISO, etc.
2

Song, Meishu, Emilia Parada-Cabaleiro, Zijiang Yang, Xin Jing, Kazumasa Togami, Kun Qian*, Björn W. Schuller et Yoshiharu Yamamoto. « Parallelising 2D-CNNs and Transformers : A Cognitive-based approach for Automatic Recognition of Learners’ English Proficiency ». Dans Intelligent Human Systems Integration (IHSI 2022) Integrating People and Intelligent Systems. AHFE International, 2022. http://dx.doi.org/10.54941/ahfe1001000.

Texte intégral
Résumé :
Learning English as a foreign language requires an extensive use of cognitive capacity, memory, and motor skills in order to orally express one’s thoughts in a clear manner. Current speech recognition intelligence focuses on recognising learners’ oral proficiency from fluency, prosody, pronunciation, and grammar’s perspectives. However, the capacity of clearly and naturally expressing an idea is a high-level cognitive behaviour which can hardly be represented by these detailed and segmental dimensions, which indeed do not fulfil English learners and teachers’ requirements. This work aims to utilise the state-of-the-art deep learning techniques to recognise English speaking proficiency at a cognitive level, i. e., a learner’s ability to clearly organise their own thoughts when expressing an idea in English as a foreign language. For this, we collected the “Oral English for Japanese Learners” Dataset (OEJL-DB), a corpus of recordings by 82 students of a Japanese high school expressing their ideas in English towards 5 different topics. Annotations concerning the clarity of learners’ thoughts are given by 5 English teachers according to 2 classes: clear and unclear. In total, the dataset includes 7.6 hours of audio data with an average length for each oral English presentation of66 seconds. As initial cognitive-based method to identify learners’ speaking proficiency, we propose an architecture based on the parallelization of CNNs and Transformers. With the strengthening of the CNNs in spatial feature representation and the Transformer in sequence encoding, we achieve a 89.4% accuracy and 87.6%. Unweighted Average Recall (UAR), results which outperform those from the ResNet architectures (89.2 % accuracy and 86.3 % UAR). Our promising outcomes reveal that speech intelligence can be efficiently applied to “grasp” high level cognitive behaviours, a new area of research which seems to have a great potential for further investigation.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Li, Shaohua, Xiuchao Sui, Xiangde Luo, Xinxing Xu, Yong Liu et Rick Goh. « Medical Image Segmentation using Squeeze-and-Expansion Transformers ». Dans Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California : International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/112.

Texte intégral
Résumé :
Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i.e., to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused features still have small "effective receptive fields" with a focus on local image cues, limiting their performance. In this work, we propose Segtran, an alternative segmentation framework based on transformers, which have unlimited "effective receptive fields" even at high feature resolutions. The core of Segtran is a novel Squeeze-and-Expansion transformer: a squeezed attention block regularizes the self attention of transformers, and an expansion block learns diversified representations. Additionally, we propose a new positional encoding scheme for transformers, imposing a continuity inductive bias for images. Experiments were performed on 2D and 3D medical image segmentation tasks: optic disc/cup segmentation in fundus images (REFUGE'20 challenge), polyp segmentation in colonoscopy images, and brain tumor segmentation in MRI scans (BraTS'19 challenge). Compared with representative existing methods, Segtran consistently achieved the highest segmentation accuracy, and exhibited good cross-domain generalization capabilities.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie