Acceder

Bibliografías temáticas / Cross-modal document classification

Literatura académica sobre el tema "Cross-modal document classification"

Autor: Grafiati

Publicado: 7 de septiembre de 2023

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Índice

Artículos de revistas
Tesis
Actas de conferencias

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Cross-modal document classification".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Cross-modal document classification"

1

Zeng, Dehong, Xiaosong Chen, Zhengxin Song, Yun Xue y Qianhua Cai. "Multimodal Interaction and Fused Graph Convolution Network for Sentiment Classification of Online Reviews". Mathematics 11, n.º 10 (17 de mayo de 2023): 2335. http://dx.doi.org/10.3390/math11102335.

Texto completo

Resumen

An increasing number of people tend to convey their opinions in different modalities. For the purpose of opinion mining, sentiment classification based on multimodal data becomes a major focus. In this work, we propose a novel Multimodal Interactive and Fusion Graph Convolutional Network to deal with both texts and images on the task of document-level multimodal sentiment analysis. The image caption is introduced as an auxiliary, which is aligned with the image to enhance the semantics delivery. Then, a graph is constructed with the sentences and images generated as nodes. In line with the graph learning, the long-distance dependencies can be captured while the visual noise can be filtered. Specifically, a cross-modal graph convolutional network is built for multimodal information fusion. Extensive experiments are conducted on a multimodal dataset from Yelp. Experimental results reveal that our model obtains a satisfying working performance in DLMSA tasks.

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Bakkali, Souhail, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol y Oriol Ramos Terrades. "VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification". Pattern Recognition, febrero de 2023, 109419. http://dx.doi.org/10.1016/j.patcog.2023.109419.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Cross-modal document classification"

1

Bakkali, Souhail. "Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning". Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS046.

Texto completo

Resumen

Les modèles développés dans cette thèse sont le résultat d'un processus itératif d'analyse et de synthèse entre les théories existantes et nos études réalisées. Plus spécifiquement, nous souhaitons étudier l'apprentissage inter-modal pour la compréhension contextualisée sur les composants des documents à travers le langage et la vision. Cette thèse porte sur l'avancement de la recherche sur l'apprentissage inter-modal et apporte des contributions sur quatre fronts : (i) proposer une approche inter-modale avec des réseaux profonds pour exploiter conjointement les informations visuelles et textuelles dans un espace de représentation sémantique commun afin d'effectuer et de créer automatiquement des prédictions sur les documents multimodaux; (ii) à étudier des stratégies concurrentielles pour s'attaquer aux tâches de classification de documents intermodaux, de récupération basée sur le contenu et de classification few-shot de documents ; (iii) pour résoudre les problèmes liés aux données comme l'apprentissage lorsque les données ne sont pas annotées, en proposant un réseau qui apprend des représentations génériques à partir d'une collection de documents non étiquetés ; et (iv) à exploiter les paramètres d'apprentissage few-shot lorsque les données ne contiennent que peu d’exemples
The frameworks developed in this thesis were the outcome of an iterative process of analysis and synthesis between existing theories and our performed studies. More specifically, we wish to study cross-modality learning for contextualized comprehension on document components across language and vision. The main idea is to leverage multimodal information from document images into a common semantic space. This thesis focuses on advancing the research on cross-modality learning and makes contributions on four fronts: (i) to proposing a cross-modal approach with deep networks to jointly leverage visual and textual information into a common semantic representation space to automatically perform and make predictions about multimodal documents (i.e., the subject matter they are about); (ii) to investigating competitive strategies to address the tasks of cross-modal document classification, content-based retrieval and few-shot document classification; (iii) to addressing data-related issues like learning when data is not annotated, by proposing a network that learns generic representations from a collection of unlabeled documents; and (iv) to exploiting few-shot learning settings when data contains only few examples

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Tran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations". Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096/document.

Texto completo

Resumen

La présente thèse étudie la modélisation conjointe des contenus visuels et textuels extraits à partir des documents multimédias pour résoudre les problèmes intermodaux. Ces tâches exigent la capacité de ``traduire'' l'information d'une modalité vers une autre. Un espace de représentation commun, par exemple obtenu par l'Analyse Canonique des Corrélation ou son extension kernelisée est une solution généralement adoptée. Sur cet espace, images et texte peuvent être représentés par des vecteurs de même type sur lesquels la comparaison intermodale peut se faire directement.Néanmoins, un tel espace commun souffre de plusieurs déficiences qui peuvent diminuer la performance des ces tâches. Le premier défaut concerne des informations qui sont mal représentées sur cet espace pourtant très importantes dans le contexte de la recherche intermodale. Le deuxième défaut porte sur la séparation entre les modalités sur l'espace commun, ce qui conduit à une limite de qualité de traduction entre modalités. Pour faire face au premier défaut concernant les données mal représentées, nous avons proposé un modèle qui identifie tout d'abord ces informations et puis les combine avec des données relativement bien représentées sur l'espace commun. Les évaluations sur la tâche d'illustration de texte montrent que la prise en compte de ces information fortement améliore les résultats de la recherche intermodale. La contribution majeure de la thèse se concentre sur la séparation entre les modalités sur l'espace commun pour améliorer la performance des tâches intermodales. Nous proposons deux méthodes de représentation pour les documents bi-modaux ou uni-modaux qui regroupent à la fois des informations visuelles et textuelles projetées sur l'espace commun. Pour les documents uni-modaux, nous suggérons un processus de complétion basé sur un ensemble de données auxiliaires pour trouver les informations correspondantes dans la modalité absente. Ces informations complémentaires sont ensuite utilisées pour construire une représentation bi-modale finale pour un document uni-modal. Nos approches permettent d'obtenir des résultats de l'état de l'art pour la recherche intermodale ou la classification bi-modale et intermodale
This thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification

Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Cross-modal document classification"

1

Bakkali, Souhail, Zuheng Ming, Mickael Coustaty y Marcal Rusinol. "Cross-Modal Deep Networks For Document Image Classification". En 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020. http://dx.doi.org/10.1109/icip40778.2020.9191268.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!