Academic literature on the topic 'Cross-modal document classification'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Cross-modal document classification.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Cross-modal document classification"
Zeng, Dehong, Xiaosong Chen, Zhengxin Song, Yun Xue, and Qianhua Cai. "Multimodal Interaction and Fused Graph Convolution Network for Sentiment Classification of Online Reviews." Mathematics 11, no. 10 (May 17, 2023): 2335. http://dx.doi.org/10.3390/math11102335.
Full textBakkali, Souhail, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, and Oriol Ramos Terrades. "VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification." Pattern Recognition, February 2023, 109419. http://dx.doi.org/10.1016/j.patcog.2023.109419.
Full textDissertations / Theses on the topic "Cross-modal document classification"
Bakkali, Souhail. "Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS046.
Full textThe frameworks developed in this thesis were the outcome of an iterative process of analysis and synthesis between existing theories and our performed studies. More specifically, we wish to study cross-modality learning for contextualized comprehension on document components across language and vision. The main idea is to leverage multimodal information from document images into a common semantic space. This thesis focuses on advancing the research on cross-modality learning and makes contributions on four fronts: (i) to proposing a cross-modal approach with deep networks to jointly leverage visual and textual information into a common semantic representation space to automatically perform and make predictions about multimodal documents (i.e., the subject matter they are about); (ii) to investigating competitive strategies to address the tasks of cross-modal document classification, content-based retrieval and few-shot document classification; (iii) to addressing data-related issues like learning when data is not annotated, by proposing a network that learns generic representations from a collection of unlabeled documents; and (iv) to exploiting few-shot learning settings when data contains only few examples
Tran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096/document.
Full textThis thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification
Conference papers on the topic "Cross-modal document classification"
Bakkali, Souhail, Zuheng Ming, Mickael Coustaty, and Marcal Rusinol. "Cross-Modal Deep Networks For Document Image Classification." In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020. http://dx.doi.org/10.1109/icip40778.2020.9191268.
Full text