Bibliografie tematiche / Image Captioning (IC)

Letteratura scientifica selezionata sul tema "Image Captioning (IC)"

Autore: Grafiati

Pubblicato: 7 luglio 2024

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Indice

Articoli di riviste
Tesi
Atti di convegni

Consulta la lista di attuali articoli, libri, tesi, atti di convegni e altre fonti scientifiche attinenti al tema "Image Captioning (IC)".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Articoli di riviste sul tema "Image Captioning (IC)"

Li, Jingyu, Zhendong Mao, Hao Li, Weidong Chen e Yongdong Zhang. "Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning". ACM Transactions on Multimedia Computing, Communications, and Applications, 25 dicembre 2023. http://dx.doi.org/10.1145/3638558.

Testo completo

Abstract (sommario):

Image captioning (IC), bringing vision to language, has drawn extensive attention. A crucial aspect of IC is the accurate depiction of visual relations among image objects. Visual relations encompass two primary facets: content relations and structural relations. Content relations, which comprise geometric positions content ( i.e. , distances and sizes) and semantic interactions content ( i.e. , actions and possessives), unveil the mutual correlations between objects. In contrast, structural relations pertain to the topological connectivity of object regions. Existing Transformer-based methods typically resort to geometric positions to enhance the visual relations, yet only using the shallow geometric content is unable to precisely cover actional content correlations and structural connection relations. In this paper, we adopt a comprehensive perspective to examine the correlations between objects, incorporating both content relations ( i.e. , geometric and semantic relations) and structural relations, with the aim of generating plausible captions. To achieve this, firstly, we construct a geometric graph from bounding box features and a semantic graph from the scene graph parser to model the content relations. Innovatively, we construct a topology graph that amalgamates the sparsity characteristics of the geometric and semantic graphs, enabling the representation of image structural relations. Secondly, we propose a novel unified approach to enrich image relation representations by integrating semantic, geometric, and structural relations into self-attention. Finally, in the language decoding stage, we further leverage the semantic relation as prior knowledge to generate accurate words. Extensive experiments on MS-COCO dataset demonstrate the effectiveness of our model, with improvements of CIDEr from 128.6% to 136.6%. Codes have been released at https://github.com/CrossmodalGroup/ER-SAN/tree/main/VG-Cap.