Journal articles on the topic 'Scene Graph Generation'

To see the other types of publications on this topic, follow the link: Scene Graph Generation.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Scene Graph Generation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Khademi, Mahmoud, and Oliver Schulte. "Deep Generative Probabilistic Graph Neural Networks for Scene Graph Generation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11237–45. http://dx.doi.org/10.1609/aaai.v34i07.6783.

Full text
Abstract:
We propose a new algorithm, called Deep Generative Probabilistic Graph Neural Networks (DG-PGNN), to generate a scene graph for an image. The input to DG-PGNN is an image, together with a set of region-grounded captions and object bounding-box proposals for the image. To generate the scene graph, DG-PGNN constructs and updates a new model, called a Probabilistic Graph Network (PGN). A PGN can be thought of as a scene graph with uncertainty: it represents each node and each edge by a CNN feature vector and defines a probability mass function (PMF) for node-type (object category) of each node and edge-type (predicate class) of each edge. The DG-PGNN sequentially adds a new node to the current PGN by learning the optimal ordering in a Deep Q-learning framework, where states are partial PGNs, actions choose a new node, and rewards are defined based on the ground-truth. After adding a node, DG-PGNN uses message passing to update the feature vectors of the current PGN by leveraging contextual relationship information, object co-occurrences, and language priors from captions. The updated features are then used to fine-tune the PMFs. Our experiments show that the proposed algorithm significantly outperforms the state-of-the-art results on the Visual Genome dataset for scene graph generation. We also show that the scene graphs constructed by DG-PGNN improve performance on the visual question answering task, for questions that need reasoning about objects and their interactions in the scene context.
APA, Harvard, Vancouver, ISO, and other styles
2

Hua, Tianyu, Hongdong Zheng, Yalong Bai, Wei Zhang, Xiao-Ping Zhang, and Tao Mei. "Exploiting Relationship for Complex-scene Image Generation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1584–92. http://dx.doi.org/10.1609/aaai.v35i2.16250.

Full text
Abstract:
The significant progress on Generative Adversarial Networks (GANs) has facilitated realistic single-object image generation based on language input. However, complex-scene generation (with various interactions among multiple objects) still suffers from messy layouts and object distortions, due to diverse configurations in layouts and appearances. Prior methods are mostly object-driven and ignore their inter-relations that play a significant role in complex-scene images. This work explores relationship-aware complex-scene image generation, where multiple objects are inter-related as a scene graph. With the help of relationships, we propose three major updates in the generation framework. First, reasonable spatial layouts are inferred by jointly considering the semantics and relationships among objects. Compared to standard location regression, we show relative scales and distances serve a more reliable target. Second, since the relations between objects have significantly influenced an object's appearance, we design a relation-guided generator to generate objects reflecting their relationships. Third, a novel scene graph discriminator is proposed to guarantee the consistency between the generated image and the input scene graph. Our method tends to synthesize plausible layouts and objects, respecting the interplay of multiple objects in an image. Experimental results on Visual Genome and HICO-DET datasets show that our proposed method significantly outperforms prior arts in terms of IS and FID metrics. Based on our user study and visual inspection, our method is more effective in generating logical layout and appearance for complex-scenes.
APA, Harvard, Vancouver, ISO, and other styles
3

Wald, Johanna, Nassir Navab, and Federico Tombari. "Learning 3D Semantic Scene Graphs with Instance Embeddings." International Journal of Computer Vision 130, no. 3 (January 22, 2022): 630–51. http://dx.doi.org/10.1007/s11263-021-01546-9.

Full text
Abstract:
AbstractA 3D scene is more than the geometry and classes of the objects it comprises. An essential aspect beyond object-level perception is the scene context, described as a dense semantic network of interconnected nodes. Scene graphs have become a common representation to encode the semantic richness of images, where nodes in the graph are object entities connected by edges, so-called relationships. Such graphs have been shown to be useful in achieving state-of-the-art performance in image captioning, visual question answering and image generation or editing. While scene graph prediction methods so far focused on images, we propose instead a novel neural network architecture for 3D data, where the aim is to learn to regress semantic graphs from a given 3D scene. With this work, we go beyond object-level perception, by exploring relations between object entities. Our method learns instance embeddings alongside a scene segmentation and is able to predict semantics for object nodes and edges. We leverage 3DSSG, a large scale dataset based on 3RScan that features scene graphs of changing 3D scenes. Finally, we show the effectiveness of graphs as an intermediate representation on a retrieval task.
APA, Harvard, Vancouver, ISO, and other styles
4

Bauer, Daniel. "Understanding Descriptions of Visual Scenes Using Graph Grammars." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 29, 2013): 1656–57. http://dx.doi.org/10.1609/aaai.v27i1.8498.

Full text
Abstract:
Automatic generation of 3D scenes from descriptions has applications in communication, education, and entertainment, but requires deep understanding of the input text. I propose thesis work on language understanding using graph-based meaning representations that can be decomposed into primitive spatial relations. The techniques used for analyzing text and transforming it into a scene representation are based on context-free graph grammars. The thesis develops methods for semantic parsing with graphs, acquisition of graph grammars, and satisfaction of spatial and world-knowledge constraints during parsing.
APA, Harvard, Vancouver, ISO, and other styles
5

Shao, Tong, and Dapeng Oliver Wu. "Graph-LSTM with Global Attribute for Scene Graph Generation." Journal of Physics: Conference Series 2003, no. 1 (August 1, 2021): 012001. http://dx.doi.org/10.1088/1742-6596/2003/1/012001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Lin, Bingqian, Yi Zhu, and Xiaodan Liang. "Atom correlation based graph propagation for scene graph generation." Pattern Recognition 122 (February 2022): 108300. http://dx.doi.org/10.1016/j.patcog.2021.108300.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Ruize, Zhongyu Wei, Piji Li, Qi Zhang, and Xuanjing Huang. "Storytelling from an Image Stream Using Scene Graphs." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9185–92. http://dx.doi.org/10.1609/aaai.v34i05.6455.

Full text
Abstract:
Visual storytelling aims at generating a story from an image stream. Most existing methods tend to represent images directly with the extracted high-level features, which is not intuitive and difficult to interpret. We argue that translating each image into a graph-based semantic representation, i.e., scene graph, which explicitly encodes the objects and relationships detected within image, would benefit representing and describing images. To this end, we propose a novel graph-based architecture for visual storytelling by modeling the two-level relationships on scene graphs. In particular, on the within-image level, we employ a Graph Convolution Network (GCN) to enrich local fine-grained region representations of objects on scene graphs. To further model the interaction among images, on the cross-images level, a Temporal Convolution Network (TCN) is utilized to refine the region representations along the temporal dimension. Then the relation-aware representations are fed into the Gated Recurrent Unit (GRU) with attention mechanism for story generation. Experiments are conducted on the public visual storytelling dataset. Automatic and human evaluation results indicate that our method achieves state-of-the-art.
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Jin, Xiaofeng Ji, and Xinxiao Wu. "Adaptive Image-to-Video Scene Graph Generation via Knowledge Reasoning and Adversarial Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 276–84. http://dx.doi.org/10.1609/aaai.v36i1.19903.

Full text
Abstract:
Scene graph in a video conveys a wealth of information about objects and their relationships in the scene, thus benefiting many downstream tasks such as video captioning and visual question answering. Existing methods of scene graph generation require large-scale training videos annotated with objects and relationships in each frame to learn a powerful model. However, such comprehensive annotation is time-consuming and labor-intensive. On the other hand, it is much easier and less cost to annotate images with scene graphs, so we investigate leveraging annotated images to facilitate training a scene graph generation model for unannotated videos, namely image-to-video scene graph generation. This task presents two challenges: 1) infer unseen dynamic relationships in videos from static relationships in images due to the absence of motion information in images; 2) adapt objects and static relationships from images to video frames due to the domain shift between them. To address the first challenge, we exploit external commonsense knowledge to infer the unseen dynamic relationship from the temporal evolution of static relationships. We tackle the second challenge by hierarchical adversarial learning to reduce the data distribution discrepancy between images and video frames. Extensive experiment results on two benchmark video datasets demonstrate the effectiveness of our method.
APA, Harvard, Vancouver, ISO, and other styles
9

Jung, Gayoung, Jonghun Lee, and Incheol Kim. "Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation." Sensors 21, no. 9 (May 2, 2021): 3164. http://dx.doi.org/10.3390/s21093164.

Full text
Abstract:
Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.
APA, Harvard, Vancouver, ISO, and other styles
10

Li, Shuohao, Min Tang, Jun Zhang, and Lincheng Jiang. "Attentive Gated Graph Neural Network for Image Scene Graph Generation." Symmetry 12, no. 4 (April 2, 2020): 511. http://dx.doi.org/10.3390/sym12040511.

Full text
Abstract:
Image scene graph is a semantic structural representation which can not only show what objects are in the image, but also infer the relationships and interactions among them. Despite the recent success in object detection using deep neural networks, automatically recognizing social relations of objects in images remains a challenging task due to the significant gap between the domains of visual content and social relation. In this work, we translate the scene graph into an Attentive Gated Graph Neural Network which can propagate a message by visual relationship embedding. More specifically, nodes in gated neural networks can represent objects in the image, and edges can be regarded as relationships among objects. In this network, an attention mechanism is applied to measure the strength of the relationship between objects. It can increase the accuracy of object classification and reduce the complexity of relationship classification. Extensive experiments on the widely adopted Visual Genome Dataset show the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
11

Krahmer, Emiel, Sebastiaan van Erk, and André Verleg. "Graph-Based Generation of Referring Expressions." Computational Linguistics 29, no. 1 (March 2003): 53–72. http://dx.doi.org/10.1162/089120103321337430.

Full text
Abstract:
This article describes a new approach to the generation of referring expressions. We propose to formalize a scene (consisting of a set of objects with various properties and relations) as a labeled directed graph and describe content selection (which properties to include in a referring expression) as a subgraph construction problem. Cost functions are used to guide the search process and to give preference to some solutions over others. The current approach has four main advantages: (1) Graph structures have been studied extensively, and by moving to a graph perspective we get direct access to the many theories and algorithms for dealing with graphs; (2) many existing generation algorithms can be reformulated in terms of graphs, and this enhances comparison and integration of the various approaches; (3) the graph perspective allows us to solve a number of problems that have plagued earlier algorithms for the generation of referring expressions; and (4) the combined use of graphs and cost functions paves the way for an integration of rule-based generation techniques with more recent stochastic approaches.
APA, Harvard, Vancouver, ISO, and other styles
12

Lin, Zhiyuan, Feng Zhu, Qun Wang, Yanzi Kong, Jianyu Wang, Liang Huang, and Yingming Hao. "RSSGG_CS: Remote Sensing Image Scene Graph Generation by Fusing Contextual Information and Statistical Knowledge." Remote Sensing 14, no. 13 (June 29, 2022): 3118. http://dx.doi.org/10.3390/rs14133118.

Full text
Abstract:
To semantically understand remote sensing images, it is not only necessary to detect the objects in them but also to recognize the semantic relationships between the instances. Scene graph generation aims to represent the image as a semantic structural graph, where objects and relationships between them are described as nodes and edges, respectively. Some existing methods rely only on visual features to sequentially predict the relationships between objects, ignoring contextual information and making it difficult to generate high-quality scene graphs, especially for remote sensing images. Therefore, we propose a novel model for remote sensing image scene graph generation by fusing contextual information and statistical knowledge, namely RSSGG_CS. To integrate contextual information and calculate attention among all objects, the RSSGG_CS model adopts a filter module (FiM) that is based on adjusted transformer architecture. Moreover, to reduce the blindness of the model when searching semantic space, statistical knowledge of relational predicates between objects from the training dataset and the cleaned Wikipedia text is used as supervision when training the model. Experiments show that fusing contextual information and statistical knowledge allows the model to generate more complete scene graphs of remote sensing images and facilitates the semantic understanding of remote sensing images.
APA, Harvard, Vancouver, ISO, and other styles
13

Ashual, Oron, and Lior Wolf. "Interactive Scene Generation via Scene Graphs with Attributes." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 09 (April 3, 2020): 13651–54. http://dx.doi.org/10.1609/aaai.v34i09.7112.

Full text
Abstract:
We introduce a simple yet expressive image generation method. On the one hand, it does not require the user to paint the masks or define a bounding box of the various objects, since the model does it by itself. On the other hand, it supports defining a coarse location and size of each object. Based on this, we offer a simple, interactive GUI, that allows a layman user to generate diverse images effortlessly.From a technical perspective, we introduce a dual embedding of layout and appearance. In this scheme, the location, size, and appearance of an object can change independently of each other. This way, the model is able to generate innumerable images per scene graph, to better express the intention of the user.In comparison to previous work, we also offer better quality and higher resolution outputs. This is due to a superior architecture, which is based on a novel set of discriminators. Those discriminators better constrain the shape of the generated mask, as well as capturing the appearance encoding in a counterfactual way.Our code is publicly available at https://www.github.com/ashual/scene_generation.
APA, Harvard, Vancouver, ISO, and other styles
14

Chen, Chao, Yibing Zhan, Baosheng Yu, Liu Liu, Yong Luo, and Bo Du. "Resistance Training Using Prior Bias: Toward Unbiased Scene Graph Generation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 212–20. http://dx.doi.org/10.1609/aaai.v36i1.19896.

Full text
Abstract:
Scene Graph Generation (SGG) aims to build a structured representation of a scene using objects and pairwise relationships, which benefits downstream tasks. However, current SGG methods usually suffer from sub-optimal scene graph generation because of the long-tailed distribution of training data. To address this problem, we propose Resistance Training using Prior Bias (RTPB) for the scene graph generation. Specifically, RTPB uses a distributed-based prior bias to improve models' detecting ability on less frequent relationships during training, thus improving the model generalizability on tail categories. In addition, to further explore the contextual information of objects and relationships, we design a contextual encoding backbone network, termed as Dual Transformer (DTrans). We perform extensive experiments on a very popular benchmark, VG150, to demonstrate the effectiveness of our method for the unbiased scene graph generation. In specific, our RTPB achieves an improvement of over 10% under the mean recall when applied to current SGG methods. Furthermore, DTrans with RTPB outperforms nearly all state-of-the-art methods with a large margin. Code is available at https://github.com/ChCh1999/RTPB
APA, Harvard, Vancouver, ISO, and other styles
15

Zhou, Hao, Yazhou Yang, Tingjin Luo, Jun Zhang, and Shuohao Li. "A unified deep sparse graph attention network for scene graph generation." Pattern Recognition 123 (March 2022): 108367. http://dx.doi.org/10.1016/j.patcog.2021.108367.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Lin, Xin, Jinquan Zeng, and Xingquan Li. "Divide and Conquer: Subset Matching for Scene Graph Generation in Complex Scenes." IEEE Access 10 (2022): 39069–79. http://dx.doi.org/10.1109/access.2022.3165617.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Li, Peng, Dezheng Zhang, Aziguli Wulamu, Xin Liu, and Peng Chen. "Semantic Relation Model and Dataset for Remote Sensing Scene Understanding." ISPRS International Journal of Geo-Information 10, no. 7 (July 17, 2021): 488. http://dx.doi.org/10.3390/ijgi10070488.

Full text
Abstract:
A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.
APA, Harvard, Vancouver, ISO, and other styles
18

Shin, Donghyeop, and Incheol Kim. "Deep Image Understanding Using Multilayered Contexts." Mathematical Problems in Engineering 2018 (December 10, 2018): 1–11. http://dx.doi.org/10.1155/2018/5847460.

Full text
Abstract:
Generation of scene graphs and natural language captions from images for deep image understanding is an ongoing research problem. Scene graphs and natural language captions have a common characteristic in that they are generated by considering the objects in the images and the relationships between the objects. This study proposes a deep neural network model named the Context-based Captioning and Scene Graph Generation Network (C2SGNet), which simultaneously generates scene graphs and natural language captions from images. The proposed model generates results through communication of context information between these two tasks. For effective communication of context information, the two tasks are structured into three layers: the object detection, relationship detection, and caption generation layers. Each layer receives related context information from the lower layer. In this study, the proposed model was experimentally assessed using the Visual Genome benchmark data set. The performance improvement effect of the context information was verified through various experiments. Further, the high performance of the proposed model was confirmed through performance comparison with existing models.
APA, Harvard, Vancouver, ISO, and other styles
19

Kim, Seongyong, Tae Hyeon Jeon, Ilsun Rhiu, Jinhyun Ahn, and Dong-Hyuk Im. "Semantic Scene Graph Generation Using RDF Model and Deep Learning." Applied Sciences 11, no. 2 (January 17, 2021): 826. http://dx.doi.org/10.3390/app11020826.

Full text
Abstract:
Over the last several years, in parallel with the general global advancement in mobile technology and a rise in social media network content consumption, multimedia content production and reproduction has increased exponentially. Therefore, enabled by the rapid recent advancements in deep learning technology, research on scene graph generation is being actively conducted to more efficiently search for and classify images desired by users within a large amount of content. This approach lets users accurately find images they are searching for by expressing meaningful information on image content as nodes and edges of a graph. In this study, we propose a scene graph generation method based on using the Resource Description Framework (RDF) model to clarify semantic relations. Furthermore, we also use convolutional neural network (CNN) and recurrent neural network (RNN) deep learning models to generate a scene graph expressed in a controlled vocabulary of the RDF model to understand the relations between image object tags. Finally, we experimentally demonstrate through testing that our proposed technique can express semantic content more effectively than existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
20

Liu, Lijuan, Yin Yang, Yi Yuan, Tianjia Shao, He Wang, and Kun Zhou. "In-game Residential Home Planning via Visual Context-aware Global Relation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 1 (May 18, 2021): 336–43. http://dx.doi.org/10.1609/aaai.v35i1.16109.

Full text
Abstract:
In this paper, we propose an effective global relation learning algorithm to recommend an appropriate location of a building unit for in-game customization of residential home complex. Given a construction layout, we propose a visual context-aware graph generation network that learns the implicit global relations among the scene components and infers the location of a new building unit. The proposed network takes as input the scene graph and the corresponding top-view depth image. It provides the location recommendations for a newly added building units by learning an auto-regressive edge distribution conditioned on existing scenes. We also introduce a global graph-image matching loss to enhance the awareness of essential geometry semantics of the site. Qualitative and quantitative experiments demonstrate that the recommended location well reflects the implicit spatial rules of components in the residential estates, and it is instructive and practical to locate the building units in the 3D scene of the complex construction.
APA, Harvard, Vancouver, ISO, and other styles
21

Zakraoui, Jezia, Moutaz Saleh, Somaya Al-Maadeed, and Jihad Mohammed Jaam. "Improving text-to-image generation with object layout guidance." Multimedia Tools and Applications 80, no. 18 (May 20, 2021): 27423–43. http://dx.doi.org/10.1007/s11042-021-11038-0.

Full text
Abstract:
AbstractThe automatic generation of realistic images directly from a story text is a very challenging problem, as it cannot be addressed using a single image generation approach due mainly to the semantic complexity of the story text constituents. In this work, we propose a new approach that decomposes the task of story visualization into three phases: semantic text understanding, object layout prediction, and image generation and refinement. We start by simplifying the text using a scene graph triple notation that encodes semantic relationships between the story objects. We then introduce an object layout module to capture the features of these objects from the corresponding scene graph. Specifically, the object layout module aggregates individual object features from the scene graph as well as averaged or likelihood object features generated by a graph convolutional neural network. All these features are concatenated to form semantic triples that are then provided to the image generation framework. For the image generation phase, we adopt a scene graph image generation framework as stage-I, which is refined using a StackGAN as stage-II conditioned on the object layout module and the generated output image from stage-I. Our approach renders object details in high-resolution images while keeping the image structure consistent with the input text. To evaluate the performance of our approach, we use the COCO dataset and compare it with three baseline approaches, namely, sg2im, StackGAN and AttnGAN, in terms of image quality and user evaluation. According to the obtained assessment results, our object layout guidance-based approach significantly outperforms the abovementioned baseline approaches in terms of the accuracy of semantic matching and realism of the generated images representing the story text sentences.
APA, Harvard, Vancouver, ISO, and other styles
22

Zhang, Lizong, Haojun Yin, Bei Hui, Sijuan Liu, and Wei Zhang. "Knowledge-Based Scene Graph Generation with Visual Contextual Dependency." Mathematics 10, no. 14 (July 20, 2022): 2525. http://dx.doi.org/10.3390/math10142525.

Full text
Abstract:
Scene graph generation is the basis of various computer vision applications, including image retrieval, visual question answering, and image captioning. Previous studies have relied on visual features or incorporated auxiliary information to predict object relationships. However, the rich semantics of external knowledge have not yet been fully utilized, and the combination of visual and auxiliary information can lead to visual dependencies, which impacts relationship prediction among objects. Therefore, we propose a novel knowledge-based model with adjustable visual contextual dependency. Our model has three key components. The first module extracts the visual features and bounding boxes in the input image. The second module uses two encoders to fully integrate visual information and external knowledge. Finally, visual context loss and visual relationship loss are introduced to adjust the visual dependency of the model. The difference between the initial prediction results and the visual dependency results is calculated to generate the dependency-corrected results. The proposed model can obtain better global and contextual information for predicting object relationships, and the visual dependencies can be adjusted through the two loss functions. The results of extensive experiments show that our model outperforms most existing methods.
APA, Harvard, Vancouver, ISO, and other styles
23

Jung, Ga Young, and Incheol Kim. "Dynamic 3D Scene Graph Generation for Robotic Manipulation Tasks." Journal of Institute of Control, Robotics and Systems 27, no. 12 (December 31, 2021): 953–63. http://dx.doi.org/10.5302/j.icros.2021.21.0140.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Sonogashira, Motoharu, Masaaki Iiyama, and Yasutomo Kawanishi. "Towards Open-Set Scene Graph Generation With Unknown Objects." IEEE Access 10 (2022): 11574–83. http://dx.doi.org/10.1109/access.2022.3145465.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Li, Ping, Zhou Yu, and Yibing Zhan. "Deep relational self-Attention networks for scene graph generation." Pattern Recognition Letters 153 (January 2022): 200–206. http://dx.doi.org/10.1016/j.patrec.2021.12.013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Xu, Xiaogang, and Ning Xu. "Hierarchical Image Generation via Transformer-Based Sequential Patch Selection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2938–45. http://dx.doi.org/10.1609/aaai.v36i3.20199.

Full text
Abstract:
To synthesize images with preferred objects and interactions, a controllable way is to generate the image from a scene graph and a large pool of object crops, where the spatial arrangements of the objects in the image are defined by the scene graph while their appearances are determined by the retrieved crops from the pool. In this paper, we propose a novel framework with such a semi-parametric generation strategy. First, to encourage the retrieval of mutually compatible crops, we design a sequential selection strategy where the crop selection for each object is determined by the contents and locations of all object crops that have been chosen previously. Such process is implemented via a transformer trained with contrastive losses. Second, to generate the final image, our hierarchical generation strategy leverages hierarchical gated convolutions which are employed to synthesize areas not covered by any image crops, and a patch guided spatially adaptive normalization module which is proposed to guarantee the final generated images complying with the crop appearance and the scene graph. Evaluated on the challenging Visual Genome and COCO-Stuff dataset, our experimental results demonstrate the superiority of our proposed method over existing state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
27

Fu, Ze, Junhao Feng, Changmeng Zheng, and Yi Cai. "Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 11 (June 28, 2022): 12947–48. http://dx.doi.org/10.1609/aaai.v36i11.21610.

Full text
Abstract:
Existing scene graph generation methods suffer the limitations when the image lacks of sufficient visual contexts. To address this limitation, we propose a knowledge-enhanced scene graph generation model with multimodal relation alignment, which supplements the missing visual contexts by well-aligned textual knowledge. First, we represent the textual information into contextualized knowledge which is guided by the visual objects to enhance the contexts. Furthermore, we align the multimodal relation triplets by co-attention module for better semantics fusion. The experimental results show the effectiveness of our method.
APA, Harvard, Vancouver, ISO, and other styles
28

Feng, Bin, Qing Zhu, Mingwei Liu, Yun Li, Junxiao Zhang, Xiao Fu, Yan Zhou, Maosu Li, Huagui He, and Weijun Yang. "An Efficient Graph-Based Spatio-Temporal Indexing Method for Task-Oriented Multi-Modal Scene Data Organization." ISPRS International Journal of Geo-Information 7, no. 9 (September 8, 2018): 371. http://dx.doi.org/10.3390/ijgi7090371.

Full text
Abstract:
Task-oriented scene data in big data and cloud environments of a smart city that must be time-critically processed are dynamic and associated with increasing complexities and heterogeneities. Existing hybrid tree-based external indexing methods are input/output (I/O)-intensive, query schema-fixed, and difficult when representing the complex relationships of real-time multi-modal scene data; specifically, queries are limited to a certain spatio-temporal range or a small number of selected attributes. This paper proposes a new spatio-temporal indexing method for task-oriented multi-modal scene data organization. First, a hybrid spatio-temporal index architecture is proposed based on the analysis of the characteristics of scene data and the driving forces behind the scene tasks. Second, a graph-based spatio-temporal relation indexing approach, named the spatio-temporal relation graph (STR-graph), is constructed for this architecture. The global graph-based index, internal and external operation mechanisms, and optimization strategy of the STR-graph index are introduced in detail. Finally, index efficiency comparison experiments are conducted, and the results show that the STR-graph performs excellently in index generation and can efficiently address the diverse requirements of different visualization tasks for data scheduling; specifically, the STR-graph is more efficient when addressing complex and uncertain spatio-temporal relation queries.
APA, Harvard, Vancouver, ISO, and other styles
29

Zheng, Zhenxing, Zhendong Li, Gaoyun An, and Songhe Feng. "Subgraph and object context-masked network for scene graph generation." IET Computer Vision 14, no. 7 (October 1, 2020): 546–53. http://dx.doi.org/10.1049/iet-cvi.2019.0896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Luo, Jie, Jia Zhao, Bin Wen, and Yuhang Zhang. "Explaining the semantics capturing capability of scene graph generation models." Pattern Recognition 110 (February 2021): 107427. http://dx.doi.org/10.1016/j.patcog.2020.107427.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Sylvain, Tristan, Pengchuan Zhang, Yoshua Bengio, R. Devon Hjelm, and Shikhar Sharma. "Object-Centric Image Generation from Layouts." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 3 (May 18, 2021): 2647–55. http://dx.doi.org/10.1609/aaai.v35i3.16368.

Full text
Abstract:
We begin with the hypothesis that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes with multiple objects well. Our layout-to-image-generation method, which we call Object-Centric Generative Adversarial Network (or OC-GAN), relies on a novel Scene-Graph Similarity Module (SGSM). The SGSM learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We also propose changes to the conditioning mechanism of the generator that enhance its object instance-awareness. Apart from improving image quality, our contributions mitigate two failure modes in previous approaches: (1) spurious objects being generated without corresponding bounding boxes in the layout, and (2) overlapping bounding boxes in the layout leading to merged objects in images. Extensive quantitative evaluation and ablation studies demonstrate the impact of our contributions, with our model outperforming previous state-of-the-art approaches on both the COCO-Stuff and Visual Genome datasets. Finally, we address an important limitation of evaluation metrics used in previous works by introducing SceneFID -- an object-centric adaptation of the popular Fréchet Inception Distance metric, that is better suited for multi-object images.
APA, Harvard, Vancouver, ISO, and other styles
32

Bao, Yongtang, Pengfei Lin, Yao Li, Yue Qi, Zhihui Wang, Wenxiang Du, and Qing Fan. "Parallel Structure from Motion for Sparse Point Cloud Generation in Large-Scale Scenes." Sensors 21, no. 11 (June 7, 2021): 3939. http://dx.doi.org/10.3390/s21113939.

Full text
Abstract:
Scene reconstruction uses images or videos as input to reconstruct a 3D model of a real scene and has important applications in smart cities, surveying and mapping, military, and other fields. Structure from motion (SFM) is a key step in scene reconstruction, which recovers sparse point clouds from image sequences. However, large-scale scenes cannot be reconstructed using a single compute node. Image matching and geometric filtering take up a lot of time in the traditional SFM problem. In this paper, we propose a novel divide-and-conquer framework to solve the distributed SFM problem. First, we use the global navigation satellite system (GNSS) information from images to calculate the GNSS neighborhood. The number of images matched is greatly reduced by matching each image to only valid GNSS neighbors. This way, a robust matching relationship can be obtained. Second, the calculated matching relationship is used as the initial camera graph, which is divided into multiple subgraphs by the clustering algorithm. The local SFM is executed on several computing nodes to register the local cameras. Finally, all of the local camera poses are integrated and optimized to complete the global camera registration. Experiments show that our system can accurately and efficiently solve the structure from motion problem in large-scale scenes.
APA, Harvard, Vancouver, ISO, and other styles
33

Kim, Incheol. "Visual Experience-Based Question Answering with Complex Multimodal Environments." Mathematical Problems in Engineering 2020 (November 19, 2020): 1–18. http://dx.doi.org/10.1155/2020/8567271.

Full text
Abstract:
This paper proposes a novel visual experience-based question answering problem (VEQA) and the corresponding dataset for embodied intelligence research that requires an agent to do actions, understand 3D scenes from successive partial input images, and answer natural language questions about its visual experiences in real time. Unlike the conventional visual question answering (VQA), the VEQA problem assumes both partial observability and dynamics of a complex multimodal environment. To address this VEQA problem, we propose a hybrid visual question answering system, VQAS, integrating a deep neural network-based scene graph generation model and a rule-based knowledge reasoning system. The proposed system can generate more accurate scene graphs for dynamic environments with some uncertainty. Moreover, it can answer complex questions through knowledge reasoning with rich background knowledge. Results of experiments using a photo-realistic 3D simulated environment, AI2-THOR, and the VEQA benchmark dataset prove the high performance of the proposed system.
APA, Harvard, Vancouver, ISO, and other styles
34

Sun, Qian, and Ruizhen Hu. "Prediction and Generation of 3D Functional Scene Based on Relation Graph." Journal of Computer-Aided Design & Computer Graphics 34, no. 09 (September 1, 2022): 1351–61. http://dx.doi.org/10.3724/sp.j.1089.2022.19174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Li, Shufei, Pai Zheng, Zuoxu Wang, Junming Fan, and Lihui Wang. "Dynamic Scene Graph for Mutual-Cognition Generation in Proactive Human-Robot Collaboration." Procedia CIRP 107 (2022): 943–48. http://dx.doi.org/10.1016/j.procir.2022.05.089.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Khan, Amjad Rehman, Hamza Mukhtar, Tanzila Saba, Omer Riaz, Muhammad Usman Ghani Khan, and Saeed Ali Bahaj. "Scene Graph Generation With Structured Aspect of Segmenting the Big Distributed Clusters." IEEE Access 10 (2022): 24264–72. http://dx.doi.org/10.1109/access.2022.3155652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Kumar, Aiswarya S., and Jyothisha J. Nair. "Scene Graph Generation Using Depth, Spatial, and Visual Cues in 2D Images." IEEE Access 10 (2022): 1968–78. http://dx.doi.org/10.1109/access.2021.3139000.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Xue, Li Jun, and Li Li Wang. "Semantic-Based Three-Dimensional Modeling of Virtual Reality Scenes." Applied Mechanics and Materials 385-386 (August 2013): 1780–84. http://dx.doi.org/10.4028/www.scientific.net/amm.385-386.1780.

Full text
Abstract:
Virtual reality scenes three-dimensional modeling usually is constructed on the basis of learning with virtual data, and learning behavior is too dependent on virtual data.It is difficult to quickly and accurately reflect characteristics of 3D modeling in virtual reality, and learning complexity is higher. In this paper, a method of virtual reality scenes three-dimensional modeling based on semantic was presented on the basis of analysis of the virtual scenes modeling methods. The modeling system architecture using the method is divided into physical model libraries, three-dimensional model semantic knowledge base, semantic-based visual modeling and scene graph automatically generation modules. The experimental results show that the detection performance is better than the results of the three-dimensional modeling based on virtual data, and the system increases flexibility and usability of the three-dimensional modeling,
APA, Harvard, Vancouver, ISO, and other styles
39

Boguslawski, P., L. Mahdjoubi, V. Zverovich, and F. Fadli. "TWO-GRAPH BUILDING INTERIOR REPRESENTATION FOR EMERGENCY RESPONSE APPLICATIONS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-2 (June 2, 2016): 9–14. http://dx.doi.org/10.5194/isprsannals-iii-2-9-2016.

Full text
Abstract:
Nowadays, in a rapidly developing urban environment with bigger and higher public buildings, disasters causing emergency situations and casualties are unavoidable. Preparedness and quick response are crucial issues saving human lives. Available information about an emergency scene, such as a building structure, helps for decision making and organizing rescue operations. Models supporting decision-making should be available in real, or near-real, time. Thus, good quality models that allow implementation of automated methods are highly desirable. This paper presents details of the recently developed method for automated generation of variable density navigable networks in a 3D indoor environment, including a full 3D topological model, which may be used not only for standard navigation but also for finding safe routes and simulating hazard and phenomena associated with disasters such as fire spread and heat transfer.
APA, Harvard, Vancouver, ISO, and other styles
40

Boguslawski, P., L. Mahdjoubi, V. Zverovich, and F. Fadli. "TWO-GRAPH BUILDING INTERIOR REPRESENTATION FOR EMERGENCY RESPONSE APPLICATIONS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-2 (June 2, 2016): 9–14. http://dx.doi.org/10.5194/isprs-annals-iii-2-9-2016.

Full text
Abstract:
Nowadays, in a rapidly developing urban environment with bigger and higher public buildings, disasters causing emergency situations and casualties are unavoidable. Preparedness and quick response are crucial issues saving human lives. Available information about an emergency scene, such as a building structure, helps for decision making and organizing rescue operations. Models supporting decision-making should be available in real, or near-real, time. Thus, good quality models that allow implementation of automated methods are highly desirable. This paper presents details of the recently developed method for automated generation of variable density navigable networks in a 3D indoor environment, including a full 3D topological model, which may be used not only for standard navigation but also for finding safe routes and simulating hazard and phenomena associated with disasters such as fire spread and heat transfer.
APA, Harvard, Vancouver, ISO, and other styles
41

Geng, Shijie, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, and Anoop Cherian. "Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1415–23. http://dx.doi.org/10.1609/aaai.v35i2.16231.

Full text
Abstract:
Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content. This task thus poses a challenging multi-modal representation learning and reasoning scenario, advancements into which could influence several human-machine interaction applications. To solve this task, we introduce a semantics-controlled multi-modal shuffled Transformer reasoning framework, consisting of a sequence of Transformer modules, each taking a modality as input and producing representations conditioned on the input question. Our proposed Transformer variant uses a shuffling scheme on their multi-head outputs, demonstrating better regularization. To encode fine-grained visual information, we present a novel dynamic scene graph representation learning pipeline that consists of an intra-frame reasoning layer producing spatio-semantic graph representations for every frame, and an inter-frame aggregation module capturing temporal cues. Our entire pipeline is trained end-to-end. We present experiments on the benchmark AVSD dataset, both on answer generation and selection tasks. Our results demonstrate state-of-the-art performances on all evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
42

Gao, Jiahui, Yi Zhou, Philip L. H. Yu, Shafiq Joty, and Jiuxiang Gu. "UNISON: Unpaired Cross-Lingual Image Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10654–62. http://dx.doi.org/10.1609/aaai.v36i10.21310.

Full text
Abstract:
Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and deprives a large part of the world population of its benefit. In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language. Specifically, our method consists of two phases: (1) a cross-lingual auto-encoding process, which utilizing a sentence parallel (bitext) corpus to learn the mapping from the source to the target language in the scene graph encoding space and decode sentences in the target language, and (2) a cross-modal unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to language modality. We verify the effectiveness of our proposed method on the Chinese image caption generation task. The comparisons against several existing methods demonstrate the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
43

Zhang, Susu, Jiancheng Ni, Lijun Hou, Zili Zhou, Jie Hou, and Feng Gao. "Global-Affine and Local-Specific Generative Adversarial Network for semantic-guided image generation." Mathematical Foundations of Computing 4, no. 3 (2021): 145. http://dx.doi.org/10.3934/mfc.2021009.

Full text
Abstract:
<p style='text-indent:20px;'>The recent progress in learning image feature representations has opened the way for tasks such as label-to-image or text-to-image synthesis. However, one particular challenge widely observed in existing methods is the difficulty of synthesizing fine-grained textures and small-scale instances. In this paper, we propose a novel Global-Affine and Local-Specific Generative Adversarial Network (GALS-GAN) to explicitly construct global semantic layouts and learn distinct instance-level features. To achieve this, we adopt the graph convolutional network to calculate the instance locations and spatial relationships from scene graphs, which allows our model to obtain the high-fidelity semantic layouts. Also, a local-specific generator, where we introduce the feature filtering mechanism to separately learn semantic maps for different categories, is utilized to disentangle and generate specific visual features. Moreover, we especially apply a weight map predictor to better combine the global and local pathways considering the highly complementary between these two generation sub-networks. Extensive experiments on the COCO-Stuff and Visual Genome datasets demonstrate the superior generation performance of our model against previous methods, our approach is more capable of capturing photo-realistic local characteristics and rendering small-sized entities with more details.</p>
APA, Harvard, Vancouver, ISO, and other styles
44

He, Yufeng, Barbara Hofer, Yehua Sheng, and Yi Huang. "Dynamic Representations of Spatial Events – The Example of a Typhoon." AGILE: GIScience Series 2 (June 4, 2021): 1–7. http://dx.doi.org/10.5194/agile-giss-2-30-2021.

Full text
Abstract:
Abstract. The Geographic scene is a conceptual model that provides a holistic representation of the environment. This model has been developed in order to overcome limitations of geographic information systems (GIS) concerning interactions between features and the representation of dynamics. This contribution translates the theoretical model into an implementation of a dynamic data model in the graph database Neo4j and applies it to GIS data representing the dynamic information of a typhoon. The specific focus of the contribution is on choices made in the process of generation of the implementation of the example and the potential queries it supports.
APA, Harvard, Vancouver, ISO, and other styles
45

Kang, Donggu, Jiyeon Kim, and Jongjin Jung. "A Framework of Automatic Ontology Construction based on Scene Graph Generation Model for Analysis of Story Video Contents." Transactions of The Korean Institute of Electrical Engineers 71, no. 9 (September 30, 2022): 1286–92. http://dx.doi.org/10.5370/kiee.2022.71.9.1286.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Xu, Chunpu, Min Yang, Chengming Li, Ying Shen, Xiang Ao, and Ruifeng Xu. "Imagine, Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3022–29. http://dx.doi.org/10.1609/aaai.v35i4.16410.

Full text
Abstract:
Visual storytelling is a task of creating a short story based on photo streams. Different from visual captions, stories contain not only factual descriptions, but also imaginary concepts that do not appear in the images. In this paper, we propose a novel imagine-reason-write generation framework (IRW) for visual storytelling, inspired by the logic of humans when they write the story. First, an imagine module is leveraged to learn the imaginative storyline explicitly, improving the coherence and reasonability of the generated story. Second, we employ a reason module to fully exploit the external knowledge (commonsense knowledge base) and task-specific knowledge (scene graph and event graph) with relational reasoning method based on the storyline. In this way, we can effectively capture the most informative commonsense and visual relationships among objects in images, which enhances the diversity and informativeness of the generated story. Finally, we integrate the imaginary concepts and relational knowledge to generate human-like story based on the original semantics of images. Extensive experiments on a benchmark dataset (i.e., VIST) demonstrate that the proposed IRW framework significantly outperforms the state-of-the-art methods across multiple evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
47

Zhang, Fang-Lue, Connelly Barnes, Hao-Tian Zhang, Junhong Zhao, and Gabriel Salas. "Coherent video generation for multiple hand-held cameras with dynamic foreground." Computational Visual Media 6, no. 3 (September 2020): 291–306. http://dx.doi.org/10.1007/s41095-020-0187-3.

Full text
Abstract:
Abstract For many social events such as public performances, multiple hand-held cameras may capture the same event. This footage is often collected by amateur cinematographers who typically have little control over the scene and may not pay close attention to the camera. For these reasons, each individually captured video may fail to cover the whole time of the event, or may lose track of interesting foreground content such as a performer. We introduce a new algorithm that can synthesize a single smooth video sequence of moving foreground objects captured by multiple hand-held cameras. This allows later viewers to gain a cohesive narrative experience that can transition between different cameras, even though the input footage may be less than ideal. We first introduce a graph-based method for selecting a good transition route. This allows us to automatically select good cut points for the hand-held videos, so that smooth transitions can be created between the resulting video shots. We also propose a method to synthesize a smooth photorealistic transition video between each pair of hand-held cameras, which preserves dynamic foreground content during this transition. Our experiments demonstrate that our method outperforms previous state-of-the-art methods, which struggle to preserve dynamic foreground content.
APA, Harvard, Vancouver, ISO, and other styles
48

Al-Durgham, M., M. Downey, S. Gehrke, and B. T. Beshah. "A FRAMEWORK FOR AN AUTOMATIC SEAMLINE ENGINE." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B1 (June 3, 2016): 275–80. http://dx.doi.org/10.5194/isprsarchives-xli-b1-275-2016.

Full text
Abstract:
Seamline generation is a crucial last step in the ortho-image mosaicking process. In particular, it is required to convolute residual geometric and radiometric imperfections that stem from various sources. In particular, temporal differences in the acquired data will cause the scene content and illumination conditions to vary. These variations can be modelled successfully. However, one is left with micro-differences that do need to be considered in seamline generation. Another cause of discrepancies originates from the rectification surface as it will not model the actual terrain and especially human-made objects perfectly. Quality of the image orientation will also contribute to the overall differences between adjacent ortho-rectified images. &lt;br&gt;&lt;br&gt; Our approach takes into consideration the aforementioned differences in designing a seamline engine. We have identified the following essential behaviours of the seamline in our engine: 1) Seamlines must pass through the path of least resistance, i.e., overlap areas with low radiometric differences. 2) Seamlines must not intersect with breaklines as that will lead to visible geometric artefacts. And finally, 3), shorter seamlines are generally favourable; they also result in faster operator review and, where necessary, interactive editing cycles. The engine design also permits alteration of the above rules for special cases. &lt;br&gt;&lt;br&gt; Although our preliminary experiments are geared towards line imaging systems (i.e., the Leica ADS family), our seamline engine remains sensor agnostic. Hence, our design is capable of mosaicking images from various sources with minimal effort. The main idea behind this engine is using graph cuts which, in spirit, is based of the max-flow min-cut theory. The main advantage of using graph cuts theory is that the generated solution is global in the energy minimization sense. In addition, graph cuts allows for a highly scalable design where a set of rules contribute towards a cost function which, in turn, influences the path of minimum resistance for the seamlines. In this paper, the authors present an approach for achieving quality seamlines relatively quickly and with emphasis on generating truly seamless ortho-mosaics.
APA, Harvard, Vancouver, ISO, and other styles
49

Al-Durgham, M., M. Downey, S. Gehrke, and B. T. Beshah. "A FRAMEWORK FOR AN AUTOMATIC SEAMLINE ENGINE." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B1 (June 3, 2016): 275–80. http://dx.doi.org/10.5194/isprs-archives-xli-b1-275-2016.

Full text
Abstract:
Seamline generation is a crucial last step in the ortho-image mosaicking process. In particular, it is required to convolute residual geometric and radiometric imperfections that stem from various sources. In particular, temporal differences in the acquired data will cause the scene content and illumination conditions to vary. These variations can be modelled successfully. However, one is left with micro-differences that do need to be considered in seamline generation. Another cause of discrepancies originates from the rectification surface as it will not model the actual terrain and especially human-made objects perfectly. Quality of the image orientation will also contribute to the overall differences between adjacent ortho-rectified images. <br><br> Our approach takes into consideration the aforementioned differences in designing a seamline engine. We have identified the following essential behaviours of the seamline in our engine: 1) Seamlines must pass through the path of least resistance, i.e., overlap areas with low radiometric differences. 2) Seamlines must not intersect with breaklines as that will lead to visible geometric artefacts. And finally, 3), shorter seamlines are generally favourable; they also result in faster operator review and, where necessary, interactive editing cycles. The engine design also permits alteration of the above rules for special cases. <br><br> Although our preliminary experiments are geared towards line imaging systems (i.e., the Leica ADS family), our seamline engine remains sensor agnostic. Hence, our design is capable of mosaicking images from various sources with minimal effort. The main idea behind this engine is using graph cuts which, in spirit, is based of the max-flow min-cut theory. The main advantage of using graph cuts theory is that the generated solution is global in the energy minimization sense. In addition, graph cuts allows for a highly scalable design where a set of rules contribute towards a cost function which, in turn, influences the path of minimum resistance for the seamlines. In this paper, the authors present an approach for achieving quality seamlines relatively quickly and with emphasis on generating truly seamless ortho-mosaics.
APA, Harvard, Vancouver, ISO, and other styles
50

Xie, L., and R. Wang. "AUTOMATIC INDOOR BUILDING RECONSTRUCTION FROM MOBILE LASER SCANNING DATA." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W7 (September 12, 2017): 417–22. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w7-417-2017.

Full text
Abstract:
Indoor reconstruction from point clouds is a hot topic in photogrammetry, computer vision and computer graphics. Reconstructing indoor scene from point clouds is challenging due to complex room floorplan and line-of-sight occlusions. Most of existing methods deal with stationary terrestrial laser scanning point clouds or RGB-D point clouds. In this paper, we propose an automatic method for reconstructing indoor 3D building models from mobile laser scanning point clouds. The method includes 2D floorplan generation, 3D building modeling, door detection and room segmentation. The main idea behind our approach is to separate wall structure into two different types as the inner wall and the outer wall based on the observation of point distribution. Then we utilize a graph cut based optimization method to solve the labeling problem and generate the 2D floorplan based on the optimization result. Subsequently, we leverage an &amp;alpha;-shape based method to detect the doors on the 2D projected point clouds and utilize the floorplan to segment the individual room. The experiments show that this door detection method can achieve a recognition rate at 97% and the room segmentation method can attain the correct segmentation results. We also evaluate the reconstruction accuracy on the synthetic data, which indicates the accuracy of our method is comparable to the state-of-the art.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography