Journal articles on the topic 'DEEP LEARNING, GENERATION, SEMANTIC SEGMENTATION, MACHINE LEARNING'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'DEEP LEARNING, GENERATION, SEMANTIC SEGMENTATION, MACHINE LEARNING.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Murtiyoso, A., F. Matrone, M. Martini, A. Lingua, P. Grussenmeyer, and R. Pierdicca. "AUTOMATIC TRAINING DATA GENERATION IN DEEP LEARNING-AIDED SEMANTIC SEGMENTATION OF HERITAGE BUILDINGS." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2022 (May 17, 2022): 317–24. http://dx.doi.org/10.5194/isprs-annals-v-2-2022-317-2022.

Full text
Abstract:
Abstract. In the geomatics domain the use of deep learning, a subset of machine learning, is becoming more and more widespread. In this context, the 3D semantic segmentation of heritage point clouds presents an interesting and promising approach for modelling automation, in light of the heterogeneous nature of historical building styles and features. However, this heterogeneity also presents an obstacle in terms of generating the training data for use in deep learning, hitherto performed largely manually. The current generally low availability of labelled data also presents a motivation to aid the process of training data generation. In this paper, we propose the use of approaches based on geometric rules to automate to a certain degree this task. One object class will be discussed in this paper, namely the pillars class. Results show that the approach managed to extract pillars with satisfactory quality (98.5% of correctly detected pillars with the proposed algorithm). Tests were also performed to use the outputs in a deep learning segmentation setting, with a favourable outcome in terms of reducing the overall labelling time (−66.5%). Certain particularities were nevertheless observed, which also influence the result of the deep learning segmentation.
APA, Harvard, Vancouver, ISO, and other styles
2

Oluwasammi, Ariyo, Muhammad Umar Aftab, Zhiguang Qin, Son Tung Ngo, Thang Van Doan, Son Ba Nguyen, Son Hoang Nguyen, and Giang Hoang Nguyen. "Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning." Complexity 2021 (March 18, 2021): 1–19. http://dx.doi.org/10.1155/2021/5538927.

Full text
Abstract:
With the emergence of deep learning, computer vision has witnessed extensive advancement and has seen immense applications in multiple domains. Specifically, image captioning has become an attractive focal direction for most machine learning experts, which includes the prerequisite of object identification, location, and semantic understanding. In this paper, semantic segmentation and image captioning are comprehensively investigated based on traditional and state-of-the-art methodologies. In this survey, we deliberate on the use of deep learning techniques on the segmentation analysis of both 2D and 3D images using a fully convolutional network and other high-level hierarchical feature extraction methods. First, each domain’s preliminaries and concept are described, and then semantic segmentation is discussed alongside its relevant features, available datasets, and evaluation criteria. Also, the semantic information capturing of objects and their attributes is presented in relation to their annotation generation. Finally, analysis of the existing methods, their contributions, and relevance are highlighted, informing the importance of these methods and illuminating a possible research continuation for the application of semantic image segmentation and image captioning approaches.
APA, Harvard, Vancouver, ISO, and other styles
3

Pellis, E., A. Murtiyoso, A. Masiero, G. Tucci, M. Betti, and P. Grussenmeyer. "AN IMAGE-BASED DEEP LEARNING WORKFLOW FOR 3D HERITAGE POINT CLOUD SEMANTIC SEGMENTATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVI-2/W1-2022 (February 25, 2022): 429–34. http://dx.doi.org/10.5194/isprs-archives-xlvi-2-w1-2022-429-2022.

Full text
Abstract:
Abstract. The interest in high-resolution semantic 3D models of historical buildings continuously increased during the last decade, thanks to their utility in protection, conservation and restoration of cultural heritage sites. The current generation of surveying tools allows the quick collection of large and detailed amount of data: such data ensure accurate spatial representations of the buildings, but their employment in the creation of informative semantic 3D models is still a challenging task, and it currently still requires manual time-consuming intervention by expert operators. Hence, increasing the level of automation, for instance developing an automatic semantic segmentation procedure enabling machine scene understanding and comprehension, can represent a dramatic improvement in the overall processing procedure. In accordance with this observation, this paper aims at presenting a new workflow for the automatic semantic segmentation of 3D point clouds based on a multi-view approach. Two steps compose this workflow: first, neural network-based semantic segmentation is performed on building images. Then, image labelling is back-projected, through the use of masked images, on the 3D space by exploiting photogrammetry and dense image matching principles. The obtained results are quite promising, with a good performance in the image segmentation, and a remarkable potential in the 3D reconstruction procedure.
APA, Harvard, Vancouver, ISO, and other styles
4

Rettenberger, Luca, Marcel Schilling, and Markus Reischl. "Annotation Efforts in Image Segmentation can be Reduced by Neural Network Bootstrapping." Current Directions in Biomedical Engineering 8, no. 2 (August 1, 2022): 329–32. http://dx.doi.org/10.1515/cdbme-2022-1084.

Full text
Abstract:
Abstract Modern medical technology offers potential for the automatic generation of datasets that can be fed into deep learning systems. However, even though raw data for supporting diagnostics can be obtained with manageable effort, generating annotations is burdensome and time-consuming. Since annotating images for semantic segmentation is particularly exhausting, methods to reduce the human effort are especially valuable. We propose a combined framework that utilizes unsupervised machine learning to automatically generate segmentation masks. Experiments on two biomedical datasets show that our approach generates noticeably better annotations than Otsu thresholding and k-means clustering without needing any additional manual effort. Using our framework, unannotated datasets can be amended with pre-annotations fully unsupervised thus reducing the human effort to a minimum.
APA, Harvard, Vancouver, ISO, and other styles
5

Prakash, Nikhil, Andrea Manconi, and Simon Loew. "Mapping Landslides on EO Data: Performance of Deep Learning Models vs. Traditional Machine Learning Models." Remote Sensing 12, no. 3 (January 21, 2020): 346. http://dx.doi.org/10.3390/rs12030346.

Full text
Abstract:
Mapping landslides using automated methods is a challenging task, which is still largely done using human efforts. Today, the availability of high-resolution EO data products is increasing exponentially, and one of the targets is to exploit this data source for the rapid generation of landslide inventory. Conventional methods like pixel-based and object-based machine learning strategies have been studied extensively in the last decade. In addition, recent advances in CNN (convolutional neural network), a type of deep-learning method, has been widely successful in extracting information from images and have outperformed other conventional learning methods. In the last few years, there have been only a few attempts to adapt CNN for landslide mapping. In this study, we introduce a modified U-Net model for semantic segmentation of landslides at a regional scale from EO data using ResNet34 blocks for feature extraction. We also compare this with conventional pixel-based and object-based methods. The experiment was done in Douglas County, a study area selected in the south of Portland in Oregon, USA, and landslide inventory extracted from SLIDO (Statewide Landslide Information Database of Oregon) was considered as the ground truth. Landslide mapping is an imbalanced learning problem with very limited availability of training data. Our network was trained on a combination of focal Tversky loss and cross-entropy loss functions using augmented image tiles sampled from a selected training area. The deep-learning method was observed to have a better performance than the conventional methods with an MCC (Matthews correlation coefficient) score of 0.495 and a POD (probability of detection) rate of 0.72 .
APA, Harvard, Vancouver, ISO, and other styles
6

Ravishankar, Rashmi, Elaf AlMahmoud, Abdulelah Habib, and Olivier L. de Weck. "Capacity Estimation of Solar Farms Using Deep Learning on High-Resolution Satellite Imagery." Remote Sensing 15, no. 1 (December 30, 2022): 210. http://dx.doi.org/10.3390/rs15010210.

Full text
Abstract:
Global solar photovoltaic capacity has consistently doubled every 18 months over the last two decades, going from 0.3 GW in 2000 to 643 GW in 2019, and is forecast to reach 4240 GW by 2040. However, these numbers are uncertain, and virtually all reporting on deployments lacks a unified source of either information or validation. In this paper, we propose, optimize, and validate a deep learning framework to detect and map solar farms using a state-of-the-art semantic segmentation convolutional neural network applied to satellite imagery. As a final step in the pipeline, we propose a model to estimate the energy generation capacity of the detected solar energy facilities. Objectively, the deep learning model achieved highly competitive performance indicators, including a mean accuracy of 96.87%, and a Jaccard Index (intersection over union of classified pixels) score of 95.5%. Subjectively, it was found to detect spaces between panels producing a segmentation output at a sub-farm level that was better than human labeling. Finally, the detected areas and predicted generation capacities were validated against publicly available data to within an average error of 4.5% Deep learning applied specifically for the detection and mapping of solar farms is an active area of research, and this deep learning capacity evaluation pipeline is one of the first of its kind. We also share an original dataset of overhead solar farm satellite imagery comprising 23,000 images (256 × 256 pixels each), and the corresponding labels upon which the machine learning model was trained.
APA, Harvard, Vancouver, ISO, and other styles
7

Mohtich, F. E., M. El-Ayachi, S. Bensiali, A. Idri, and I. Ait Hou. "DEEP LEARNING APPROACH APPLIED TO DRONE IMAGERY FOR REAL ESTATE TAX ASSESSMENT: CASE OF THE TAX ON UNBUILT LAND KENITRA-MOROCCO." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-4/W5-2022 (October 17, 2022): 121–27. http://dx.doi.org/10.5194/isprs-archives-xlviii-4-w5-2022-121-2022.

Full text
Abstract:
Abstract. According to the Court of Audit, urban taxation is the main source of revenue for local authorities in almost all regions of the world. In Morocco, in particular, the tax on unbuilt urban land accounts for 35% of the revenue from taxes managed directly by the municipality. The property tax assessment system currently adopted is not regularly updated and is not properly monitored. These difficulties do not allow for a significant expansion of the land base. The current efforts aim at accelerating the census of the urban heritage using innovative and automated approaches which are intended to lead to the next generation of urban information services and the development of smart cities. In this context we propose a methodology that consists of acquisition of high-resolution UAV images. Then the training of a deep learning algorithm of semantic segmentation of the images in order to extract the characteristics defining the unbuilt land. U-Net, the deep architecture of the convolutional neural network that we have parameterized in order to adapt it to the nature of the phenomenon treated and the volume of data we have as well as the performance of the machine, offers a segmentation accuracy that reaches 98.4%.Deep learning algorithms are seen as more promising for overcoming the difficulties of extracting semantic features from complex scenes and large differences in the appearance of unbuilt urban land. The results of prediction will be used for defining urban areas where updates are made from the perspective of tracking urban taxes.
APA, Harvard, Vancouver, ISO, and other styles
8

Grimm, Florian, Florian Edl, Susanne R. Kerscher, Kay Nieselt, Isabel Gugel, and Martin U. Schuhmann. "Semantic segmentation of cerebrospinal fluid and brain volume with a convolutional neural network in pediatric hydrocephalus—transfer learning from existing algorithms." Acta Neurochirurgica 162, no. 10 (June 25, 2020): 2463–74. http://dx.doi.org/10.1007/s00701-020-04447-x.

Full text
Abstract:
Abstract Background For the segmentation of medical imaging data, a multitude of precise but very specific algorithms exist. In previous studies, we investigated the possibility of segmenting MRI data to determine cerebrospinal fluid and brain volume using a classical machine learning algorithm. It demonstrated good clinical usability and a very accurate correlation of the volumes to the single area determination in a reproducible axial layer. This study aims to investigate whether these established segmentation algorithms can be transferred to new, more generalizable deep learning algorithms employing an extended transfer learning procedure and whether medically meaningful segmentation is possible. Methods Ninety-five routinely performed true FISP MRI sequences were retrospectively analyzed in 43 patients with pediatric hydrocephalus. Using a freely available and clinically established segmentation algorithm based on a hidden Markov random field model, four classes of segmentation (brain, cerebrospinal fluid (CSF), background, and tissue) were generated. Fifty-nine randomly selected data sets (10,432 slices) were used as a training data set. Images were augmented for contrast, brightness, and random left/right and X/Y translation. A convolutional neural network (CNN) for semantic image segmentation composed of an encoder and corresponding decoder subnetwork was set up. The network was pre-initialized with layers and weights from a pre-trained VGG 16 model. Following the network was trained with the labeled image data set. A validation data set of 18 scans (3289 slices) was used to monitor the performance as the deep CNN trained. The classification results were tested on 18 randomly allocated labeled data sets (3319 slices) and on a T2-weighted BrainWeb data set with known ground truth. Results The segmentation of clinical test data provided reliable results (global accuracy 0.90, Dice coefficient 0.86), while the CNN segmentation of data from the BrainWeb data set showed comparable results (global accuracy 0.89, Dice coefficient 0.84). The segmentation of the BrainWeb data set with the classical FAST algorithm produced consistent findings (global accuracy 0.90, Dice coefficient 0.87). Likewise, the area development of brain and CSF in the long-term clinical course of three patients was presented. Conclusion Using the presented methods, we showed that conventional segmentation algorithms can be transferred to new advances in deep learning with comparable accuracy, generating a large number of training data sets with relatively little effort. A clinically meaningful segmentation possibility was demonstrated.
APA, Harvard, Vancouver, ISO, and other styles
9

Cira, Calimanut-Ionut, Ramón Alcarria, Miguel-Ángel Manso-Callejo, and Francisco Serradilla. "A Deep Learning-Based Solution for Large-Scale Extraction of the Secondary Road Network from High-Resolution Aerial Orthoimagery." Applied Sciences 10, no. 20 (October 17, 2020): 7272. http://dx.doi.org/10.3390/app10207272.

Full text
Abstract:
Secondary roads represent the largest part of the road network. However, due to the absence of clearly defined edges, presence of occlusions, and differences in widths, monitoring and mapping them represents a great effort for public administration. We believe that recent advancements in machine vision allow the extraction of these types of roads from high-resolution remotely sensed imagery and can enable the automation of the mapping operation. In this work, we leverage these advances and propose a deep learning-based solution capable of efficiently extracting the surface area of secondary roads at a large scale. The solution is based on hybrid segmentation models trained with high-resolution remote sensing imagery divided in tiles of 256 × 256 pixels and their correspondent segmentation masks, resulting in increases in performance metrics of 2.7–3.5% when compared to the original architectures. The best performing model achieved Intersection over Union and F1 scores of maximum 0.5790 and 0.7120, respectively, with a minimum loss of 0.4985 and was integrated on a web platform which handles the evaluation of large areas, the association of the semantic predictions with geographical coordinates, the conversion of the tiles’ format and the generation of geotiff results compatible with geospatial databases.
APA, Harvard, Vancouver, ISO, and other styles
10

Briechle, S., P. Krzystek, and G. Vosselman. "SEMANTIC LABELING OF ALS POINT CLOUDS FOR TREE SPECIES MAPPING USING THE DEEP NEURAL NETWORK POINTNET++." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W13 (June 5, 2019): 951–55. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w13-951-2019.

Full text
Abstract:
<p><strong>Abstract.</strong> Most methods for the mapping of tree species are based on the segmentation of single trees that are subsequently classified using a set of hand-crafted features and an appropriate classifier. The classification accuracy for coniferous and deciduous trees just using airborne laser scanning (ALS) data is only around 90% in case the geometric information of the point cloud is used. As deep neural networks (DNNs) have the ability to adaptively learn features from the underlying data, they have outperformed classic machine learning (ML) approaches on well-known benchmark datasets provided by the robotics, computer vision and remote sensing community. Though, tree species classification using deep learning (DL) procedures has been of minor research interest so far. Some studies have been conducted based on an extensive prior generation of images or voxels from the 3D raw data. Since innovative DNNs directly operate on irregular and unordered 3D point clouds on a large scale, the objective of this study is to exemplarily use PointNet++ for the semantic labeling of ALS point clouds to map deciduous and coniferous trees. The dataset for our experiments consists of ALS data from the Bavarian Forest National Park (366 trees/ha), only including spruces (coniferous) and beeches (deciduous). First, the training data were generated automatically using a classic feature-based Random Forest (RF) approach classifying coniferous trees (precision&amp;thinsp;=&amp;thinsp;93%, recall&amp;thinsp;=&amp;thinsp;80%) and deciduous trees (precision&amp;thinsp;=&amp;thinsp;82%, recall&amp;thinsp;=&amp;thinsp;92%). Second, PointNet++ was trained and subsequently evaluated using 80 randomly chosen test batches à 400&amp;thinsp;m<sup>2</sup>. The achieved per-point classification results after 163 training epochs for coniferous trees (precision&amp;thinsp;=&amp;thinsp;90%, recall&amp;thinsp;=&amp;thinsp;79%) and deciduous trees (precision&amp;thinsp;=&amp;thinsp;81%, recall&amp;thinsp;=&amp;thinsp;91%) are fairly high considering that only the geometry was included. Nevertheless, the classification results using PointNet++ are slightly lower than those of the baseline method using a RF classifier. Errors in the training data and occurring edge effects limited a better performance. Our first results demonstrate that the architecture of the 3D DNN PointNet++ can successfully be adapted to the semantic labeling of large ALS point clouds to map deciduous and coniferous trees. Future work will focus on the integration of additional features like i.e. the laser intensity, the surface normals and multispectral features into the DNN. Thus, a further improvement of the accuracy of the proposed approach is to be expected. Furthermore, the classification of numerous individual tree species based on pre-segmented single trees should be investigated.</p>
APA, Harvard, Vancouver, ISO, and other styles
11

Alokasi, Haneen, and Muhammad Bilal Ahmad. "Deep Learning-Based Frameworks for Semantic Segmentation of Road Scenes." Electronics 11, no. 12 (June 15, 2022): 1884. http://dx.doi.org/10.3390/electronics11121884.

Full text
Abstract:
Semantic segmentation using machine learning and computer vision techniques is one of the most popular topics in autonomous driving-related research. With the revolution of deep learning, the need for more efficient and accurate segmentation systems has increased. This paper presents a detailed review of deep learning-based frameworks used for semantic segmentation of road scenes, highlighting their architectures and tasks. It also discusses well-known standard datasets that evaluate semantic segmentation systems in addition to new datasets in the field. To overcome a lack of enough data required for the training process, data augmentation techniques and their experimental results are reviewed. Moreover, domain adaptation methods that have been deployed to transfer knowledge between different domains in order to reduce the domain gap are presented. Finally, this paper provides quantitative analysis and performance evaluation and discusses the results of different frameworks on the reviewed datasets and highlights future research directions in the field of semantic segmentation using deep learning.
APA, Harvard, Vancouver, ISO, and other styles
12

Sahu, M., and A. Ohri. "VECTOR MAP GENERATION FROM AERIAL IMAGERY USING DEEP LEARNING." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-2/W5 (May 29, 2019): 157–62. http://dx.doi.org/10.5194/isprs-annals-iv-2-w5-157-2019.

Full text
Abstract:
<p><strong>Abstract.</strong> We propose a simple yet efficient technique to leverage semantic segmentation model to extract and separate individual buildings in densely compacted areas using medium resolution satellite/UAV orthoimages. We adopted standard UNET architecture, additionally added batch normalization layer after every convolution, to label every pixel in the image. The result obtained is fed into proposed post-processing pipeline for separating connected binary blobs of buildings and converting it into GIS layer for further analysis as well as for generating 3D buildings. The proposed algorithm extracts building footprints from aerial images, transform semantic to instance map and convert it into GIS layers to generate 3D buildings. We integrated this method in Indshine’s cloud platform to speed up the process of digitization, generate automatic 3D models, and perform the geospatial analysis. Our network achieved &amp;sim;70% Dice coefficient for the segmentation process.</p>
APA, Harvard, Vancouver, ISO, and other styles
13

Belozerov, Ilya Andreevich, and Vladimir Anatolievich Sudakov. "Investigation of machine learning models for medical image segmentation." Keldysh Institute Preprints, no. 37 (2022): 1–15. http://dx.doi.org/10.20948/prepr-2022-37.

Full text
Abstract:
On the example of X-ray images of human lungs, the analysis and construction of models of semantic segmentation of computer vision is carried out. The paper explores various approaches to medical image processing, comparing methods for implementing deep learning models and evaluating them. 5 models of neural networks have been developed to perform the segmentation task, implemented using such well-known libraries as: TensorFlow and PyTorch. The model with the best performance can be used to build a system for automatic segmentation of various images of patients and calculate the characteristics of their organs.
APA, Harvard, Vancouver, ISO, and other styles
14

Knott, M., and R. Groenendijk. "TOWARDS MESH-BASED DEEP LEARNING FOR SEMANTIC SEGMENTATION IN PHOTOGRAMMETRY." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2021 (June 17, 2021): 59–66. http://dx.doi.org/10.5194/isprs-annals-v-2-2021-59-2021.

Full text
Abstract:
Abstract. This research is the first to apply MeshCNN – a deep learning model that is specifically designed for 3D triangular meshes – in the photogrammetry domain. We highlight the challenges that arise when applying a mesh-based deep learning model to a photogrammetric mesh, especially w.r.t. data set properties. We provide solutions on how to prepare a remotely sensed mesh for a machine learning task. The most notable pre-processing step proposed is a novel application of the Breadth-First Search algorithm for chunking a large mesh into computable pieces. Furthermore, this work extends MeshCNN such that photometric features based on the mesh texture are considered in addition to the geometric information. Experiments show that including color information improves the predictive performance of the model by a large margin. Besides, experimental results indicate that segmentation performance could be advanced substantially with the introduction of a high-quality benchmark for semantic segmentation on meshes.
APA, Harvard, Vancouver, ISO, and other styles
15

Matrone, Francesca, Eleonora Grilli, Massimo Martini, Marina Paolanti, Roberto Pierdicca, and Fabio Remondino. "Comparing Machine and Deep Learning Methods for Large 3D Heritage Semantic Segmentation." ISPRS International Journal of Geo-Information 9, no. 9 (September 7, 2020): 535. http://dx.doi.org/10.3390/ijgi9090535.

Full text
Abstract:
In recent years semantic segmentation of 3D point clouds has been an argument that involves different fields of application. Cultural heritage scenarios have become the subject of this study mainly thanks to the development of photogrammetry and laser scanning techniques. Classification algorithms based on machine and deep learning methods allow to process huge amounts of data as 3D point clouds. In this context, the aim of this paper is to make a comparison between machine and deep learning methods for large 3D cultural heritage classification. Then, considering the best performances of both techniques, it proposes an architecture named DGCNN-Mod+3Dfeat that combines the positive aspects and advantages of these two methodologies for semantic segmentation of cultural heritage point clouds. To demonstrate the validity of our idea, several experiments from the ArCH benchmark are reported and commented.
APA, Harvard, Vancouver, ISO, and other styles
16

Dallaqua, F. B. J. R., R. A. S. Rosa, B. Schultz, L. R. Faria, T. G. Rodrigues, C. G. Oliveira, M. E. J. Kieser, V. Malhotra, T. Dwyer, and D. S. Wolfe. "FOREST PLANTATION DETECTION THROUGH DEEP SEMANTIC SEGMENTATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2022 (May 30, 2022): 77–84. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2022-77-2022.

Full text
Abstract:
Abstract. Forest plantations play an important role ecologically, contribute to carbon sequestration and support billions of dollars of economic activity each year through sustainable forest management and forest sector value chains. As the global demand for forest products and services increases, the marketplace is seeking more reliable data on forest plantations. Remote sensing technologies allied with machine learning, and most recently deep learning techniques, provide valuable data for inventorying forest plantations and related valuation products. In this work, deep semantic segmentation with U-net architecture was used to detect forest plantation areas using Sentinel-2 and CBERS-4A images of different areas of Brazil. First, the U-net models were built from an area of the Centre-East of Paraná State, and then the best models were tested in 3 new areas that present different characteristics. The U-net models built with Sentinel-2 images achieved promising results for areas similar to the ones used in the training set, with F1-score ranging from 0.9171 to 0.9499 and with Kappa values between 0.8712 to 0.9272, demonstrating the feasibility of deep semantic segmentation to detect forest plantations.
APA, Harvard, Vancouver, ISO, and other styles
17

Du, Zhenrong, Jianyu Yang, Cong Ou, and Tingting Zhang. "Smallholder Crop Area Mapped with a Semantic Segmentation Deep Learning Method." Remote Sensing 11, no. 7 (April 11, 2019): 888. http://dx.doi.org/10.3390/rs11070888.

Full text
Abstract:
The growing population in China has led to an increasing importance of crop area (CA) protection. A powerful tool for acquiring accurate and up-to-date CA maps is automatic mapping using information extracted from high spatial resolution remote sensing (RS) images. RS image information extraction includes feature classification, which is a long-standing research issue in the RS community. Emerging deep learning techniques, such as the deep semantic segmentation network technique, are effective methods to automatically discover relevant contextual features and get better image classification results. In this study, we exploited deep semantic segmentation networks to classify and extract CA from high-resolution RS images. WorldView-2 (WV-2) images with only Red-Green-Blue (RGB) bands were used to confirm the effectiveness of the proposed semantic classification framework for information extraction and the CA mapping task. Specifically, we used the deep learning framework TensorFlow to construct a platform for sampling, training, testing, and classifying to extract and map CA on the basis of DeepLabv3+. By leveraging per-pixel and random sample point accuracy evaluation methods, we conclude that the proposed approach can efficiently obtain acceptable accuracy (Overall Accuracy = 95%, Kappa = 0.90) of CA classification in the study area, and the approach performs better than other deep semantic segmentation networks (U-Net/PspNet/SegNet/DeepLabv2) and traditional machine learning methods, such as Maximum Likelihood (ML), Support Vector Machine (SVM), and RF (Random Forest). Furthermore, the proposed approach is highly scalable for the variety of crop types in a crop area. Overall, the proposed approach can train a precise and effective model that is capable of adequately describing the small, irregular fields of smallholder agriculture and handling the great level of details in RGB high spatial resolution images.
APA, Harvard, Vancouver, ISO, and other styles
18

Yang, Su, Miaole Hou, and Songnian Li. "Three-Dimensional Point Cloud Semantic Segmentation for Cultural Heritage: A Comprehensive Review." Remote Sensing 15, no. 3 (January 17, 2023): 548. http://dx.doi.org/10.3390/rs15030548.

Full text
Abstract:
In the cultural heritage field, point clouds, as important raw data of geomatics, are not only three-dimensional (3D) spatial presentations of 3D objects but they also have the potential to gradually advance towards an intelligent data structure with scene understanding, autonomous cognition, and a decision-making ability. The approach of point cloud semantic segmentation as a preliminary stage can help to realize this advancement. With the demand for semantic comprehensibility of point cloud data and the widespread application of machine learning and deep learning approaches in point cloud semantic segmentation, there is a need for a comprehensive literature review covering the topics from the point cloud data acquisition to semantic segmentation algorithms with application strategies in cultural heritage. This paper first reviews the current trends of acquiring point cloud data of cultural heritage from a single platform with multiple sensors and multi-platform collaborative data fusion. Then, the point cloud semantic segmentation algorithms are discussed with their advantages, disadvantages, and specific applications in the cultural heritage field. These algorithms include region growing, model fitting, unsupervised clustering, supervised machine learning, and deep learning. In addition, we summarized the public benchmark point cloud datasets related to cultural heritage. Finally, the problems and constructive development trends of 3D point cloud semantic segmentation in the cultural heritage field are presented.
APA, Harvard, Vancouver, ISO, and other styles
19

Zhang, H., A. Gruen, and M. Li. "DEEP LEARNING FOR SEMANTIC SEGMENTATION OF CORAL IMAGES IN UNDERWATER PHOTOGRAMMETRY." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2022 (May 17, 2022): 343–50. http://dx.doi.org/10.5194/isprs-annals-v-2-2022-343-2022.

Full text
Abstract:
Abstract. Regular monitoring activities are important for assessing the influence of unfavourable factors on corals and tracking subsequent recovery or decline. Deep learning-based underwater photogrammetry provides a comprehensive solution for automatic large-scale and precise monitoring. It can quickly acquire a large range of underwater coral reef images, and extract information from these coral images through advanced image processing technology and deep learning methods. This procedure has three major components: (a) Generation of 3D models, (b) understanding of relevant corals in the images, and (c) tracking of those models over time and spatial change analysis. This paper focusses on issue (b), it applies five state-of-the-art neural networks to the semantic segmentation of coral images, compares their performance, and proposes a new coral semantic segmentation method. Finally, in order to quantitatively evaluate the performance of neural networks for semantic segmentation in these experiments, this paper uses mean class-wise Intersection over Union (mIoU), the most commonly used accuracy measure in semantic segmentation, as the standard metric. Meanwhile, considering that the coral boundary is very irregular and the evaluation index of IoU is not accurate enough, a new segmentation evaluation index based on boundary quality, Boundary IoU, is also used to evaluate the segmentation effect. The proposed trained network can accurately distinguish living from dead corals, which could reflect the health of the corals in the area of interest. The classification results show that we achieve state-of-the-art performance compared to other methods tested on the dataset provided in this paper on underwater coral images.
APA, Harvard, Vancouver, ISO, and other styles
20

Bhatnagar, Saheba, Laurence Gill, and Bidisha Ghosh. "Drone Image Segmentation Using Machine and Deep Learning for Mapping Raised Bog Vegetation Communities." Remote Sensing 12, no. 16 (August 12, 2020): 2602. http://dx.doi.org/10.3390/rs12162602.

Full text
Abstract:
The application of drones has recently revolutionised the mapping of wetlands due to their high spatial resolution and the flexibility in capturing images. In this study, the drone imagery was used to map key vegetation communities in an Irish wetland, Clara Bog, for the spring season. The mapping, carried out through image segmentation or semantic segmentation, was performed using machine learning (ML) and deep learning (DL) algorithms. With the aim of identifying the most appropriate, cost-efficient, and accurate segmentation method, multiple ML classifiers and DL models were compared. Random forest (RF) was identified as the best pixel-based ML classifier, which provided good accuracy (≈85%) when used in conjunction graph cut algorithm for image segmentation. Amongst the DL networks, a convolutional neural network (CNN) architecture in a transfer learning framework was utilised. A combination of ResNet50 and SegNet architecture gave the best semantic segmentation results (≈90%). The high accuracy of DL networks was accompanied with significantly larger labelled training dataset, computation time and hardware requirements compared to ML classifiers with slightly lower accuracy. For specific applications such as wetland mapping where networks are required to be trained for each different site, topography, season, and other atmospheric conditions, ML classifiers proved to be a more pragmatic choice.
APA, Harvard, Vancouver, ISO, and other styles
21

Neshov, Nikolay, Agata Manolova, Krasimir Tonchev, and Antoni Ivanov. "SUPPORTING BUSINESS MODEL INNOVATION BASED ON DEEP LEARNING SCENE SEMANTIC SEGMENTATION." Indian Journal of Computer Science and Engineering 11, no. 6 (December 20, 2020): 962–68. http://dx.doi.org/10.21817/indjcse/2020/v11i6/201106207.

Full text
Abstract:
The capacity to create innovative Business Models (BM) has become the foundation for numerous businesses. Business Model Innovation (BMI) grows more significant as digitalization influences our everyday lives and prompts the development of better approaches for working, imparting and collaborating in this computerized universe of Industry 4.0. In this paper we present a conceptual architecture which can be applied in the modern video-conference systems with the help of semantic segmentation. The scene represents an environment, intended for discussion of ideas in business modeling. The semantic segmentation allows each pixel of an image (or video) from the scene to be related or classified to a specific type of object. In this way it is possible to interpret the description of a scene by the machine. Thus, with the help of the proposed architecture, the processes taking place between objects and people in the surrounding environment can be analyzed for the purpose of digitization of BMI by modelling human behavior and cognitive processes into logical expressions that can be digitized and automated. The semantic segmentation is considered as a basic element in this type of interaction. We demonstrate the effectiveness of our algorithm in with real data examples.
APA, Harvard, Vancouver, ISO, and other styles
22

Shi, Lijuan, Guoying Wang, Lufeng Mo, Xiaomei Yi, Xiaoping Wu, and Peng Wu. "Automatic Segmentation of Standing Trees from Forest Images Based on Deep Learning." Sensors 22, no. 17 (September 3, 2022): 6663. http://dx.doi.org/10.3390/s22176663.

Full text
Abstract:
Semantic segmentation of standing trees is important to obtain factors of standing trees from images automatically and effectively. Aiming at the accurate segmentation of multiple standing trees in complex backgrounds, some traditional methods have shortcomings such as low segmentation accuracy and manual intervention. To achieve accurate segmentation of standing tree images effectively, SEMD, a lightweight network segmentation model based on deep learning, is proposed in this article. DeepLabV3+ is chosen as the base framework to perform multi-scale fusion of the convolutional features of the standing trees in images, so as to reduce the loss of image edge details during the standing tree segmentation and reduce the loss of feature information. MobileNet, a lightweight network, is integrated into the backbone network to reduce the computational complexity. Furthermore, SENet, an attention mechanism, is added to obtain the feature information efficiently and suppress the generation of useless feature information. The extensive experimental results show that using the SEMD model the MIoU of the semantic segmentation of standing tree images of different varieties and categories under simple and complex backgrounds reaches 91.78% and 86.90%, respectively. The lightweight network segmentation model SEMD based on deep learning proposed in this paper can solve the problem of multiple standing trees segmentation with high accuracy.
APA, Harvard, Vancouver, ISO, and other styles
23

Sable, Piyush. "Designing Image Based Captcha using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 30, 2021): 3678–82. http://dx.doi.org/10.22214/ijraset.2021.35532.

Full text
Abstract:
Captchas, or Completely Automated Public Turing Tests to Tell Computers and Humans Apart, were created in response to programmers' ability to breach computer networks via computer attack programmes and bots. Because of its ease of development and use, the Text Captcha is the most well-known Captcha scheme. Hackers and programmers, on the other hand, have weakened the assumed security of Captchas, leaving websites vulnerable to assault. Text Captchas are still widely used since it is assumed that the attack speeds are moderate, typically two to five seconds for each image, and that this is not considered a significant concern. Style Area Captcha (SACaptcha) is a revolutionary image-based Captcha suggested in this paper, which relies on semantic data comprehension, pixel-level segmentation, and deep learning approaches. The suggested SACaptcha highlights the creation of image-based Captchas utilising deep learning techniques for boosting the security purpose, demonstrating that text Captchas are no longer secure.
APA, Harvard, Vancouver, ISO, and other styles
24

Majidizadeh, A., H. Hasani, and M. Jafari. "SEMANTIC SEGMENTATION OF UAV IMAGES BASED ON U-NET IN URBAN AREA." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences X-4/W1-2022 (January 14, 2023): 451–57. http://dx.doi.org/10.5194/isprs-annals-x-4-w1-2022-451-2023.

Full text
Abstract:
Abstract. Semantic segmentation of aerial data has been one of the leading researches in the field of photogrammetry, remote sensing, and computer vision in recent years. Many applications, including airborne mapping of urban scenes, object positioning in aerial images, automatic extraction of buildings from remote sensing or high-resolution aerial images, etc., require accurate and efficient segmentation algorithms. According to the high potential of deep learning algorithms in the classification of complex scenes, this paper aims to train a deep learning model to evaluate the semantic segmentation accuracy of UAV-based images in urban areas. The proposed method implements a deep learning framework based on the U-Net encoder-decoder architecture, which extracts and classifies features through layers of convolution, max pooling, activation, and concatenation in an end-to-end process. The obtained results compare with two traditional machine learning models, Random Forest (RF) and Multi-Layer Perceptron (MLP). They rely on two steps that involve extracting features and classifying images. In this study, the experiments are performed on the UAVid2020 semantic segmentation dataset from the ISPRS database. Results show the effectiveness of the proposed deep learning framework, so that the U-Net architecture achieved the best results with 75.15% overall accuracy, compared to RF and MLP algorithms with 52.51% and 54.65% overall accuracy, respectively.
APA, Harvard, Vancouver, ISO, and other styles
25

Boonpook, Wuttichai, Yumin Tan, Attawut Nardkulpat, Kritanai Torsri, Peerapong Torteeka, Patcharin Kamsing, Utane Sawangwit, Jose Pena, and Montri Jainaen. "Deep Learning Semantic Segmentation for Land Use and Land Cover Types Using Landsat 8 Imagery." ISPRS International Journal of Geo-Information 12, no. 1 (January 7, 2023): 14. http://dx.doi.org/10.3390/ijgi12010014.

Full text
Abstract:
Using deep learning semantic segmentation for land use extraction is the most challenging problem in medium spatial resolution imagery. This is because of the deep convolution layer and multiple levels of deep steps of the baseline network, which can cause a degradation problem in small land use features. In this paper, a deep learning semantic segmentation algorithm which comprises an adjustment network architecture (LoopNet) and land use dataset is proposed for automatic land use classification using Landsat 8 imagery. The experimental results illustrate that deep learning semantic segmentation using the baseline network (SegNet, U-Net) outperforms pixel-based machine learning algorithms (MLE, SVM, RF) for land use classification. Furthermore, the LoopNet network, which comprises a convolutional loop and convolutional block, is superior to other baseline networks (SegNet, U-Net, PSPnet) and improvement networks (ResU-Net, DeeplabV3+, U-Net++), with 89.84% overall accuracy and good segmentation results. The evaluation of multispectral bands in the land use dataset demonstrates that Band 5 has good performance in terms of extraction accuracy, with 83.91% overall accuracy. Furthermore, the combination of different spectral bands (Band 1–Band 7) achieved the highest accuracy result (89.84%) compared to individual bands. These results indicate the effectiveness of LoopNet and multispectral bands for land use classification using Landsat 8 imagery.
APA, Harvard, Vancouver, ISO, and other styles
26

Murtiyoso, A., C. Lhenry, T. Landes, P. Grussenmeyer, and E. Alby. "SEMANTIC SEGMENTATION FOR BUILDING FAÇADE 3D POINT CLOUD FROM 2D ORTHOPHOTO IMAGES USING TRANSFER LEARNING." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2021 (June 28, 2021): 201–6. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2021-201-2021.

Full text
Abstract:
Abstract. The task of semantic segmentation is an important one in the context of 3D building modelling. Indeed, developments in 3D generation techniques have rendered the point cloud ubiquitous. However pure data acquisition only captures geometric information and semantic classification remains to be performed, often manually, in order to give a tangible sense to the 3D data. Recently progress in computing power also opened the way for massive application of deep learning methods, including for semantic segmentation purposes. Although well established in the processing of 2D images, deep learning solutions remain an open question for 3D data. In this study, we aim to benefit from the vastly more developed 2D semantic segmentation by performing transfer learning on a photogrammetric orthoimage. The neural network was trained using labelled and rectified images of building façades. Another programme was then written to permit the passage between 2D orthoimage and 3D point cloud. Results show that the approach worked well and presents an alternative to help the automation process for point cloud semantic segmentation, at least in the case of photogrammetric data.
APA, Harvard, Vancouver, ISO, and other styles
27

Najjaj, C., H. Rhinane, and A. Hilali. "DEEP LEARNING APPROACH FOR URBAN MAPPING." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVI-4/W3-2021 (January 11, 2022): 261–66. http://dx.doi.org/10.5194/isprs-archives-xlvi-4-w3-2021-261-2022.

Full text
Abstract:
Abstract. Researchers in computer vision and machine learning are becoming increasingly interested in image semantic segmentation. Many methods based on convolutional neural networks (CNNs) have been proposed and have made considerable progress in the building extraction mission. This other methods can result in suboptimal segmentation outcomes. Recently, to extract buildings with a great precision, we propose a model which can recognize all the buildings and present them in mask with white and the other classes in black. This developed network, which is based on U-Net, will boost the model's sensitivity. This paper provides a deep learning approach for building detection on satellite imagery applied in Casablanca city, Firstly, to begin we describe the terminology of this field. Next, the main datasets exposed in this project which’s 1000 satellite imagery. Then, we train the model UNET for 25 epochs on the training and validation datasets and testing the pretrained weight model with some unseen satellite images. Finally, the experimental results show that the proposed model offers good performance obtained as a binary mask that extract all the buildings in the region of Casablanca with a higher accuracy and entirety to achieve an average F1 score on test data of 0.91.
APA, Harvard, Vancouver, ISO, and other styles
28

Laxmaiah, Dr Bagam. "Image Style Transfer Using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (June 30, 2022): 1187–90. http://dx.doi.org/10.22214/ijraset.2022.43990.

Full text
Abstract:
Abstract: The principle of Image style transfer is to define two distance functions, one that describes the content image and the other that describes the style Image. By using these content and Style Images[5][6] as inputs we will begetting the desired output which has the content image merged with style image. The output will be in the graphical model of the content image. In summary, we’ll take the base input image, a content image that we want to match, the style image that we want to match by undergoing the process of convolutional neural network [7] firstly the content image is undergoing the process of content loss and style image as style loss after content loss and style loss it will undergo the process of gram matrix and the final image will be formed. Keywords: Content Image, Style Image, Content Loss, Style Loss, Convolutional Neural Networks, Gram Matrix Deep Lab Semantic Segmentation.
APA, Harvard, Vancouver, ISO, and other styles
29

Matrone, F., A. Lingua, R. Pierdicca, E. S. Malinverni, M. Paolanti, E. Grilli, F. Remondino, A. Murtiyoso, and T. Landes. "A BENCHMARK FOR LARGE-SCALE HERITAGE POINT CLOUD SEMANTIC SEGMENTATION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2020 (August 14, 2020): 1419–26. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2020-1419-2020.

Full text
Abstract:
Abstract. The lack of benchmarking data for the semantic segmentation of digital heritage scenarios is hampering the development of automatic classification solutions in this field. Heritage 3D data feature complex structures and uncommon classes that prevent the simple deployment of available methods developed in other fields and for other types of data. The semantic classification of heritage 3D data would support the community in better understanding and analysing digital twins, facilitate restoration and conservation work, etc. In this paper, we present the first benchmark with millions of manually labelled 3D points belonging to heritage scenarios, realised to facilitate the development, training, testing and evaluation of machine and deep learning methods and algorithms in the heritage field. The proposed benchmark, available at http://archdataset.polito.it/, comprises datasets and classification results for better comparisons and insights into the strengths and weaknesses of different machine and deep learning approaches for heritage point cloud semantic segmentation, in addition to promoting a form of crowdsourcing to enrich the already annotated database.
APA, Harvard, Vancouver, ISO, and other styles
30

Arsiwala-Scheppach, Lubaina T., Akhilanand Chaurasia, Anne Müller, Joachim Krois, and Falk Schwendicke. "Machine Learning in Dentistry: A Scoping Review." Journal of Clinical Medicine 12, no. 3 (January 25, 2023): 937. http://dx.doi.org/10.3390/jcm12030937.

Full text
Abstract:
Machine learning (ML) is being increasingly employed in dental research and application. We aimed to systematically compile studies using ML in dentistry and assess their methodological quality, including the risk of bias and reporting standards. We evaluated studies employing ML in dentistry published from 1 January 2015 to 31 May 2021 on MEDLINE, IEEE Xplore, and arXiv. We assessed publication trends and the distribution of ML tasks (classification, object detection, semantic segmentation, instance segmentation, and generation) in different clinical fields. We appraised the risk of bias and adherence to reporting standards, using the QUADAS-2 and TRIPOD checklists, respectively. Out of 183 identified studies, 168 were included, focusing on various ML tasks and employing a broad range of ML models, input data, data sources, strategies to generate reference tests, and performance metrics. Classification tasks were most common. Forty-two different metrics were used to evaluate model performances, with accuracy, sensitivity, precision, and intersection-over-union being the most common. We observed considerable risk of bias and moderate adherence to reporting standards which hampers replication of results. A minimum (core) set of outcome and outcome metrics is necessary to facilitate comparisons across studies.
APA, Harvard, Vancouver, ISO, and other styles
31

Xu, Xinying, Guiqing Li, Gang Xie, Jinchang Ren, and Xinlin Xie. "Weakly Supervised Deep Semantic Segmentation Using CNN and ELM with Semantic Candidate Regions." Complexity 2019 (March 14, 2019): 1–12. http://dx.doi.org/10.1155/2019/9180391.

Full text
Abstract:
The task of semantic segmentation is to obtain strong pixel-level annotations for each pixel in the image. For fully supervised semantic segmentation, the task is achieved by a segmentation model trained using pixel-level annotations. However, the pixel-level annotation process is very expensive and time-consuming. To reduce the cost, the paper proposes a semantic candidate regions trained extreme learning machine (ELM) method with image-level labels to achieve pixel-level labels mapping. In this work, the paper casts the pixel mapping problem into a candidate region semantic inference problem. Specifically, after segmenting each image into a set of superpixels, superpixels are automatically combined to achieve segmentation of candidate region according to the number of image-level labels. Semantic inference of candidate regions is realized based on the relationship and neighborhood rough set associated with semantic labels. Finally, the paper trains the ELM using the candidate regions of the inferred labels to classify the test candidate regions. The experiment is verified on the MSRC dataset and PASCAL VOC 2012, which are popularly used in semantic segmentation. The experimental results show that the proposed method outperforms several state-of-the-art approaches for deep semantic segmentation.
APA, Harvard, Vancouver, ISO, and other styles
32

Li, Dan, Chuda Xiao, Yang Liu, Zhuo Chen, Haseeb Hassan, Liyilei Su, Jun Liu, et al. "Deep Segmentation Networks for Segmenting Kidneys and Detecting Kidney Stones in Unenhanced Abdominal CT Images." Diagnostics 12, no. 8 (July 23, 2022): 1788. http://dx.doi.org/10.3390/diagnostics12081788.

Full text
Abstract:
Recent breakthroughs of deep learning algorithms in medical imaging, automated detection, and segmentation techniques for renal (kidney) in abdominal computed tomography (CT) images have been limited. Radiomics and machine learning analyses of renal diseases rely on the automatic segmentation of kidneys in CT images. Inspired by this, our primary aim is to utilize deep semantic segmentation learning models with a proposed training scheme to achieve precise and accurate segmentation outcomes. Moreover, this work aims to provide the community with an open-source, unenhanced abdominal CT dataset for training and testing the deep learning segmentation networks to segment kidneys and detect kidney stones. Five variations of deep segmentation networks are trained and tested both dependently (based on the proposed training scheme) and independently. Upon comparison, the models trained with the proposed training scheme enable the highly accurate 2D and 3D segmentation of kidneys and kidney stones. We believe this work is a fundamental step toward AI-driven diagnostic strategies, which can be an essential component of personalized patient care and improved decision-making in treating kidney diseases.
APA, Harvard, Vancouver, ISO, and other styles
33

Deng, Chunyuan, Zhenyun Peng, Zhencheng Chen, and Ruixing Chen. "Point Cloud Deep Learning Network Based on Balanced Sampling and Hybrid Pooling." Sensors 23, no. 2 (January 14, 2023): 981. http://dx.doi.org/10.3390/s23020981.

Full text
Abstract:
The automatic semantic segmentation of point cloud data is important for applications in the fields of machine vision, virtual reality, and smart cities. The processing capability of the point cloud segmentation method with PointNet++ as the baseline needs to be improved for extremely imbalanced point cloud scenes. To address this problem, in this study, we designed a weighted sampling method based on farthest point sampling (FPS), which adjusts the sampling weight value according to the loss value of the model to equalize the sampling process. We also introduced the relational learning of the neighborhood space of the sampling center point in the feature encoding process, where the feature importance is distinguished by using a self-attention model. Finally, the global–local features were aggregated and transmitted using the hybrid pooling method. The experimental results of the six-fold crossover experiment showed that on the S3DIS semantic segmentation dataset, the proposed network achieved 9.5% and 11.6% improvement in overall point-wise accuracy (OA) and mean of class-wise intersection over union (MIoU), respectively, compared with the baseline. On the Vaihingen dataset, the proposed network achieved 4.2% and 3.9% improvement in OA and MIoU, respectively, compared with the baseline. Compared with the segmentation results of other network models on public datasets, our algorithm achieves a good balance between OA and MIoU.
APA, Harvard, Vancouver, ISO, and other styles
34

Yekeen, S. T., and A. L. Balogun. "AUTOMATED MARINE OIL SPILL DETECTION USING DEEP LEARNING INSTANCE SEGMENTATION MODEL." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2020 (August 21, 2020): 1271–76. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2020-1271-2020.

Full text
Abstract:
Abstract. This study developed a novel deep learning oil spill instance segmentation model using Mask-Region-based Convolutional Neural Network (Mask R-CNN) model which is a state-of-the-art computer vision model. A total of 2882 imageries containing oil spill, look-alike, ship, and land area after conducting different pre-processing activities were acquired. These images were subsequently sub-divided into 88% training and 12% for testing, equating to 2530 and 352 images respectively. The model training was conducted using transfer learning on a pre-trained ResNet 101 with COCO data as a backbone in combination with Feature Pyramid Network (FPN) architecture for the extraction of features at 30 epochs with 0.001 learning rate. The model’s performance was evaluated using precision, recall, and F1-measure which shows a higher performance than other existing models with value of 0.964, 0.969 and 0.968 respectively. As a specialized task, the study concluded that the developed deep learning instance segmentation model (Mask R-CNN) performs better than conventional machine learning models and semantic segmentation deep learning models in detection and segmentation of marine oil spill.
APA, Harvard, Vancouver, ISO, and other styles
35

Li, Jun, Chengjie Niu, and Kai Xu. "Learning Part Generation and Assembly for Structure-Aware Shape Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11362–69. http://dx.doi.org/10.1609/aaai.v34i07.6798.

Full text
Abstract:
Learning powerful deep generative models for 3D shape synthesis is largely hindered by the difficulty in ensuring plausibility encompassing correct topology and reasonable geometry. Indeed, learning the distribution of plausible 3D shapes seems a daunting task for the holistic approaches, given the significant topological variations of 3D objects even within the same category. Enlightened by the fact that 3D shape structure is characterized as part composition and placement, we propose to model 3D shape variations with a part-aware deep generative network, coined as PAGENet. The network is composed of an array of per-part VAE-GANs, generating semantic parts composing a complete shape, followed by a part assembly module that estimates a transformation for each part to correlate and assemble them into a plausible structure. Through delegating the learning of part composition and part placement into separate networks, the difficulty of modeling structural variations of 3D shapes is greatly reduced. We demonstrate through both qualitative and quantitative evaluations that PAGENet generates 3D shapes with plausible, diverse and detailed structure, and show two applications, i.e., semantic shape segmentation and part-based shape editing.
APA, Harvard, Vancouver, ISO, and other styles
36

Wei, Yuhai, Wu Wei, and Yangbiao Zhang. "EfferDeepNet: An Efficient Semantic Segmentation Method for Outdoor Terrain." Machines 11, no. 2 (February 9, 2023): 256. http://dx.doi.org/10.3390/machines11020256.

Full text
Abstract:
The recognition of terrain and outdoor complex environments based on vision sensors is a key technology in practical robotics applications, and forms the basis of autonomous navigation and motion planning. While traditional machine learning methods can be applied to outdoor terrain recognition, their recognition accuracy is low. In order to improve the accuracy of outdoor terrain recognition, methods based on deep learning are widely used. However, the network structure of deep learning methods is very complex, and the number of parameters is large, which cannot meet the actual operating requirements of of unmanned systems. Therefore, in order to solve the problems of poor real-time performance and low accuracy of deep learning algorithms for terrain recognition, this paper proposes the efficient EfferDeepNet network for pixel level terrain recognition in order to realize global perception of outdoor environment. First, this method uses convolution kernels with different sizes in the depthwise separable convolution (DSC) stage to extract more semantic feature information. Then, an attention mechanism is introduced to weight the acquired features, focusing on the key local feature areas. Finally, in order to avoid redundancy due to a large number of features and parameters in the model, this method uses a ghost module to make the network more lightweight. In addition, to solve the problem of pixel level terrain recognition having a negative effect on image boundary segmentation, the proposed method integrates an enhanced feature extraction network. Experimental results show that the proposed EfferDeepNet network can quickly and accurately perform global recognition and semantic segmentation of terrain in complex environments.
APA, Harvard, Vancouver, ISO, and other styles
37

Wang, Yucheng, Jinya Su, Xiaojun Zhai, Fanlin Meng, and Cunjia Liu. "Snow Coverage Mapping by Learning from Sentinel-2 Satellite Multispectral Images via Machine Learning Algorithms." Remote Sensing 14, no. 3 (February 8, 2022): 782. http://dx.doi.org/10.3390/rs14030782.

Full text
Abstract:
Snow coverage mapping plays a vital role not only in studying hydrology and climatology, but also in investigating crop disease overwintering for smart agriculture management. This work investigates snow coverage mapping by learning from Sentinel-2 satellite multispectral images via machine-learning methods. To this end, the largest dataset for snow coverage mapping (to our best knowledge) with three typical classes (snow, cloud and background) is first collected and labeled via the semi-automatic classification plugin in QGIS. Then, both random forest-based conventional machine learning and U-Net-based deep learning are applied to the semantic segmentation challenge in this work. The effects of various input band combinations are also investigated so that the most suitable one can be identified. Experimental results show that (1) both conventional machine-learning and advanced deep-learning methods significantly outperform the existing rule-based Sen2Cor product for snow mapping; (2) U-Net generally outperforms the random forest since both spectral and spatial information is incorporated in U-Net via convolution operations; (3) the best spectral band combination for U-Net is B2, B11, B4 and B9. It is concluded that a U-Net-based deep-learning classifier with four informative spectral bands is suitable for snow coverage mapping.
APA, Harvard, Vancouver, ISO, and other styles
38

Salem, Danny, Yifeng Li, Pengcheng Xi, Hilary Phenix, Miroslava Cuperlovic-Culf, and Mads Kærn. "YeastNet: Deep-Learning-Enabled Accurate Segmentation of Budding Yeast Cells in Bright-Field Microscopy." Applied Sciences 11, no. 6 (March 17, 2021): 2692. http://dx.doi.org/10.3390/app11062692.

Full text
Abstract:
Accurate and efficient segmentation of live-cell images is critical in maximizing data extraction and knowledge generation from high-throughput biology experiments. Despite recent development of deep-learning tools for biomedical imaging applications, great demand for automated segmentation tools for high-resolution live-cell microscopy images remains in order to accelerate the analysis. YeastNet dramatically improves the performance of the non-trainable classic algorithm, and performs considerably better than the current state-of-the-art yeast-cell segmentation tools. We have designed and trained a U-Net convolutional network (named YeastNet) to conduct semantic segmentation on bright-field microscopy images and generate segmentation masks for cell labeling and tracking. YeastNet enables accurate automatic segmentation and tracking of yeast cells in biomedical applications. YeastNet is freely provided with model weights as a Python package on GitHub.
APA, Harvard, Vancouver, ISO, and other styles
39

Huang, H. S., S. J. Tang, W. X. Wang, X. M. Li, and R. Z. Guo. "FROM BIM TO POINTCLOUD: AUTOMATIC GENERATION OF LABELED INDOOR POINTCLOUD." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B5-2022 (June 2, 2022): 73–78. http://dx.doi.org/10.5194/isprs-archives-xliii-b5-2022-73-2022.

Full text
Abstract:
Abstract. With the development of deep learning technology, a large number of indoor spatial applications, such as robotics and indoor navigation, have raised higher data requirements for indoor semantic model. However, creating deep learning classifiers requires a large number of labeled datasets, and the collection of such datasets requires a lot of manually labeling proces, which is labor-intensive and time-consuming. In this paper, we propose a method to automatically create 3D point clouds datasets with indoor semantic labels based on parametric BIM model. First, a automatic BIM generation method is proposed through simulating the structure of interior space Secondly, we use a viewpoint-guided labeled point cloud generation method to generate synthetic 3D point clouds with different labels, color information. Especially, noise are also simulated with a gaussian model. As shown in the experiments, the point cloud data with labels can be quickly obtained from existing BIM models, which will largely reduce the complexity of data labeling and improve efficiency. These simulated data can be used in the deep learning training process and improve the semantic segmentation accuracy.
APA, Harvard, Vancouver, ISO, and other styles
40

Hsieh, Chia-Sheng, and Xiang-Jie Ruan. "Automated Semantic Segmentation of Indoor Point Clouds from Close-Range Images with Three-Dimensional Deep Learning." Buildings 13, no. 2 (February 9, 2023): 468. http://dx.doi.org/10.3390/buildings13020468.

Full text
Abstract:
The creation of building information models requires acquiring real building conditions. The generation of a three-dimensional (3D) model from 3D point clouds involves classification, outline extraction, and boundary regularization for semantic segmentation. The number of 3D point clouds generated using close-range images is smaller and tends to be unevenly distributed, which is not conducive to automated modeling processing. In this paper, we propose an efficient solution for the semantic segmentation of indoor point clouds from close-range images. A 3D deep learning framework that achieves better results is further proposed. A dynamic graph convolutional neural network (DGCNN) 3D deep learning method is used in this study. This method was selected to learn point cloud semantic features. Moreover, more efficient operations can be designed to build a module for extracting point cloud features such that the problem of inadequate beam and column classification can be resolved. First, DGCNN is applied to learn and classify the indoor point cloud into five categories: columns, beams, walls, floors, and ceilings. Then, the proposed semantic segmentation and modeling method is utilized to obtain the geometric parameters of each object to be integrated into building information modeling software. The experimental results show that the overall accuracy rates of the three experimental sections of Area_1 in the Stanford 3D semantic dataset test results are 86.9%, 97.4%, and 92.5%. The segmentation accuracy of corridor 2F in a civil engineering building is 94.2%. In comparing the length with the actual on-site measurement, the root mean square error is found to be ±0.03 m. The proposed method is demonstrated to be capable of automatic semantic segmentation from 3D point clouds with indoor close-range images.
APA, Harvard, Vancouver, ISO, and other styles
41

Lin, Mengchen, Guidong Bao, Xiaoqian Sang, and Yunfeng Wu. "Recent Advanced Deep Learning Architectures for Retinal Fluid Segmentation on Optical Coherence Tomography Images." Sensors 22, no. 8 (April 15, 2022): 3055. http://dx.doi.org/10.3390/s22083055.

Full text
Abstract:
With non-invasive and high-resolution properties, optical coherence tomography (OCT) has been widely used as a retinal imaging modality for the effective diagnosis of ophthalmic diseases. The retinal fluid is often segmented by medical experts as a pivotal biomarker to assist in the clinical diagnosis of age-related macular diseases, diabetic macular edema, and retinal vein occlusion. In recent years, the advanced machine learning methods, such as deep learning paradigms, have attracted more and more attention from academia in the retinal fluid segmentation applications. The automatic retinal fluid segmentation based on deep learning can improve the semantic segmentation accuracy and efficiency of macular change analysis, which has potential clinical implications for ophthalmic pathology detection. This article summarizes several different deep learning paradigms reported in the up-to-date literature for the retinal fluid segmentation in OCT images. The deep learning architectures include the backbone of convolutional neural network (CNN), fully convolutional network (FCN), U-shape network (U-Net), and the other hybrid computational methods. The article also provides a survey on the prevailing OCT image datasets used in recent retinal segmentation investigations. The future perspectives and some potential retinal segmentation directions are discussed in the concluding context.
APA, Harvard, Vancouver, ISO, and other styles
42

Nisa, Mehrun, Saeed Ahmad Buzdar, Khalil Khan, and Muhammad Saeed Ahmad. "Deep Convolutional Neural Network Based Analysis of Liver Tissues Using Computed Tomography Images." Symmetry 14, no. 2 (February 15, 2022): 383. http://dx.doi.org/10.3390/sym14020383.

Full text
Abstract:
Liver disease is one of the most prominent causes of the increase in the death rate worldwide. These death rates can be reduced by early liver diagnosis. Computed tomography (CT) is a method for the analysis of liver images in clinical practice. To analyze a large number of liver images, radiologists face problems that sometimes lead to the wrong classifications of liver diseases, eventually resulting in severe conditions, such as liver cancer. Thus, a machine-learning-based method is needed to classify such problems based on their texture features. This paper suggests two different kinds of algorithms to address this challenging task of liver disease classification. Our first method, which is based on conventional machine learning, uses texture features for classification. This method uses conventional machine learning through automated texture analysis and supervised machine learning methods. For this purpose, 3000 clinically verified CT image samples were obtained from 71 patients. Appropriate image classes belonging to the same disease were trained to confirm the abnormalities in liver tissues by using supervised learning methods. Our proposed method correctly quantified asymmetric patterns in CT images using machine learning. We evaluated the effectiveness of the feature vector with the K Nearest Neighbor (KNN), Naive Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF) classifiers. The second algorithm proposes a semantic segmentation model for liver disease identification. Our model is based on semantic image segmentation (SIS) using a convolutional neural network (CNN). The model encodes high-density maps through a specific guided attention method. The trained model classifies CT images into five different categories of various diseases. The compelling results obtained confirm the effectiveness of the proposed model. The study concludes that abnormalities in the human liver could be discriminated and diagnosed by texture analysis techniques, which may also assist radiologists and medical physicists in predicting the severity and proliferation of abnormalities in liver diseases.
APA, Harvard, Vancouver, ISO, and other styles
43

Niu, Zijie, Juntao Deng, Xu Zhang, Jun Zhang, Shijia Pan, and Haotian Mu. "Identifying the Branch of Kiwifruit Based on Unmanned Aerial Vehicle (UAV) Images Using Deep Learning Method." Sensors 21, no. 13 (June 29, 2021): 4442. http://dx.doi.org/10.3390/s21134442.

Full text
Abstract:
It is important to obtain accurate information about kiwifruit vines to monitoring their physiological states and undertake precise orchard operations. However, because vines are small and cling to trellises, and have branches laying on the ground, numerous challenges exist in the acquisition of accurate data for kiwifruit vines. In this paper, a kiwifruit canopy distribution prediction model is proposed on the basis of low-altitude unmanned aerial vehicle (UAV) images and deep learning techniques. First, the location of the kiwifruit plants and vine distribution are extracted from high-precision images collected by UAV. The canopy gradient distribution maps with different noise reduction and distribution effects are generated by modifying the threshold and sampling size using the resampling normalization method. The results showed that the accuracies of the vine segmentation using PSPnet, support vector machine, and random forest classification were 71.2%, 85.8%, and 75.26%, respectively. However, the segmentation image obtained using depth semantic segmentation had a higher signal-to-noise ratio and was closer to the real situation. The average intersection over union of the deep semantic segmentation was more than or equal to 80% in distribution maps, whereas, in traditional machine learning, the average intersection was between 20% and 60%. This indicates the proposed model can quickly extract the vine distribution and plant position, and is thus able to perform dynamic monitoring of orchards to provide real-time operation guidance.
APA, Harvard, Vancouver, ISO, and other styles
44

Lin, Y., G. Vosselman, Y. Cao, and M. Y. Yang. "EFFICIENT TRAINING OF SEMANTIC POINT CLOUD SEGMENTATION VIA ACTIVE LEARNING." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2020 (August 3, 2020): 243–50. http://dx.doi.org/10.5194/isprs-annals-v-2-2020-243-2020.

Full text
Abstract:
Abstract. With the development of LiDAR and photogrammetric techniques, more and more point clouds are available with high density and in large areas. Point cloud interpretation is an important step before many real applications like 3D city modelling. Many supervised machine learning techniques have been adapted to semantic point cloud segmentation, aiming to automatically label point clouds. Current deep learning methods have shown their potentials to produce high accuracy in semantic point cloud segmentation tasks. However, these supervised methods require a large amount of labelled data for proper model performance and good generalization. In practice, manual labelling of point clouds is very expensive and time-consuming. Active learning can iteratively select unlabelled samples for manual annotation based on current statistical models and then update the labelled data pool for next model training. In order to effectively label point clouds, we proposed a segment based active learning strategy to assess the informativeness of samples. Here, the proposed strategy uses 40% of the whole training dataset to achieve a mean IoU of 75.2% which is 99.1% of the accuracy in mIoU obtained from the model trained on the full dataset, while the baseline method using same amount of data only reaches 69.6% in mIoU corresponding to 90.9% of the accuracy in mIoU obtained from the model trained on the full dataset.
APA, Harvard, Vancouver, ISO, and other styles
45

Roscher, R., M. Volpi, C. Mallet, L. Drees, and J. D. Wegner. "SEMCITY TOULOUSE: A BENCHMARK FOR BUILDING INSTANCE SEGMENTATION IN SATELLITE IMAGES." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences V-5-2020 (August 3, 2020): 109–16. http://dx.doi.org/10.5194/isprs-annals-v-5-2020-109-2020.

Full text
Abstract:
Abstract. In order to reach the goal of reliably solving Earth monitoring tasks, automated and efficient machine learning methods are necessary for large-scale scene analysis and interpretation. A typical bottleneck of supervised learning approaches is the availability of accurate (manually) labeled training data, which is particularly important to train state-of-the-art (deep) learning methods. We present SemCity Toulouse, a publicly available, very high resolution, multi-spectral benchmark data set for training and evaluation of sophisticated machine learning models. The benchmark acts as test bed for single building instance segmentation which has been rarely considered before in densely built urban areas. Additional information is provided in the form of a multi-class semantic segmentation annotation covering the same area plus an adjacent area 3 times larger. The data set addresses interested researchers from various communities such as photogrammetry and remote sensing, but also computer vision and machine learning.
APA, Harvard, Vancouver, ISO, and other styles
46

Ibrahim, Yahya, Balázs Nagy, and Csaba Benedek. "Deep Learning-Based Masonry Wall Image Analysis." Remote Sensing 12, no. 23 (November 29, 2020): 3918. http://dx.doi.org/10.3390/rs12233918.

Full text
Abstract:
In this paper we introduce a novel machine learning-based fully automatic approach for the semantic analysis and documentation of masonry wall images, performing in parallel automatic detection and virtual completion of occluded or damaged wall regions, and brick segmentation leading to an accurate model of the wall structure. For this purpose, we propose a four-stage algorithm which comprises three interacting deep neural networks and a watershed transform-based brick outline extraction step. At the beginning, a U-Net-based sub-network performs initial wall segmentation into brick, mortar and occluded regions, which is followed by a two-stage adversarial inpainting model. The first adversarial network predicts the schematic mortar-brick pattern of the occluded areas based on the observed wall structure, providing in itself valuable structural information for archeological and architectural applications. The second adversarial network predicts the pixels’ color values yielding a realistic visual experience for the observer. Finally, using the neural network outputs as markers in a watershed-based segmentation process, we generate the accurate contours of the individual bricks, both in the originally visible and in the artificially inpainted wall regions. Note that while the first three stages implement a sequential pipeline, they interact through dependencies of their loss functions admitting the consideration of hidden feature dependencies between the different network components. For training and testing the network a new dataset has been created, and an extensive qualitative and quantitative evaluation versus the state-of-the-art is given. The experiments confirmed that the proposed method outperforms the reference techniques both in terms of wall structure estimation and regarding the visual quality of the inpainting step, moreover it can be robustly used for various different masonry wall types.
APA, Harvard, Vancouver, ISO, and other styles
47

Si, Yifan, Dawei Gong, Yang Guo, Xinhua Zhu, Qiangsheng Huang, Julian Evans, Sailing He, and Yaoran Sun. "An Advanced Spectral–Spatial Classification Framework for Hyperspectral Imagery Based on DeepLab v3+." Applied Sciences 11, no. 12 (June 19, 2021): 5703. http://dx.doi.org/10.3390/app11125703.

Full text
Abstract:
DeepLab v3+ neural network shows excellent performance in semantic segmentation. In this paper, we proposed a segmentation framework based on DeepLab v3+ neural network and applied it to the problem of hyperspectral imagery classification (HSIC). The dimensionality reduction of the hyperspectral image is performed using principal component analysis (PCA). DeepLab v3+ is used to extract spatial features, and those are fused with spectral features. A support vector machine (SVM) classifier is used for fitting and classification. Experimental results show that the framework proposed in this paper outperforms most traditional machine learning algorithms and deep-learning algorithms in hyperspectral imagery classification tasks.
APA, Harvard, Vancouver, ISO, and other styles
48

Touzani, Samir, and Jessica Granderson. "Open Data and Deep Semantic Segmentation for Automated Extraction of Building Footprints." Remote Sensing 13, no. 13 (July 1, 2021): 2578. http://dx.doi.org/10.3390/rs13132578.

Full text
Abstract:
Advances in machine learning and computer vision, combined with increased access to unstructured data (e.g., images and text), have created an opportunity for automated extraction of building characteristics, cost-effectively, and at scale. These characteristics are relevant to a variety of urban and energy applications, yet are time consuming and costly to acquire with today’s manual methods. Several recent research studies have shown that in comparison to more traditional methods that are based on features engineering approach, an end-to-end learning approach based on deep learning algorithms significantly improved the accuracy of automatic building footprint extraction from remote sensing images. However, these studies used limited benchmark datasets that have been carefully curated and labeled. How the accuracy of these deep learning-based approach holds when using less curated training data has not received enough attention. The aim of this work is to leverage the openly available data to automatically generate a larger training dataset with more variability in term of regions and type of cities, which can be used to build more accurate deep learning models. In contrast to most benchmark datasets, the gathered data have not been manually curated. Thus, the training dataset is not perfectly clean in terms of remote sensing images exactly matching the ground truth building’s foot-print. A workflow that includes data pre-processing, deep learning semantic segmentation modeling, and results post-processing is introduced and applied to a dataset that include remote sensing images from 15 cities and five counties from various region of the USA, which include 8,607,677 buildings. The accuracy of the proposed approach was measured on an out of sample testing dataset corresponding to 364,000 buildings from three USA cities. The results favorably compared to those obtained from Microsoft’s recently released US building footprint dataset.
APA, Harvard, Vancouver, ISO, and other styles
49

Cha, Jun-Young, Hyung-In Yoon, In-Sung Yeo, Kyung-Hoe Huh, and Jung-Suk Han. "Panoptic Segmentation on Panoramic Radiographs: Deep Learning-Based Segmentation of Various Structures Including Maxillary Sinus and Mandibular Canal." Journal of Clinical Medicine 10, no. 12 (June 11, 2021): 2577. http://dx.doi.org/10.3390/jcm10122577.

Full text
Abstract:
Panoramic radiographs, also known as orthopantomograms, are routinely used in most dental clinics. However, it has been difficult to develop an automated method that detects the various structures present in these radiographs. One of the main reasons for this is that structures of various sizes and shapes are collectively shown in the image. In order to solve this problem, the recently proposed concept of panoptic segmentation, which integrates instance segmentation and semantic segmentation, was applied to panoramic radiographs. A state-of-the-art deep neural network model designed for panoptic segmentation was trained to segment the maxillary sinus, maxilla, mandible, mandibular canal, normal teeth, treated teeth, and dental implants on panoramic radiographs. Unlike conventional semantic segmentation, each object in the tooth and implant classes was individually classified. For evaluation, the panoptic quality, segmentation quality, recognition quality, intersection over union (IoU), and instance-level IoU were calculated. The evaluation and visualization results showed that the deep learning-based artificial intelligence model can perform panoptic segmentation of images, including those of the maxillary sinus and mandibular canal, on panoramic radiographs. This automatic machine learning method might assist dental practitioners to set up treatment plans and diagnose oral and maxillofacial diseases.
APA, Harvard, Vancouver, ISO, and other styles
50

Ji, Hyesung, Danial Hooshyar, Kuekyeng Kim, and Heuiseok Lim. "A semantic-based video scene segmentation using a deep neural network." Journal of Information Science 45, no. 6 (December 19, 2018): 833–44. http://dx.doi.org/10.1177/0165551518819964.

Full text
Abstract:
Video scene segmentation is very important research in the field of computer vision, because it helps in efficient storage, indexing and retrieval of videos. Achieving this kind of scene segmentation cannot be done by just calculating the similarity of low-level features presented in the video; high-level features should also be considered to achieve a better performance. Even though much research has been conducted on video scene segmentation, most of these studies failed to semantically segment a video into scenes. Thus, in this study, we propose a Deep-learning Semantic-based Scene-segmentation model (called DeepSSS) that considers image captioning to segment a video into scenes semantically. First, the DeepSSS performs shot boundary detection by comparing colour histograms and then employs maximum-entropy-applied keyframe extraction. Second, for semantic analysis, using image captioning that benefits from deep learning generates a semantic text description of the keyframes. Finally, by comparing and analysing the generated texts, it assembles the keyframes into a scene grouped under a semantic narrative. That said, DeepSSS considers both low- and high-level features of videos to achieve a more meaningful scene segmentation. By applying DeepSSS to data sets from MS COCO for caption generation and evaluating its semantic scene-segmentation task results with the data sets from TRECVid 2016, we demonstrate quantitatively that DeepSSS outperforms other existing scene-segmentation methods using shot boundary detection and keyframes. What’s more, the experiments were done by comparing scenes segmented by humans and scene segmented by the DeepSSS. The results verified that the DeepSSS’ segmentation resembled that of humans. This is a new kind of result that was enabled by semantic analysis, which was impossible by just using low-level features of videos.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography