Dissertations / Theses: 'DEEP LEARNING, GENERATION, SEMANTIC SEGMENTATION, MACHINE LEARNING'

1

Di, Mauro Daniele. "Scene Understanding for Parking Spaces Management." Doctoral thesis, Università di Catania, 2019. http://hdl.handle.net/10761/4138.

Full text

Abstract:

The major part of world-wide population moved to urban areas. After such process many issues of major cities have worsened, e.g. air pollution, traffic, security. The increase of security cameras and the improvements of Computer Vision algorithm can be a good solution for many of those problems. The work in this thesis was started after a grant by Park Smart s.r.l., a company located in Catania, which believes that Computer Vision can be the answer for parking space management. The main problem the company has to face is to find a fast way to deploy working solutions, lowering the labeling effort to the minimum, across different scene, cities, parking areas. During the three years of doctoral studies we have tried to solve the problem through the use of various methods such as Semi-Supervised Learning, Counting and Scene Adaptation through Image Classification, Object Detection and Semantic Segmentation. Semi-Supervised classification was the first approach used to decrease labeling effort for fast deployment. Methods based on counting objects, like cars and parking spots, were analyzed as second solution. To gain full knowledge of the scene we focused on Semantic Segmentation and the use of Generative Adversarial Networks in order to find a viable way to reach good Scene Adaptation results comparable to state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

2

Nguyen, Duc Minh Chau. "Affordance learning for visual-semantic perception." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2021. https://ro.ecu.edu.au/theses/2443.

Full text

Abstract:

Affordance Learning is linked to the study of interactions between robots and objects, including how robots perceive objects by scene understanding. This area has been popular in the Psychology, which has recently come to influence Computer Vision. In this way, Computer Vision has borrowed the concept of affordance from Psychology in order to develop Visual-Semantic recognition systems, and to develop the capabilities of robots to interact with objects, in particular. However, existing systems of Affordance Learning are still limited to detecting and segmenting object affordances, which is called Affordance Segmentation. Further, these systems are not designed to develop specific abilities to reason about affordances. For example, a Visual-Semantic system, for captioning a scene, can extract information from an image, such as “a person holds a chocolate bar and eats it”, but does not highlight the affordances: “hold” and “eat”. Indeed, these affordances and others commonly appear within all aspects of life, since affordances usually connect to actions (from a linguistic view, affordances are generally known as verbs in sentences). Due to the above mentioned limitations, this thesis aims to develop systems of Affordance Learning for Visual-Semantic Perception. These systems can be built using Deep Learning, which has been empirically shown to be efficient for performing Computer Vision tasks. There are two goals of the thesis: (1) study what are the key factors that contribute to the performance of Affordance Segmentation and (2) reason about affordances (Affordance Reasoning) based on parts of objects for Visual-Semantic Perception. In terms of the first goal, the thesis mainly investigates the feature extraction module as this is one of the earliest steps in learning to segment affordances. The thesis finds that the quality of feature extraction from images plays a vital role in improved performance of Affordance Segmentation. With regard to the second goal, the thesis infers affordances from object parts to reason about part-affordance relationships. Based on this approach, the thesis devises an Object Affordance Reasoning Network that can learn to construct relationships between affordances and object parts. As a result, reasoning about affordance becomes achievable in the generation of scene graphs of affordances and object parts. Empirical results, obtained from extensive experiments, show the potential of the system (that the thesis developed) towards Affordance Reasoning from Scene Graph Generation.

APA, Harvard, Vancouver, ISO, and other styles

3

Espis, Andrea. "Object detection and semantic segmentation for assisted data labeling." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text

Abstract:

The automation of data labeling tasks is a solution to the errors and time costs related to human labeling. In this thesis work CenterNet, DeepLabV3, and K-Means applied to the RGB color space, are deployed to build a pipeline for Assisted data labeling: a semi-automatic process to iteratively improve the quality of the annotations. The proposed pipeline pointed out a total of 1547 wrong and missing annotations when applied to a dataset originally containing 8,300 annotations. Moreover, the quality of each annotation has been drastically improved, and at the same time, more than 600 hours of work have been saved. The same models have also been used to address the real-time Tire inspection task, regarding the detection of markers on the surface of tires. According to the experiments, the combination of DeepLabV3 output and post-processing based on the area and shape of the predicted blobs, achieves a maximum of mean Precision 0.992, with mean Recall 0.982, and a maximum of mean Recall 0.998, with mean Precision 0.960.

APA, Harvard, Vancouver, ISO, and other styles

4

Serra, Sabina. "Deep Learning for Semantic Segmentation of 3D Point Clouds from an Airborne LiDAR." Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-168367.

Full text

Abstract:

Light Detection and Ranging (LiDAR) sensors have many different application areas, from revealing archaeological structures to aiding navigation of vehicles. However, it is challenging to interpret and fully use the vast amount of unstructured data that LiDARs collect. Automatic classification of LiDAR data would ease the utilization, whether it is for examining structures or aiding vehicles. In recent years, there have been many advances in deep learning for semantic segmentation of automotive LiDAR data, but there is less research on aerial LiDAR data. This thesis investigates the current state-of-the-art deep learning architectures, and how well they perform on LiDAR data acquired by an Unmanned Aerial Vehicle (UAV). It also investigates different training techniques for class imbalanced and limited datasets, which are common challenges for semantic segmentation networks. Lastly, this thesis investigates if pre-training can improve the performance of the models. The LiDAR scans were first projected to range images and then a fully convolutional semantic segmentation network was used. Three different training techniques were evaluated: weighted sampling, data augmentation, and grouping of classes. No improvement was observed by the weighted sampling, neither did grouping of classes have a substantial effect on the performance. Pre-training on the large public dataset SemanticKITTI resulted in a small performance improvement, but the data augmentation seemed to have the largest positive impact. The mIoU of the best model, which was trained with data augmentation, was 63.7% and it performed very well on the classes Ground, Vegetation, and Vehicle. The other classes in the UAV dataset, Person and Structure, had very little data and were challenging for most models to classify correctly. In general, the models trained on UAV data performed similarly as the state-of-the-art models trained on automotive data.

APA, Harvard, Vancouver, ISO, and other styles

5

Westell, Jesper. "Multi-Task Learning using Road Surface Condition Classification and Road Scene Semantic Segmentation." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157403.

Full text

Abstract:

Understanding road surface conditions is an important component in active vehicle safety. Estimations can be achieved through image classification using increasingly popular convolutional neural networks (CNNs). In this paper, we explore the effects of multi-task learning by creating CNNs capable of simultaneously performing the two tasks road surface condition classification (RSCC) and road scene semantic segmentation (RSSS). A multi-task network, containing a shared feature extractor (VGG16, ResNet-18, ResNet-101) and two taskspecific network branches, is built and trained using the Road-Conditions and Cityscapes datasets. We reveal that utilizing task-dependent homoscedastic uncertainty in the learning process improvesmulti-task model performance on both tasks. When performing task adaptation, using a small set of additional data labeled with semantic information, we gain considerable RSCC improvements on complex models. Furthermore, we demonstrate increased model generalizability in multi-task models, with up to 12% higher F1-score compared to single-task models.

APA, Harvard, Vancouver, ISO, and other styles

6

Rydgård, Jonas, and Marcus Bejgrowicz. "Semantic Segmentation of Building Materials in Real World Images Using 3D Information." Thesis, Linköpings universitet, Datorseende, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176618.

Full text

Abstract:

The increasing popularity of drones has made it convenient to capture a large number of images of a property, which can then be used to build a 3D model. The conditions of buildings can be analyzed to plan renovations. This creates an interest for automatically identifying building materials, a task well suited for machine learning. With access to drone imagery of buildings as well as depth maps and normal maps, we created a dataset for semantic segmentation. Two different convolutional neural networks were trained and evaluated, to see how well they perform material segmentation. DeepLabv3+, which uses RGB data, was compared to Depth-Aware CNN, which uses RGB-D data. Our experiments showed that DeepLabv3+ achieved higher mean intersection over union. To investigate if the information in the depth maps and normal maps could give a performance boost, we conducted experiments with an encoding we call HMN - horizontal disparity, magnitude of normal with ground, normal parallel with gravity. This three channel encoding was used to jointly train two CNNs, one with RGB and one with HMN, and then sum their predictions. This led to improved results for both DeepLabv3+ and Depth-Aware CNN.
Den ökade populariteten av drönare har gjort det smidigt att ta ett stort antal bilder av en fastighet, och sedan skapa en 3D-modell. Skicket hos en byggnad kan enkelt analyseras och renoveringar planeras. Det är då av intresse att automatiskt kunna identifiera byggnadsmaterial, en uppgift som lämpar sig väl för maskininlärning. Med tillgång till såväl drönarbilder av byggnader som djupkartor och normalkartor har vi skapat ett dataset för semantisk segmentering. Två olika faltande neuronnät har tränats och utvärderats för att se hur väl de fungerar för materialigenkänning. DeepLabv3+ som använder sig av RGB-data har jämförts med Depth-Aware CNN som använder RGB-D-data och våra experiment visar att DeepLabv3+ får högre mean intersection over union. För att undersöka om resultaten kan förbättras med hjälp av datat i djupkartor och normalkartor har vi kodat samman informationen till vad vi valt att benämna HMN - horisontell disparitet, magnitud av normalen parallell med marken, normal i gravitationsriktningen. Denna trekanalsinput kan användas för att träna ett extra CNN samtidigt som man tränar med RGB-bilder, och sedan summera båda predikteringarna. Våra experiment visar att detta leder till bättre segmenteringar för både DeepLabv3+ och Depth-Aware CNN.

APA, Harvard, Vancouver, ISO, and other styles

7

Sörsäter, Michael. "Active Learning for Road Segmentation using Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152286.

Full text

Abstract:

In recent years, development of Convolutional Neural Networks has enabled high performing semantic segmentation models. Generally, these deep learning based segmentation methods require a large amount of annotated data. Acquiring such annotated data for semantic segmentation is a tedious and expensive task. Within machine learning, active learning involves in the selection of new data in order to limit the usage of annotated data. In active learning, the model is trained for several iterations and additional samples are selected that the model is uncertain of. The model is then retrained on additional samples and the process is repeated again. In this thesis, an active learning framework has been applied to road segmentation which is semantic segmentation of objects related to road scenes. The uncertainty in the samples is estimated with Monte Carlo dropout. In Monte Carlo dropout, several dropout masks are applied to the model and the variance is captured, working as an estimate of the model’s uncertainty. Other metrics to rank the uncertainty evaluated in this work are: a baseline method that selects samples randomly, the entropy in the default predictions and three additional variations/extensions of Monte Carlo dropout. Both the active learning framework and uncertainty estimation are implemented in the thesis. Monte Carlo dropout performs slightly better than the baseline in 3 out of 4 metrics. Entropy outperforms all other implemented methods in all metrics. The three additional methods do not perform better than Monte Carlo dropout. An analysis of what kind of uncertainty Monte Carlo dropout capture is performed together with a comparison of the samples selected by baseline and Monte Carlo dropout. Future development and possible improvements are also discussed.

APA, Harvard, Vancouver, ISO, and other styles

8

Hu, Xikun. "Multispectral Remote Sensing and Deep Learning for Wildfire Detection." Licentiate thesis, KTH, Geoinformatik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-295655.

Full text

Abstract:

Remote sensing data has great potential for wildfire detection and monitoring with enhanced spatial resolution and temporal coverage. Earth Observation satellites have been employed to systematically monitor fire activity over large regions in two ways: (i) to detect the location of actively burning spots (during the fire event), and (ii) to map the spatial extent of the burned scars (during or after the event). Active fire detection plays an important role in wildfire early warning systems. The open-access of Sentinel-2 multispectral data at 20-m resolution offers an opportunity to evaluate its complementary role to the coarse indication in the hotspots provided by MODIS-like polar-orbiting and GOES-like geostationary systems. In addition, accurate and timely mapping of burned areas is needed for damage assessment. Recent advances in deep learning (DL) provides the researcher with automatic, accurate, and bias-free large-scale mapping options for burned area mapping using uni-temporal multispectral imagery. Therefore, the objective of this thesis is to evaluate multispectral remote sensing data (in particular Sentinel-2) for wildfire detection, including active fire detection using a multi-criteria approach and burned area detection using DL models. For active fire detection, a multi-criteria approach based on the reflectance of B4, B11, and B12 of Sentinel-2 MSI data is developed for several representative fire-prone biomes to extract unambiguous active fire pixels. The adaptive thresholds for each biome are statistically determined from 11 million Sentinel-2 observations samples acquired over summertime (June 2019 to September 2019) across 14 regions or countries. The primary criterion is derived from 3 sigma prediction interval of OLS regression of observation samples for each biome. More specific criteria based on B11 and B12 are further introduced to reduce the omission errors (OE) and commission errors (CE). The multi-criteria approach proves to be effective in cool smoldering fire detection in study areas with tropical & subtropical grasslands, savannas & shrublands using the primary criterion. At the same time, additional criteria that thresholds the reflectance of B11 and B12 can effectively decrease the CE caused by extremely bright flames around the hot cores in testing sites with Mediterranean forests, woodlands & scrub. The other criterion based on reflectance ratio between B12 and B11 also avoids the effects of CE caused by hot soil pixels in sites with tropical & subtropical moist broadleaf forests. Overall, the validation performance over testing patches reveals that CE and OE can be kept at a low level (0.14 and 0.04) as an acceptable trade-off. This multi-criteria algorithm is suitable for rapid active fire detection based on uni-temporal imagery without the requirement of multi-temporal data. Medium-resolution multispectral data can be used as a complementary choice to the coarse resolution images for their ability to detect small burning areas and to detect active fires more accurately. For burned area mapping, this thesis aims to expound on the capability of deep DL models for automatically mapping burned areas from uni-temporal multispectral imagery. Various burned area detection algorithms have been developed using Sentinel-2 and/or Landsat data, but most of the studies require a pre-fire image, dense time-series data, or an empirical threshold. In this thesis, several semantic segmentation network architectures, i.e., U-Net, HRNet, Fast- SCNN, and DeepLabv3+ are applied to Sentinel-2 imagery and Landsat-8 imagery over three testing sites in two local climate zones. In addition, three popular machine learning (ML) algorithms (LightGBM, KNN, and random forests) and NBR thresholding techniques (empirical and OTSU-based) are used in the same study areas for comparison. The validation results show that DL algorithms outperform the machine learning (ML) methods in two of the three cases with the compact burned scars, while ML methods seem to be more suitable for mapping dispersed scar in boreal forests. Using Sentinel-2 images, U-Net and HRNet exhibit comparatively identical performance with higher kappa (around 0.9) in one heterogeneous Mediterranean fire site in Greece; Fast-SCNN performs better than others with kappa over 0.79 in one compact boreal forest fire with various burn severity in Sweden. Furthermore, directly transferring the trained models to corresponding Landsat-8 data, HRNet dominates in the three test sites among DL models and can preserve the high accuracy. The results demonstrate that DL models can make full use of contextual information and capture spatial details in multiple scales from fire-sensitive spectral bands to map burned areas. With the uni-temporal image, DL-based methods have the potential to be used for the next Earth observation satellite with onboard data processing and limited storage for previous scenes. In the future study, DL models will be explored to detect active fire from multi-resolution remote sensing data. The existing problem of unbalanced labeled data can be resolved via advanced DL architecture, the suitable configuration on the training dataset, and improved loss function. To further explore the damage caused by wildfire, future work will focus on the burn severity assessment based on DL models through multi-class semantic segmentation. In addition, the translation between optical and SAR imagery based on Generative Adversarial Network (GAN) model could be explored to improve burned area mapping in different weather conditions.
Fjärranalysdata har stor potential för upptäckt och övervakning av skogsbränder med förbättrad rumslig upplösning och tidsmässig täckning. Jordobservationssatelliter har använts för att systematiskt övervaka brandaktivitet över stora regioner på två sätt: (i) för att upptäcka placeringen av aktivt brinnande fläckar (under brandhändelsen) och (ii) för att kartlägga den brända ärrens rumsliga omfattning ( under eller efter evenemanget). Aktiv branddetektering spelar en viktig roll i system för tidig varning för skogsbränder. Den öppna tillgången till Sentinel-2 multispektral data vid 20 m upplösning ger en möjlighet att utvärdera dess kompletterande roll i förhållande till den grova indikationen i hotspots som tillhandahålls av MODIS-liknande polaromloppsbanesystem och GOES-liknande geostationära system. Dessutom krävs en korrekt och snabb kartläggning av brända områden för skadebedömning. Senaste framstegen inom deep learning (DL) ger forskaren automatiska, exakta och förspänningsfria storskaliga kartläggningsalternativ för kartläggning av bränt område med unitemporal multispektral bild. Därför är syftet med denna avhandling att utvärdera multispektral fjärranalysdata (särskilt Sentinel- 2) för att upptäcka skogsbränder, inklusive aktiv branddetektering med hjälp av ett multikriterietillvägagångssätt och detektering av bränt område med DL-modeller. För aktiv branddetektering utvecklas en multikriteriemetod baserad på reflektionen av B4, B11 och B12 i Stentinel-2 MSI data för flera representativa brandbenägna biom för att få fram otvetydiga pixlar för aktiv brand. De adaptiva tröskelvärdena för varje biom bestäms statistiskt från 11 miljoner Sentinel-2 observationsprover som förvärvats under sommaren (juni 2019 till september 2019) i 14 regioner eller länder. Det primära kriteriet härleds från 3-sigma-prediktionsintervallet för OLS-regression av observationsprover för varje biom. Mer specifika kriterier baserade på B11 och B12 införs vidare för att minska utelämningsfel (OE) och kommissionsfel (CE). Det multikriteriella tillvägagångssättet visar sig vara effektivt när det gäller upptäckt av svala pyrande bränder i undersökningsområden med tropiska och subtropiska gräsmarker, savanner och buskmarker med hjälp av det primära kriteriet. Samtidigt kan ytterligare kriterier som tröskelvärden för reflektionen av B11 och B12 effektivt minska det fel som orsakas av extremt ljusa lågor runt de heta kärnorna i testområden med skogar, skogsmarker och buskage i Medelhavsområdet. Det andra kriteriet som bygger på förhållandet mellan B12 och B11:s reflektionsgrad undviker också effekterna av CE som orsakas av heta markpixlar i områden med tropiska och subtropiska fuktiga lövskogar. Sammantaget visar valideringsresultatet för testområden att CE och OE kan hållas på en låg nivå (0,14 och 0,04) som en godtagbar kompromiss. Algoritmen med flera kriterier lämpar sig för snabb aktiv branddetektering baserad på unika tidsmässiga bilder utan krav på tidsmässiga data. Multispektrala data med medelhög upplösning kan användas som ett kompletterande val till bilder med kursupplösning på grund av deras förmåga att upptäcka små brinnande områden och att upptäcka aktiva bränder mer exakt. När det gäller kartläggning av brända områden syftar denna avhandling till att förklara hur djupa DL-modeller kan användas för att automatiskt kartlägga brända områden från multispektrala bilder i ett tidsintervall. Olika algoritmer för upptäckt av brända områden har utvecklats med hjälp av Sentinel-2 och/eller Landsat-data, men de flesta av studierna kräver att man har en förebränning. bild före branden, täta tidsseriedata eller ett empiriskt tröskelvärde. I den här avhandlingen tillämpas flera arkitekturer för semantiska segmenteringsnätverk, dvs. U-Net, HRNet, Fast- SCNN och DeepLabv3+, på Sentinel- 2 bilder och Landsat-8 bilder över tre testplatser i två lokala klimatzoner. Dessutom används tre populära algoritmer för maskininlärning (ML) (Light- GBM, KNN och slumpmässiga skogar) och NBR-tröskelvärden (empiriska och OTSU-baserade) i samma undersökningsområden för jämförelse. Valideringsresultaten visar att DL-algoritmerna överträffar maskininlärningsmetoderna (ML) i två av de tre fallen med kompakta brända ärr, medan ML-metoderna verkar vara mer lämpliga för kartläggning av spridda ärr i boreala skogar. Med hjälp av Sentinel-2 bilder uppvisar U-Net och HRNet jämförelsevis identiska prestanda med högre kappa (omkring 0,9) i en heterogen brandplats i Medelhavet i Grekland; Fast-SCNN presterar bättre än andra med kappa över 0,79 i en kompakt boreal skogsbrand med varierande brännskadegrad i Sverige. Vid direkt överföring av de tränade modellerna till motsvarande Landsat-8-data dominerar HRNet dessutom på de tre testplatserna bland DL-modellerna och kan bevara den höga noggrannheten. Resultaten visade att DL-modeller kan utnyttja kontextuell information fullt ut och fånga rumsliga detaljer i flera skalor från brandkänsliga spektralband för att kartlägga brända områden. Med den unika tidsmässiga bilden har DL-baserade metoder potential att användas för nästa jordobservationssatellit med databehandling ombord och begränsad lagring av tidigare scener. I den framtida studien kommer DL-modeller att undersökas för att upptäcka aktiva bränder från fjärranalysdata med flera upplösningar. Det befintliga problemet med obalanserade märkta data kan lösas med hjälp av en avancerad DL-arkitektur, lämplig konfiguration av träningsdatasetet och förbättrad förlustfunktion. För att ytterligare utforska de skador som orsakas av skogsbränder kommer det framtida arbetet att fokusera på bedömningen av brännskadornas allvarlighetsgrad baserat på DL-modeller genom semantisk segmentering av flera klasser. Dessutom kan översättningen mellan optiska bilder och SAR-bilder baserad på en GAN-modell (Generative Adversarial Network) undersökas för att förbättra kartläggningen av brända områden under olika väderförhållanden.

QC 20210525

APA, Harvard, Vancouver, ISO, and other styles

9

Phillips, Adon. "Melanoma Diagnostics Using Fully Convolutional Networks on Whole Slide Images." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36929.

Full text

Abstract:

Semantic segmentation as an approach to recognizing and localizing objects within an image is a major research area in computer vision. Now that convolutional neural networks are being increasingly used for such tasks, there have been many improve- ments in grand challenge results, and many new research opportunities in previously untennable areas. Using fully convolutional networks, we have developed a semantic segmentation pipeline for the identification of melanocytic tumor regions, epidermis, and dermis lay- ers in whole slide microscopy images of cutaneous melanoma or cutaneous metastatic melanoma. This pipeline includes processes for annotating and preparing a dataset from the output of a tissue slide scanner to the patch-based training and inference by an artificial neural network. We have curated a large dataset of 50 whole slide images containing cutaneous melanoma or cutaneous metastatic melanoma that are fully annotated at 40× ob- jective resolution by an expert pathologist. We will publish the source images of this dataset online. We also present two new FCN architectures that fuse multiple deconvolutional strides, combining coarse and fine predictions to improve accuracy over similar networks without multi-stride information. Our results show that the system performs better than our comparators. We include inference results on thousands of patches from four whole slide images, reassembling them into whole slide segmentation masks to demonstrate how our system generalizes on novel cases.

APA, Harvard, Vancouver, ISO, and other styles

10

Radhakrishnan, Aswathnarayan. "A Study on Applying Learning Techniques to Remote Sensing Data." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1586901481703797.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Djikic, Addi. "Segmentation and Depth Estimation of Urban Road Using Monocular Camera and Convolutional Neural Networks." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235496.

Full text

Abstract:

Deep learning for safe autonomous transport is rapidly emerging. Fast and robust perception for autonomous vehicles will be crucial for future navigation in urban areas with high traffic and human interplay. Previous work focuses on extracting full image depth maps, or finding specific road features such as lanes. However, in urban environments lanes are not always present, and sensors such as LiDAR with 3D point clouds provide a quite sparse depth perception of road with demanding algorithmic approaches. In this thesis we derive a novel convolutional neural network that we call AutoNet. It is designed as an encoder-decoder network for pixel-wise depth estimation of an urban drivable free-space road, using only a monocular camera, and handled as a supervised regression problem. AutoNet is also constructed as a classification network to solely classify and segment the drivable free-space in real- time with monocular vision, handled as a supervised classification problem, which shows to be a simpler and more robust solution than the regression approach. We also implement the state of the art neural network ENet for comparison, which is designed for fast real-time semantic segmentation and fast inference speed. The evaluation shows that AutoNet outperforms ENet for every performance metrics, but shows to be slower in terms of frame rate. However, optimization techniques are proposed for future work, on how to advance the frame rate of the network while still maintaining the robustness and performance. All the training and evaluation is done on the Cityscapes dataset. New ground truth labels for road depth perception are created for training with a novel approach of fusing pre-computed depth maps with semantic labels. Data collection with a Scania vehicle is conducted, mounted with a monocular camera to test the final derived models. The proposed AutoNet shows promising state of the art performance in regards to road depth estimation as well as road classification.
Deep learning för säkra autonoma transportsystem framträder mer och mer inom forskning och utveckling. Snabb och robust uppfattning om miljön för autonoma fordon kommer att vara avgörande för framtida navigering inom stadsområden med stor trafiksampel. I denna avhandling härleder vi en ny form av ett neuralt nätverk som vi kallar AutoNet. Där nätverket är designat som en autoencoder för pixelvis djupskattning av den fria körbara vägytan för stadsområden, där nätverket endast använder sig av en monokulär kamera och dess bilder. Det föreslagna nätverket för djupskattning hanteras som ett regressions problem. AutoNet är även konstruerad som ett klassificeringsnätverk som endast ska klassificera och segmentera den körbara vägytan i realtid med monokulärt seende. Där detta är hanterat som ett övervakande klassificerings problem, som även visar sig vara en mer simpel och mer robust lösning för att hitta vägyta i stadsområden. Vi implementerar även ett av de främsta neurala nätverken ENet för jämförelse. ENet är utformat för snabb semantisk segmentering i realtid, med hög prediktions- hastighet. Evalueringen av nätverken visar att AutoNet utklassar ENet i varje prestandamätning för noggrannhet, men visar sig vara långsammare med avseende på antal bilder per sekund. Olika optimeringslösningar föreslås för framtida arbete, för hur man ökar nätverk-modelens bildhastighet samtidigt som man behåller robustheten.All träning och utvärdering görs på Cityscapes dataset. Ny data för träning samt evaluering för djupskattningen för väg skapas med ett nytt tillvägagångssätt, genom att kombinera förberäknade djupkartor med semantiska etiketter för väg. Datainsamling med ett Scania-fordon utförs även, monterad med en monoculär kamera för att testa den slutgiltiga härleda modellen. Det föreslagna nätverket AutoNet visar sig vara en lovande topp-presterande modell i fråga om djupuppskattning för väg samt vägklassificering för stadsområden.

APA, Harvard, Vancouver, ISO, and other styles

12

Sokol, Norbert. "Segmentace biologických vzorků v obrazech z kryo-elektronového mikroskopu s využitím metod strojového učení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-442577.

Full text

Abstract:

Zobrazovanie pomocou kryo-elektrónovej mikroskopie má svoje nezastúpiteľné miesto v analýze viacerých biologických štruktúr. Lokalizácia buniek kultivovaných na mriežke a ich segmentácia voči pozadiu alebo kontaminácii je základom. Spolu s vývojom viacerých metód hlbokého učenia sa podstatne zvýšila úspešnosť úloh sémantickej segmentácie. V tejto práci vyvinieme hlbokú konvolučnú neurónovú sieť pre úlohu sémantickej segmentácie buniek kultivovaných na mriežke. Dátový súbor pre túto prácu bol vytvorený pomocou dual-beam kryo-elektónového mikroskopu vyvinutého spoločnosťou Thermo Fisher Scientific Brno.

APA, Harvard, Vancouver, ISO, and other styles

13

Ciano, Giorgio. "Multi-stage generation for segmentation of medical images." Doctoral thesis, 2022. http://hdl.handle.net/2158/1274075.

Full text

Abstract:

Recently, deep learning methods have had a tremendous impact on computer vision applications. The results obtained were unimaginable a few years ago. The problems of greatest interest are image classification, semantic segmentation, object detection, face recognition, and so on. All these tasks have in common the necessity of having a sufficient quantity of data to be able to train the model in a suitable manner. In fact, deep neural networks have a very high number of parameters, which imposes a fairly large dataset of supervised examples for their training. This problem is particularly important in the medical field, especially when the goal is the semantic segmentation of images, both due to the presence of privacy issues and the high cost of image tagging by medical experts. The main objective of this thesis is to study new methods for generating synthetic images along with their label–maps for segmentation purposes. The generated images can be used to augment real datasets. In the thesis, in order to achieve such a goal, new fully data–driven methods based on Generative Adversarial Networks are proposed. The main characteristic of these methods is that, differently from other approaches described in literature, they are multi–stage, namely they are composed of some steps. Indeed, by splitting the generation procedure in steps, the task is simplified and the employed networks require a smaller number of examples for learning. In particular, a first proposed method consists of a two–stage image generation procedure, where the semantic label–maps are produced first, and then the image is generated from the label–maps. This approach has been used to generate retinal images along with the corresponding vessel segmentation label–maps. With this method, learning the generator requires only a handful of samples. The method generates realistic high–resolution retinal images. Moreover, the generated images can be used to augment the training set of a segmentation algorithm. In this way, we achieved results that outperforms the state–of–the–art for the task of segmentation of retinal vessels. In the second part of the thesis, a three–stage approach is presented: the initial step consists in the generation of dots whose positions indicate the locations of the semantic objects represented in the image; then, in the second step, the dots are translated into semantic label–maps, which are, finally, transformed into the image. The method was evaluated on the segmentation of chest radiographic images. The experimental results are promising both from a qualitative and quantitative point of view.

APA, Harvard, Vancouver, ISO, and other styles

14

Martins, Gil Lusquiños. "Image segmentation algorithms based on deep learning for drone aerial imagery from the Portuguese coastal zone." Master's thesis, 2021. http://hdl.handle.net/10773/33614.

Full text

Abstract:

As human-induced pressures continue to rise in the coastal zone, there is an increasing need to resourcefully predict, detect and monitor environmental patterns to support large scale conservation strategies. The Portuguese coastal zone is the home to profuse biological communities, including mussels, which are a key ecological species for the biodiversity of seashore ecosystems, supporting and shielding a vast amount of invertebrate species. Additionally, the improvement of unmanned aerial devices and high-resolution aerial photography have provided the possibility to produce large temporal and spatial datasets while subsiding both biological and physical disturbances in the ecosystems. On this basis, a low-altitude and high resolution aerial image set was captured by a research team from the Biology Department of the University of Aveiro to measure the coverage, size and density of mussels along the Portuguese shoreline. With this newly-gathered dataset, a group from the Department of Electronics, Telecommunications and Informatics, from the same institution, took the initiative to create computer vision algorithms through deep learning in order to assist the analysis of the collected data and verify the viability of the data-gathering methods. This work presents all the thorough procedures executed to answer the proposed challenge, from the development of a functional pixel-wise image segmentation dataset, to the development of predicting models using renowned architectures in the deep learning community, capable of achieving good results to enable the understanding of the dynamics of the ecosystem and predict the mussel abundance under distinct environmental scenarios. Furthermore, the solution has the potential to grow and be improved further. By exploring a new dataset that may open new doors for understanding and classification of coastal zones, with models that could potentially be re-trained in the future for different kinds of shores and intertidal zones with more and other animal communities, this work also proves the possibility of using deep learning models to analyze image data acquired from drones and hopes to allow further research on the subject and on different types of areas and vegetation.
À medida que as pressões induzidas pelo homem continuam a aumentar na zona costeira, há uma necessidade crescente de prever, detetar e monitorizar padrões ambientais para apoiar estratégias de conservação em grande escala. A zona costeira portuguesa é o lar de comunidades biológicas abundantes, incluindo mexilhões, que são uma espécie ecológica chave para a biodiversidade dos ecossistemas costeiros, apoiando e protegendo uma vasta quantidade de espécies invertebradas. Adicionalmente, o aperfeiçoamento dos dispositivos aéreos não tripulados e da fotografia aérea de alta resolução proporcionaram a possibilidade de produzir grandes conjuntos de dados temporais e espaciais, reduzindo ao mesmo tempo tanto perturbações biológicas como físicas nos ecossistemas. Nesta base, um conjunto de imagens aéreas de baixa altitude e alta resolução foi capturado por uma equipa de investigação do Departamento de Biologia da Universidade de Aveiro para medir a cobertura, tamanho e densidade dos mexilhões ao longo da costa portuguesa. Com este conjunto de dados reunido, um grupo do Departamento de Eletrónica, Telecomunicações e Informática, da mesma instituição, tomou a iniciativa de criar algoritmos de visão computacional através de deep learning, com o objetivo de auxiliar a análise das imagens recolhidas e verificar a viabilidade dos métodos de recolha de dados. Este trabalho apresenta todos os procedimentos exaustivos efetuados para responder ao desafio proposto, desde o desenvolvimento de um conjunto de dados funcional para segmentação de imagens ao nível do pixel, até ao desenvolvimento de modelos preditivos utilizando arquiteturas de renome na comunidade de deep learning, capazes de alcançar bons resultados para permitir a compreensão da dinâmica do ecossistema e prever a abundância dos mexilhões em cenários ambientais distintos. Além disso, a solução apresenta potencial para crescer e ser futuramente aperfeiçoada. Ao explorar um novo conjunto de dados que poderá abrir novas portas para a compreensão e classificação das zonas costeiras, com modelos que poderão ser potencialmente re-treinados no futuro para diferentes tipos de costas e zonas intertidais com mais e outras comunidades animais, este trabalho prova também a possibilidade de utilizar modelos de deep learning para analisar dados adquiridos através de drones e espera possibilitar uma investigação mais aprofundada no tema e em diferentes tipos de áreas e vegetação.
Mestrado em Engenharia Informática

APA, Harvard, Vancouver, ISO, and other styles

15

Ferreira, Joana Carlos Mesquita. "Extraction of Heart Rate from Multimodal Video Streams of Neonates using Methods of Machine Learning." Master's thesis, 2019. http://hdl.handle.net/10362/103155.

Full text

Abstract:

The World Health Organization estimates that more than one-tenth of births are premature. Premature births are linked to an increase of the mortality risk, when compared with full-term infants. In fact, preterm birth complications are the leading cause of perinatal mortality. These complications range from respiratory distress to cardiovascular disorders. Vital signs changes are often prior to these major complications, therefore it is crucial to perform continuous monitoring of this signals. Heart rate monitoring is particularly important. Nowadays, the standard method to monitor this vital sign requires adhesive electrodes or sensors that are attached to the infant. This contact-based methods can damage the skin of the infant, possibly leading to infections. Within this context, there is a need to evolve to remote heart rate monitoring methods. This thesis introduces a new method for region of interest selection to improve remote heart rate monitoring in neonatology through Photoplethysmography Imaging. The heart rate assessment is based on the standard photoplethysmography principle, which makes use of the subtle fluctuations of visible or infrared light that is reflected from the skin surface within the cardiac cycle. A camera is used, instead of the contact-based sensors. Specifically, this thesis presents an alternative method to manual region of interest selection using methods of Machine Learning, aiming to improve the robustness of Photoplethysmography Imaging. This method comprises a highly efficient Fully Convolutional Neural Network to select six different body regions, within each video frame. The developed neural network was built upon a ResNet network and a custom upsampling network. Additionally, a new post-processing method was developed to refine the body segmentation results, using a sequence of morphological operations and centre of mass analysis. The developed region of interest selection method was validated with clinical data, demonstrating a good agreement (78%) between the estimated heart rate and the reference.

APA, Harvard, Vancouver, ISO, and other styles

16

(6738881), Soonam Lee. "Segmentation and Deconvolution of Fluorescence Microscopy Volumes." Thesis, 2019.

Find full text

Abstract:

Recent advances in optical microscopy have enabled biologists collect fluorescence microscopy volumes cellular and subcellular structures of living tissue. This results in collecting large datasets of microscopy volume and needs image processing aided automated quantification method. To quantify biological structures a first and fundamental step is segmentation. Yet, the quantitative analysis of the microscopy volume is hampered by light diffraction, distortion created by lens aberrations in different directions, complex variation of biological structures. This thesis describes several proposed segmentation methods to identify various biological structures such as nuclei or tubules observed in fluorescence microscopy volumes. To achieve nuclei segmentation, multiscale edge detection method and 3D active contours with inhomogeneity correction method are used for segmenting nuclei. Our proposed 3D active contours with inhomogeneity correction method utilizes 3D microscopy volume information while addressing intensity inhomogeneity across vertical and horizontal directions. To achieve tubules segmentation, ellipse model fitting to tubule boundary method and convolutional neural networks with inhomogeneity correction method are performed. More specifically, ellipse fitting method utilizes a combination of adaptive and global thresholding, potentials, z direction refinement, branch pruning, end point matching, and boundary fitting steps to delineate tubular objects. Also, the deep learning based method combines intensity inhomogeneity correction, data augmentation, followed by convolutional neural networks architecture. Moreover, this thesis demonstrates a new deconvolution method to improve microscopy image quality without knowing the 3D point spread function using a spatially constrained cycle-consistent adversarial networks. The results of proposed methods are visually and numerically compared with other methods. Experimental results demonstrate that our proposed methods achieve better performance than other methods for nuclei/tubules segmentation as well as deconvolution.

APA, Harvard, Vancouver, ISO, and other styles

17

(9187466), Bharath Kumar Comandur Jagannathan Raghunathan. "Semantic Labeling of Large Geographic Areas Using Multi-Date and Multi-View Satellite Images and Noisy OpenStreetMap Labels." Thesis, 2020.

Find full text

Abstract:

This dissertation addresses the problem of how to design a convolutional neural network (CNN) for giving semantic labels to the points on the ground given the satellite image coverage over the area and, for the ground truth, given the noisy labels in OpenStreetMap (OSM). This problem is made challenging by the fact that -- (1) Most of the images are likely to have been recorded from off-nadir viewpoints for the area of interest on the ground; (2) The user-supplied labels in OSM are frequently inaccurate and, not uncommonly, entirely missing; and (3) The size of the area covered on the ground must be large enough to possess any engineering utility. As this dissertation demonstrates, solving this problem requires that we first construct a DSM (Digital Surface Model) from a stereo fusion of the available images, and subsequently use the DSM to map the individual pixels in the satellite images to points on the ground. That creates an association between the pixels in the images and the noisy labels in OSM. The CNN-based solution we present yields a 4-8% improvement in the per-class segmentation IoU (Intersection over Union) scores compared to the traditional approaches that use the views independently of one another. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-`a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. This work also presents, for arguably the first time, an in-depth discussion of large-area image alignment and DSM construction using tens of true multi-date and multi-view WorldView-3 satellite images on a distributed OpenStack cloud computing platform.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'DEEP LEARNING, GENERATION, SEMANTIC SEGMENTATION, MACHINE LEARNING'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles