Academic literature on the topic 'Synthetic datasets'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Synthetic datasets.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Synthetic datasets"

1

Hanel, A., D. Kreuzpaintner, and U. Stilla. "EVALUATION OF A TRAFFIC SIGN DETECTOR BY SYNTHETIC IMAGE DATA FOR ADVANCED DRIVER ASSISTANCE SYSTEMS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2 (May 30, 2018): 425–32. http://dx.doi.org/10.5194/isprs-archives-xlii-2-425-2018.

Full text
Abstract:
Recently, several synthetic image datasets of street scenes have been published. These datasets contain various traffic signs and can therefore be used to train and test machine learning-based traffic sign detectors. In this contribution, selected datasets are compared regarding ther applicability for traffic sign detection. The comparison covers the process to produce the synthetic images and addresses the virtual worlds, needed to produce the synthetic images, and their environmental conditions. The comparison covers variations in the appearance of traffic signs and the labeling strategies used for the datasets, as well. A deep learning traffic sign detector is trained with multiple training datasets with different ratios between synthetic and real training samples to evaluate the synthetic SYNTHIA dataset. A test of the detector on real samples only has shown that an overall accuracy and ROC AUC of more than 95 % can be achieved for both a small rate of synthetic samples and a large rate of synthetic samples in the training dataset.
APA, Harvard, Vancouver, ISO, and other styles
2

Arvanitis, Theodoros N., Sean White, Stuart Harrison, Rupert Chaplin, and George Despotou. "A method for machine learning generation of realistic synthetic datasets for validating healthcare applications." Health Informatics Journal 28, no. 2 (January 2022): 146045822210770. http://dx.doi.org/10.1177/14604582221077000.

Full text
Abstract:
Digital health applications can improve quality and effectiveness of healthcare, by offering a number of new tools to users, which are often considered a medical device. Assuring their safe operation requires, amongst others, clinical validation, needing large datasets to test them in realistic clinical scenarios. Access to datasets is challenging, due to patient privacy concerns. Development of synthetic datasets is seen as a potential alternative. The objective of the paper is the development of a method for the generation of realistic synthetic datasets, statistically equivalent to real clinical datasets, and demonstrate that the Generative Adversarial Network (GAN) based approach is fit for purpose. A generative adversarial network was implemented and trained, in a series of six experiments, using numerical and categorical variables, including ICD-9 and laboratory codes, from three clinically relevant datasets. A number of contextual steps provided the success criteria for the synthetic dataset. A synthetic dataset that exhibits very similar statistical characteristics with the real dataset was generated. Pairwise association of variables is very similar. A high degree of Jaccard similarity and a successful K-S test further support this. The proof of concept of generating realistic synthetic datasets was successful, with the approach showing promise for further work.
APA, Harvard, Vancouver, ISO, and other styles
3

Kannan, Subarmaniam. "Synthetic time series data generation for edge analytics." F1000Research 11 (January 20, 2022): 67. http://dx.doi.org/10.12688/f1000research.72984.1.

Full text
Abstract:
Background: Internet of Things (IoT) edge analytics enables data computation and storage to be available adjacent to the source of data generation at the IoT system. This method improves sensor data handling and speeds up analysis, prediction, and action. Using machine learning for analytics and task offloading in edge servers could minimise latency and energy usage. However, one of the key challenges in using machine learning in edge analytics is to find a real-world dataset to implement a more representative predictive model. This challenge has undeniably slowed down the adoption of machine learning methods in IoT edge analytics. Thus, the generation of realistic synthetic datasets can leverage the need to speed up methodological use of machine learning in edge analytics. Methods: We create synthetic data with features that are like data from IoT devices. We use an existing air quality dataset that includes temperature and gas sensor measurements. This real-time dataset includes component values for the Air Quality Index (AQI) and ppm concentrations for various polluting gases. We build a JavaScript Object Notation (JSON) model to capture the distribution of variables and the structure of this real dataset to generate the synthetic data. Based on the synthetic dataset and original dataset, we create a comparative predictive model. Results: Analysis of synthetic dataset predictive model shows that it can be successfully used for edge analytics purposes, replacing real-world datasets. There is no significant difference between the real-world dataset compared the synthetic dataset. The generated synthetic data requires no modification to suit the edge computing requirements. Conclusions: The framework can generate representative synthetic datasets based on JSON schema attributes. The accuracy, precision, and recall values for the real and synthetic datasets indicate that the logistic regression model is capable of successfully classifying data.
APA, Harvard, Vancouver, ISO, and other styles
4

Poudevigne-Durance, Thomas, Owen Dafydd Jones, and Yipeng Qin. "MaWGAN: A Generative Adversarial Network to Create Synthetic Data from Datasets with Missing Data." Electronics 11, no. 6 (March 8, 2022): 837. http://dx.doi.org/10.3390/electronics11060837.

Full text
Abstract:
The creation of synthetic data are important for a range of applications, for example, to anonymise sensitive datasets or to increase the volume of data in a dataset. When the target dataset has missing data, then it is common to just discard incomplete observations, even though this necessarily means some loss of information. However, when the proportion of missing data are large, discarding incomplete observations may not leave enough data to accurately estimate their joint distribution. Thus, there is a need for data synthesis methods capable of using datasets with missing data, to improve accuracy and, in more extreme cases, to make data synthesis possible. To achieve this, we propose a novel generative adversarial network (GAN) called MaWGAN (for masked Wasserstein GAN), which creates synthetic data directly from datasets with missing values. As with existing GAN approaches, the MaWGAN synthetic data generator generates samples from the full joint distribution. We introduce a novel methodology for comparing the generator output with the original data that does not require us to discard incomplete observations, based on a modification of the Wasserstein distance and easily implemented using masks generated from the pattern of missing data in the original dataset. Numerical experiments are used to demonstrate the superior performance of MaWGAN compared to (a) discarding incomplete observations before using a GAN, and (b) imputing missing values (using the GAIN algorithm) before using a GAN.
APA, Harvard, Vancouver, ISO, and other styles
5

So, Banghee, Jean-Philippe Boucher, and Emiliano A. Valdez. "Synthetic Dataset Generation of Driver Telematics." Risks 9, no. 4 (March 24, 2021): 58. http://dx.doi.org/10.3390/risks9040058.

Full text
Abstract:
This article describes the techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations regarding driver’s claims experience, together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process while using machine learning algorithms. In the first stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The second stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The third stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work ot be valuable.
APA, Harvard, Vancouver, ISO, and other styles
6

Wu, Hao, Yue Ning, Prithwish Chakraborty, Jilles Vreeken, Nikolaj Tatti, and Naren Ramakrishnan. "Generating Realistic Synthetic Population Datasets." ACM Transactions on Knowledge Discovery from Data 12, no. 4 (July 13, 2018): 1–22. http://dx.doi.org/10.1145/3182383.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Minhas, Saad, Zeba Khanam, Shoaib Ehsan, Klaus McDonald-Maier, and Aura Hernández-Sabaté. "Weather Classification by Utilizing Synthetic Data." Sensors 22, no. 9 (April 21, 2022): 3193. http://dx.doi.org/10.3390/s22093193.

Full text
Abstract:
Weather prediction from real-world images can be termed a complex task when targeting classification using neural networks. Moreover, the number of images throughout the available datasets can contain a huge amount of variance when comparing locations with the weather those images are representing. In this article, the capabilities of a custom built driver simulator are explored specifically to simulate a wide range of weather conditions. Moreover, the performance of a new synthetic dataset generated by the above simulator is also assessed. The results indicate that the use of synthetic datasets in conjunction with real-world datasets can increase the training efficiency of the CNNs by as much as 74%. The article paves a way forward to tackle the persistent problem of bias in vision-based datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Jie, Xinyan Qin, Jin Lei, Bo Jia, Bo Li, Zhaojun Li, Huidong Li, Yujie Zeng, and Jie Song. "A Novel Auto-Synthesis Dataset Approach for Fitting Recognition Using Prior Series Data." Sensors 22, no. 12 (June 9, 2022): 4364. http://dx.doi.org/10.3390/s22124364.

Full text
Abstract:
To address power transmission line (PTL) traversing complex environments leading to data collection being difficult and costly, we propose a novel auto-synthesis dataset approach for fitting recognition using prior series data. The approach mainly includes three steps: (1) formulates synthesis rules by the prior series data; (2) renders 2D images based on the synthesis rules utilizing advanced virtual 3D techniques; (3) generates the synthetic dataset with images and annotations obtained by processing images using the OpenCV. The trained model using the synthetic dataset was tested by the real dataset (including images and annotations) with a mean average precision (mAP) of 0.98, verifying the feasibility and effectiveness of the proposed approach. The recognition accuracy by the test is comparable with training by real samples and the cost is greatly reduced to generate synthetic datasets. The proposed approach improves the efficiency of establishing a dataset, providing a training data basis for deep learning (DL) of fitting recognition.
APA, Harvard, Vancouver, ISO, and other styles
9

Kugurakova, Vlada Vladimirovna, Vitaly Denisovich Abramov, Daniil Ivanovich Kostiuk, Regina Airatovna Sharaeva, Rim Radikovich Gazizova, and Murad Rustemovich Khafizov. "Generation of Three-Dimensional Synthetic Datasets." Russian Digital Libraries Journal 24, no. 4 (September 12, 2021): 622–52. http://dx.doi.org/10.26907/1562-5419-2021-24-4-622-652.

Full text
Abstract:
The work is devoted to the description of the process of developing a universal toolkit for generating synthetic data for training various neural networks. The approach used has shown its success and effectiveness in solving various problems, in particular, training a neural network to recognize shopping behavior inside stores through surveillance cameras and training a neural network for recognizing spaces with augmented reality devices without using auxiliary infrared cameras. Generalizing conclusions allow planning the further development of technologies for generating three-dimensional synthetic data.
APA, Harvard, Vancouver, ISO, and other styles
10

Ma’sum, Muhammad Anwar. "Intelligent Clustering and Dynamic Incremental Learning to Generate Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification." Symmetry 12, no. 4 (April 24, 2020): 679. http://dx.doi.org/10.3390/sym12040679.

Full text
Abstract:
Classification in multi-modal data is one of the challenges in the machine learning field. The multi-modal data need special treatment as its features are distributed in several areas. This study proposes multi-codebook fuzzy neural networks by using intelligent clustering and dynamic incremental learning for multi-modal data classification. In this study, we utilized intelligent K-means clustering based on anomalous patterns and intelligent K-means clustering based on histogram information. In this study, clustering is used to generate codebook candidates before the training process, while incremental learning is utilized when the condition to generate a new codebook is sufficient. The condition to generate a new codebook in incremental learning is based on the similarity of the winner class and other classes. The proposed method was evaluated in synthetic and benchmark datasets. The experiment results showed that the proposed multi-codebook fuzzy neural networks that use dynamic incremental learning have significant improvements compared to the original fuzzy neural networks. The improvements were 15.65%, 5.31% and 11.42% on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively, for incremental version 1. The incremental learning version 2 improved by 21.08% 4.63%, and 14.35% on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively. The multi-codebook fuzzy neural networks that use intelligent clustering also had significant improvements compared to the original fuzzy neural networks, achieving 23.90%, 2.10%, and 15.02% improvements on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Synthetic datasets"

1

D'Agostino, Alessandro. "Automatic generation of synthetic datasets for digital pathology image analysis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21722/.

Full text
Abstract:
The project is inspired by an actual problem of timing and accessibility in the analysis of histological samples in the health-care system. In this project, I address the problem of synthetic histological image generation for the purpose of training Neural Networks for the segmentation of real histological images. The collection of real histological human-labeled samples is a very time consuming and expensive process and often is not representative of healthy samples, for the intrinsic nature of the medical analysis. The method I propose is based on the replication of the traditional specimen preparation technique in a virtual environment. The first step is the creation of a 3D virtual model of a region of the target human tissue. The model should represent all the key features of the tissue, and the richer it is the better will be the yielded result. The second step is to perform a sampling of the model through a virtual tomography process, which produces a first completely labeled image of the section. This image is then processed with different tools to achieve a histological-like aspect. The most significant aesthetical post-processing is given by the action of a style transfer neural network that transfers the typical histological visual texture on the synthetic image. This procedure is presented in detail for two specific models: one of pancreatic tissue and one of dermal tissue. The two resulting images compose a pair of images suitable for a supervised learning technique. The generation process is completely automatized and does not require the intervention of any human operator, hence it can be used to produce arbitrary large datasets. The synthetic images are inevitably less complex than the real samples and they offer an easier segmentation task to solve for the NN. However, the synthetic images are very abundant, and the training of a NN can take advantage of this feature, following the so-called curriculum learning strategy.
APA, Harvard, Vancouver, ISO, and other styles
2

Hummel, Georg Verfasser], Peter [Akademischer Betreuer] [Stütz, and Paolo [Gutachter] Remagnino. "On synthetic datasets for development of computer vision algorithms in airborne reconnaissance applications / Georg Hummel ; Gutachter: Peter Stütz, Paolo Remagnino ; Akademischer Betreuer: Peter Stütz ; Universität der Bundeswehr München, Fakultät für Luft- und Raumfahrttechnik." Neubiberg : Universitätsbibliothek der Universität der Bundeswehr München, 2017. http://d-nb.info/1147386331/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hummel, Georg [Verfasser], Peter [Akademischer Betreuer] [Gutachter] Stütz, and Paolo [Gutachter] Remagnino. "On synthetic datasets for development of computer vision algorithms in airborne reconnaissance applications / Georg Hummel ; Gutachter: Peter Stütz, Paolo Remagnino ; Akademischer Betreuer: Peter Stütz ; Universität der Bundeswehr München, Fakultät für Luft- und Raumfahrttechnik." Neubiberg : Universitätsbibliothek der Universität der Bundeswehr München, 2017. http://d-nb.info/1147386331/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhao, Amy(Xiaoyu Amy). "Learning distributions of transformations from small datasets for applied image synthesis." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/128342.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2020
Cataloged from PDF of thesis. "February 2020."
Includes bibliographical references (pages 75-91).
Much of the recent research in machine learning and computer vision focuses on applications with large labeled datasets. However, in realistic settings, it is much more common to work with limited data. In this thesis, we investigate two applications of image synthesis using small datasets. First, we demonstrate how to use image synthesis to perform data augmentation, enabling the use of supervised learning methods with limited labeled data. Data augmentation -- typically the application of simple, hand-designed transformations such as rotation and scaling -- is often used to expand small datasets. We present a method for learning complex data augmentation transformations, producing examples that are more diverse, realistic, and useful for training supervised systems than hand-engineered augmentation. We demonstrate our proposed augmentation method for improving few-shot object classification performance, using a new dataset of collectible cards with fine-grained differences. We also apply our method to medical image segmentation, enabling the training of a supervised segmentation system using just a single labeled example. In our second application, we present a novel image synthesis task: synthesizing time lapse videos of the creation of digital and watercolor paintings. Using a recurrent model of paint strokes and a novel training scheme, we create videos that tell a plausible visual story of the painting process.
by Amy (Xiaoyu) Zhao.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
5

He, Wenbin. "Exploration and Analysis of Ensemble Datasets with Statistical and Deep Learning Models." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574695259847734.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Bartocci, John Timothy. "Generating a synthetic dataset for kidney transplantation using generative adversarial networks and categorical logit encoding." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1617104572023027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Choudhury, Ananya. "WiSDM: a platform for crowd-sourced data acquisition, analytics, and synthetic data generation." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/72256.

Full text
Abstract:
Human behavior is a key factor influencing the spread of infectious diseases. Individuals adapt their daily routine and typical behavior during the course of an epidemic -- the adaptation is based on their perception of risk of contracting the disease and its impact. As a result, it is desirable to collect behavioral data before and during a disease outbreak. Such data can help in creating better computer models that can, in turn, be used by epidemiologists and policy makers to better plan and respond to infectious disease outbreaks. However, traditional data collection methods are not well suited to support the task of acquiring human behavior related information; especially as it pertains to epidemic planning and response. Internet-based methods are an attractive complementary mechanism for collecting behavioral information. Systems such as Amazon Mechanical Turk (MTurk) and online survey tools provide simple ways to collect such information. This thesis explores new methods for information acquisition, especially behavioral information that leverage this recent technology. Here, we present the design and implementation of a crowd-sourced surveillance data acquisition system -- WiSDM. WiSDM is a web-based application and can be used by anyone with access to the Internet and a browser. Furthermore, it is designed to leverage online survey tools and MTurk; WiSDM can be embedded within MTurk in an iFrame. WiSDM has a number of novel features, including, (i) ability to support a model-based abductive reasoning loop: a flexible and adaptive information acquisition scheme driven by causal models of epidemic processes, (ii) question routing: an important feature to increase data acquisition efficacy and reduce survey fatigue and (iii) integrated surveys: interactive surveys to provide additional information on survey topic and improve user motivation. We evaluate the framework's performance using Apache JMeter and present our results. We also discuss three other extensions of WiSDM: Adapter, Synthetic Data Generator, and WiSDM Analytics. The API Adapter is an ETL extension of WiSDM which enables extracting data from disparate data sources and loading to WiSDM database. The Synthetic Data Generator allows epidemiologists to build synthetic survey data using NDSSL's Synthetic Population as agents. WiSDM Analytics empowers users to perform analysis on the data by writing simple python code using Versa APIs. We also propose a data model that is conducive to survey data analysis.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
8

Šlosár, Peter. "Generátor syntetické datové sady pro dopravní analýzu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236021.

Full text
Abstract:
This Master's thesis deals with the design and development of tools for generating a synthetic dataset for traffic analysis purposes. The first part contains a brief introduction to the vehicle detection and rendering methods. Blender and the set of scripts are used to create highly customizable training images dataset and synthetic videos from a single photograph. Great care is taken to create very realistic output, that is suitable for further processing in field of traffic analysis. Produced images and videos are automatically richly annotated. Achieved results are tested by training a sample car detector and evaluated with real life testing data. Synthetic dataset outperforms real training datasets in this comparison of the detection rate. Computational demands of the tools are evaluated as well. The final part sums up the contribution of this thesis and outlines some extensions of the tools for the future.
APA, Harvard, Vancouver, ISO, and other styles
9

Oškera, Jan. "Detekce dopravních značek a semaforů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-432850.

Full text
Abstract:
The thesis focuses on modern methods of traffic sign detection and traffic lights detection directly in traffic and with use of back analysis. The main subject is convolutional neural networks (CNN). The solution is using convolutional neural networks of YOLO type. The main goal of this thesis is to achieve the greatest possible optimization of speed and accuracy of models. Examines suitable datasets. A number of datasets are used for training and testing. These are composed of real and synthetic data sets. For training and testing, the data were preprocessed using the Yolo mark tool. The training of the model was carried out at a computer center belonging to the virtual organization MetaCentrum VO. Due to the quantifiable evaluation of the detector quality, a program was created statistically and graphically showing its success with use of ROC curve and evaluation protocol COCO. In this thesis I created a model that achieved a success average rate of up to 81 %. The thesis shows the best choice of threshold across versions, sizes and IoU. Extension for mobile phones in TensorFlow Lite and Flutter have also been created.
APA, Harvard, Vancouver, ISO, and other styles
10

Kola, Ramya Sree. "Generation of synthetic plant images using deep learning architecture." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18450.

Full text
Abstract:
Background: Generative Adversarial Networks (Goodfellow et al., 2014) (GANs)are the current state of the art machine learning data generating systems. Designed with two neural networks in the initial architecture proposal, generator and discriminator. These neural networks compete in a zero-sum game technique, to generate data having realistic properties inseparable to that of original datasets. GANs have interesting applications in various domains like Image synthesis, 3D object generation in gaming industry, fake music generation(Dong et al.), text to image synthesis and many more. Despite having a widespread application domains, GANs are popular for image data synthesis. Various architectures have been developed for image synthesis evolving from fuzzy images of digits to photorealistic images. Objectives: In this research work, we study various literature on different GAN architectures. To understand significant works done essentially to improve the GAN architectures. The primary objective of this research work is synthesis of plant images using Style GAN (Karras, Laine and Aila, 2018) variant of GAN using style transfer. The research also focuses on identifying various machine learning performance evaluation metrics that can be used to measure Style GAN model for the generated image datasets. Methods: A mixed method approach is used in this research. We review various literature work on GANs and elaborate in detail how each GAN networks are designed and how they evolved over the base architecture. We then study the style GAN (Karras, Laine and Aila, 2018a) design details. We then study related literature works on GAN model performance evaluation and measure the quality of generated image datasets. We conduct an experiment to implement the Style based GAN on leaf dataset(Kumar et al., 2012) to generate leaf images that are similar to the ground truth. We describe in detail various steps in the experiment like data collection, preprocessing, training and configuration. Also, we evaluate the performance of Style GAN training model on the leaf dataset. Results: We present the results of literature review and the conducted experiment to address the research questions. We review and elaborate various GAN architecture and their key contributions. We also review numerous qualitative and quantitative evaluation metrics to measure the performance of a GAN architecture. We then present the generated synthetic data samples from the Style based GAN learning model at various training GPU hours and the latest synthetic data sample after training for around ~8 GPU days on leafsnap dataset (Kumar et al., 2012). The results we present have a decent quality to expand the dataset for most of the tested samples. We then visualize the model performance by tensorboard graphs and an overall computational graph for the learning model. We calculate the Fréchet Inception Distance score for our leaf Style GAN and is observed to be 26.4268 (the lower the better). Conclusion: We conclude the research work with an overall review of sections in the paper. The generated fake samples are much similar to the input ground truth and appear to be convincingly realistic for a human visual judgement. However, the calculated FID score to measure the performance of the leaf StyleGAN accumulates a large value compared to that of Style GANs original celebrity HD faces image data set. We attempted to analyze the reasons for this large score.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Synthetic datasets"

1

Drechsler, Jörg. Synthetic Datasets for Statistical Disclosure Control. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Drechsler, Jörg. Synthetic datasets for statistical disclosure control: Theory and implementation. New York: Springer, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Beckfield, Jason. Key Concepts, Measures, and Data. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780190492472.003.0001.

Full text
Abstract:
In this chapter, the author discusses several concepts, measures, and datasets that open new avenues for social epidemiologists to use political sociology to explain the distribution of population health. He also describes how social epidemiological concepts can feed back into political sociology to advance its agenda of understanding the social organization of power. Three themes integrate the concerns of these still-disconnected fields: (1) conceptualization of etiologic period, (2) definition of population, and (3) distinction between population averages and population variances. Mutual appreciation of the key concepts for each field is essential for the development of synthetic engagement between political sociology and social epidemiology.
APA, Harvard, Vancouver, ISO, and other styles
4

Taberlet, Pierre, Aurélie Bonin, Lucie Zinger, and Eric Coissac. Environmental DNA. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198767220.001.0001.

Full text
Abstract:
Environmental DNA (eDNA), i.e. DNA released in the environment by any living form, represents a formidable opportunity to gather high-throughput and standard information on the distribution or feeding habits of species. It has therefore great potential for applications in ecology and biodiversity management. However, this research field is fast-moving, involves different areas of expertise and currently lacks standard approaches, which calls for an up-to-date and comprehensive synthesis. Environmental DNA for biodiversity research and monitoring covers current methods based on eDNA, with a particular focus on “eDNA metabarcoding”. Intended for scientists and managers, it provides the background information to allow the design of sound experiments. It revisits all steps necessary to produce high-quality metabarcoding data such as sampling, metabarcode design, optimization of PCR and sequencing protocols, as well as analysis of large sequencing datasets. All these different steps are presented by discussing the potential and current challenges of eDNA-based approaches to infer parameters on biodiversity or ecological processes. The last chapters of this book review how DNA metabarcoding has been used so far to unravel novel patterns of diversity in space and time, to detect particular species, and to answer new ecological questions in various ecosystems and for various organisms. Environmental DNA for biodiversity research and monitoring constitutes an essential reading for all graduate students, researchers and practitioners who do not have a strong background in molecular genetics and who are willing to use eDNA approaches in ecology and biomonitoring.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Synthetic datasets"

1

Drechsler, Jörg. "Fully Synthetic Datasets." In Synthetic Datasets for Statistical Disclosure Control, 39–51. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Drechsler, Jörg. "Partially Synthetic Datasets." In Synthetic Datasets for Statistical Disclosure Control, 53–63. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Brinkhoff, Thomas. "Real and Synthetic Test Datasets." In Encyclopedia of Database Systems, 1–5. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_1357-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Brinkhoff, Thomas. "Real and Synthetic Test Datasets." In Encyclopedia of Database Systems, 2339–44. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_1357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Brinkhoff, Thomas. "Real and Synthetic Test Datasets." In Encyclopedia of Database Systems, 3110–14. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_1357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Drechsler, Jörg. "Background on Multiply Imputed Synthetic Datasets." In Synthetic Datasets for Statistical Disclosure Control, 7–11. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Drechsler, Jörg. "Introduction." In Synthetic Datasets for Statistical Disclosure Control, 1–5. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Drechsler, Jörg. "Chances and Obstacles for Multiply Imputed Synthetic Datasets." In Synthetic Datasets for Statistical Disclosure Control, 99–102. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Drechsler, Jörg. "Background on Multiple Imputation." In Synthetic Datasets for Statistical Disclosure Control, 13–21. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Drechsler, Jörg. "The IAB Establishment Panel." In Synthetic Datasets for Statistical Disclosure Control, 23–25. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0326-5_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Synthetic datasets"

1

Cooley and Robinson. "Synthetic focus imaging using partial datasets." In Proceedings of IEEE Ultrasonics Symposium ULTSYM-94. IEEE, 1994. http://dx.doi.org/10.1109/ultsym.1994.401884.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sokolov, N. A., E. P. Vasiliev, and A. A. Getmanskaya. "Generation and Study of the Synthetic Brain Electron Microscopy Dataset for Segmentation Purpose." In 32nd International Conference on Computer Graphics and Vision. Keldysh Institute of Applied Mathematics, 2022. http://dx.doi.org/10.20948/graphicon-2022-706-714.

Full text
Abstract:
Advanced microscopy technologies such as electron microscopy have opened up a new field of vision for biomedical researchers. The use of artificial intelligence methods for processing EM data is largely difficult due to the small amount of annotated data at the training stage. Therefore, we add synthetic images to an annotated real EM dataset or use a fully synthetic training dataset. In this work, we present an algorithm for the synthesis of 6 types of organelles. Based on the EPFL dataset, a training set of 860 real fragments 256x256 (ORG) and 6000 synthetic ones (SYN), as well as their combination (MIX), were generated. An experiment of training models for segmentation into 5 and 6 classes showed that, despite the imperfection of synthetic data, for an axon poorly represented in the training data set, the use of a synthetic data set improves the Dice metric from 0.3 on the original dataset to 0.8 on the mixed and synthetic datasets. The synthetic data strategy gives annotations for free, but shifts the effort to producing sufficiently realistic images.
APA, Harvard, Vancouver, ISO, and other styles
3

Kar, Amlan, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, and Sanja Fidler. "Meta-Sim: Learning to Generate Synthetic Datasets." In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019. http://dx.doi.org/10.1109/iccv.2019.00465.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Barth, R. "Large Synthetic Datasets for Improved Deep Learning." In Scientific Symposium FAIR Data Sciences for Green Life Sciences. Wageningen University & Research, 2018. http://dx.doi.org/10.18174/fairdata2018.16276.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Silva, Henrique Matheus F. da, Rafael S. Pereira Silva, and Fábio Porto. "SAGAD: Synthetic Data Generator for Tabular Datasets." In Simpósio Brasileiro de Banco de Dados. Sociedade Brasileira de Computação - SBC, 2021. http://dx.doi.org/10.5753/sbbd.2021.17861.

Full text
Abstract:
The accuracy of machine learning models implementing classification tasks is strongly dependent on the quality of the training dataset. This is a challenge for domains where data is not abundant, such as personalized medicine,or unbalance, as in the case of images of plant species, where some species have very few samples while others offer large number of samples. In both scenarios,the resulting models tend to offer poor performance. In this paper we present two techniques to face this challenge. Firstly, we present a data augmentation method called SAGAD, based on conditional entropy. SAGAD can balance minority classes in conjunction with the increase of the overall size of the trainingset. In our experiments, the application of SAGAD in small data problems with different machine learning algorithms yielded significant improvement in performance. We additionally present an extension of SAGAD for iterative learning algorithms, called DABEL, which generates new samples for each epoch usingan optimization approach that continuously improves the model’s performance. The adoption of SAGAD and DABEL consistently extends the training dataset towards improved target classification performance.
APA, Harvard, Vancouver, ISO, and other styles
6

Gao, Haoqi, and Koichi Ogawara. "Face alignment by learning from small real datasets and large synthetic datasets." In 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML). IEEE, 2022. http://dx.doi.org/10.1109/cacml55074.2022.00073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

de Melo, Vinicius V., and Ana C. Lorena. "Using Complexity Measures to Evolve Synthetic Classification Datasets." In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 2018. http://dx.doi.org/10.1109/ijcnn.2018.8489645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Belenko, Viacheslav, Vasiliy Krundyshev, and Maxim Kalinin. "Synthetic datasets generation for intrusion detection in VANET." In SIN '18: 11th International Conference On Security Of Information and Networks. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3264437.3264479.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Basak, Shubhajit, Hossein Javidnia, Faisal Khan, Rachel McDonnell, and Michael Schukat. "Methodology for Building Synthetic Datasets with Virtual Humans." In 2020 31st Irish Signals and Systems Conference (ISSC). IEEE, 2020. http://dx.doi.org/10.1109/issc49989.2020.9180188.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Forestier, Germain, Francois Petitjean, Hoang Anh Dau, Geoffrey I. Webb, and Eamonn Keogh. "Generating Synthetic Time Series to Augment Sparse Datasets." In 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 2017. http://dx.doi.org/10.1109/icdm.2017.106.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Synthetic datasets"

1

Agarwal, Deborah A., Marty Humphrey, Catharine van Ingen, Norm Beekwilder, Monte Goode, Keith Jackson, Matt Rodriguez, and Robin Weber. Fluxnet Synthesis Dataset Collaboration Infrastructure. Office of Scientific and Technical Information (OSTI), February 2008. http://dx.doi.org/10.2172/951101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tremblay, T., and M. Lamothe. New contributions to the ice-flow chronology in the Boothia-Lancaster Ice Stream catchment area. Natural Resources Canada/CMSS/Information Management, 2022. http://dx.doi.org/10.4095/331062.

Full text
Abstract:
Within the Boothia-Lancaster Ice Stream (BLIS) catchment area, ice flow patterns were reconstructed based on the synthesis of striation directions and cross-cutting relationships, transport patterns of erratic boulders, glacial landforms, cold-based glacial landsystems, and ice-retreat chronology. New ArcticDEM data, high-definition satellite imagery and multibeam echosounder bathymetric datasets provided increased details on ice flow indicators. Convergent high-velocity ice flows through the BLIS main axis were major, persistent features in the northeastern Laurentide Ice Sheet through the last glaciation, and this study highlights intensity fluctuations and ice flow pattern variations that occurred during that time. Highly contrasting glacial geomorphology, notably in the abundance of moraines, reflects marked differences in ice-margin retreat rates and patterns during deglaciation between the western and eastern sides of the BLIS.
APA, Harvard, Vancouver, ISO, and other styles
3

Lasko, Kristofer. Incorporating Sentinel-1 SAR imagery with the MODIS MCD64A1 burned area product to improve burn date estimates and reduce burn date uncertainty in wildland fire mapping. Engineer Research and Development Center (U.S.), September 2021. http://dx.doi.org/10.21079/11681/42122.

Full text
Abstract:
Wildland fires result in a unique signal detectable by multispectral remote sensing and synthetic aperture radar (SAR). However, in many regions, such as Southeast Asia, persistent cloud cover and aerosols temporarily obstruct multispectral satellite observations of burned area, including the MODIS MCD64A1 Burned Area Product (BAP). Multiple days between cloud free pre- and postburn MODIS observations result in burn date uncertainty. We incorporate cloud-penetrating, C-band SAR-with the MODIS MCD64A BAP in Southeast Asia, to exploit the strengths of each dataset to better estimate the burn date and reduce the potential burn date uncertainty range. We incorporate built-in quality control using MCD64A1 to reduce erroneous pixel updating. We test the method over part of Laos and Thailand during April 2016 and found average uncertainty reduction of 4.5 d, improving 15% of MCD64A1 pixels. A new BAP could improve monitoring temporal trends of wildland fires, air quality studies and monitoring post-fire vegetation dynamics.
APA, Harvard, Vancouver, ISO, and other styles
4

Heifetz, Yael, and Michael Bender. Success and failure in insect fertilization and reproduction - the role of the female accessory glands. United States Department of Agriculture, December 2006. http://dx.doi.org/10.32747/2006.7695586.bard.

Full text
Abstract:
The research problem. Understanding of insect reproduction has been critical to the design of insect pest control strategies including disruptions of mate-finding, courtship and sperm transfer by male insects. It is well known that males transfer proteins to females during mating that profoundly affect female reproductive physiology, but little is known about the molecular basis of female mating response and no attempts have yet been made to interfere with female post-mating responses that directly bear on the efficacy of fertilization. The female reproductive tract provides a crucial environment for the events of fertilization yet thus far those events and the role of the female tract in influencing them are poorly understood. For this project, we have chosen to focus on the lower reproductive tract because it is the site of two processes critical to reproduction: sperm management (storage, maintenance, and release from storage) and fertilization. E,fforts during this project period centered on the elucidation of mating responses in the female lower reproductive tract The central goals of this project were: 1. To identify mating-responsive genes in the female lower reproductive tract using DNA microarray technology. 2. In parallel, to identify mating-responsive genes in these tissues using proteomic assays (2D gels and LC-MS/MS techniques). 3. To integrate proteomic and genomic analyses of reproductive tract gene expression to identify significant genes for functional analysis. Our main achievements were: 1. Identification of mating-responsive genes in the female lower reproductive tract. We identified 539 mating-responsive genes using genomic and proteomic approaches. This analysis revealed a shift from gene silencing to gene activation soon after mating and a peak in differential gene expression at 6 hours post-mating. In addition, comparison of the two datasets revealed an expression pattern consistent with the model that important reproductive proteins are pre-programmed for synthesis prior to mating. This work was published in Mack et al. (2006). Validation experiments using real-time PCR techniques suggest that microarray assays provide a conservativestimate of the true transcriptional activity in reproductive tissues. 2.lntegration of proteomics and genomics data sets. We compared the expression profiles from DNA microarray data with the proteins identified in our proteomic experiments. Although comparing the two data sets poses analyical challenges, it provides a more complete view of gene expression as well as insights into how specific genes may be regulated. This work was published in Mack et al. (2006). 3. Development of primary reproductive tract cell cultures. We developed primary cell cultures of dispersed reproductive tract cell types and determined conditions for organ culture of the entire reproductive tract. This work will allow us to rapidly screen mating-responsive genes for a variety of reproductive-tract specifi c functions. Scientific and agricultural significance. Together, these studies have defined the genetic response to mating in a part of the female reproductive tract that is critical for successful fertllization and have identified alarge set of mating-responsive genes. This work is the first to combine both genomic and proteomic approaches in determining female mating response in these tissues and has provided important insights into insect reproductive behavior.
APA, Harvard, Vancouver, ISO, and other styles
5

Allen, Kathy, Andy Nadeau, and Andy Robertston. Natural resource condition assessment: Salinas Pueblo Missions National Monument. National Park Service, May 2022. http://dx.doi.org/10.36967/nrr-2293613.

Full text
Abstract:
The Natural Resource Condition Assessment (NRCA) Program aims to provide documentation about the current conditions of important park natural resources through a spatially explicit, multi-disciplinary synthesis of existing scientific data and knowledge. Findings from the NRCA will help Salinas Pueblo Missions National Monument (SAPU) managers to develop near-term management priorities, engage in watershed or landscape scale partnership and education efforts, conduct park planning, and report program performance (e.g., Department of the Interior’s Strategic Plan “land health” goals, Government Performance and Results Act). The objectives of this assessment are to evaluate and report on current conditions of key park resources, to evaluate critical data and knowledge gaps, and to highlight selected existing stressors and emerging threats to resources or processes. For the purpose of this NRCA, staff from the National Park Service (NPS) and Saint Mary’s University of Minnesota – GeoSpatial Services (SMUMN GSS) identified key resources, referred to as “components” in the project. The selected components include natural resources and processes that are currently of the greatest concern to park management at SAPU. The final project framework contains nine resource components, each featuring discussions of measures, stressors, and reference conditions. This study involved reviewing existing literature and, where appropriate, analyzing data for each natural resource component in the framework to provide summaries of current condition and trends in selected resources. When possible, existing data for the established measures of each component were analyzed and compared to designated reference conditions. A weighted scoring system was applied to calculate the current condition of each component. Weighted Condition Scores, ranging from zero to one, were divided into three categories of condition: low concern, moderate concern, and significant concern. These scores help to determine the current overall condition of each resource. The discussions for each component, found in Chapter 4 of this report, represent a comprehensive summary of current available data and information for these resources, including unpublished park information and perspectives of park resource managers, and present a current condition designation when appropriate. Each component assessment was reviewed by SAPU resource managers, NPS Southern Colorado Plateau Network (SCPN) staff, or outside experts. Existing literature, short- and long-term datasets, and input from NPS and other outside agency scientists support condition designations for components in this assessment. However, in some cases, data were unavailable or insufficient for several of the measures of the featured components. In other instances, data establishing reference condition were limited or unavailable for components, making comparisons with current information inappropriate or invalid. In these cases, it was not possible to assign condition for the components. Current condition was not able to be determined for six of the ten components due to these data gaps. For those components with sufficient available data, the overall condition varied. Two components were determined to be in good condition: dark night skies and paleontological resources. However, both were at the edge of the good condition range, and any small decline in conditions could shift them into the moderate concern range. Of the components in good condition, a trend could not be assigned for paleontological resources and dark night skies is considered stable. Two components (wetland and riparian communities and viewshed) were of moderate concern, with no trend assigned for wetland and riparian communities and a stable trend for viewshed. Detailed discussion of these designations is presented in Chapters 4 and 5 of this report. Several park-wide threats and stressors influence the condition of priority resources in SAPU...
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography