Academic literature on the topic 'Synthetic Database Generation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Synthetic Database Generation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Synthetic Database Generation"

1

Priyadarshini, Pallavi, Fengqiong Qin, Ee-Peng Lim, and Wee-Keong Ng. "Parameter driven synthetic web database generation." Journal of Systems and Software 69, no. 1-2 (2004): 29–42. http://dx.doi.org/10.1016/s0164-1212(03)00002-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sanghi, Anupam, Shadab Ahmed, and Jayant R. Haritsa. "Projection-compliant database generation." Proceedings of the VLDB Endowment 15, no. 5 (2022): 998–1010. http://dx.doi.org/10.14778/3510397.3510398.

Full text
Abstract:
Synthesizing data using declarative formalisms has been persuasively advocated in contemporary data generation frameworks. In particular, they specify operator output volumes through row-cardinality constraints. However, thus far, adherence to these volumetric constraints has been limited to the Filter and Join operators. A critical deficiency is the lack of support for the Projection operator, which is at the core of basic SQL constructs such as Distinct, Union and Group By. The technical challenge here is that cardinality unions in multi-dimensional space, and not mere summations, need to be captured in the generation process. Further, dependencies across different data subspaces need to be taken into account. We address the above lacuna by presenting PiGen , a dynamic data generator that incorporates Projection cardinality constraints in its ambit. The design is based on a projection subspace division strategy that supports the expression of constraints using optimized linear programming formulations. Further, techniques of symmetric refinement and workload decomposition are introduced to handle constraints across different projection subspaces. Finally, PiGen supports dynamic generation, where data is generated on-demand during query processing, making it amenable to Big Data environments. A detailed evaluation on workloads derived from real-world and synthetic benchmarks demonstrates that PiGen can accurately and efficiently model Projection outcomes, representing an essential step forward in customized database generation.
APA, Harvard, Vancouver, ISO, and other styles
3

Pujol, David, Amir Gilad, and Ashwin Machanavajjhala. "PreFair: Privately Generating Justifiably Fair Synthetic Data." Proceedings of the VLDB Endowment 16, no. 6 (2023): 1573–86. http://dx.doi.org/10.14778/3583140.3583168.

Full text
Abstract:
When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, rendering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.
APA, Harvard, Vancouver, ISO, and other styles
4

Pavez, Vicente, Gabriel Hermosilla, Francisco Pizarro, Sebastián Fingerhuth, and Daniel Yunge. "Thermal Image Generation for Robust Face Recognition." Applied Sciences 12, no. 1 (2022): 497. http://dx.doi.org/10.3390/app12010497.

Full text
Abstract:
This article shows how to create a robust thermal face recognition system based on the FaceNet architecture. We propose a method for generating thermal images to create a thermal face database with six different attributes (frown, glasses, rotation, normal, vocal, and smile) based on various deep learning models. First, we use StyleCLIP, which oversees manipulating the latent space of the input visible image to add the desired attributes to the visible face. Second, we use the GANs N’ Roses (GNR) model, a multimodal image-to-image framework. It uses maps of style and content to generate thermal imaging from visible images, using generative adversarial approaches. Using the proposed generator system, we create a database of synthetic thermal faces composed of more than 100k images corresponding to 3227 individuals. When trained and tested using the synthetic database, the Thermal-FaceNet model obtained a 99.98% accuracy. Furthermore, when tested with a real database, the accuracy was more than 98%, validating the proposed thermal images generator system.
APA, Harvard, Vancouver, ISO, and other styles
5

Dinges, Laslo, Ayoub Al-Hamadi, Moftah Elzobi, Sherif El-etriby, and Ahmed Ghoneim. "ASM Based Synthesis of Handwritten Arabic Text Pages." Scientific World Journal 2015 (2015): 1–18. http://dx.doi.org/10.1155/2015/323575.

Full text
Abstract:
Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.
APA, Harvard, Vancouver, ISO, and other styles
6

Sazonova, Kateryna, Olena Nosovets, Vitalii Babenko, and Olga Averianova. "GENERATION OF SYNTHETICAL MEDICAL DATA BY MDR-ANALYSIS." Proceedings of the National Aviation University 87, no. 2 (2021): 31–36. http://dx.doi.org/10.18372/2306-1472.87.15719.

Full text
Abstract:
Purpose: The purpose of this article is to outline an algorithm for generating synthetic medical data in order to augment small samples of data. Methods: To achieve the research goal, methods such as: correlation analysis (to identify significant variables and the relationships between them), MDR analysis (to build logical chains of relationships between medical data), and regression analysis (to model medical data variables to use this to generate synthetic data) were used. Results: A database of heart failure patients that is publicly available was used to test the developed algorithm for generating synthetic medical data in action; as a result, statistical relationships between data were found and used to build linear regression models. Discussion: The proposed algorithm allows, with a few simple, yet important actions, to perform the generation of medical data, which makes it possible to obtain large data sets that can be used to implement machine learning methods in any tasks related to medicine.
APA, Harvard, Vancouver, ISO, and other styles
7

Burman, Nitin, Claudia Manetti, Paulo Tostes, Joost Lumens, and Jan D'hooge. "A pipeline to enable large-scale generation of diverse 2D cardiac synthetic ultrasound recordings corresponding to healthy and heart failure virtual patients." Journal of the Acoustical Society of America 152, no. 4 (2022): A279. http://dx.doi.org/10.1121/10.0016267.

Full text
Abstract:
Simulated ultrasound (US) data are widely used in echocardiography to develop and validate rapidly growing convolutional neural networks (CNNs) based learning algorithms for image processing and analysis. In this context, a large and diverse database of synthetic US scans is considered vital for CNN training purposes, as clinical US data are scarce and difficult to access. Major hurdles in creating an extensive database are the long US simulation time and unstable heart models for extreme parameter settings. Here, we developed and implemented a cardiac US simulation pipeline that kinematically connects two state-of-the-art solutions in the field of US simulation (COLE) and cardiac modelling (CircAdapt), benefiting from the fast simulation time of the convolution-based ultrasound simulator and stability of the mechanical heart model to produce 2D synthetic cardiac US recordings. Furthermore, using our pipeline, we generated diverse set of 600 2D synthetic cardiac US recordings of healthy and heart failure virtual patients with variations in the shapes, motion patterns, and functions of the heart, along with their ground truth 2D myocardial velocity profiles and deformation curves. The resulting database is a potential tool for augmenting training databases of machine learning based US image processing algorithms. [Work funded by European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 860745.]
APA, Harvard, Vancouver, ISO, and other styles
8

Kuriki, Mikaele Silva, Francisco Lledo Santos, and Cristiano Poleto. "Small-Scale Wetland Model for Synthetic Sewage Treatment." Ciência e Natura 44 (April 21, 2022): e25. http://dx.doi.org/10.5902/2179460x68834.

Full text
Abstract:
With the demand for electricity growing, the migration to renewable sources is a reality. In distributed generation, photovoltaic systems are a renewable and sustainable alternative to the main energy sources to generate electricity. Monitoring a photovoltaic system over its operating time guarantees its good performance. This requires solar radiation and temperature data measured at the installation site or the use of solarimetric stations databases. However, the differences between the results simulated with databases and with data measured at the installation site are not widely known, which would be the ideal case from a technical point of view. The aim of this study was to verify the feasibility of monitoring the performance of a 2.5 kWp photovoltaic system located in the city of Porto Alegre - Brazil using the System Advisor Model (SAM) modeling tool and a public database. Simulation results were compared using data provided by a station of the National Institute of Meteorology (INMET) with the results obtained with data measured at the site of the photovoltaic system. Differences were verified between the solar radiation measured on site and that of the INMET database, and the difference in accumulated radiation was 9.2% for the entire period analyzed. When comparing the measured and simulated alternating current energy using the radiation and temperature data measured on site for the non-shading time, it was found that the difference between the results was 0.5%. Using the INMET climate file, the monthly differences ranged from -6% to 14% and the difference in accumulated energy for the entire measurement period was 2.5%. The results showed that the use of a database measured by a public solarimetric station close to the site, in this case approximately 6 km away from the installation, is feasible for monitoring photovoltaic systems, since the differences found were not significant. This monitoring can identify system failures and performance loss over time.
APA, Harvard, Vancouver, ISO, and other styles
9

Baowaly, Mrinal Kanti, Chia-Ching Lin, Chao-Lin Liu, and Kuan-Ta Chen. "Synthesizing electronic health records using improved generative adversarial networks." Journal of the American Medical Informatics Association 26, no. 3 (2018): 228–41. http://dx.doi.org/10.1093/jamia/ocy142.

Full text
Abstract:
AbstractObjectiveThe aim of this study was to generate synthetic electronic health records (EHRs). The generated EHR data will be more realistic than those generated using the existing medical Generative Adversarial Network (medGAN) method.Materials and MethodsWe modified medGAN to obtain two synthetic data generation models—designated as medical Wasserstein GAN with gradient penalty (medWGAN) and medical boundary-seeking GAN (medBGAN)—and compared the results obtained using the three models. We used 2 databases: MIMIC-III and National Health Insurance Research Database (NHIRD), Taiwan. First, we trained the models and generated synthetic EHRs by using these three 3 models. We then analyzed and compared the models’ performance by using a few statistical methods (Kolmogorov–Smirnov test, dimension-wise probability for binary data, and dimension-wise average count for count data) and 2 machine learning tasks (association rule mining and prediction).ResultsWe conducted a comprehensive analysis and found our models were adequately efficient for generating synthetic EHR data. The proposed models outperformed medGAN in all cases, and among the 3 models, boundary-seeking GAN (medBGAN) performed the best.DiscussionTo generate realistic synthetic EHR data, the proposed models will be effective in the medical industry and related research from the viewpoint of providing better services. Moreover, they will eliminate barriers including limited access to EHR data and thus accelerate research on medical informatics.ConclusionThe proposed models can adequately learn the data distribution of real EHRs and efficiently generate realistic synthetic EHRs. The results show the superiority of our models over the existing model.
APA, Harvard, Vancouver, ISO, and other styles
10

Loisel, Hubert, Daniel Schaffer Ferreira Jorge, Rick A. Reynolds, and Dariusz Stramski. "A synthetic optical database generated by radiative transfer simulations in support of studies in ocean optics and optical remote sensing of the global ocean." Earth System Science Data 15, no. 8 (2023): 3711–31. http://dx.doi.org/10.5194/essd-15-3711-2023.

Full text
Abstract:
Abstract. Radiative transfer (RT) simulations have long been used to study the relationships between the inherent optical properties (IOPs) of seawater and light fields within and leaving the ocean, from which ocean apparent optical properties (AOPs) can be calculated. For example, inverse models used to estimate IOPs from ocean color radiometric measurements have been developed and validated using the results of RT simulations. Here we describe the development of a new synthetic optical database based on hyperspectral RT simulations across the spectral range of near-ultraviolet to near-infrared performed with the HydroLight radiative transfer code. The key component of this development is the generation of a synthetic dataset of seawater IOPs that serves as input to RT simulations. Compared to similar developments of optical databases in the past, the present dataset of IOPs is characterized by the probability distributions of IOPs that are consistent with global distributions representative of vast areas of open-ocean pelagic environments and coastal regions, covering a broad range of optical water types. The generation of synthetic data of IOPs associated with particulate and dissolved constituents of seawater was driven largely by an extensive set of field measurements of the phytoplankton absorption coefficient collected in diverse oceanic environments. Overall, the synthetic IOP dataset consists of 3320 combinations of IOPs. Additionally, the pure seawater IOPs were assumed following recent recommendations. The RT simulations were performed using 3320 combinations of input IOPs, assuming vertical homogeneity within an infinitely deep ocean. These input IOPs were used in three simulation scenarios associated with assumptions about inelastic radiative processes in the water column (not considered in previous synthetically generated optical databases) and three simulation scenarios associated with the sun zenith angle. Specifically, the simulations were made assuming no inelastic processes, the presence of Raman scattering by water molecules, and the presence of both Raman scattering and fluorescence of chlorophyll a pigment. Fluorescence of colored dissolved organic matter was omitted from all simulations. For each of these three simulation scenarios, the simulations were made for three sun zenith angles of 0, 30, and 60∘ assuming clear skies, standard atmosphere, and a wind speed of 5 m s−1. Thus, overall 29 880 RT simulations were performed. The output results of these simulations include radiance distributions, plane and scalar irradiances, and a whole set of AOPs, including remote-sensing reflectance, vertical diffuse attenuation coefficients, and mean cosines, where all optical variables are reported in the spectral range of 350 to 750 nm at 5 nm intervals for different depths between the sea surface and 50 m. The consistency of this new synthetic database has been assessed through comparisons with in situ data and previously developed empirical relationships involving IOPs and AOPs. The database is available at the Dryad open-access repository of research data (https://doi.org/10.6076/D1630T, Loisel et al., 2023).
APA, Harvard, Vancouver, ISO, and other styles
More sources
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography