Tesis sobre el tema "Small datasets"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 26 mejores tesis para su investigación sobre el tema "Small datasets".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Shi, Xiaojin. "Visual learning from small training datasets /". Diss., Digital Dissertations Database. Restricted to UC campuses, 2005. http://uclibs.org/PID/11984.
Texto completoVan, Koten Chikako y n/a. "Bayesian statistical models for predicting software effort using small datasets". University of Otago. Department of Information Science, 2007. http://adt.otago.ac.nz./public/adt-NZDU20071009.120134.
Texto completoZhao, Amy(Xiaoyu Amy). "Learning distributions of transformations from small datasets for applied image synthesis". Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/128342.
Texto completoCataloged from PDF of thesis. "February 2020."
Includes bibliographical references (pages 75-91).
Much of the recent research in machine learning and computer vision focuses on applications with large labeled datasets. However, in realistic settings, it is much more common to work with limited data. In this thesis, we investigate two applications of image synthesis using small datasets. First, we demonstrate how to use image synthesis to perform data augmentation, enabling the use of supervised learning methods with limited labeled data. Data augmentation -- typically the application of simple, hand-designed transformations such as rotation and scaling -- is often used to expand small datasets. We present a method for learning complex data augmentation transformations, producing examples that are more diverse, realistic, and useful for training supervised systems than hand-engineered augmentation. We demonstrate our proposed augmentation method for improving few-shot object classification performance, using a new dataset of collectible cards with fine-grained differences. We also apply our method to medical image segmentation, enabling the training of a supervised segmentation system using just a single labeled example. In our second application, we present a novel image synthesis task: synthesizing time lapse videos of the creation of digital and watercolor paintings. Using a recurrent model of paint strokes and a novel training scheme, we create videos that tell a plausible visual story of the painting process.
by Amy (Xiaoyu) Zhao.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Arzamasov, Vadim [Verfasser] y K. [Akademischer Betreuer] Böhm. "Comprehensible and Robust Knowledge Discovery from Small Datasets / Vadim Arzamasov ; Betreuer: K. Böhm". Karlsruhe : KIT-Bibliothek, 2021. http://d-nb.info/1238148166/34.
Texto completoLazarovici, Allan 1979. "Development of gene-finding algorithms for fungal genomes : dealing with small datasets and leveraging comparative genomics". Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/29681.
Texto completoIncludes bibliographical references (leaves 60-62).
A computer program called FUNSCAN was developed which identifies protein coding regions in fungal genomes. Gene structural and compositional properties are modeled using a Hidden Markov Model. Separate training and testing sets for FUNSCAN were obtained by aligning cDNAs from an organism to their genomic loci, generating a 'gold standard' set of annotated genes. The performance of FUNSCAN is competitive with other computer programs design to identify protein coding regions in fungal genomes. A technique called 'Training Set Augmentation' is described which can be used to train FUNSCAN when only a small training set of genes is available. Techniques that combine alignment algorithms with FUNSCAN to identify novel genes are also discussed and explored.
by Allan Lazarovici.
M.Eng.and S.B.
Horečný, Peter. "Metody segmentace obrazu s malými trénovacími množinami". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412996.
Texto completoLucy, Caleb O. "Rapid Acquisition of Low Cost High-Resolution Elevation Datasets Using a Small Unmanned Aircraft System: An Application for Measuring River Geomorphic Change". Thesis, Boston College, 2015. http://hdl.handle.net/2345/bc-ir:104880.
Texto completoEmerging methods for acquiring high-resolution topographic datasets have the potential to open new opportunities for quantitative geomorphic analysis. This study demonstrates a technique for rapidly obtaining structure from motion (SfM) photogrammetry-derived digital elevation models (DEMs) using aerial photographs acquired with a small unmanned aircraft system (sUAS). In conjunction with collection of aerial imagery, study sites are surveyed with a differential global position system (dGPS)-enabled total station (TPS) for georeferencing and accuracy assessment of sUAS SfM measurements. Results from sUAS SfM surveys of upland river channels in northern New England consistently produce DEMs and orthoimagery with ~1 cm pixel resolution. One-to-one point measurement comparisons demonstrate sUAS SfM systematically measures elevations about 0.16 ±0.23 m higher than TPS equivalents (0.28 m RMSE). Bathymetric (i.e. submerged or subaqueous) sUAS SfM measurements are 0.20 ±0.24 m (0.31 m RMSE) higher than TPS, whereas exposed (subaerial) points are 0.14 ±0.22 m (0.26 m RMSE) higher than TPS. Serial comparison of DEMs obtained before and after a two-year flood event indicates cut bank erosion and point bar deposition of ~0.10 m, consistent with expectations for channel evolution. DEMs acquired with the sUAS SfM are of comparable resolution but a lower cost alternative to those from airborne light detection and ranging (lidar), the current standard for topographic imagery. Furthermore, lidar is not available for much of the United States and sUAS SfM provides an efficient means for expanding coverage of this critical elevation dataset. Due to their utility in municipal, land use, and emergency planning, the demand for high-resolution topographic datasets continues to increase among governments, research institutions, and private sector consulting firms. Terrain analysis using sUAS SfM could therefore be a boon to river management and restoration in northern New England and other regions
Thesis (MS) — Boston College, 2015
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Geology and Geophysics
Oppon, Ekow CruickShank. "Synergistic use of promoter prediction algorithms: a choice of small training dataset?" Thesis, University of the Western Cape, 2000. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_8222_1185436339.
Texto completoPromoter detection, especially in prokaryotes, has always been an uphill task and may remain so, because of the many varieties of sigma factors employed by various organisms in transcription. The situation is made more complex by the fact, that any seemingly unimportant sequence segment may be turned into a promoter sequence by an activator or repressor (if the actual promoter sequence is made unavailable). Nevertheless, a computational approach to promoter detection has to be performed due to number of reasons. The obvious that comes to mind is the long and tedious process involved in elucidating promoters in the &lsquo
wet&rsquo
laboratories not to mention the financial aspect of such endeavors. Promoter detection/prediction of an organism with few characterized promoters (M.tuberculosis) as envisaged at the beginning of this work was never going to be easy. Even for the few known Mycobacterial promoters, most of the respective sigma factors associated with their transcription were not known. If the information (promoter-sigma) were available, the research would have been focused on categorizing the promoters according to sigma factors and training the methods on the respective categories. That is assuming that, there would be enough training data for the respective categories. Most promoter detection/prediction studies have been carried out on E.coli because of the availability of a number of experimentally characterized promoters (+- 310). Even then, no researcher to date has extended the research to the entire E.coli genome.
Forsberg, Fredrik y Gonzalez Pierre Alvarez. "Unsupervised Machine Learning: An Investigation of Clustering Algorithms on a Small Dataset". Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16300.
Texto completoGay, Antonin. "Pronostic de défaillance basé sur les données pour la prise de décision en maintenance : Exploitation du principe d'augmentation de données avec intégration de connaissances à priori pour faire face aux problématiques du small data set". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0059.
Texto completoThis CIFRE PhD is a joint project between ArcelorMittal and the CRAN laboratory, with theaim to optimize industrial maintenance decision-making through the exploitation of the available sources of information, i.e. industrial data and knowledge, under the industrial constraints presented by the steel-making context. Current maintenance strategy on steel lines is based on regular preventive maintenance. Evolution of preventive maintenance towards a dynamic strategy is done through predictive maintenance. Predictive maintenance has been formalized within the Prognostics and Health Management (PHM) paradigm as a seven steps process. Among these PHM steps, this PhD's work focuses on decision-making and prognostics. The Industry 4.0 context put emphasis on data-driven approaches, which require large amount of data that industrial systems cannot ystematically supply. The first contribution of the PhD consists in proposing an equation to link prognostics performances to the number of available training samples. This contribution allows to predict prognostics performances that could be obtained with additional data when dealing with small datasets. The second contribution of the PhD focuses on evaluating and analyzing the performance of data augmentation when applied to rognostics on small datasets. Data augmentation leads to an improvement of prognostics performance up to 10%. The third contribution of the PhD consists in the integration of expert knowledge into data augmentation. Statistical knowledge integration proved efficient to avoid performance degradation caused by data augmentation under some unfavorable conditions. Finally, the fourth contribution consists in the integration of prognostics in maintenance decision-making cost modeling and the evaluation of prognostics impact on maintenance decision cost. It demonstrates that (i) the implementation of predictive maintenance reduces maintenance cost up to 18-20% and ii) the 10% prognostics improvement can reduce maintenance cost by an additional 1%
Tilgner, Martin. "Detekce chodců ve snímku pomocí metod strojového učení". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400707.
Texto completoGuin, Agneev. "Terrain Classification to find Drivable Surfaces using Deep Neural Networks : Semantic segmentation for unstructured roads combined with the use of Gabor filters to determine drivable regions trained on a small dataset". Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-222021.
Texto completoAutonoma fordon står inför olika utmaningar under svåra terrängförhållanden som landsbygds- eller skogsvägar på grund av bristen av körfältinformation, vägskyltar och trafikljus. I denna avhandling undersöker vi ett nytt tillvägagångssätt att använda Djupa Neurala Nätverk (DNN) för att klassificera terrängytor utifrån deras körbarhet i syfte att stödja autonom navigering i ostrukturerade miljöer.Till exempel kan terrängytor klassificeras som asfalt, grus, gräs, lera, snö etc. Bilder från kameran monterad på en gruvbil användes för att utföra semantisk segmentering och klassificera vägytor. Bilderna delades manuellt upp i träningsset på 16 samt 9 klasser för alla relevanta klasser respektive körbara klasser. Ett litet men mångsidigt dataset med 100 bilder förstärktes med närliggande bilder från videoklippen för att expandera detta dataset. Neurala nätverk användes för att testa prestandan hos klassificeringen under dessa terrängförhållanden. Det förtränade nätverket AlexNet jämfördes med nätverken utan träning. Gaborfilter, kända för att särskilja texturerade ytor, användes vidare för att förbättra resultaten av det neurala nätverket. Experimenten visar att förtränade nätverk presterar bra med små dataset och många klasser. En kombination av Gaborfilter med förtränade nätverk kan skapa en pålitlig navigationsväg under svåra terrängförhållanden. Även om resultaten verkar positiva för bilder som liknar träningsbildscenen presterar nätverken inte bra i andra situationer. Även om testen tyder på att stora dataset krävs för tillförlitliga resultat, är detta ett steg närmare att göra de autonoma bilarna körbara i svåra terrängförhållanden.
Durand, Marie. "La découverte et la compréhension des profils d’apprenants : classification semi-supervisée et acquisition d’une langue seconde". Thesis, Paris 8, 2019. http://www.theses.fr/2019PA080029.
Texto completoThis thesis aims to develop an effective methodology for the discovery and description of the learner's profile of an L2 based on acquisition data (perception, understanding and production). We want to detect patterns in the acquisition behaviours of subgroups of learners, taking into account the multidimensional aspect of the L2 learning process. The proposed methodology belongs to the field of artificial intelligence, more specifically to semi supervised clustering techniques.Our algorithm has been applied to the data base of the VILLA project, which includes the performance of learners from 5 different source languages (French, Italian, Dutch, German and English) with Polish as the target language. 156 adult learners were each tested with a variety of tasks in Polish during 14 hours of teaching session, starting from the initial exposure. These tests made it possible to evaluate their performance on the levels of linguistic analysis that are phonology, morphology, morphosyntax and lexicon. The database also includes their sensitivity to input characteristics, such as the frequency and transparency of lexical elements used in linguistic tasks.The similarity measure used in traditional clustering techniques is revisited in this work in order to evaluate the distance between two learners from an acquisitionist point of view. It is based on the identification of the learner's response strategy to a specific language test structure. We show that this measure makes it possible to detect the presence or absence in the learner's responses of a strategy similar to the LC flexional system, and so enables our algorithm to provide a resulting classification consistent with second language acquisition research. As a result, we claim that our algorithm might be relevant in the empirical establishment of learners' profiles and the discovery of new opportunities for reflection or analysis
Hung-YuChen y 陳泓佑. "Learning from small datasets containing nominal attributes". Thesis, 2019. http://ndltd.ncl.edu.tw/handle/y2qgaw.
Texto completo國立成功大學
資訊管理研究所
107
In many small-data-learning problems, owing to the incomplete data structure, explicit information for decision makers is limited. Although machine learning algorithms are extensively applied to extract knowledge, most of them are developed without considering whether the training sets can fully represent the population properties. Focusing on small data which contains nominal inputs and continuous outputs, this paper develops an effective sample generating procedure based on fuzzy theories to tackle the learning issue by data preprocessing. According to the derived fuzzy relations between categories and continuous outputs, the possibilities of the combinations of categories (virtual samples) can be aggregated when continuous outputs are given. Proper virtual samples are further selected by using fuzzy alpha-cut on the possibility distributions, and these are added to the training sets to form new ones. In the experiment, sixteen datasets taken from the UC Irvine Machine Learning Repository are examined with back-propagation neural networks and support vector regressions. The results reveal that the forecasting accuracies of the two models are significantly improved when they are built with the proposed new training sets. Moreover, the results also indicate the proposed method outperforms bootstrap aggregating and the synthetic minority over-sampling technique-Nominal-Continuous with the greatest amount of statistical support.
Chun-WeiChen y 陳俊偉. "Applying Box-and-Whisker Plots for Learning from Small Datasets". Thesis, 2011. http://ndltd.ncl.edu.tw/handle/24953023842861205688.
Texto completoCHOU, TSAI-YUAN y 周才淵. "Generating Virtual Attributes by Fuzzy Clustering Algorithm for Small Datasets Learning". Thesis, 2015. http://ndltd.ncl.edu.tw/handle/9dnjea.
Texto completoChien-ChihChen y 陳建智. "Employing Dependent Virtual Samples for Learning More Information from Small Datasets". Thesis, 2011. http://ndltd.ncl.edu.tw/handle/50934699766590312854.
Texto completoHong-YangLin y 林泓暘. "Generating Aggregated Weights to Improve the Predictive Accuracy of Single-Model Ensemble Numerical Predicting Method in Small Datasets". Thesis, 2017. http://ndltd.ncl.edu.tw/handle/6b385s.
Texto completo國立成功大學
工業與資訊管理學系
105
In the age of information explosion,it’s easier to reach out to information,so how to explore and conclude some useful information in limited data is a pretty important study in small data learning.nowadays,the studies in ensemble method mostly focus on the process instead of the result.the methods in datamining can be divided into classification and prediction.in ensemble method ,voting is the most common way to deal with classification,but in numerical prediction problem,average method is the most common way to calculate the result,but it can be easily affected by some extreme values,especially in the circumstances of small datasets We make an improvement in Bagging.We use SVR as our prediction model ,and calculate the error value based on our prediction model,so we can get a corresponding weight value of each prediction value,and then we can calculate the compromise prediction value under the purpose of getting the smallest error value.Therefore,we can stabilize our system,and we compare our method to average method in order to examine the effect of our study,and we also take the practical case in panel factory to prove the improvement in single-model ensemble method
Hult, Jim y Pontus Pihl. "Inspecting product quality with computer vision techniques : Comparing traditional image processingmethodswith deep learning methodson small datasets in finding surface defects". Thesis, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-54056.
Texto completoWu-KuoLin y 林武國. "Rebuilding Sample Distributions for Small Dataset Learning". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/344kev.
Texto completo國立成功大學
工業與資訊管理學系
106
Over the past few decades, numerous learning algorithms have been proposed to extract knowledge from data. The majority of these algorithms have been developed with the assumption that training sets can denote populations. When the training sets contain only a few properties of their populations, the algorithms may extract minimal and/or biased knowledge for decision makers. This study develops a systematic procedure based on fuzzy theories to create new training sets by rebuilding the possible sample distributions, where the procedure contains new functions that estimate domains and a sample generating method. In this study, two real cases of a leading company in the thin film transistor liquid crystal display (TFT-LCD) industry are examined. Two learning algorithms, a back-propagation neural network and support vector regression, are employed for modeling, and two sample generation approaches, bootstrap aggregating (bagging) and the synthetic minority over-sampling technique (SMOTE), are employed to compare the accuracy of the models. The results indicate that the proposed method outperforms bagging and the SMOTE with the greatest amount of statistical support.
Chang, Ya-Chun y 張雅君. "A research on intelligent parameters searching in small dataset". Thesis, 2010. http://ndltd.ncl.edu.tw/handle/08483726387452009983.
Texto completo東海大學
工業工程與經營資訊學系
98
Practically, the experiment is the major methodology in the R&D stage of searching for the right parameter settings of a new product development. However, the searching procedure is very much consuming the cost, time and manpower. That is, a method in enhancing the speed and quality of the searching process will be very much benefit in the product development process. This research is focused on the developing a searching mechanism under the small size datasets to achieve a better quality of the region of parameter settings in a faster way. A goal-oriented method is developed in effectively using the previous experiments information to limit the further explore region. This research adopted Intervalized Kernel Density Estimation (IKDE) method to generate the virtual dataset based on the existed real small dataset. And then, Support Vector Machine (SVM) is used to find the classifier. In this research, three improved methods have been developed: 1) purely IKDE combined with SVM to construct a classifier, 2) limited the generation of virtual dataset and achieve an equal quality of the classifier which showed the efficiency in computation time, 3) using roulette wheel method in exploring the region of virtual dataset but without losing the quality of the classifier and showed a better convergence property. All the methods showed a better quality than the general random methods. And, the last method showed a convergence property in out run all methods.
I-HsiangWen y 溫怡翔. "A New Data Transformation Model for Small Dataset Learning". Thesis, 2016. http://ndltd.ncl.edu.tw/handle/81814135131034384018.
Texto completo國立成功大學
工業與資訊管理學系
104
In most highly competitive manufacturing industries, the sample sizes are usually very small in pilot runs, in order to quickly launch new products. However, it is always difficult for engineers to improve the quality in mass production runs based on the limited data obtained in this way. Past research has demonstrated that adding artificial samples can be an effective approach when learning with small datasets. However, a prior analysis of the data is needed to deduce the appropriate sample distributions within which the artificial samples are generated. Johnson transformation is one of the well-known models that can be applied to bring data close to a normal distribution with the satisfaction of certain statistical assumptions. The sample size required for such data transformation methods is usually large, and this thus motivates the efforts of the current study to develop a new method which is suitable for small datasets. Accordingly, this research proposes the Small-Johnson Data Transformation (SJDT) method to transform small raw data to normal distributions to generate virtual samples. When compared with four other methods, the results obtained with a real small dataset drawn from the Film Transistor Liquid Crystal Display (TFT-LCD) industry in Taiwan demonstrate that the proposed method is able to effectively improve the forecasting ability with small sample sizes.
Yu-ChunChiang y 江裕群. "Generating fuzzy-rule based attributes to improve small dataset learning". Thesis, 2015. http://ndltd.ncl.edu.tw/handle/49511140842879638349.
Texto completoWei-ShanLing y 凌偉珊. "Constructing a new virtual sample generation technique for small dataset learning". Thesis, 2016. http://ndltd.ncl.edu.tw/handle/74105744105429620295.
Texto completo國立成功大學
工業與資訊管理學系碩士在職專班
104
Since the rise of Generation Network, big data has become the hottest topic issue even small data recently. It is difficult to do further analysis and prediction due to small data is not easy to obtain and high cost. Virtual sample generation method proved an effective way to solve small data problem. The main technique is Mega-trend diffusion (MTD) that defined database on status of uniform distribution and skewness. These studies propose a non-parametric multi-modal virtual sample generation for multi-modal population. After running data preprocess, it will capture the maximum and useful data by using soft DBSCAN cluster method. Using estimated data range by MTD Algorithm and generate virtual sample for prediction.
Mahdi, Md Safiur Rahman. "Identifying conserved microRNAs in a large dataset of wheat small RNAs". 2015. http://hdl.handle.net/1993/30677.
Texto completoOctober 2015
Bing-MinWang y 王秉民. "Exploring neural network hyperparameters on small dataset and hand-crafted features: take credit scoring as an example". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/f3kq35.
Texto completo國立成功大學
電機工程學系
106
Deep learning has achieved remarkable success in various fields, e.g. computer vision, natural language processing, and games, etc, and developed many novel techniques. These fields have a large number of data with raw features. But there are still numerous problems in other fields with few data and hand-crafted features, such as credit scoring, stock prediction, HIV prediction, etc. We want to explore whether deep learning techniques developed from remarkable tasks work in other machine learning tasks. We compared the combinations of 9 activation functions and 12 weight initializations, found that the result from original paper is the same as from credit scoring dataset. We further explored the regularization methods affect the results while model gets deeper and used SMBO method to replace grid search and random search methods for hyperparameter tuning. Last, we compared the time of training a model between neural network and ensemble method (bstacking). We showed that neural network could get a better accuracy while using 0.27 times the time for training a model. We showed that deep learning can still outperform traditional machine learning method (bstacking) in small and hand-crafted feature dataset, and we should not be using smaller networks because of overfitting. Instead, use big network, and properly choose regularization techniques to control overfitting. In deep network, l2 and dropout are the better choices than early stopping. From the efficiency point of view, some traditional machine learning algorithms would need much time to train than neural networks.