Dissertations / Theses on the topic 'PREDICTION DATASET'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'PREDICTION DATASET.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Klus, Petr 1985. ""The Clever machine"- a computational tool for dataset exploration and prediction." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/482051.
Full textEl propósito de mis estudios doctorales era desarrollar un algoritmo para el análisis a gran escala de conjuntos de datos de proteínas. Esta tesis describe la metodología, el trabajo técnico desarrollado y los casos biológicos envueltos en la creación del algoritmo principal –el cleverMachine (CM) y sus extensiones multiCleverMachine (mCM) y cleverGO. El CM y mCM permiten la caracterización y clasificación de grupos de proteínas basados en características físico-químicas, junto con la abundancia de proteínas y la anotación de ontología de genes, para así elaborar una exploración de datos correcta. Mi método está compuesto por científicos tanto computacionales como experimentales con una interfaz amplia, fácil de usar para un monitoreo y clasificación de secuencia de proteínas de alto rendimiento.
Clayberg, Lauren (Lauren W. ). "Web element role prediction from visual information using a novel dataset." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/132734.
Full textCataloged from the official PDF of thesis.
Includes bibliographical references (pages 89-90).
Machine learning has enhanced many existing tech industries, including end-to-end test automation for web applications. One of the many goals that mabl and other companies have in this new tech initiative is to automatically gain insight into how web applications work. The task of web element role prediction is vital for the advancement of this newly emerging product category. I applied supervised visual machine learning techniques to the task. In addition, I created a novel dataset and present detailed attribute distribution and bias information. The dataset is used to provide updated baselines for performance using current day web applications, and a novel metric is provided to better quantify the performance of these models. The top performing model achieves an F1-score of 0.45 on ten web element classes. Additional findings include color distributions for different web element roles, and how some color spaces are more intuitive to humans than others.
by Lauren Clayberg.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Oppon, Ekow CruickShank. "Synergistic use of promoter prediction algorithms: a choice of small training dataset?" Thesis, University of the Western Cape, 2000. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_8222_1185436339.
Full textPromoter detection, especially in prokaryotes, has always been an uphill task and may remain so, because of the many varieties of sigma factors employed by various organisms in transcription. The situation is made more complex by the fact, that any seemingly unimportant sequence segment may be turned into a promoter sequence by an activator or repressor (if the actual promoter sequence is made unavailable). Nevertheless, a computational approach to promoter detection has to be performed due to number of reasons. The obvious that comes to mind is the long and tedious process involved in elucidating promoters in the &lsquo
wet&rsquo
laboratories not to mention the financial aspect of such endeavors. Promoter detection/prediction of an organism with few characterized promoters (M.tuberculosis) as envisaged at the beginning of this work was never going to be easy. Even for the few known Mycobacterial promoters, most of the respective sigma factors associated with their transcription were not known. If the information (promoter-sigma) were available, the research would have been focused on categorizing the promoters according to sigma factors and training the methods on the respective categories. That is assuming that, there would be enough training data for the respective categories. Most promoter detection/prediction studies have been carried out on E.coli because of the availability of a number of experimentally characterized promoters (+- 310). Even then, no researcher to date has extended the research to the entire E.coli genome.
Vandehei, Bailey R. "Leveraging Defects Life-Cycle for Labeling Defective Classes." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2111.
Full textSousa, Massáine Bandeira e. "Improving accuracy of genomic prediction in maize single-crosses through different kernels and reducing the marker dataset." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/11/11137/tde-07032018-163203/.
Full textNo melhoramento de plantas, a predição genômica (PG) é uma eficiente ferramenta para aumentar a eficiência seletiva de genótipos, principalmente, considerando múltiplos ambientes. Esta técnica tem como vantagem incrementar o ganho genético para características complexas e reduzir os custos. Entretanto, ainda são necessárias estratégias que aumentem a acurácia e reduzam o viés dos valores genéticos genotípicos. Nesse contexto, os objetivos foram: i) comparar duas estratégias para obtenção de subconjuntos de marcadores baseado em seus efeitos em relação ao seu impacto na acurácia da seleção genômica; ii) comparar a acurácia seletiva de quatro modelos de PG incluindo o efeito de interação genótipo × ambiente (G×A) e dois kernels (GBLUP e Gaussiano). Para isso, foram usados dados de um painel de diversidade de arroz (RICE) e dois conjuntos de dados de milho (HEL e USP). Estes foram avaliados para produtividade de grãos e altura de plantas. Em geral, houve incremento da acurácia de predição e na eficiência da seleção genômica usando subconjuntos de marcadores. Estes poderiam ser utilizados para construção de arrays e, consequentemente, reduzir os custos com genotipagem. Além disso, utilizando o kernel Gaussiano e incluindo o efeito de interação G×A há aumento na acurácia dos modelos de predição genômica.
Johansson, David. "Price Prediction of Vinyl Records Using Machine Learning Algorithms." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-96464.
Full textBaveye, Yoann. "Automatic prediction of emotions induced by movies." Thesis, Ecully, Ecole centrale de Lyon, 2015. http://www.theses.fr/2015ECDL0035/document.
Full textNever before have movies been as easily accessible to viewers, who can enjoy anywhere the almost unlimited potential of movies for inducing emotions. Thus, knowing in advance the emotions that a movie is likely to elicit to its viewers could help to improve the accuracy of content delivery, video indexing or even summarization. However, transferring this expertise to computers is a complex task due in part to the subjective nature of emotions. The present thesis work is dedicated to the automatic prediction of emotions induced by movies based on the intrinsic properties of the audiovisual signal. To computationally deal with this problem, a video dataset annotated along the emotions induced to viewers is needed. However, existing datasets are not public due to copyright issues or are of a very limited size and content diversity. To answer to this specific need, this thesis addresses the development of the LIRIS-ACCEDE dataset. The advantages of this dataset are threefold: (1) it is based on movies under Creative Commons licenses and thus can be shared without infringing copyright, (2) it is composed of 9,800 good quality video excerpts with a large content diversity extracted from 160 feature films and short films, and (3) the 9,800 excerpts have been ranked through a pair-wise video comparison protocol along the induced valence and arousal axes using crowdsourcing. The high inter-annotator agreement reflects that annotations are fully consistent, despite the large diversity of raters’ cultural backgrounds. Three other experiments are also introduced in this thesis. First, affective ratings were collected for a subset of the LIRIS-ACCEDE dataset in order to cross-validate the crowdsourced annotations. The affective ratings made also possible the learning of Gaussian Processes for Regression, modeling the noisiness from measurements, to map the whole ranked LIRIS-ACCEDE dataset into the 2D valence-arousal affective space. Second, continuous ratings for 30 movies were collected in order develop temporally relevant computational models. Finally, a last experiment was performed in order to collect continuous physiological measurements for the 30 movies used in the second experiment. The correlation between both modalities strengthens the validity of the results of the experiments. Armed with a dataset, this thesis presents a computational model to infer the emotions induced by movies. The framework builds on the recent advances in deep learning and takes into account the relationship between consecutive scenes. It is composed of two fine-tuned Convolutional Neural Networks. One is dedicated to the visual modality and uses as input crops of key frames extracted from video segments, while the second one is dedicated to the audio modality through the use of audio spectrograms. The activations of the last fully connected layer of both networks are conv catenated to feed a Long Short-Term Memory Recurrent Neural Network to learn the dependencies between the consecutive video segments. The performance obtained by the model is compared to the performance of a baseline similar to previous work and shows very promising results but reflects the complexity of such tasks. Indeed, the automatic prediction of emotions induced by movies is still a very challenging task which is far from being solved
Lamichhane, Niraj. "Prediction of Travel Time and Development of Flood Inundation Maps for Flood Warning System Including Ice Jam Scenario. A Case Study of the Grand River, Ohio." Youngstown State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1463789508.
Full textRai, Manisha. "Topographic Effects in Strong Ground Motion." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/56593.
Full textPh. D.
Cooper, Heather. "Comparison of Classification Algorithms and Undersampling Methods on Employee Churn Prediction: A Case Study of a Tech Company." DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2260.
Full textŠalanda, Ondřej. "Strojové učení v úloze predikce vlivu nukleotidového polymorfismu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234918.
Full textPhanse, Shruti. "Study on the performance of ontology based approaches to link prediction in social networks as the number of users increases." Thesis, Kansas State University, 2010. http://hdl.handle.net/2097/6914.
Full textDepartment of Computing and Information Sciences
Doina Caragea
Recent advances in social network applications have resulted in millions of users joining such networks in the last few years. User data collected from social networks can be used for various data mining problems such as interest recommendations, friendship recommendations and many more. Social networks, in general, can be seen as a huge directed network graph representing users of the network (together with their information, e.g., user interests) and their interactions (also known as friendship links). Previous work [Hsu et al., 2007] on friendship link prediction has shown that graph features contain important predictive information. Furthermore, it has been shown that user interests can be used to improve link predictions, if they are organized into an explicitly or implicitly ontology [Haridas, 2009; Parimi, 2010]. However, the above mentioned previous studies have been performed using a small set of users in the social network LiveJournal. The goal of this work is to study the performance of the ontology based approach proposed in [Haridas, 2009], when number of users in the dataset is increased. More precisely, we study the performance of the approach in terms of performance for data sets consisting of 1000, 2000, 3000 and 4000 users. Our results show that the performance generally increases with the number of users. However, the problem becomes quickly intractable from a computation time point of view. As a part of our study, we also compare our results obtained using the ontology-based approach [Haridas, 2009] with results obtained with the LDA based approach in [Parimi, 2010], when such results are available.
Andruccioli, Matteo. "Previsione del Successo di Prodotti di Moda Prima della Commercializzazione: un Nuovo Dataset e Modello di Vision-Language Transformer." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24956/.
Full textKaramichalis, Nikolaos. "Using Machine Learning techniques to understand glucose fluctuation in response to breathing signals." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-87348.
Full textAl, Tobi Amjad Mohamed. "Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models." Thesis, University of St Andrews, 2018. http://hdl.handle.net/10023/17050.
Full textWard, Neil M. "Tropical North African rainfall and worldwide monthly to multi-decadal climate variations : directed towards the development of a corrected ship wind dataset, and improved diagnosis, understanding and prediction of North African rainfall." Thesis, University of Reading, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.385252.
Full textSengupta, Aritra. "Empirical Hierarchical Modeling and Predictive Inference for Big, Spatial, Discrete, and Continuous Data." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1350660056.
Full textChen, Yang. "Robust Prediction of Large Spatio-Temporal Datasets." Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/23098.
Full textHowever, STRE has been shown sensitive to outliers or anomaly observations. In our design, the St-RSTP model assumes that the measurement error follows Student\'s t-distribution, instead of a traditional Gaussian distribution. To handle the analytical intractable inference of Student\'s t model, we propose an approximate inference algorithm in the framework of Expectation Propagation (EP). Extensive experimental evaluations, based on both simulation
and real-life data sets, demonstrated the robustness and the efficiency of our Student-t prediction model compared with the STRE model.
Master of Science
Schöner, Holger. "Working with real world datasets preprocessing and prediction with large incomplete and heterogeneous datasets /." [S.l.] : [s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=973424672.
Full textVagh, Yunous. "Mining climate data for shire level wheat yield predictions in Western Australia." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2013. https://ro.ecu.edu.au/theses/695.
Full textVelecký, Jan. "Predikce vlivu mutace na rozpustnost proteinů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417288.
Full textGiommi, Luca. "Predicting CMS datasets popularity with machine learning." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/9136/.
Full textChen, Linchao. "Predictive Modeling of Spatio-Temporal Datasets in High Dimensions." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1429586479.
Full textYang, Chaozheng. "Sufficient Dimension Reduction in Complex Datasets." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/404627.
Full textPh.D.
This dissertation focuses on two problems in dimension reduction. One is using permutation approach to test predictor contribution. The permutation approach applies to marginal coordinate tests based on dimension reduction methods such as SIR, SAVE and DR. This approach no longer requires calculation of the method-specific weights to determine the asymptotic null distribution. The other one is through combining clustering method with robust regression (least absolute deviation) to estimate dimension reduction subspace. Compared with ordinary least squares, the proposed method is more robust to outliers; also, this method replaces the global linearity assumption with the more flexible local linearity assumption through k-means clustering.
Temple University--Theses
Van, Koten Chikako, and n/a. "Bayesian statistical models for predicting software effort using small datasets." University of Otago. Department of Information Science, 2007. http://adt.otago.ac.nz./public/adt-NZDU20071009.120134.
Full textVeganzones, David. "Corporate failure prediction models : contributions from a novel explanatory variable and imbalanced datasets approach." Thesis, Lille, 2018. http://www.theses.fr/2018LIL1A004.
Full textThis dissertation explores novel approaches to develop corporate failure prediction models. This thesis then contains three new areas for intervention. The first is a novel explanatory variable based on earnings management. For this purpose, we use two measures (accruals and real activities) that assess potential earnings manipulation. We evidenced that models which include this novel variable in combination with financial information are more accurate than those relying only on financial data. The second analyzes the capacity of corporate failure models in imbalanced datasets. We put into relation the different degrees of imbalance, the loss on performance and the performance recovery capacity, which have never been studied in corporate failure. The third unifies the previous areas by evaluating the capacity of our proposed earnings management model in imbalanced datasets. Researches covered in this thesis provide unique and relevant contributions to corporate finance literature, especially to corporate failure domain
Bsoul, Abed Al-Raoof. "PROCESSING AND CLASSIFICATION OF PHYSIOLOGICAL SIGNALS USING WAVELET TRANSFORM AND MACHINE LEARNING ALGORITHMS." VCU Scholars Compass, 2011. http://scholarscompass.vcu.edu/etd/258.
Full textGranato, Italo Stefanine Correia. "snpReady and BGGE: R packages to prepare datasets and perform genome-enabled predictions." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/11/11137/tde-21062018-134207/.
Full textO uso de marcadores moleculares permite um aumento na eficiência da seleção, bem como uma melhor compreensão dos recursos genéticos em programas de melhoramento. No entanto, com o aumento do número de marcadores, é necessário o processamento deste antes de deixa-lo disponível para uso. Além disso, para explorar a interação genótipo x ambiente (GA) no contexto da predição genômica, algumas matrizes de covariância precisam ser obtidas antes da etapa de predição. Assim, com o objetivo de facilitar a introdução de práticas genômicas nos programa de melhoramento, dois pacotes em R foram desenvolvidos. O primeiro, snpReady, foi criado para preparar conjuntos de dados para realizar estudos genômicos. Este pacote oferece três funções para atingir esse objetivo, organizando e aplicando o controle de qualidade, construindo a matriz de parentesco genômico e com estimativas de parâmetros genéticos populacionais. Além disso, apresentamos um novo método de imputação para marcas perdidas. O segundo pacote é o BGGE, criado para gerar kernels para alguns modelos genômicos de interação GA e realizar predições genômicas. Consiste em duas funções (getK e BGGE). A primeira é utilizada para criar kernels para os modelos GA, e a última realiza predições genômicas, com alguns recursos especifico para os kernels GA que diminuem o tempo computacional. Os recursos abordados nos dois pacotes apresentam uma opção rápida e direta para ajudar a introdução e uso de análises genômicas nas diversas etapas do programa de melhoramento.
Gorshechnikova, Anastasiia. "Likelihood approximation and prediction for large spatial and spatio-temporal datasets using H-matrix approach." Doctoral thesis, Università degli studi di Padova, 2019. http://hdl.handle.net/11577/3425427.
Full textRivera, Steven Anthony. "BeatDB v3 : a framework for the creation of predictive datasets from physiological signals." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113114.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 101-104).
BeatDB is a framework for fast processing and analysis of physiological data, such as arterial blood pressure (ABP) or electrocardiograms (ECG). BeatDB takes such data as input and processes it for machine learning analytics in multiple stages. It offers both beat and onset detection, feature extraction for beats and groups of beats over one or more signal channels and over the time domain, and an extraction step focused on finding condition windows and aggregate features within them. BeatDB has gone through multiple iterations, with its initial version running as a collection of single-use MATLAB and Python scripts run on VM instances in Open- Stack and its second version (known as PhysioMiner) acting as a cohesive and modular cloud system on Amazon Web Services in Java. The goal of this project is primarily to modify BeatDB to support multi-channel waveform data like EEG and accelerometer data and to make the project more flexible to modification by researchers. Major software development tasks included rewriting condition detection to find windows in valid beat groups only, refactoring and writing new code to extract features and prepare training data for multi-channel signals, and fully redesigning and reimplementing BeatDB within Python, focusing on optimization and simplicity based on probable use cases of BeatDB. BeatDB v3 has become more accurate in the datasets it generates, usable for both developer and non-developer users, and efficient in both performance and design than previous iterations, achieving an average AUROC increase of over 4% when comparing specific iterations.
by Steven Anthony Rivera.
M. Eng.
Yasarer, Hakan. "Decision making in engineering prediction systems." Diss., Kansas State University, 2013. http://hdl.handle.net/2097/16231.
Full textDepartment of Civil Engineering
Yacoub M. Najjar
Access to databases after the digital revolutions has become easier because large databases are progressively available. Knowledge discovery in these databases via intelligent data analysis technology is a relatively young and interdisciplinary field. In engineering applications, there is a demand for turning low-level data-based knowledge into a high-level type knowledge via the use of various data analysis methods. The main reason for this demand is that collecting and analyzing databases can be expensive and time consuming. In cases where experimental or empirical data are already available, prediction models can be used to characterize the desired engineering phenomena and/or eliminate unnecessary future experiments and their associated costs. Phenomena characterization, based on available databases, has been utilized via Artificial Neural Networks (ANNs) for more than two decades. However, there is a need to introduce new paradigms to improve the reliability of the available ANN models and optimize their predictions through a hybrid decision system. In this study, a new set of ANN modeling approaches/paradigms along with a new method to tackle partially missing data (Query method) are introduced for this purpose. The potential use of these methods via a hybrid decision making system is examined by utilizing seven available databases which are obtained from civil engineering applications. Overall, the new proposed approaches have shown notable prediction accuracy improvements on the seven databases in terms of quantified statistical accuracy measures. The proposed new methods are capable in effectively characterizing the general behavior of a specific engineering/scientific phenomenon and can be collectively used to optimize predictions with a reasonable degree of accuracy. The utilization of the proposed hybrid decision making system (HDMS) via an Excel-based environment can easily be utilized by the end user, to any available data-rich database, without the need for any excessive type of training.
Zhu, Cheng. "Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769.
Full textDilthey, Alexander Tilo. "Statistical HLA type imputation from large and heterogeneous datasets." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:1bca18bf-b9d5-4777-b58e-a0dca4c9dbea.
Full textDuncan, Andrew Paul. "The analysis and application of artificial neural networks for early warning systems in hydrology and the environment." Thesis, University of Exeter, 2014. http://hdl.handle.net/10871/17569.
Full textChen, Kunru. "Recurrent Neural Networks for Fault Detection : An exploratory study on a dataset about air compressor failures of heavy duty trucks." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-38184.
Full textMauricio-Sanchez, David, Andrade Lopes Alneu de, and higuihara Juarez Pedro Nelson. "Approaches based on tree-structures classifiers to protein fold prediction." Institute of Electrical and Electronics Engineers Inc, 2017. http://hdl.handle.net/10757/622536.
Full textProtein fold recognition is an important task in the biological area. Different machine learning methods such as multiclass classifiers, one-vs-all and ensemble nested dichotomies were applied to this task and, in most of the cases, multiclass approaches were used. In this paper, we compare classifiers organized in tree structures to classify folds. We used a benchmark dataset containing 125 features to predict folds, comparing different supervised methods and achieving 54% of accuracy. An approach related to tree-structure of classifiers obtained better results in comparison with a hierarchical approach.
Revisión por pares
Bloodgood, Michael. "Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 200 p, 2009. http://proquest.umi.com/pqdweb?did=1818417671&sid=1&Fmt=2&clientId=8331&RQT=309&VName=PQD.
Full textPETRINI, ALESSANDRO. "HIGH PERFORMANCE COMPUTING MACHINE LEARNING METHODS FOR PRECISION MEDICINE." Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/817104.
Full textPrecision Medicine is a new paradigm which is reshaping several aspects of clinical practice, representing a major departure from the "one size fits all" approach in diagnosis and prevention featured in classical medicine. Its main goal is to find personalized prevention measures and treatments, on the basis of the personal history, lifestyle and specific genetic factors of each individual. Three factors contributed to the rapid rise of Precision Medicine approaches: the ability to quickly and cheaply generate a vast amount of biological and omics data, mainly thanks to Next-Generation Sequencing; the ability to efficiently access this vast amount of data, under the Big Data paradigm; the ability to automatically extract relevant information from data, thanks to innovative and highly sophisticated data processing analytical techniques. Machine Learning in recent years revolutionized data analysis and predictive inference, influencing almost every field of research. Moreover, high-throughput bio-technologies posed additional challenges to effectively manage and process Big Data in Medicine, requiring novel specialized Machine Learning methods and High Performance Computing techniques well-tailored to process and extract knowledge from big bio-medical data. In this thesis we present three High Performance Computing Machine Learning techniques that have been designed and developed for tackling three fundamental and still open questions in the context of Precision and Genomic Medicine: i) identification of pathogenic and deleterious genomic variants among the "sea" of neutral variants in the non-coding regions of the DNA; ii) detection of the activity of regulatory regions across different cell lines and tissues; iii) automatic protein function prediction and drug repurposing in the context of biomolecular networks. For the first problem we developed parSMURF, a novel hyper-ensemble method able to deal with the huge data imbalance that characterizes the detection of pathogenic variants in the non-coding regulatory regions of the human genome. We implemented this approach with highly parallel computational techniques using supercomputing resources at CINECA (Marconi – KNL) and HPC Center Stuttgart (HLRS Apollo HAWK), obtaining state-of-the-art results. For the second problem we developed Deep Feed Forward and Deep Convolutional Neural Networks to respectively process epigenetic and DNA sequence data to detect active promoters and enhancers in specific tissues at genome-wide level using GPU devices to parallelize the computation. Finally we developed scalable semi-supervised graph-based Machine Learning algorithms based on parametrized Hopfield Networks to process in parallel using GPU devices large biological graphs, using a parallel coloring method that improves the classical Luby greedy algorithm. We also present ongoing extensions of parSMURF, very recently awarded by the Partnership for Advance in Computing in Europe (PRACE) consortium to further develop the algorithm, apply them to huge genomic data and embed its results into Genomiser, a state-of-the-art computational tool for the detection of pathogenic variants associated with Mendelian genetic diseases, in the context of an international collaboration with the Jackson Lab for Genomic Medicine.
Malazizi, Ladan. "Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation." Thesis, University of Bradford, 2008. http://hdl.handle.net/10454/4262.
Full textMatsumoto, Élia Yathie. "A methodology for improving computed individual regressions predictions." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-12052016-140407/.
Full textEsta pesquisa propõe uma metodologia para melhorar previsões calculadas por um modelo de regressão, sem a necessidade de modificar seus parâmetros ou sua arquitetura. Em outras palavras, o objetivo é obter melhores resultados por meio de ajustes nos valores computados pela regressão, sem alterar ou reconstruir o modelo de previsão original. A proposta é ajustar os valores previstos pela regressão por meio do uso de estimadores de confiabilidade individuais capazes de indicar se um determinado valor estimado é propenso a produzir um erro considerado crítico pelo usuário da regressão. O método proposto foi testado em três conjuntos de experimentos utilizando três tipos de dados diferentes. O primeiro conjunto de experimentos trabalhou com dados produzidos artificialmente, o segundo, com dados transversais extraídos no repositório público de dados UCI Machine Learning Repository, e o terceiro, com dados do tipo séries de tempos extraídos do ISO-NE (Independent System Operator in New England). Os experimentos com dados artificiais foram executados para verificar o comportamento do método em situações controladas. Nesse caso, os experimentos alcançaram melhores resultados para dados limpos artificialmente produzidos e evidenciaram progressiva piora com a adição de elementos aleatórios. Os experimentos com dados reais extraído das bases de dados UCI e ISO-NE foram realizados para investigar a aplicabilidade da metodologia no mundo real. O método proposto foi capaz de melhorar os valores previstos por regressões em cerca de 95% dos experimentos realizados com dados reais.
Hrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.
Full textSAMADI, HEANG, and 黄善玉. "Applying Linear Hazard Transform for Mortality Prediction to Taiwanese Dataset." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/r8q3fn.
Full text逢甲大學
風險管理與保險學系
106
The thesis will present and compare two mortality modeling, Linear Hazard Transform (LHT) and Lee Carter, based on the real data in Taiwan. Empirical observation between two sequences of the force of mortality for two different years (no need to be consecutive) shows that there is a linear relation, and the two estimated coefficients of the LHT model can be obtained by multiple linear regression to capture the mortality improvement. Under those two fitted numbers, we plot the real and fitted survival curves and notice that the LHT is good at fitting under some statistical criteria. After forecasting the parameters, the future mortality rates can be yearly predicted. Moreover, the optimal period is considered based on numerous experiments, and the intercept coefficient in the model has been modified. Lastly, the application to TSO dataset for net single premium calculation is presented. Keywords: Linear Hazard Transform, Lee Carter, Net Single Premium, Fitting Mortality, Mortality Projection, Model Selection, AIC, BIC, RMSE, MAE, MAPE.
Lo, Chia-Yu, and 駱佳妤. "Recurrent Learning on PM2.5 Prediction Based on Clustered Airbox Dataset." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/r49hyt.
Full text國立中央大學
資訊工程學系
107
The progress of industrial development naturally leads to the demand of more electrical power. Unfortunately, due to the fear of the safety of nuclear power plants, many countries have relied on thermal power plants, which will cause more air pollutants during the process of coal burning. This phenomenon as well as more vehicle emissions around us, have constituted the primary factors of serious air pollution. Inhaling too much particulate air pollution may lead to respiratory diseases and even death, especially PM2.5. By predicting the air pollutant concentration, people can take precautions to avoid overexposure in the air pollutants. Consequently, the accurate PM2.5 prediction becomes more important. In this thesis, we propose a PM2.5 prediction system, which utilizes the dataset from EdiGreen Airbox and Taiwan EPA. Autoencoder and Linear interpolation are adopted for solving the missing value problem. Spearman's correlation coecient is used to identify the most relevant features for PM2.5. Two prediction models (i.e., LSTM and LSTM based on K-means) are implemented which predict PM2.5 value for each Airbox device. To assess the performance of the model prediction, the daily average error and the hourly average accuracy for the duration of a week are calculated. The experimental results show that LSTM based on K-means has the best performance among all methods. Therefore, LSTM based on K-means is chosen to provide real-time PM2.5 prediction through the Linebot.
Chen, Mei-Yun, and 陳鎂鋆. "Prediction Model for Semitransparent Watercolor PigmentMixtures Using Deep Learning with a Dataset of Transmittance and Reflectance." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/24u7ek.
Full text國立臺灣大學
資訊網路與多媒體研究所
107
Learning color mixing is difficult for novice painters. In order to support novice painters in learning color mixing, we propose a prediction model for semitransparent pigment mixtures and use its prediction results to create a Smart Palette system. Such a system is constructed by first building a watercolor dataset with two types of color mixing data, indicated by transmittance and reflectance: incrementation of the same primary pigment and a mixture of two different pigments. Next, we apply the collected data to a deep neural network to train a model for predicting the results of semitransparent pigment mixtures. Finally, we constructed a Smart Palette that provides easily-followable instructions on mixing a target color with two primary pigments in real life: when users pick a pixel, an RGB color, from an image, the system returns its mixing recipe which indicates the two primary pigments being used and their quantities. When evaluating the pigment mixtures produced by the aforementioned model against ground truth, 83% of the test set registered a color distance of ΔE*ab < 5; ΔE*ab, above 5 is where average observers start determining that the colors in comparison as two different colors. In addition, in order to examine the effectiveness of the Smart Palette system, we design a user evaluation which untrained users perform pigment mixing with three methods: by intuition, based on Itten''s color wheel, and with the Smart Palette and the results are then compiled as three color distance, ΔE*ab values. After that, the color distance of the three methods are examined by a t-test to prove whether the color differences were significant. Combining the results of color distance and the t-values of the t-test, it can demonstrate that the mixing results produced by using the Smart Palette is obviously closer to a target color than that of the others. Base on these evaluations, our system, the Smart Palette demonstrates that it can effectively help users to learn and perform better at color mixing than that of the traditional method.
MUKUL. "EFFICIENT CLASSIFICATION ON THE BASIS OF DECISION TRESS." Thesis, 2019. http://dspace.dtu.ac.in:8080/jspui/handle/repository/17065.
Full textRaj, Rohit. "Towards Robustness of Neural Legal Judgement System." Thesis, 2023. https://etd.iisc.ac.in/handle/2005/6145.
Full textLutu, P. E. N. (Patricia Elizabeth Nalwoga). "Dataset selection for aggregate model implementation in predictive data mining." Thesis, 2010. http://hdl.handle.net/2263/29486.
Full textThesis (PhD)--University of Pretoria, 2010.
Computer Science
unrestricted
Ralho, João Pedro Loureiro. "Learning Single-View Plane Prediction from autonomous driving datasets." Master's thesis, 2019. http://hdl.handle.net/10316/87852.
Full textA reconstrução 3D tradicional usando múltiplas imagens apresenta algumas dificuldades em cenas com pouca ou repetida textura, superfícies inclinadas, iluminação variada e especularidades. Este problema foi resolvido através de geometria planar (PPR), bastante frequente em estruturas construidas pelo ser humano. As primitivas planares são usadas para obter uma reconstrução mais precisa, geometricamente simples, e visualmente mais apelativa que uma núvem de pontos. Estimação de profundidade através de uma única imagem (SIDE) é uma ideia bastante apelativa que recentemente ganhou novo destaque devido à emergência de métodos de aprendizagem e de novas formas de gerar grandes conjuntos de dados RGB-D precisos. No fundo, esta dissertação pretende extender o trabalho desenvolvido em SIDE para reconstrução 3D usando primitivas planares através de uma única imagem (SI-PPR). Os métodos existentes apresentam alguma dificuldade em gerar bons resultados porque não existem grandes coleções de dados PPR precisos de cenas reais. Como tal, o objetivo desta dissertação é propor um pipeline para gerar de forma eficiente grandes coleções de dados PPR para retreinar métodos de estimação PPR. O pipeline é composto por três partes, uma responsável por gerar informação sobre a profundidade numa imagem através do colmap, segmentação manual dos planos verificados na imagem, e uma propagação automática da segmentação realizada e dos parâmetros dos planos para as imagens vizinhas usando uma nova estratégia com base em restrições geométricas e amostragem aleatória. O pipeline criado é capaz de gerar dados PPR com eficiência e precisão a partir de imagens reais.
Traditional 3D reconstruction using multiple images have some difficulties in dealing with scenes with little or repeated texture, slanted surfaces, variable illumination, and specularities. This problem was solved using planarity prior (PPR), which is quite common in man-made environments. Planar primitives are used for a more accurate, geometrically simple, and visually appealing reconstruction than a cloud of points. Single image depth estimation (SIDE) is a very appealing idea that has recently gained new prominence due to the emergence of learning methods and new ways to generate large accurate RGB-D datasets. This dissertation intends to extend the work developed in SIDE and work on single image piece-wise planar reconstruction (SI-PPR). Existing methods struggle in outputting accurate planar information from images because there are no large collections of accurate PPR data from real imagery. Therefore, this dissertation aims to propose a pipeline to efficiently generate large PPR datasets to re-train PPR estimation approaches. The pipeline is composed by three main stages, depth data generation using colmap, manual labeling of a small percentage of the images of the dataset, and automatic label and plane propagation to neighbouring views using a new strategy based on geometric constraints and random sampling. The pipeline created is able to efficiently and accurately generate PPR data from real images. -------------------------------------------------------------------
"Predicting Demographic and Financial Attributes in a Bank Marketing Dataset." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.38651.
Full textDissertation/Thesis
Masters Thesis Computer Science 2016
Hong-YangLin and 林泓暘. "Generating Aggregated Weights to Improve the Predictive Accuracy of Single-Model Ensemble Numerical Predicting Method in Small Datasets." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/6b385s.
Full text國立成功大學
工業與資訊管理學系
105
In the age of information explosion,it’s easier to reach out to information,so how to explore and conclude some useful information in limited data is a pretty important study in small data learning.nowadays,the studies in ensemble method mostly focus on the process instead of the result.the methods in datamining can be divided into classification and prediction.in ensemble method ,voting is the most common way to deal with classification,but in numerical prediction problem,average method is the most common way to calculate the result,but it can be easily affected by some extreme values,especially in the circumstances of small datasets We make an improvement in Bagging.We use SVR as our prediction model ,and calculate the error value based on our prediction model,so we can get a corresponding weight value of each prediction value,and then we can calculate the compromise prediction value under the purpose of getting the smallest error value.Therefore,we can stabilize our system,and we compare our method to average method in order to examine the effect of our study,and we also take the practical case in panel factory to prove the improvement in single-model ensemble method