Дисертації з теми "ENSEMBLE LEARNING TECHNIQUE"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-23 дисертацій для дослідження на тему "ENSEMBLE LEARNING TECHNIQUE".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
King, Michael Allen. "Ensemble Learning Techniques for Structured and Unstructured Data." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/51667.
Повний текст джерелаPh. D.
Nguyen, Thanh Tien. "Ensemble Learning Techniques and Applications in Pattern Classification." Thesis, Griffith University, 2017. http://hdl.handle.net/10072/366342.
Повний текст джерелаThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text
Valenzuela, Russell. "Predicting National Basketball Association Game Outcomes Using Ensemble Learning Techniques." Thesis, California State University, Long Beach, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10980443.
Повний текст джерелаThere have been a number of studies that try to predict sporting event outcomes. Most previous research has involved results in football and college basketball. Recent years has seen similar approaches carried out in professional basketball. This thesis attempts to build upon existing statistical techniques and apply them to the National Basketball Association using a synthesis of algorithms as motivation. A number of ensemble learning methods will be utilized and compared in hopes of improving the accuracy of single models. Individual models used in this thesis will be derived from Logistic Regression, Naïve Bayes, Random Forests, Support Vector Machines, and Artificial Neural Networks while aggregation techniques include Bagging, Boosting, and Stacking. Data from previous seasons and games from both?players and teams will be used to train models in R.
Johansson, Alfred. "Ensemble approach to code smell identification : Evaluating ensemble machine learning techniques to identify code smells within a software system." Thesis, Tekniska Högskolan, Jönköping University, JTH, Datateknik och informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-49319.
Повний текст джерелаRecamonde-Mendoza, Mariana. "Exploring ensemble learning techniques to optimize the reverse engineering of gene regulatory networks." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/95693.
Повний текст джерелаIn this thesis we are concerned about the reverse engineering of gene regulatory networks from post-genomic data, a major challenge in Bioinformatics research. Gene regulatory networks are intricate biological circuits responsible for govern- ing the expression levels (activity) of genes, thereby playing an important role in the control of many cellular processes, including cell differentiation, cell cycle and metabolism. Unveiling the structure of these networks is crucial to gain a systems- level understanding of organisms development and behavior, and eventually shed light on the mechanisms of diseases caused by the deregulation of these cellular pro- cesses. Due to the increasing availability of high-throughput experimental data and the large dimension and complexity of biological systems, computational methods have been essential tools in enabling this investigation. Nonetheless, their perfor- mance is much deteriorated by important computational and biological challenges posed by the scenario. In particular, the noisy and sparse features of biological data turn the network inference into a challenging combinatorial optimization prob- lem, to which current methods fail in respect to the accuracy and robustness of predictions. This thesis aims at investigating the use of ensemble learning tech- niques as means to overcome current limitations and enhance the inference process by exploiting the diversity among multiple inferred models. To this end, we develop computational methods both to generate diverse network predictions and to combine multiple predictions into an ensemble solution, and apply this approach to a number of scenarios with different sources of diversity in order to understand its potential in this specific context. We show that the proposed solutions are competitive with tra- ditional algorithms in the field and improve our capacity to accurately reconstruct gene regulatory networks. Results obtained for the inference of transcriptional and post-transcriptional regulatory networks, two adjacent and complementary layers of the overall gene regulatory network, evidence the efficiency and robustness of our approach, encouraging the consolidation of ensemble systems as a promising methodology to decipher the structure of gene regulatory networks.
Luong, Vu A. "Advanced techniques for classification of non-stationary streaming data and applications." Thesis, Griffith University, 2022. http://hdl.handle.net/10072/420554.
Повний текст джерелаThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Info & Comm Tech
Science, Environment, Engineering and Technology
Full Text
Wang, Xian Bo. "A novel fault detection and diagnosis framework for rotating machinery using advanced signal processing techniques and ensemble extreme learning machines." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3951596.
Повний текст джерелаEtienam, Clement. "Structural and shape reconstruction using inverse problems and machine learning techniques with application to hydrocarbon reservoirs." Thesis, University of Manchester, 2019. https://www.research.manchester.ac.uk/portal/en/theses/structural-and-shape-reconstruction-using-inverse-problems-and-machine-learning-techniques-with-application-to-hydrocarbon-reservoirs(e21f1030-64e7-4267-b708-b7f0165a5f53).html.
Повний текст джерелаTaylor, Farrell R. "Evaluation of Supervised Machine Learning for Classifying Video Traffic." NSUWorks, 2016. http://nsuworks.nova.edu/gscis_etd/972.
Повний текст джерелаVandoni, Jennifer. "Ensemble Methods for Pedestrian Detection in Dense Crowds." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS116/document.
Повний текст джерелаThis study deals with pedestrian detection in high- density crowds from a mono-camera system. The detections can be then used both to obtain robust density estimation, and to initialize a tracking algorithm. One of the most difficult challenges is that usual pedestrian detection methodologies do not scale well to high-density crowds, for reasons such as absence of background, high visual homogeneity, small size of the objects, and heavy occlusions. We cast the detection problem as a Multiple Classifier System (MCS), composed by two different ensembles of classifiers, the first one based on SVM (SVM-ensemble) and the second one based on CNN (CNN-ensemble), combined relying on the Belief Function Theory (BFT) to exploit their strengths for pixel-wise classification. SVM-ensemble is composed by several SVM detectors based on different gradient, texture and orientation descriptors, able to tackle the problem from different perspectives. BFT allows us to take into account the imprecision in addition to the uncertainty value provided by each classifier, which we consider coming from possible errors in the calibration procedure and from pixel neighbor's heterogeneity in the image space. However, scarcity of labeled data for specific dense crowd contexts reflects in the impossibility to obtain robust training and validation sets. By exploiting belief functions directly derived from the classifiers' combination, we propose an evidential Query-by-Committee (QBC) active learning algorithm to automatically select the most informative training samples. On the other side, we explore deep learning techniques by casting the problem as a segmentation task with soft labels, with a fully convolutional network designed to recover small objects thanks to a tailored use of dilated convolutions. In order to obtain a pixel-wise measure of reliability about the network's predictions, we create a CNN- ensemble by means of dropout at inference time, and we combine the different obtained realizations in the context of BFT. Finally, we show that the output map given by the MCS can be employed to perform people counting. We propose an evaluation method that can be applied at every scale, providing also uncertainty bounds on the estimated density
Pereira, Vinicius Gomes. "Using supervised machine learning and sentiment analysis techniques to predict homophobia in portuguese tweets." reponame:Repositório Institucional do FGV, 2018. http://hdl.handle.net/10438/24301.
Повний текст джерелаApproved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2018-07-11T12:40:51Z (GMT) No. of bitstreams: 1 DissertacaoFinal.pdf: 2029614 bytes, checksum: 3eda3dc97f25c0eecd86608653150d82 (MD5)
Made available in DSpace on 2018-07-16T17:48:51Z (GMT). No. of bitstreams: 1 DissertacaoFinal.pdf: 2029614 bytes, checksum: 3eda3dc97f25c0eecd86608653150d82 (MD5) Previous issue date: 2018-04-16
Este trabalho estuda a identificação de tweets homofóbicos, utilizando uma abordagem de processamento de linguagem natural e aprendizado de máquina. O objetivo é construir um modelo preditivo que possa detectar, com razoável precisão, se um Tweet contém conteúdo ofensivo a indivı́duos LGBT ou não. O banco de dados utilizado para treinar os modelos preditivos foi construı́do agregando tweets de usuários que interagiram com polı́ticos e/ou partidos polı́ticos no Brasil. Tweets contendo termos relacionados a LGBTs ou que têm referências a indivı́duos LGBT foram coletados e classificados manualmente. Uma grande parte deste trabalho está na construção de features que capturam com precisão não apenas o texto do tweet, mas também caracterı́sticas especı́ficas dos usuários e de expressões coloquiais do português. Em particular, os usos de palavrões e vocabulários especı́ficos são um forte indicador de tweets ofensivos. Naturalmente, n-gramas e esquemas de frequência de termos também foram considerados como caracterı́sticas do modelo. Um total de 12 conjuntos de recursos foram construı́dos. Uma ampla gama de técnicas de aprendizado de máquina foi empregada na tarefa de classificação: Naive Bayes, regressões logı́sticas regularizadas, redes neurais feedforward, XGBoost (extreme gradient boosting), random forest e support vector machines. Depois de estimar e ajustar cada modelo, eles foram combinados usando voting e stacking. Voting utilizando 10 modelos obteve o melhor resultado, com 89,42% de acurácia.
This work studies the identification of homophobic tweets from a natural language processing and machine learning approach. The goal is to construct a predictive model that can detect, with reasonable accuracy, whether a Tweet contains offensive content to LGBT or not. The database used to train the predictive models was constructed aggregating tweets from users that have interacted with politicians and/or political parties in Brazil. Tweets containing LGBT-related terms or that have references to open LGBT individuals were collected and manually classified. A large part of this work is in constructing features that accurately capture not only the text of the tweet but also specific characteristics of the users and language choices. In particular, the uses of swear words and strong vocabulary is a quite strong predictor of offensive tweets. Naturally, n-grams and term weighting schemes were also considered as features of the model. A total of 12 sets of features were constructed. A broad range of machine learning techniques were employed in the classification task: naive Bayes, regularized logistic regressions, feedforward neural networks, extreme gradient boosting (XGBoost), random forest and support vector machines. After estimating and tuning each model, they were combined using voting and stacking. Voting using 10 models obtained the best result, with 89.42% accuracy.
Bui, Minh Thanh. "Statistical modeling, level-set and ensemble learning for automatic segmentation of 3D high-frequency ultrasound data : towards expedited quantitative ultrasound in lymph nodes from cancer patients." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066146/document.
Повний текст джерелаThis work investigates approaches to obtain automatic segmentation of three media (i.e., lymph node parenchyma, perinodal fat and normal saline) in lymph node (LN) envelope data to expedite quantitative ultrasound (QUS) in dissected LNs from cancer patients. A statistical modeling study identified a two-parameter gamma distribution as the best model for data from the three media based on its high fitting accuracy, its analytically less-complex probability density function (PDF), and closed-form expressions for its parameter estimation. Two novel level-set segmentation methods that made use of localized statistics of envelope data to handle data inhomogeneities caused by attenuation and focusing effects were developed. The first, local region-based gamma distribution fitting (LRGDF), employed the gamma PDFs to model speckle statistics of envelope data in local regions at a controllable scale using a smooth function with a compact support. The second, statistical transverse-slice-based level-set (STS-LS), used gamma PDFs to locally model speckle statistics in consecutive transverse slices. A novel method was then designed and evaluated to automatically initialize the LRGDF and STS-LS methods using random forest classification with new proposed features. Methods developed in this research provided accurate, automatic and efficient segmentation results on simulated envelope data and data acquired for LNs from colorectal- and breast-cancer patients as compared with manual expert segmentation. Results also demonstrated that accurate QUS estimates are maintained when automatic segmentation is applied to evaluate excised LN data
Pacheco, Do Espirito Silva Caroline. "Feature extraction and selection for background modeling and foreground detection." Thesis, La Rochelle, 2017. http://www.theses.fr/2017LAROS005/document.
Повний текст джерелаIn this thesis, we present a robust descriptor for background subtraction which is able to describe texture from an image sequence. The descriptor is less sensitive to noisy pixels and produces a short histogram, while preserving robustness to illumination changes. Moreover, a descriptor for dynamic texture recognition is also proposed. This descriptor extracts not only color information, but also a more detailed information from video sequences. Finally, we present an ensemble for feature selection approach that is able to select suitable features for each pixel to distinguish the foreground objects from the background ones. Our proposal uses a mechanism to update the relative importance of each feature over time. For this purpose, a heuristic approach is used to reduce the complexity of the background model maintenance while maintaining the robustness of the background model. However, this method only reaches the highest accuracy when the number of features is huge. In addition, each base classifier learns a feature set instead of individual features. To overcome these limitations, we extended our previous approach by proposing a new methodology for selecting features based on wagging. We also adopted a superpixel-based approach instead of a pixel-level approach. This does not only increases the efficiency in terms of time and memory consumption, but also can improves the segmentation performance of moving objects
Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.
Повний текст джерелаWith the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods
NIGAM, HARSHIT. "SOFTWARE DEFECT PREDICTION USING ENSEMBLE OF MACHINE LEARNING TECHNIQUE." Thesis, 2020. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18102.
Повний текст джерелаYADAV, MAYANK. "USE OF ENSEMBLE LEARNERS TO PREDICT NUMBER OF DEFECTS IN A SOFTWARE." Thesis, 2023. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19838.
Повний текст джерелаJAWA, MISHA. "COMPARISION OF ENSEMBLE LEARNING MODELS AND IMPACT OF DATA BALANCING TECHNIQUE FOR SOFTWARE EFFORT ESTIMATION." Thesis, 2022. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19229.
Повний текст джерелаDolo, Kgaugelo Moses. "Differential evolution technique on weighted voting stacking ensemble method for credit card fraud detection." Diss., 2019. http://hdl.handle.net/10500/26758.
Повний текст джерелаSchool of Computing
M. Sc. (Computing)
Pisani, Francesco Sergio, Felice Crupi, and Gianluigi Folino. "Ensemble learning techniques for cyber security applications." Thesis, 2017. http://hdl.handle.net/handle/10955/1873.
Повний текст джерелаCyber security involves protecting information and systems from major cyber threats; frequently, some high-level techniques, such as for instance data mining techniques, are be used to efficiently fight, alleviate the effect or to prevent the action of the cybercriminals. In particular, classification can be efficiently used for many cyber security application, i.e. in intrusion detection systems, in the analysis of the user behavior, risk and attack analysis, etc. However, the complexity and the diversity of modern systems opened a wide range of new issues difficult to address. In fact, security softwares have to deal with missing data, privacy limitation and heterogeneous sources. Therefore, it would be really unlikely a single classification algorithm will perform well for all the types of data, especially in presence of changes and with constraints of real time and scalability. To this aim, this thesis proposes a framework based on the ensemble paradigm to cope with these problems. Ensemble is a learning paradigm where multiple learners are trained for the same task by a learning algorithm, and the predictions of the learners are combined for dealing with new unseen instances. The ensemble method helps to reduce the variance of the error, the bias, and the dependence from a single dataset; furthermore, it can be build in an incremental way and it is apt to distributed implementations. It is also particularly suitable for distributed intrusion detection, because it permits to build a network profile by combining different classifiers that together provide complementary information. However, the phase of building of the ensemble could be computationally expensive as when new data arrives, it is necessary to restart the training phase. For this reason, the framework is based on Genetic Programming to evolve a function for combining the classifiers composing the ensemble, having some attractive characteristics. First, the models composing the ensemble can be trained only on a portion of the training set, and then they can be combined and used without any extra phase of training. Moreover the models can be specialized for a single class and they can be designed to handle the difficult problems of unbalanced classes and missing data. In case of changes in the data, the function can be recomputed in an incrementally way, with a moderate computational effort and, in a streaming environment, drift strategies can be used to update the models. In addition, all the phases of the algorithm are distributed and can exploits the advantages of running on parallel/ distributed architectures to cope with real time constraints. The framework is oriented and specialized towards cyber security applications. For this reason, the algorithm is designed to work with missing data, unbalanced classes, models specialized on some tasks and model working with streaming data. Two typical scenarios in the cyber security domain are provided and some experiment are conducted on artificial and real datasets to test the effectiveness of the approach. The first scenario deals with user behavior. The actions taken by users could lead to data breaches and the damages could have a very high cost. The second scenario deals with intrusion detection system. In this research area, the ensemble paradigm is a very new technique and the researcher must completely understand the advantages of this solution.
Università della Calabria
NIGAM, HARSHIT. "SOFTWARE DEFECT PREDICTION USING ENSEMBLE OF MACHINE LEARNING TECHNIQUES." Thesis, 2020. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18103.
Повний текст джерелаAmaro, Miguel Mendes. "Credit scoring: comparison of non‐parametric techniques against logistic regression." Master's thesis, 2020. http://hdl.handle.net/10362/99692.
Повний текст джерелаOver the past decades, financial institutions have been giving increased importance to credit risk management as a critical tool to control their profitability. More than ever, it became crucial for these institutions to be able to well discriminate between good and bad clients for only accepting the credit applications that are not likely to default. To calculate the probability of default of a particular client, most financial institutions have credit scoring models based on parametric techniques. Logistic regression is the current industry standard technique in credit scoring models, and it is one of the techniques under study in this dissertation. Although it is regarded as a robust and intuitive technique, it is still not free from several critics towards the model assumptions it takes that can compromise its predictions. This dissertation intends to evaluate the gains in performance resulting from using more modern non-parametric techniques instead of logistic regression, performing a model comparison over four different real-life credit datasets. Specifically, the techniques compared against logistic regression in this study consist of two single classifiers (decision tree and SVM with RBF kernel) and two ensemble methods (random forest and stacking with cross-validation). The literature review demonstrates that heterogeneous ensemble approaches have a weaker presence in credit scoring studies and, because of that, stacking with cross-validation was considered in this study. The results demonstrate that logistic regression outperforms the decision tree classifier, has similar performance in relation to SVM and slightly underperforms both ensemble approaches in similar extents.
Reichenbach, Jonas. "Credit scoring with advanced analytics: applying machine learning methods for credit risk assessment at the Frankfurter sparkasse." Master's thesis, 2018. http://hdl.handle.net/10362/49557.
Повний текст джерелаThe need for controlling and managing credit risk obliges financial institutions to constantly reconsider their credit scoring methods. In the recent years, machine learning has shown improvement over the common traditional methods for the application of credit scoring. Even small improvements in prediction quality are of great interest for the financial institutions. In this thesis classification methods are applied to the credit data of the Frankfurter Sparkasse to score their credits. Since recent research has shown that ensemble methods deliver outstanding prediction quality for credit scoring, the focus of the model investigation and application is set on such methods. Additionally, the typical imbalanced class distribution of credit scoring datasets makes us consider sampling techniques, which compensate the imbalances for the training dataset. We evaluate and compare different types of models and techniques according to defined metrics. Besides delivering a high prediction quality, the model’s outcome should be interpretable as default probabilities. Hence, calibration techniques are considered to improve the interpretation of the model’s scores. We find ensemble methods to deliver better results than the best single model. Specifically, the method of the Random Forest delivers the best performance on the given data set. When compared to the traditional credit scoring methods of the Frankfurter Sparkasse, the Random Forest shows significant improvement when predicting a borrower’s default within a 12-month period. The Logistic Regression is used as a benchmark to validate the performance of the model.
Dias, Didier Narciso. "Soil Classification Resorting to Machine Learning Techniques." Master's thesis, 2019. http://hdl.handle.net/10362/125335.
Повний текст джерелаA classificação de solos é o ato de resumir a informação sobre um perfil do solo em uma única classe, da qual é possivel inferir várias propriedades, mesmo com a ausência de conhecimento sobre a área de estudo. Estas classes fazem a comunicação dos solos e de como estes podem ser usados, em áreas como a agricultura e silvicultura, mais simples de perceber. Infelizmente a classificação de solos é dispendiosa, demorada, e requer especialistas para realizar as experiências necessárias para classificar corretamente o solo em causa. A presente tese de mestrado focou-se na avaliação de algoritmos de aprendizagem automática para o problema de classificação de solos, baseada maioritariamente nos atributos intrínsecos destes, na região do México. Foi utilizada uma base de dados contendo 6 760 perfis de solos, os 19 464 horizontes que os constituem, e as propriedades químicas e físicas, como o pH e a percentagem de barro, pertencentes a esses horizontes. Quatro métodos de modelação de dados foram testados (standard depths, n first layers, thickness, e area weighted thickness), tal como diferentes valores para uma imputação baseada em k-Nearest Neighbours. Também foi realizada uma comparação entre algoritmos de aprendizagem automática, nomeadamente Random Forests, Gradient Tree Boosting, Deep Neural Networks e Recurrent Neural Networks. Todas as modelações de dados providenciaram resultados similares, quando propriamente parametrisados, atingindo valores de Kappa de 0.504 e accuracy de 0.554, sendo que o métdodo standard depths obteve uma performance mais consistente. O parâmetro k, referente ao método de imputação, revelou ter pouco impacto na variação dos resultados. O algoritmo Gradient Tree Boosting foi o que obteve melhores resultados, seguido de perto pelo modelo de Random Forests. Os métodos baseados em neurónios tiveram resultados substancialmente piores, nunca superando um valor de Kappa de 0.4.