Dissertations / Theses on the topic 'Learning with noisy labels'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Learning with noisy labels.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Yu, Xiyu. "Learning with Biased and Noisy Labels." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20125.
Full textCaye, Daudt Rodrigo. "Convolutional neural networks for change analysis in earth observation images with noisy labels and domain shifts." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT033.
Full textThe analysis of satellite and aerial Earth observation images allows us to obtain precise information over large areas. A multitemporal analysis of such images is necessary to understand the evolution of such areas. In this thesis, convolutional neural networks are used to detect and understand changes using remote sensing images from various sources in supervised and weakly supervised settings. Siamese architectures are used to compare coregistered image pairs and to identify changed pixels. The proposed method is then extended into a multitask network architecture that is used to detect changes and perform land cover mapping simultaneously, which permits a semantic understanding of the detected changes. Then, classification filtering and a novel guided anisotropic diffusion algorithm are used to reduce the effect of biased label noise, which is a concern for automatically generated large-scale datasets. Weakly supervised learning is also achieved to perform pixel-level change detection using only image-level supervision through the usage of class activation maps and a novel spatial attention layer. Finally, a domain adaptation method based on adversarial training is proposed, which succeeds in projecting images from different domains into a common latent space where a given task can be performed. This method is tested not only for domain adaptation for change detection, but also for image classification and semantic segmentation, which proves its versatility
Fang, Tongtong. "Learning from noisy labelsby importance reweighting: : a deep learning approach." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264125.
Full textFelaktiga annoteringar kan sänka klassificeringsprestanda.Speciellt för djupa nätverk kan detta leda till dålig generalisering. Nyligen har brusrobust djup inlärning överträffat andra inlärningsmetoder när det gäller hantering av komplexa indata Befintligta resultat från djup inlärning kan dock inte tillhandahålla rimliga viktomfördelningskriterier. För att hantera detta kunskapsgap och inspirerat av domänanpassning föreslår vi en ny robust djup inlärningsmetod som använder omviktning. Omviktningen görs genom att minimera den maximala medelavvikelsen mellan förlustfördelningen av felmärkta och korrekt märkta data. I experiment slår den föreslagna metoden andra metoder. Resultaten visar en stor forskningspotential för att tillämpa domänanpassning. Dessutom motiverar den föreslagna metoden undersökningar av andra intressanta problem inom domänanpassning genom att möjliggöra smarta omviktningar.
Ainapure, Abhijeet Narhar. "Application and Performance Enhancement of Intelligent Cross-Domain Fault Diagnosis in Rotating Machinery." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1623164772153736.
Full textChan, Jeffrey (Jeffrey D. ). "On boosting and noisy labels." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100297.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 53-56).
Boosting is a machine learning technique widely used across many disciplines. Boosting enables one to learn from labeled data in order to predict the labels of unlabeled data. A central property of boosting instrumental to its popularity is its resistance to overfitting. Previous experiments provide a margin-based explanation for this resistance to overfitting. In this thesis, the main finding is that boosting's resistance to overfitting can be understood in terms of how it handles noisy (mislabeled) points. Confirming experimental evidence emerged from experiments using the Wisconsin Diagnostic Breast Cancer(WDBC) dataset commonly used in machine learning experiments. A majority vote ensemble filter identified on average that 2.5% of the points in the dataset as noisy. The experiments chiefly investigated boosting's treatment of noisy points from a volume-based perspective. While the cell volume surrounding noisy points did not show a significant difference from other points, the decision volume surrounding noisy points was two to three times less than that of non-noisy points. Additional findings showed that decision volume not only provides insight into boosting's resistance to overfitting in the context of noisy points, but also serves as a suitable metric for identifying which points in a dataset are likely to be mislabeled.
by Jeffrey Chan.
M. Eng.
Almansour, Amal. "Credibility assessment for Arabic micro-blogs using noisy labels." Thesis, King's College London (University of London), 2016. https://kclpure.kcl.ac.uk/portal/en/theses/credibility-assessment-for-arabic-microblogs-using-noisy-labels(6baf983a-940d-4c2c-8821-e992348b4097).html.
Full textNorthcutt, Curtis George. "Classification with noisy labels : "Multiple Account" cheating detection in Open Online Courses." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/111870.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 113-122).
Massive Open Online Courses (MOOCs) have the potential to enhance socioeconomic mobility through education. Yet, the viability of this outcome largely depends on the reputation of MOOC certificates as a credible academic credential. I describe a cheating strategy that threatens this reputation and holds the potential to render the MOOC certificate valueless. The strategy, Copying Answers using Multiple Existences Online (CAMEO), involves a user who gathers solutions to assessment questions using one or more harvester accounts and then submits correct answers using one or more separate master accounts. To estimate a lower bound for CAMEO prevalence among 1.9 million course participants in 115 HarvardX and MITx courses, I introduce a filter-based CAMEO detection algorithm and use a small-scale experiment to verify CAMEO use with certainty. I identify preventive strategies that can decrease CAMEO rates and show evidence of their effectiveness in science courses. Because the CAMEO algorithm functions as a lower bound estimate, it fails to detect many CAMEO cheaters. As a novelty of this thesis, instead of improving the shortcomings of the CAMEO algorithm directly, I recognize that we can think of the CAMEO algorithm as a method for producing noisy predicted cheating labels. Then a solution to the more general problem of binary classification with noisy labels ( ~ P̃̃̃ Ñ learning) is a solution to CAMEO cheating detection. ~ P̃ Ñ learning is the problem of binary classification when training examples may be mislabeled (flipped) uniformly with noise rate 1 for positive examples and 0 for negative examples. I propose Rank Pruning to solve ~ P ~N learning and the open problem of estimating the noise rates. Unlike prior solutions, Rank Pruning is efficient and general, requiring O(T) for any unrestricted choice of probabilistic classifier with T fitting time. I prove Rank Pruning achieves consistent noise estimation and equivalent expected risk as learning with uncorrupted labels in ideal conditions, and derive closed-form solutions when conditions are non-ideal. Rank Pruning achieves state-of-the-art noise rate estimation and F1, error, and AUC-PR on the MNIST and CIFAR datasets, regardless of noise rates. To highlight, Rank Pruning with a CNN classifier can predict if a MNIST digit is a one or not one with only 0:25% error, and 0:46% error across all digits, even when 50% of positive examples are mislabeled and 50% of observed positive labels are mislabeled negative examples. Rank Pruning achieves similarly impressive results when as large as 50% of training examples are actually just noise drawn from a third distribution. Together, the CAMEO and Rank Pruning algorithms allow for a robust, general, and time-efficient solution to the CAMEO cheating detection problem. By ensuring the validity of MOOC credentials, we enable MOOCs to achieve both openness and value, and thus take one step closer to the greater goal of democratization of education.
by Curtis George Northcutt.
S.M.
Ekambaram, Rajmadhan. "Active Cleaning of Label Noise Using Support Vector Machines." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/6830.
Full textBalasubramanian, Krishnakumar. "Learning without labels and nonnegative tensor factorization." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33926.
Full textNugyen, Duc Tam [Verfasser], and Thomas [Akademischer Betreuer] Brox. "Robust deep learning for computer vision to counteract data scarcity and label noise." Freiburg : Universität, 2020. http://d-nb.info/1226657060/34.
Full textFonseca, Eduardo. "Training sound event classifiers using different types of supervision." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/673067.
Full textEl interés en el reconocimiento automático de eventos sonoros se ha incrementado en los últimos años, motivado por nuevas aplicaciones en campos como la asistencia médica, smart homes, o urbanismo. Al comienzo de esta tesis, la investigación en clasificación de eventos sonoros se centraba principalmente en aprendizaje supervisado usando datasets pequeños, a menudo anotados cuidadosamente con vocabularios limitados a dominios específicos (como el urbano o el doméstico). Sin embargo, tales datasets no permiten entrenar clasificadores capaces de reconocer los cientos de eventos sonoros que ocurren en nuestro entorno, como silbidos de kettle, sonidos de pájaros, coches pasando, o diferentes alarmas. Al mismo tiempo, websites como Freesound o YouTube albergan grandes cantidades de datos de sonido ambiental, que pueden ser útiles para entrenar clasificadores con un vocabulario más extenso, particularmente utilizando métodos de deep learning que requieren gran cantidad de datos. Para avanzar el estado del arte en la clasificación de eventos sonoros, esta tesis investiga varios aspectos de la creación de datasets, así como de aprendizaje supervisado y no supervisado para entrenar clasificadores de eventos sonoros con un vocabulario extenso, utilizando diferentes tipos de supervisión de manera novedosa y alternativa. En concreto, nos centramos en aprendizaje supervisado usando etiquetas sin ruido y con ruido, así como en aprendizaje de representaciones auto-supervisado a partir de datos no etiquetados. La primera parte de esta tesis se centra en la creación de FSD50K, un dataset con más de 100h de audio etiquetado manualmente usando 200 clases de eventos sonoros. Presentamos una descripción detallada del proceso de creación y una caracterización exhaustiva del dataset. Además, exploramos modificaciones arquitectónicas para aumentar la invariancia frente a desplazamientos en CNNs, mejorando la robustez frente a desplazamientos de tiempo/frecuencia en los espectrogramas de entrada. En la segunda parte, nos centramos en entrenar clasificadores de eventos sonoros usando etiquetas con ruido. Primero, proponemos un dataset que permite la investigación del ruido de etiquetas real. Después, exploramos métodos agnósticos a la arquitectura de red para mitigar el efecto del ruido en las etiquetas durante el entrenamiento, incluyendo técnicas de regularización, funciones de coste robustas al ruido, y estrategias para rechazar ejemplos etiquetados con ruido. Además, desarrollamos un método teacher-student para abordar el problema de las etiquetas ausentes en datasets de eventos sonoros. En la tercera parte, proponemos algoritmos para aprender representaciones de audio a partir de datos sin etiquetar. En particular, desarrollamos métodos de aprendizaje contrastivos auto-supervisados, donde las representaciones se aprenden comparando pares de ejemplos calculados a través de métodos de aumento de datos y separación automática de sonido. Finalmente, reportamos sobre la organización de dos DCASE Challenge Tasks para el tageado automático de audio a partir de etiquetas ruidosas. Mediante la propuesta de datasets, así como de métodos de vanguardia y representaciones de audio, esta tesis contribuye al avance de la investigación abierta sobre eventos sonoros y a la transición del aprendizaje supervisado tradicional utilizando etiquetas sin ruido a otras estrategias de aprendizaje menos dependientes de costosos esfuerzos de anotación.
Louche, Ugo. "From confusion noise to active learning : playing on label availability in linear classification problems." Thesis, Aix-Marseille, 2016. http://www.theses.fr/2016AIXM4025/document.
Full textThe works presented in this thesis fall within the general framework of linear classification, that is the problem of categorizing data into two or more classes based on on a training set of labelled data. In practice though acquiring labeled examples might prove challenging and/or costly as data are inherently easier to obtain than to label. Dealing with label scarceness have been a motivational goal in the machine learning literature and this work discuss two settings related to this problem: learning in the presence of noise and active learning
Akavia, Adi. "Learning noisy characters, multiplication codes, and cryptographic hardcore predicates." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/43032.
Full textIncludes bibliographical references (p. 181-187).
We present results in cryptography, coding theory and sublinear algorithms. In cryptography, we introduce a unifying framework for proving that a Boolean predicate is hardcore for a one-way function and apply it to a broad family of functions and predicates, showing new hardcore predicates for well known one-way function candidates such as RSA and discrete-log as well as reproving old results in an entirely different way. Our proof framework extends the list-decoding method of Goldreich and Levin [38] for showing hardcore predicates, by introducing a new class of error correcting codes and new list-decoding algorithm we develop for these codes. In coding theory, we introduce a novel class of error correcting codes that we name: Multiplication codes (MPC). We develop decoding algorithms for MPC codes, showing they achieve desirable combinatorial and algorithmic properties, including: (1) binary MPC of constant distance and exponential encoding length for which we provide efficient local list decoding and local self correcting algorithms; (2) binary MPC of constant distance and polynomial encoding length for which we provide efficient decoding algorithm in random noise model; (3) binary MPC of constant rate and distance. MPC codes are unique in particular in achieving properties as above while having a large group as their underlying algebraic structure. In sublinear algorithms, we present the SFT algorithm for finding the sparse Fourier approximation of complex multi-dimensional signals in time logarithmic in the signal length. We also present additional algorithms for related settings, differing in the model by which the input signal is given, in the considered approximation measure, and in the class of addressed signals. The sublinear algorithms we present are central components in achieving our results in cryptography and coding theory.
(cont) Reaching beyond theoretical computer science, we suggest employing our algorithms as tools for performance enhancement in data intensive applications, in particular, we suggest replacing the O(log N)-time FFT algorithm with our e(log N)-time SFT algorithm for settings where a sparse approximation suffices.
by Adi Akavia.
Ph.D.
Zhao, Yan. "Deep learning methods for reverberant and noisy speech enhancement." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1593462119759348.
Full textCAPPOZZO, ANDREA. "Robust model-based classification and clustering: advances in learning from contaminated datasets." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2020. http://hdl.handle.net/10281/262919.
Full textAt the time of writing, an ever-increasing amount of data is collected every day, with its volume estimated to be doubling every two years. Thanks to the technological advancements, datasets are becoming massive in terms of size and substantially more complex in nature. Nevertheless, this abundance of ``raw information'' does come at a price: wrong measurements, data-entry errors, breakdowns of automatic collection systems and several other causes may ultimately undermine the overall data quality. To this extent, robust methods have a central role in properly converting contaminated ``raw information'' to trustworthy knowledge: a primary goal of any statistical analysis. The present manuscript presents novel methodologies for performing reliable inference, within the model-based classification and clustering framework, in presence of contaminated data. First, we propose a robust modification to a family of semi-supervised patterned models, for accomplishing classification when dealing with both class and attribute noise. Second, we develop a discriminant analysis method for anomaly and novelty detection, with the final aim of discovering label noise, outliers and unobserved classes in an unlabelled dataset. Third, we introduce two robust variable selection methods, that effectively perform high-dimensional discrimination within an adulterated scenario.
Kim, Seungyeon. "Novel document representations based on labels and sequential information." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53946.
Full textHe, Jin. "Robust Mote-Scale Classification of Noisy Data via Machine Learning." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1440413201.
Full textMoon, Taesup. "Learning from noisy data with applications to filtering and denoising /." May be available electronically:, 2008. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.
Full textDu, Yuxuan. "The Power of Quantum Neural Networks in The Noisy Intermediate-Scale Quantum Era." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/24976.
Full textVafaie, Parsa. "Learning in the Presence of Skew and Missing Labels Through Online Ensembles and Meta-reinforcement Learning." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42636.
Full textBrodin, Johan. "Working with emotions : Recommending subjective labels to music tracks using machine learning." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-199278.
Full textKurerade musiksamlingar är ett växande område som en direkt följd av den frihet som strömmande musiktjänster som Spotify ger oss. För att kunna kategorisera låtar baserade på subjektiva värderingar på ett skalbart sätt har denna avhandling undersökt om rekommendationer av sådana etiketter är möjliga genom maskininlärning. När 2464 spår med ett eller flera av 22 olika kärnvärden analyserades byggdes en profil för varje spår upp av attribut från tre olika kategorier: redaktionella, kulturella och akustiska. Vid klassificering av spåren undersöktes flera olika metoder för fleretikettsklassificering. Genom att kombinera fem olika transformationsmetoder med tre bas-klassificerare och använda två algoritm-anpassningar konstruerades totalt 17 olika konfigurationer. De olika konfigurationerna utvärderades med flera olika mätvärden, inkluderat (men inte begränsat till) Hamming Loss, Ranking Loss, One error, F1 score, exakt matchning och både träningstid och testningstid. Resultaten visade att transformationsalgoritmen ”Label Powerset” tillsammans med Sekventiell Minimal Optimering utklassade de andra konfigurationerna. Vi fann också lovande resultat för artificiella neuronnät, något som bör undersökas ytterligare i framtiden.
Schaeffer, Laura M. "Interaction of instructional material order and subgoal labels on learning in programming." Thesis, Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54459.
Full textHsu, Wei-Ning Ph D. Massachusetts Institute of Technology. "Speech processing with less supervision : learning from weak labels and multiple modalities." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/127021.
Full textCataloged from the official PDF of thesis.
Includes bibliographical references (pages 191-217).
In recent years, supervised learning has achieved great success in speech processing with powerful neural network models and vast quantities of in-domain labeled data. However, collecting a labeled dataset covering all domains can be either expensive due to the diversity of speech or almost impossible for some tasks such as speech-to-speech translation. Such a paradigm limits the applicability of speech technologies to high-resource settings. In sharp contrast, humans are good at reading the training signals from indirect supervision, such as from small amount of explicit labels and from different modalities. This capability enables humans to learn from a wider variety of resources, including better domain coverage. In light of this observation, this thesis focuses on learning algorithms for speech processing that can utilize weak and indirect supervision to overcome the restrictions imposed by the supervised paradigm and make the most out of the data at hand for learning.
In the first part of the thesis, we devise a self-training algorithm for speech recognition that distills knowledge from a trained language model, a compact form of external non-speech prior knowledge. The algorithm is inspired by how humans use contextual and prior information to bias speech recognition and produce confident predictions. To distill knowledge within the language model, we implement a beam-search based objective to align the prediction probability with the likelihood of the language model among candidate hypotheses. Experimental results demonstrate state-of-the-art performance that recover word error rates by up to 90% relative to using the same data with ground truth transcripts. Moreover, we show that the proposed algorithm can scale to 60,000 hours of unlabeled speech and yield further reduction in word error rates.
In the second part of the thesis, we present several text-to-speech synthesis models that enable fine-grained control of unlabeled non-textual attributes, including voice, prosody, acoustic environment properties and microphone channel effects. We achieve controllability of unlabeled attributes by formulating a text-to-speech system as a generative model with structured latent variables, and learn this generative process along with an efficient approximate inference model by adopting the variational autoencoder framework. We demonstrate that those latent variables can then be used to control the unlabeled variations in speech, making it possible to build a high-quality speech synthesis model using weakly-labeled mixed-quality speech data as the model learns to control the hidden factors. In the last part of the thesis, we extend a cross-modal semantic embedding learning framework proposed in Harwath et al.
(2019) to learn hierarchical discrete linguistic units from visually grounded speech, a form of multimodal sensory data. By utilizing a discriminative, multimodal grounding objective, the proposed framework forces the learned units to be useful for semantic image retrieval. In contrast, most of the previous work on linguistic unit discovery do not use multimodal data--they consider a reconstruction objective that encourages the learned units to be useful for reconstructing the speech, and hence those units may also encode non-linguistic factors. Experimental results show that the proposed framework outperforms state-of-the-art phonetic unit discovery frameworks by almost 50% on the ZeroSpeech 2019 ABX phone discriminative task, and learns word detectors that discover over 270 words with an F1 score of greater than 0.5. In addition, the learned units from the proposed framework are also more robust to nuisance variation compared to frameworks that learn from only speech.
by Wei-Ning Hsu.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Alt, Jonathan K. "Learning from Noisy and Delayed Rewards The Value of Reinforcement Learning to Defense Modeling and Simulation." Thesis, Monterey, California. Naval Postgraduate School, 2012. http://hdl.handle.net/10945/17313.
Full textModeling and simulation of military operations requires human behavior models capable of learning from experi-ence in complex environments where feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of AI learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the re-ward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is devel-oped and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.
Tabassum, Binte Jafar Jeniya. "Information Extraction From User Generated Noisy Texts." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1606315356821532.
Full textBalda, Cañizares Emilio Rafael [Verfasser], Rudolf [Akademischer Betreuer] Mathar, and Bastian [Akademischer Betreuer] Leibe. "Robustness analysis of deep neural networks in the presence of adversarial perturbations and noisy labels / Emilio Rafael Balda Canizares ; Rudolf Mathar, Bastian Leibe." Aachen : Universitätsbibliothek der RWTH Aachen, 2019. http://d-nb.info/1216040931/34.
Full textHolland, Hans Mullinnix. "Treatment of Instance-Based Classifiers Containing Ambiguous Attributes and Class Labels." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/84.
Full textKhasgiwala, Anuj. "Word Recognition in Nutrition Labels with Convolutional Neural Network." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7101.
Full textJimenez, Blazquez Lara. "Mathematical Methods for Maritime Signal Curation in Noisy Environments." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-43653.
Full textQin, Zengchang. "Learning with fuzzy labels : a random set approach towards intelligent data mining systems." Thesis, University of Bristol, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.422575.
Full textMediani, Mohammed [Verfasser], and A. [Akademischer Betreuer] Waibel. "Learning from Noisy Data in Statistical Machine Translation / Mohammed Mediani ; Betreuer: A. Waibel." Karlsruhe : KIT-Bibliothek, 2017. http://d-nb.info/1137946598/34.
Full textWilliamson, Donald S. "DEEP LEARNING METHODS FOR IMPROVING THE PERCEPTUAL QUALITY OF NOISY AND REVERBERANT SPEECH." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1461018277.
Full textMariello, Andrea. "Learning from noisy data through robust feature selection, ensembles and simulation-based optimization." Doctoral thesis, Università degli studi di Trento, 2019. https://hdl.handle.net/11572/367772.
Full textMariello, Andrea. "Learning from noisy data through robust feature selection, ensembles and simulation-based optimization." Doctoral thesis, University of Trento, 2019. http://eprints-phd.biblio.unitn.it/3545/1/tesi_mariello.pdf.
Full textJayal, Ambikesh. "Framework to manage labels for e-assessment of diagrams." Thesis, Brunel University, 2010. http://bura.brunel.ac.uk/handle/2438/4496.
Full textBista, Shachi. "Extracting Adverse Drug Reactions from Product Labels using Deep Learning and Natural Language Processing." Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277815.
Full textFarmakovigilans berör de aktiviteter som förbättrar förståelsen av biverkningar av läkemedel. Trots de stränga prövningar som behövs för läkemedelsutvecklingen finns ändå en del biverkningar som är okända p.g.a. genetik, fysiologiska eller demografiska faktorer. Uppsala Monitoring Centre (UMC), i samarbete med World Health Organization (WHO) är vårdnadshavare till den globala databasen av rapporter på medicinska biverkningar, VigiBase. VigiBase innehåller över 20 miljoner misstänkta rapporter från hela världen. Dock, en andel av dessa rapporter beskriver biverkningar som är redan kända. Egentligen finns det över 3 miljoner potentiella samband mellan alla läkemedel och biverkningar i databasen. Att hitta den riktiga och okända biverkningar behövs kraftfulla statistiska metoder samt kunskap om det kända säkerhetsprofil av läkemedlet. Det finns ett behöv för ett databas som kartlägger läkemedel med alla kända biverkningar men, inget sådant databas finns idag. Syftet med detta examensarbete är att utveckla en djup-lärandemodell som kan läsa av texter på läkemedels etiketter — tillsynsdokument som beskriver säkerhetsprofil av läkemedel — och kartlägga dem till ett standardiserat terminologi med hög precision. Problemet kan brytas in i två fas, den första scanning och den andra mapping. Scanning handlar om att kartlägga position av text-fragmentet i etiketter. Mapping handlar om att kartlägga de detekterade text-fragmentet till Medical Dictionary for Regulatory Activities (MedDRA), den terminologi som används i UMC för biverkningar. Tidigare försök, s.k. dictionary-based approach på UMC uppnådde scanning F1 i 0,42 (0,31 precision; 0,64 recall) och mapping macro-averaged F1 i 0,43 (0,39 macro-averaged precision; 0,64 macro-averaged recall). De bästa systemen (s.k. state-of-the-art) uppnådde scanning F1 över 0,8 och 0,7 för den scanning respektive mapping problemet. Jag använder den 2019 ADE Evaluation Challenge dataset att utveckla algoritmerna i projektet. Detta dataset innehåller 100 läkemedels etiketter annoterad med biverkningar och deras kartläggning i MedDRA. Denna avhandling utforskar tre arkitekturer till scanning problemet: 1) Bidirectional Long Short-Term Memory (BiLSTM) och softmax för klassificering, 2) BiLSTM med Conditional Random Field (CRF) klassificering och, till sist, 3) BiLSTM med CRF klassificering och Embeddings from Language Model (ELMo) embeddings. Med avseende till mapping problematiken utforskar jag metoder inom Information Retrieval genom användning av sökmotorerna whoosh och Solr. För att förbättra prestandan i mapping utforskar jag Learning to Rank metoder. BiLSTM med CRF presterade bäst inom scanning problematiken med F1 i 0,67 (0,75 precision; 0,61 recall) som är ett 0,06 absolut ökning över den BiLSTM encoder med softmax klassificering. Med ELMo försämrade F1 till 0,62. Analys av felet visade att Inside, Beginning, Outside (IOB2) märkning som jag har valt att använda passar inte till att beteckna diskontinuerliga och sammansatta spans, och tillför betydande osäkerhet i träningsdata. Med avseende till mapping problematiken har jag kollat på sökmotorn Solr och whoosh, med, och utan Learning to Rank. Solr visade sig som den bäst presterande sökmotorn med macro-averaged F1 i 0,49 jämfört med whoosh som visade macro-averaged F1 i 0,47. Learning to Rank algoritmerna försämrade F1 med över 0,1 för båda sökmotorer. Den bäst presterande scanning och mapping algoritmer slog den baseline systemets F1 med 0,25 i scanning faset, och 0,06 i mapping fasen. Ett stor källa av fel för den Solr sökmotorn har kommit från tokeniserings-fel, som hade en försämringseffekt i prestanda genom hela pipelinen. I slutsats, moderna Natural Language Processing (NLP) tekniker kan kraftigt öka prestanda inom detektering av biverkningar från etiketter och texter, jämfört med gamla dictionary metoder, särskilt när kontexten är viktigt.
Sävhammar, Simon. "Uniform interval normalization : Data representation of sparse and noisy data sets for machine learning." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19194.
Full textMirylenka, Katsiaryna. "Mining and Learning in Sequential Data Streams: Interesting Correlations and Classification in Noisy Settings." Doctoral thesis, Università degli studi di Trento, 2015. https://hdl.handle.net/11572/368620.
Full textMirylenka, Katsiaryna. "Mining and Learning in Sequential Data Streams: Interesting Correlations and Classification in Noisy Settings." Doctoral thesis, University of Trento, 2015. http://eprints-phd.biblio.unitn.it/1398/1/mirylenka.pdf.
Full textRoy, Sujan K. "Kalman Filtering with Machine Learning Methods for Speech Enhancement." Thesis, Griffith University, 2021. http://hdl.handle.net/10072/404456.
Full textThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text
Nguyen, Thanh Tan. "Selected non-convex optimization problems in machine learning." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/200748/1/Thanh_Nguyen_Thesis.pdf.
Full textYoung, William Albert II. "LEARNING RATES WITH CONFIDENCE LIMITS FOR JET ENGINE MANUFACTURING PROCESSES AND PART FAMILIES FROM NOISY DATA." Ohio University / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1131637106.
Full textTandon, Prateek. "Bayesian Aggregation of Evidence for Detection and Characterization of Patterns in Multiple Noisy Observations." Research Showcase @ CMU, 2015. http://repository.cmu.edu/dissertations/658.
Full textJones, Nelda Morreau Lanny E. Lian Ming-Gon John. "Relationship between special education diagnostic labels and placement characteristics of children in foster care." Normal, Ill. Illinois State University, 1996. http://wwwlib.umi.com/cr/ilstu/fullcit?p9633420.
Full textTitle from title page screen, viewed May 23, 2006. Dissertation Committee: Lanny E. Morreau, Ming-Gon J. Lian (co-chairs), Keith E. Stearns, Kenneth H. Strand, Jeanne A. Howard. Includes bibliographical references (leaves 140-165) and abstract. Also available in print.
Leoni, Cristian. "Interpretation of Dimensionality Reduction with Supervised Proxies of User-defined Labels." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105622.
Full textReich, Christian [Verfasser], and Laerhoven Kristof [Gutachter] Van. "Learning machine monitoring models from sparse and noisy sensor data annotations / Christian Reich ; Gutachter: Kristof Van Laerhoven." Siegen : Universitätsbibliothek der Universität Siegen, 2020. http://d-nb.info/122050615X/34.
Full textReich, Christian [Verfasser], and Kristof Van [Gutachter] Laerhoven. "Learning machine monitoring models from sparse and noisy sensor data annotations / Christian Reich ; Gutachter: Kristof Van Laerhoven." Siegen : Universitätsbibliothek der Universität Siegen, 2020. http://nbn-resolving.de/urn:nbn:de:hbz:467-17183.
Full textZlicar, Blaz. "Algorithms for noisy and nonstationary data : advances in financial time series forecasting and pattern detection with machine learning." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10043123/.
Full textKraus, Vivien. "Apprentissage semi-supervisé pour la régression multi-labels : application à l’annotation automatique de pneumatiques." Thesis, Lyon, 2021. https://tel.archives-ouvertes.fr/tel-03789608.
Full textWith the advent and rapid growth of digital technologies, data has become a precious asset as well as plentiful. However, with such an abundance come issues about data quality and labelling. Because of growing numbers of available data volumes, while human expert labelling is still important, it is more and more necessary to reinforce semi-supervised learning with the exploitation of unlabeled data. This problem is all the more noticeable in the multi-label learning framework, and in particular for regression, where each statistical unit is guided by many different targets, taking the form of numerical scores. This thesis focuses on this fundamental framework. First, we begin by proposing a method for semi-supervised regression, that we challenge through a detailed experimental study. Thanks to this new method, we present a second contribution, more fitted to the multi-label framework. We also show its efficiency with a comparative study on literature data sets. Furthermore, the problem dimension is always a pain point of machine learning, and reducing it sparks the interest of many researchers. Feature selection is one of the major tasks addressing this problem, and we propose to study it here in a complex framework : for semi-supervised, multi-label regression. Finally, an experimental validation is proposed on a real problem about automatic annotation of tires, to tackle the needs expressed by the industrial partner of this thesis
MANSERVIGI, LUCREZIA. "Detection and classification of fults and anomalies in gas turbine sensors by means of statistical filters and machine learning models." Doctoral thesis, Università degli studi di Ferrara, 2021. http://hdl.handle.net/11392/2478821.
Full textMonitoring and diagnostics of gas turbines is a key challenge that can be performed only if the unit is equipped with reliable sensors, thus providing the actual operating condition of the energy system under investigation. Thus, the evaluation of sensor reliability is fundamental since only a reliable measurement can lead to proper decisions about system operation and health state. In fact, a faulty sensor may provide misleading information for decision making, at the expense of business interruption and maintenance-related costs. For this reason, this thesis develops, tunes and validates comprehensive methodologies for the detection and classification of both faults and anomalies affecting gas turbine sensors. This purpose is achieved by means of two different analyses and related tools. First, the Improved Detection, Classification and Integrated Diagnostics of Gas Turbine Sensors (I-DCIDS) tool is developed. The I-DCIDS tool comprises two kernels, namely Fault Detection Tool and Sensor Overall Health State Analysis (SOHSA). The former detects and classifies the most frequent fault classes. The latter evaluates the sensor overall health state. The novel diagnostic tool is suitable for assessing the health state of both single sensors and redundant/correlated sensors. The methodology uses basic mathematical laws that require some user-defined configuration parameters. Thus, a sensitivity analysis is carried out on I-DCIDS parameters to derive their optimal setting. The sensitivity analysis is performed on four heterogeneous and challenging field datasets referring to correlated sensors. Then, the I-DCIDS tool is validated by means of an additional field dataset, by proving its detection capability. Furthermore, the I-DCIDS tool is also exploited to evaluate the health state of several single sensors, by analyzing a huge amount of field data that refer to six different physical quantities. These analyses provide some rules of thumb for field operation, with the final aim of identifying time occurrence and magnitude of faulty sensors. The results demonstrate the diagnostic capability of the I-DCIDS approach in a real-world scenario. Moreover, the methodology proves to be suitable for all types of datasets and physical quantities and, thanks to its optimal tuning, also capable of identifying the actual time point of fault onset. A further challenge addressed in this thesis relies on the evaluation of raw data reliability, which may be compromised because of process anomalies. Such anomalies, which have been rarely investigated in the literature, may introduce errors whereby the unit of measure of a sensor is wrongly assumed. In this thesis such a situation is named Unit of Measure Inconsistency (UMI). Thus, this thesis is also aimed at identifying the approach that is mostly able to successfully detect UMI occurrence and classify unlabeled data. Among several alternatives, the capability of three supervised Machine Learning classifiers, i.e., Support Vector Machine, Naïve Bayes and K-Nearest Neighbors is investigated. In addition, a novel methodology, namely Improved Nearest Neighbor is proposed and investigated. The capability of each classifier is assessed by means of several analyses, so that the influence of the reliability of the data used for training the classifier and the number of classes is investigated. Among all tested approaches, the Naïve Bayes classifier and the novel Improved Nearest Neighbor prove to be the most effective, since they demonstrate their effectiveness, robustness and general validity in the majority of the cases. Thanks to the selected classifiers, the actual unit of measure of raw data can be provided and further sensor diagnoses can be safely performed. Finally, it has to be highlighted that all analyses reported in this thesis make use of field data acquired from sensors installed on Siemens gas turbines.