Thèses sur le sujet « Learning with Limited Data »

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Learning with Limited Data.

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleures thèses pour votre recherche sur le sujet « Learning with Limited Data ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.

1

Chen, Si. « Active Learning Under Limited Interaction with Data Labeler ». Thesis, Virginia Tech, 2021. http://hdl.handle.net/10919/104894.

Texte intégral
Résumé :
Active learning (AL) aims at reducing labeling effort by identifying the most valuable unlabeled data points from a large pool. Traditional AL frameworks have two limitations: First, they perform data selection in a multi-round manner, which is time-consuming and impractical. Second, they usually assume that there are a small amount of labeled data points available in the same domain as the data in the unlabeled pool. In this thesis, we initiate the study of one-round active learning to solve the first issue. We propose DULO, a general framework for one-round setting based on the notion of data utility functions, which map a set of data points to some performance measure of the model trained on the set. We formulate the one-round active learning problem as data utility function maximization. We then propose D²ULO on the basis of DULO as a solution that solves both issues. Specifically, D²ULO leverages the idea of domain adaptation (DA) to train a data utility model on source labeled data. The trained utility model can then be used to select high-utility data in the target domain and at the same time, provide an estimate for the utility of the selected data. Our experiments show that the proposed frameworks achieves better performance compared with state-of-the-art baselines in the same setting. Particularly, D²ULO is applicable to the scenario where the source and target labels have mismatches, which is not supported by the existing works.
M.S.
Machine Learning (ML) has achieved huge success in recent years. Machine Learning technologies such as recommendation system, speech recognition and image recognition play an important role on human daily life. This success mainly build upon the use of large amount of labeled data: Compared with traditional programming, a ML algorithm does not rely on explicit instructions from human; instead, it takes the data along with the label as input, and aims to learn a function that can correctly map data to the label space by itself. However, data labeling requires human effort and could be time-consuming and expensive especially for datasets that contain domain-specific knowledge (e.g., disease prediction etc.) Active Learning (AL) is one of the solution to reduce data labeling effort. Specifically, the learning algorithm actively selects data points that provide more information for the model, hence a better model can be achieved with less labeled data. While traditional AL strategies do achieve good performance, it requires a small amount of labeled data as initialization and performs data selection in multi-round, which pose great challenge to its application, as there is no platform provide timely online interaction with data labeler and the interaction is often time inefficient. To deal with the limitations, we first propose DULO which a new setting of AL is studied: data selection is only allowed to be performed once. To further broaden the application of our method, we propose D²ULO which is built upon DULO and Domain Adaptation techniques to avoid the use of initial labeled data. Our experiments show that both of the proposed two frameworks achieve better performance compared with state-of-the-art baselines.
Styles APA, Harvard, Vancouver, ISO, etc.
2

Dvornik, Mikita. « Learning with Limited Annotated Data for Visual Understanding ». Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM050.

Texte intégral
Résumé :
La capacité des méthodes d'apprentissage profond à exceller en vision par ordinateur dépend fortement de la quantité de données annotées disponibles pour la formation. Pour certaines tâches, l'annotation peut être trop coûteuse et demander trop de travail, devenant ainsi le principal obstacle à une meilleure précision. Les algorithmes qui apprennent automatiquement à partir des données, sans supervision humaine, donnent de bien pires résultats que leurs homologues entièrement supervisés. Il y a donc une forte motivation à travailler sur des méthodes efficaces d'apprentissage avec des annotations limitées. Cette thèse propose d'exploiter les connaissances préalables sur la tâche et développe des solutions plus efficaces pour la compréhension des scènes et la classification de quelques images.Les principaux défis de la compréhension des scènes comprennent la détection d'objets, la sémantique et la segmentation des instances. De même, toutes ces tâches visent à reconnaître et localiser des objets, au niveau de la région ou au niveau plus précis des pixels, ce qui rend le processus d'annotation difficile. La première contribution de ce manuscrit est un réseau neuronal convolutionnel (CNN) qui effectue à la fois la détection d'objets et la segmentation sémantique. Nous concevons une architecture de réseau spécialisée, qui est formée pour résoudre les deux problèmes en un seul passage et qui fonctionne en temps réel. Grâce à la procédure de formation multitâche, les deux tâches bénéficient l'une de l'autre en termes de précision, sans données supplémentaires étiquetées.La deuxième contribution introduit une nouvelle technique d'augmentation des données, c'est-à-dire l'augmentation artificielle de la quantité de données de formation. Il vise à créer de nouvelles scènes par copier-coller d'objets d'une image à l'autre, dans un ensemble de données donné. Placer un objet dans un contexte approprié s'est avéré crucial pour améliorer la compréhension de la scène. Nous proposons de modéliser explicitement le contexte visuel à l'aide d'un CNN qui découvre les corrélations entre les catégories d'objets et leur voisinage typique, puis propose des emplacements réalistes à augmenter. Dans l'ensemble, le collage d'objets aux "bons endroits" permet d'améliorer les performances de détection et de segmentation des objets, avec des gains plus importants dans les scénarios d'annotations limitées.Pour certains problèmes, les données sont extrêmement rares et un algorithme doit apprendre de nouveaux concepts à partir de quelques exemples. Peu de classification consiste à apprendre un modèle prédictif capable de s'adapter efficacement à une nouvelle classe, avec seulement quelques échantillons annotés. Alors que la plupart des méthodes actuelles se concentrent sur le mécanisme d'adaptation, peu de travaux ont abordé explicitement le problème du manque de données sur la formation. Dans notre troisième article, nous montrons qu'en s'attaquant à la question fondamentale de la variance élevée des classificateurs d'apprentissage à faible tir, il est possible de surpasser considérablement les techniques existantes plus sophistiquées. Notre approche consiste à concevoir un ensemble de réseaux profonds pour tirer parti de la variance des classificateurs et à introduire de nouvelles stratégies pour encourager les réseaux à coopérer, tout en encourageant la diversité des prédictions. En faisant correspondre différentes sorties de réseaux sur des images d'entrée similaires, nous améliorons la précision et la robustesse du modèle par rapport à la formation d'ensemble classique. De plus, un seul réseau obtenu par distillation montre des performances similaires à celles de l'ensemble complet et donne des résultats à la pointe de la technologie, sans surcharge de calcul au moment du test
The ability of deep-learning methods to excel in computer vision highly depends on the amount of annotated data available for training. For some tasks, annotation may be too costly and labor intensive, thus becoming the main obstacle to better accuracy. Algorithms that learn from data automatically, without human supervision, perform substantially worse than their fully-supervised counterparts. Thus, there is a strong motivation to work on effective methods for learning with limited annotations. This thesis proposes to exploit prior knowledge about the task and develops more effective solutions for scene understanding and few-shot image classification.Main challenges of scene understanding include object detection, semantic and instance segmentation. Similarly, all these tasks aim at recognizing and localizing objects, at region- or more precise pixel-level, which makes the annotation process difficult. The first contribution of this manuscript is a Convolutional Neural Network (CNN) that performs both object detection and semantic segmentation. We design a specialized network architecture, that is trained to solve both problems in one forward pass, and operates in real-time. Thanks to the multi-task training procedure, both tasks benefit from each other in terms of accuracy, with no extra labeled data.The second contribution introduces a new technique for data augmentation, i.e., artificially increasing the amount of training data. It aims at creating new scenes by copy-pasting objects from one image to another, within a given dataset. Placing an object in a right context was found to be crucial in order to improve scene understanding performance. We propose to model visual context explicitly using a CNN that discovers correlations between object categories and their typical neighborhood, and then proposes realistic locations for augmentation. Overall, pasting objects in ``right'' locations allows to improve object detection and segmentation performance, with higher gains in limited annotation scenarios.For some problems, the data is extremely scarce, and an algorithm has to learn new concepts from a handful of examples. Few-shot classification consists of learning a predictive model that is able to effectively adapt to a new class, given only a few annotated samples. While most current methods concentrate on the adaptation mechanism, few works have tackled the problem of scarce training data explicitly. In our third contribution, we show that by addressing the fundamental high-variance issue of few-shot learning classifiers, it is possible to significantly outperform more sophisticated existing techniques. Our approach consists of designing an ensemble of deep networks to leverage the variance of the classifiers, and introducing new strategies to encourage the networks to cooperate, while encouraging prediction diversity. By matching different networks outputs on similar input images, we improve model accuracy and robustness, comparing to classical ensemble training. Moreover, a single network obtained by distillation shows similar to the full ensemble performance and yields state-of-the-art results with no computational overhead at test time
Styles APA, Harvard, Vancouver, ISO, etc.
3

Moskvyak, Olga. « Learning from limited annotated data for re-identification problem ». Thesis, Queensland University of Technology, 2021. https://eprints.qut.edu.au/226866/1/Olga_Moskvyak_Thesis.pdf.

Texte intégral
Résumé :
The project develops machine learning methods for the re-identification task, which is matching images from the same category in a database. The thesis proposes approaches to reduce the influence of two critical challenges in image re-identification: pose variations that affect the appearance of objects and the need to annotate a large dataset to train a neural network. Depending on the domain, these challenges occur to a different extent. Our approach demonstrates superior performance on several benchmarks for people, cars, and animal categories.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Xian, Yongqin [Verfasser]. « Learning from limited labeled data - Zero-Shot and Few-Shot Learning / Yongqin Xian ». Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2020. http://d-nb.info/1219904457/34.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
5

Eriksson, Håkan. « Clustering Generic Log Files Under Limited Data Assumptions ». Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189642.

Texte intégral
Résumé :
Complex computer systems are often prone to anomalous or erroneous behavior, which can lead to costly downtime as the systems are diagnosed and repaired. One source of information for diagnosing the errors and anomalies are log files, which are often generated in vast and diverse amounts. However, the log files' size and semi-structured nature makes manual analysis of log files generally infeasible. Some automation is desirable to sift through the log files to find the source of the anomalies or errors. This project aimed to develop a generic algorithm that could cluster diverse log files in accordance to domain expertise. The results show that the developed algorithm performs well in accordance to manual clustering even under more relaxed data assumptions.
Komplexa datorsystem är ofta benägna att uppvisa anormalt eller felaktigt beteende, vilket kan leda till kostsamma driftstopp under tiden som systemen diagnosticeras och repareras. En informationskälla till feldiagnosticeringen är loggfiler, vilka ofta genereras i stora mängder och av olika typer. Givet loggfilernas storlek och semistrukturerade utseende så blir en manuell analys orimlig att genomföra. Viss automatisering är önsvkärd för att sovra bland loggfilerna så att källan till felen och anormaliteterna blir enklare att upptäcka. Det här projektet syftade till att utveckla en generell algoritm som kan klustra olikartade loggfiler i enlighet med domänexpertis. Resultaten visar att algoritmen presterar väl i enlighet med manuell klustring även med färre antaganden om datan.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Boman, Jimmy. « A deep learning approach to defect detection with limited data availability ». Thesis, Umeå universitet, Institutionen för fysik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-173207.

Texte intégral
Résumé :
In industrial processes, products are often visually inspected for defects inorder to verify their quality. Many automated visual inspection algorithms exist, and in many cases humans still perform the inspections. Advances in machine learning have showed that deep learning methods lie at the forefront of reliability and accuracy in such inspection tasks. In order to detect defects, most deep learning methods need large amounts of training data to learn from. This makes demonstrating such methods to a new customer problematic, since such data often does not exist beforehand, and has to be gathered specifically for the task. The aim of this thesis is to develop a method to perform such demonstrations. With access to only a small dataset, the method should be able to analyse an image and return a map of binary values, signifying which pixels in the original image belong to a defect and which do not. A method was developed that divides an image into overlapping patches, and analyses each patch individually for defects, using a deep learning method. Three different deep learning methods for classifying the patches were evaluated; a convolutional neural network, a transfer learning model based on the VGG19 network, and an autoencoder. The three methods were first compared in a simple binary classification task, without the patching method. They were then tested together with the patching method on two sets of images. The transfer learning model was able to identify every defect across both tests, having been trained using only four training images, proving that defect detection with deep learning can be done successfully even when there is not much training data available.
Styles APA, Harvard, Vancouver, ISO, etc.
7

Guo, Zhenyu. « Data famine in big data era : machine learning algorithms for visual object recognition with limited training data ». Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/46412.

Texte intégral
Résumé :
Big data is an increasingly attractive concept in many fields both in academia and in industry. The increasing amount of information actually builds an illusion that we are going to have enough data to solve all the data driven problems. Unfortunately it is not true, especially for areas where machine learning methods are heavily employed, since sufficient high-quality training data doesn't necessarily come with the big data, and it is not easy or sometimes impossible to collect sufficient training samples, which most computational algorithms depend on. This thesis mainly focuses on dealing situations with limited training data in visual object recognition, by developing novel machine learning algorithms to overcome the limited training data difficulty. We investigate three issues in object recognition involving limited training data: 1. one-shot object recognition, 2. cross-domain object recognition, and 3. object recognition for images with different picture styles. For Issue 1, we propose an unsupervised feature learning algorithm by constructing a deep structure of the stacked Hierarchical Dirichlet Process (HDP) auto-encoder, in order to extract "semantic" information from unlabeled source images. For Issue 2, we propose a Domain Adaptive Input-Output Kernel Learning algorithm to reduce the domain shifts in both input and output spaces. For Issue 3, we introduce a new problem involving images with different picture styles, successfully formulate the relationship between pixel mapping functions with gradient based image descriptors, and also propose a multiple kernel based algorithm to learn an optimal combination of basis pixel mapping functions to improve the recognition accuracy. For all the proposed algorithms, experimental results on publicly available data sets demonstrate the performance improvements over previous state-of-arts.
Styles APA, Harvard, Vancouver, ISO, etc.
8

Ayllon, Clemente Irene [Verfasser]. « Towards natural speech acquisition : incremental word learning with limited data / Irene Ayllon Clemente ». Bielefeld : Universitätsbibliothek Bielefeld, 2013. http://d-nb.info/1077063458/34.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
9

Chang, Fengming. « Learning accuracy from limited data using mega-fuzzification method to improve small data set learning accuracy for early flexible manufacturing system scheduling ». Saarbrücken VDM Verlag Dr. Müller, 2005. http://d-nb.info/989267156/04.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
10

Tania, Zannatun Nayem. « Machine Learning with Reconfigurable Privacy on Resource-Limited Edge Computing Devices ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292105.

Texte intégral
Résumé :
Distributed computing allows effective data storage, processing and retrieval but it poses security and privacy issues. Sensors are the cornerstone of the IoT-based pipelines, since they constantly capture data until it can be analyzed at the central cloud resources. However, these sensor nodes are often constrained by limited resources. Ideally, it is desired to make all the collected data features private but due to resource limitations, it may not always be possible. Making all the features private may cause overutilization of resources, which would in turn affect the performance of the whole system. In this thesis, we design and implement a system that is capable of finding the optimal set of data features to make private, given the device’s maximum resource constraints and the desired performance or accuracy of the system. Using the generalization techniques for data anonymization, we create user-defined injective privacy encoder functions to make each feature of the dataset private. Regardless of the resource availability, some data features are defined by the user as essential features to make private. All other data features that may pose privacy threat are termed as the non-essential features. We propose Dynamic Iterative Greedy Search (DIGS), a greedy search algorithm that takes the resource consumption for each non-essential feature as input and returns the most optimal set of non-essential features that can be private given the available resources. The most optimal set contains the features which consume the least resources. We evaluate our system on a Fitbit dataset containing 17 data features, 4 of which are essential private features for a given classification application. Our results show that we can provide 9 additional private features apart from the 4 essential features of the Fitbit dataset containing 1663 records. Furthermore, we can save 26:21% memory as compared to making all the features private. We also test our method on a larger dataset generated with Generative Adversarial Network (GAN). However, the chosen edge device, Raspberry Pi, is unable to cater to the scale of the large dataset due to insufficient resources. Our evaluations using 1=8th of the GAN dataset result in 3 extra private features with up to 62:74% memory savings as compared to all private data features. Maintaining privacy not only requires additional resources, but also has consequences on the performance of the designed applications. However, we discover that privacy encoding has a positive impact on the accuracy of the classification model for our chosen classification application.
Distribuerad databehandling möjliggör effektiv datalagring, bearbetning och hämtning men det medför säkerhets- och sekretessproblem. Sensorer är hörnstenen i de IoT-baserade rörledningarna, eftersom de ständigt samlar in data tills de kan analyseras på de centrala molnresurserna. Dessa sensornoder begränsas dock ofta av begränsade resurser. Helst är det önskvärt att göra alla insamlade datafunktioner privata, men på grund av resursbegränsningar kanske det inte alltid är möjligt. Att göra alla funktioner privata kan orsaka överutnyttjande av resurser, vilket i sin tur skulle påverka prestanda för hela systemet. I denna avhandling designar och implementerar vi ett system som kan hitta den optimala uppsättningen datafunktioner för att göra privata, med tanke på begränsningar av enhetsresurserna och systemets önskade prestanda eller noggrannhet. Med hjälp av generaliseringsteknikerna för data-anonymisering skapar vi användardefinierade injicerbara sekretess-kodningsfunktioner för att göra varje funktion i datasetet privat. Oavsett resurstillgänglighet definieras vissa datafunktioner av användaren som viktiga funktioner för att göra privat. Alla andra datafunktioner som kan utgöra ett integritetshot kallas de icke-väsentliga funktionerna. Vi föreslår Dynamic Iterative Greedy Search (DIGS), en girig sökalgoritm som tar resursförbrukningen för varje icke-väsentlig funktion som inmatning och ger den mest optimala uppsättningen icke-väsentliga funktioner som kan vara privata med tanke på tillgängliga resurser. Den mest optimala uppsättningen innehåller de funktioner som förbrukar minst resurser. Vi utvärderar vårt system på en Fitbit-dataset som innehåller 17 datafunktioner, varav 4 är viktiga privata funktioner för en viss klassificeringsapplikation. Våra resultat visar att vi kan erbjuda ytterligare 9 privata funktioner förutom de 4 viktiga funktionerna i Fitbit-datasetet som innehåller 1663 poster. Dessutom kan vi spara 26; 21% minne jämfört med att göra alla funktioner privata. Vi testar också vår metod på en större dataset som genereras med Generative Adversarial Network (GAN). Den valda kantenheten, Raspberry Pi, kan dock inte tillgodose storleken på den stora datasetet på grund av otillräckliga resurser. Våra utvärderingar med 1=8th av GAN-datasetet resulterar i 3 extra privata funktioner med upp till 62; 74% minnesbesparingar jämfört med alla privata datafunktioner. Att upprätthålla integritet kräver inte bara ytterligare resurser utan har också konsekvenser för de designade applikationernas prestanda. Vi upptäcker dock att integritetskodning har en positiv inverkan på noggrannheten i klassificeringsmodellen för vår valda klassificeringsapplikation.
Styles APA, Harvard, Vancouver, ISO, etc.
11

Rücklé, Andreas [Verfasser], Iryna [Akademischer Betreuer] Gurevych, Jonathan [Akademischer Betreuer] Berant et Goran [Akademischer Betreuer] Glavaš. « Representation Learning and Learning from Limited Labeled Data for Community Question Answering / Andreas Rücklé ; Iryna Gurevych, Jonathan Berant, Goran Glavaš ». Darmstadt : Universitäts- und Landesbibliothek, 2021. http://d-nb.info/1236344472/34.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
12

Omstedt, Fredrik. « A deep reinforcement learning approach to the problem of golf using an agent limited by human data ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277832.

Texte intégral
Résumé :
In the sport of golf, the use of statistics has become prominent as a way of understanding and improving golfers’ golf swings. Even though swing data is easily accessible thanks to a variety of technological tools, it is not always clear how to use this data, especially for amateur golfers. This thesis investigates the feasibility of using reinforcement learning in conjunction with a golfer’s data to play golf, something that could provide insight on how the golfer can improve. Specifically, a Dueling Double Deep Q Network agent and a Multi Pass Deep Q Network agent were trained and evaluated on playing golf from pixel data on two simulated golf courses, using only shot data provided by a real golfer to hit shots. These two reinforcement learning agents were then compared with the golfer on how well they played, in regards to the number of shots hit and distances remaining to the golf holes when the holes were finished. The majority of the results showed no significant difference between either agent and the golfer on both golf courses tested, indicating that the agents could play on a similar level to the golfer. The complexity of the problem caused the agents to have a good knowledge of states that occurred often but poor knowledge otherwise, which is one likely reason why the agents played similarly to but not better than the golfer. Other reasons include lack of training time, as well as potentially non-representative data retrieved from the golfer. It is concluded that there is potential in using reinforcement learning for the problem of golf and possibly for similar problems as well. Moreover, further research could improve the agents such that more valuable insights regarding golfers’ data can be found.
Inom sporten golf har användandet av statistik blivit ett vanligt sätt att förstå och förbättra golfares golfsvingar. Trots det faktum att svingdata är tillgängligt, tack vare en mängd tekniska verktyg, är det inte självklart hur datan kan användas, speciellt för amatörgolfare. Detta examensarbete undersöker huruvida användandet av förstärkande inlärning tillsammans med en golfares data är en möjlighet för att spela golf, något som skulle kunna bidra med insikter om hur golfaren kan utvecklas. Mer specifikt har en Dueling Double Deep Q Network-agent och en Multi Pass Deep Q Network-agent tränats och evaluerats på att spela golf från pixeldata från två simulerade golfbanor genom att endast an- vända slagdata från en riktig golfare. Dessa två agenter har sedan jämförts med golfaren på hur väl de spelade i förhållande till mängden slag och avstånden till golfhålen när hålen avslutats. Majoriteten av resultaten visade ingen signifikant skillnad mellan någon av agenterna och golfaren på båda golfbanorna, vilket indikerar att agenterna spelade på en liknande nivå som golfaren. Komplexiteten hos problemet gjorde att agenterna hade bra kunskap om tillstånd som förekom ofta men dålig kunskap annars. Detta är en trolig orsak till att agenterna kunde spela på en liknande nivå som men inte bättre än golfaren. Andra anledningar kan vara för lite träningstid och potentiellt ickerepresentativ data från golfaren. Sammanfattningsvis kan slutsatsen dras att användandet av förstärkande inlärning för problemet golf, och möjligtvis även liknande problem, har potential. Dessutom skulle agenterna kunna förbättras givet fler eller mer djupgående undersökningar, så att mer värdefulla insikter om golfares data kan upptäckas.
Styles APA, Harvard, Vancouver, ISO, etc.
13

Sherwin, Jason. « A computational approach to achieve situational awareness from limited observations of a complex system ». Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33955.

Texte intégral
Résumé :
At the start of the 21st century, the topic of complexity remains a formidable challenge in engineering, science and other aspects of our world. It seems that when disaster strikes it is because some complex and unforeseen interaction causes the unfortunate outcome. Why did the financial system of the world meltdown in 2008-2009? Why are global temperatures on the rise? These questions and other ones like them are difficult to answer because they pertain to contexts that require lengthy descriptions. In other words, these contexts are complex. But we as human beings are able to observe and recognize this thing we call 'complexity'. Furthermore, we recognize that there are certain elements of a context that form a system of complex interactions - i.e., a complex system. Many researchers have even noted similarities between seemingly disparate complex systems. Do sub-atomic systems bear resemblance to weather patterns? Or do human-based economic systems bear resemblance to macroscopic flows? Where do we draw the line in their resemblance? These are the kinds of questions that are asked in complex systems research. And the ability to recognize complexity is not only limited to analytic research. Rather, there are many known examples of humans who, not only observe and recognize but also, operate complex systems. How do they do it? Is there something superhuman about these people or is there something common to human anatomy that makes it possible to fly a plane? - Or to drive a bus? Or to operate a nuclear power plant? Or to play Chopin's etudes on the piano? In each of these examples, a human being operates a complex system of machinery, whether it is a plane, a bus, a nuclear power plant or a piano. What is the common thread running through these abilities? The study of situational awareness (SA) examines how people do these types of remarkable feats. It is not a bottom-up science though because it relies on finding general principles running through a host of varied human activities. Nevertheless, since it is not constrained by computational details, the study of situational awareness provides a unique opportunity to approach complex tasks of operation from an analytical perspective. In other words, with SA, we get to see how humans observe, recognize and react to complex systems on which they exert some control. Reconciling this perspective on complexity with complex systems research, it might be possible to further our understanding of complex phenomena if we can probe the anatomical mechanisms by which we, as humans, do it naturally. At this unique intersection of two disciplines, a hybrid approach is needed. So in this work, we propose just such an approach. In particular, this research proposes a computational approach to the situational awareness (SA) of complex systems. Here we propose to implement certain aspects of situational awareness via a biologically-inspired machine-learning technique called Hierarchical Temporal Memory (HTM). In doing so, we will use either simulated or actual data to create and to test computational implementations of situational awareness. This will be tested in two example contexts, one being more complex than the other. The ultimate goal of this research is to demonstrate a possible approach to analyzing and understanding complex systems. By using HTM and carefully developing techniques to analyze the SA formed from data, it is believed that this goal can be obtained.
Styles APA, Harvard, Vancouver, ISO, etc.
14

Harár, Pavol. « Klasifikace audia hlubokým učením s limitovanými zdroji dat ». Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-408054.

Texte intégral
Résumé :
Standardní postupy diagnózy dysfonie klinickým logopedem mají své nevýhody, především tu, že je tento proces velmi subjektivní. Nicméně v poslední době získala popularitu automatická objektivní analýza stavu mluvčího. Vědci úspěšně založili své metody na různých algoritmech strojového učení a ručně vytvořených příznacích. Tyto metody nejsou bohužel přímo škálovatelné na jiné poruchy hlasu, samotný proces tvorby příznaků je pracný a také náročný z hlediska financí a talentu. Na základě předchozích úspěchů může přístup založený na hlubokém učení pomoci překlenout některé problémy se škálovatelností a generalizací, nicméně překážkou je omezené množství trénovacích dat. Jedná se o společný jmenovatel téměř ve všech systémech pro automatizovanou analýzu medicínských dat. Hlavním cílem této práce je výzkum nových přístupů prediktivního modelování založeného na hlubokém učení využívající omezené sady zvukových dat, se zaměřením zejména na hodnocení patologických hlasů. Tato práce je první, která experimentuje s hlubokým učením v této oblasti, a to na dosud největší kombinované databázi dysfonických hlasů, která byla v rámci této práce vytvořena. Předkládá důkladný průzkum veřejně dostupných zdrojů dat a identifikuje jejich limitace. Popisuje návrh nových časově-frekvenčních reprezentací založených na Gaborově transformaci a představuje novou třídu chybových funkcí, které přinášejí reprezentace výstupů prospěšné pro učení. V numerických experimentech demonstruje zlepšení výkonu konvolučních neuronových sítí trénovaných na omezených zvukových datových sadách pomocí tzv. "augmented target loss function" a navržených časově-frekvenčních reprezentací "Gabor" a "Mel scattering".
Styles APA, Harvard, Vancouver, ISO, etc.
15

Giaretta, Lodovico. « Pushing the Limits of Gossip-Based Decentralised Machine Learning ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-253794.

Texte intégral
Résumé :
Recent years have seen a sharp increase in the ubiquity and power of connected devices, such as smartphones, smart appliances and smart sensors. These de- vices produce large amounts of data that can be extremely precious for training larger, more advanced machine learning models. Unfortunately, it is some- times not possible to collect and process these datasets on a central system, due either to their size or to the growing privacy requirements of digital data handling.To overcome this limit, researchers developed protocols to train global models in a decentralised fashion, exploiting the computational power of these edge devices. These protocols do not require any of the data on the device to be shared, relying instead on communicating partially-trained models.Unfortunately, real-world systems are notoriously hard to control, and may present a wide range of challenges that are easily overlooked in academic stud- ies and simulations. This research analyses the gossip learning protocol, one of the main results in the area of decentralised machine learning, to assess its applicability to real-world scenarios.Specifically, this work identifies the main assumptions built into the pro- tocol, and performs carefully-crafted simulations in order to test its behaviour when these assumptions are lifted. The results show that the protocol can al- ready be applied to certain environments, but that it fails when exposed to certain conditions that appear in some real-world scenarios. In particular, the models trained by the protocol may be biased towards the data stored in nodes with faster communication speeds or a higher number of neighbours. Further- more, certain communication topologies can have a strong negative impact on the convergence speed of the models.While this study also suggests effective mitigations for some of these is- sues, it appears that the gossip learning protocol requires further research ef- forts, in order to ensure a wider industrial applicability.
Under de senaste åren har vi sett en kraftig ökning av närvaron och kraften hos anslutna enheter, såsom smartphones, smarta hushållsmaskiner, och smarta sensorer. Dessa enheter producerar stora mängder data som kan vara extremt värdefulla för att träna stora och avancerade maskininlärningsmodeller. Dessvärre är det ibland inte möjligt att samla in och bearbeta dessa dataset på ett centralt system, detta på grund av deras storlek eller de växande sekretesskraven för digital datahantering.För att lösa problemet har forskare utvecklar protokoller för att träna globala modeller på ett decentraliserat sätt och utnyttja beräkningsförmågan hos dessa enheter. Dessa protokoll kräver inte datan på enheter delas utan förlitar sig istället på att kommunicera delvis tränade modeller.Dessvärre så är verkliga system svåra att kontrollera och kan presentera ett brett spektrum av utmaningar som lätt överskådas i akademiska studier och simuleringar. Denna forskning analyserar gossip inlärning protokollet vilket är av de viktigaste resultaten inom decentraliserad maskininlärning, för att bedöma dess tillämplighet på verkliga scenarier.Detta arbete identifierar de huvudsakliga antagandena om protokollet och utför noggrant utformade simuleringar för att testa protokollets beteende när dessa antaganden tas bort. Resultaten visar att protokollet redan kan tillämpas i vissa miljöer, men att det misslyckas när det utsätts för vissa förhållanden som i verklighetsbaserade scenarier. Mer specifikt så kan modellerna som utbildas av protokollet vara partiska och fördelaktiga mot data lagrade i noder med snabbare kommunikationshastigheter eller ett högre antal grannar. Vidare kan vissa kommunikationstopologier få en stark negativ inverkan på modellernas konvergenshastighet.Även om denna studie kom fram till en förmildrande effekt för vissa av dessa problem så verkar det som om gossip inlärning protokollet kräver ytterligare forskningsinsatser för att säkerställa en bredare industriell tillämplighet.
Styles APA, Harvard, Vancouver, ISO, etc.
16

Pettersson, Christoffer. « Investigating the Correlation Between Marketing Emails and Receivers Using Unsupervised Machine Learning on Limited Data : A comprehensive study using state of the art methods for text clustering and natural language processing ». Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189147.

Texte intégral
Résumé :
The goal of this project is to investigate any correlation between marketing emails and their receivers using machine learning and only a limited amount of initial data. The data consists of roughly 1200 emails and 98.000 receivers of these. Initially, the emails are grouped together based on their content using text clustering. They contain no information regarding prior labeling or categorization which creates a need for an unsupervised learning approach using solely the raw text based content as data. The project investigates state-of-the-art concepts like bag-of-words for calculating term importance and the gap statistic for determining an optimal number of clusters. The data is vectorized using term frequency - inverse document frequency to determine the importance of terms relative to the document and to all documents combined. An inherit problem of this approach is high dimensionality which is reduced using latent semantic analysis in conjunction with singular value decomposition. Once the resulting clusters have been obtained, the most frequently occurring terms for each cluster are analyzed and compared. Due to the absence of initial labeling an alternative approach is required to evaluate the clusters validity. To do this, the receivers of all emails in each cluster who actively opened an email is collected and investigated. Each receiver have different attributes regarding their purpose of using the service and some personal information. Once gathered and analyzed, conclusions could be drawn that it is possible to find distinguishable connections between the resulting email clusters and their receivers but to a limited extent. The receivers from the same cluster did show similar attributes as each other which were distinguishable from the receivers of other clusters. Hence, the resulting email clusters and their receivers are specific enough to distinguish themselves from each other but too general to handle more detailed information. With more data, this could become a useful tool for determining which users of a service should receive a particular email to increase the conversion rate and thereby reach out to more relevant people based on previous trends.
Målet med detta projekt att undersöka eventuella samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på en brgränsad mängd data. Datan består av ca 1200 email meddelanden med 98.000 mottagare. Initialt så gruperas alla meddelanden baserat på innehåll via text klustering. Meddelandena innehåller ingen information angående tidigare gruppering eller kategorisering vilket skapar ett behov för ett oövervakat tillvägagångssätt för inlärning där enbart det råa textbaserade meddelandet används som indata. Projektet undersöker moderna tekniker så som bag-of-words för att avgöra termers relevans och the gap statistic för att finna ett optimalt antal kluster. Datan vektoriseras med hjälp av term frequency - inverse document frequency för att avgöra relevansen av termer relativt dokumentet samt alla dokument kombinerat. Ett fundamentalt problem som uppstår via detta tillvägagångssätt är hög dimensionalitet, vilket reduceras med latent semantic analysis tillsammans med singular value decomposition. Då alla kluster har erhållits så analyseras de mest förekommande termerna i vardera kluster och jämförs. Eftersom en initial kategorisering av meddelandena saknas så krävs ett alternativt tillvägagångssätt för evaluering av klustrens validitet. För att göra detta så hämtas och analyseras alla mottagare för vardera kluster som öppnat något av dess meddelanden. Mottagarna har olika attribut angående deras syfte med att använda produkten samt personlig information. När de har hämtats och undersökts kan slutsatser dras kring hurvida samband kan hittas. Det finns ett klart samband mellan vardera kluster och dess mottagare, men till viss utsträckning. Mottagarna från samma kluster visade likartade attribut som var urskiljbara gentemot mottagare från andra kluster. Därav kan det sägas att de resulterande klustren samt dess mottagare är specifika nog att urskilja sig från varandra men för generella för att kunna handera mer detaljerad information. Med mer data kan detta bli ett användbart verktyg för att bestämma mottagare av specifika emailutskick för att på sikt kunna öka öppningsfrekvensen och därmed nå ut till mer relevanta mottagare baserat på tidigare resultat.
Styles APA, Harvard, Vancouver, ISO, etc.
17

Young, William Albert II. « LEARNING RATES WITH CONFIDENCE LIMITS FOR JET ENGINE MANUFACTURING PROCESSES AND PART FAMILIES FROM NOISY DATA ». Ohio University / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1131637106.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
18

Trávníčková, Kateřina. « Interaktivní segmentace 3D CT dat s využitím hlubokého učení ». Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-432864.

Texte intégral
Résumé :
This thesis deals with CT data segmentation using convolutional neural nets and describes the problem of training with limited training sets. User interaction is suggested as means of improving segmentation quality for the models trained on small training sets and the possibility of using transfer learning is also considered. All of the chosen methods help improve the segmentation quality in comparison with the baseline method, which is the use of automatic data specific segmentation model. The segmentation has improved by tens of percents in Dice score when trained with very small datasets. These methods can be used, for example, to simplify the creation of a new segmentation dataset.
Styles APA, Harvard, Vancouver, ISO, etc.
19

Yang, Xuan, et 楊譞. « Budget-limited data disambiguation ». Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/196458.

Texte intégral
Résumé :
The problem of data ambiguity exists in a wide range of applications. In this thesis, we study “cost-aware" methods to alleviate the data ambiguity problems in uncertain databases and social-tagging data. In database applications, ambiguous (or uncertain) data may originate from data integration and measurement error of devices. These ambiguous data are maintained by uncertain databases. In many situations, it is possible to “clean", or remove, ambiguities from these databases. For example, the GPS location of a user is inexact due to measurement error, but context information (e.g., what a user is doing) can be used to reduce the imprecision of the location value. In practice, a cleaning activity often involves a cost, may fail and may not remove all ambiguities. Moreover, the statistical information about how likely database entities can be cleaned may not be precisely known. We model the above aspects with the uncertain database cleaning problem, which requires us to make sensible decisions in selecting entities to clean in order to maximize the amount of ambiguous information removed under a limited budget. To solve this problem, we propose the Explore-Exploit (or EE) algorithm, which gathers valuable information during the cleaning process to determine how the remaining cleaning budget should be invested. We also study how to fine-tune the parameters of EE in order to achieve optimal cleaning effectiveness. Social tagging data capture web users' textual annotations, called tags, for resources (e.g., webpages and photos). Since tags are given by casual users, they often contain noise (e.g., misspelled words) and may not be able to cover all the aspects of each resource. In this thesis, we design a metric to systematically measure the tagging quality of each resource based on the tags it has received. We propose an incentive-based tagging framework in order to improve the tagging quality. The main idea is to award users some incentive for giving (relevant) tags to resources. The challenge is, how should we allocate incentives to a large set of resources, so as to maximize the improvement of their tagging quality under a limited budget? To solve this problem, we propose a few efficient incentive allocation strategies. Experiments shows that our best strategy provides resources with a close-to-optimal gain in tagging quality. To summarize, we study the problem of budget-limited data disambiguation for uncertain databases and social tagging data | given a set of objects (entities from uncertain databases or web resources), how can we make sensible decisions about which object to \disambiguate" (to perform a cleaning activity on the entity or ask a user to tag the resource), in order to maximize the amount of ambiguous information reduced under a limited budget.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy
Styles APA, Harvard, Vancouver, ISO, etc.
20

Wang, Chen. « Global investigation of marine atmospheric boundary layer rolls using Sentinel-1 SAR data ». Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2020. http://www.theses.fr/2020IMTA0203.

Texte intégral
Résumé :
Cette thèse exploite les données globales du radar à ouverture synthétique (SAR) Sentinel-1 (S-1) pour l'étude des rouleaux de la couche limite atmosphérique marine (MABL). Un modèle basé sur l'apprentissage en profondeur a été développé pour identifier automatiquement les rouleaux à partir des images massives S-1 WV. L'évaluation prouve que des rouleaux plus nombreux et plus clairs sont visibles à un angle d'incidence plus grand avec une limitation à des vitesses de vent très basses et lorsque la direction du vent est perpendiculaire à l'antenne SAR. Au delà de cela, les énormes données conduisent à un nouveau résultat qui, en moyenne et à toutes les vitesses de vent, les rouleaux MABL induisent des variations de vent de surface de ~ 8% (± 3,5%) le débit moyen, dépassant rarement 20%. Les statistiques mondiales ont confirmé dans des études antérieures que jusqu'à 90% des rouleaux identifiés se produisent dans des conditions quasi neutres à légèrement instables. La longueur d'onde et l'orientation du roulis sont extraites avec des résultats d'organisation multi-échelles et de contraste directionnel entre les latitudes basses et moyennes. La distribution systématique de l'orientation des rouleaux par rapport au vent de surface des tropiques aux extra-tropiques rappelle l'importance de la force de Coriolis horizontale sur les rouleaux. Malgré l'importance de ces faits saillants pour les études sur l'atmosphère et l'océan, il est très probable que les données SAR S-1 WV presque globales soient étendues aux rouleaux, aux cellules convectives et à d'autres processus air-mer clés. Les résultats devraient être comparés, expliqués et complétés dans un proche avenir par des études théoriques et numériques approfondies
This thesis exploits the global Sentinel-1 (S-1) wave mode (WV) synthetic aperture radar (SAR) data for marine atmospheric boundary layer (MABL) roll studies. A deep learning- based model was developed to automatically identify rolls from the massive S-1 WV images. Valuation evidences that more and clearer rolls are visible at the larger incidence angle with limitation in very low wind speeds and when wind direction being perpendicular to the SAR antenna looking. Beyond this, the huge data leads to a new result that, on average and across all wind speeds, MABL rolls induce surface wind variations of ~8% (±3.5%) the mean flow, seldom exceeding 20%. Global statistics confirmed with previous studies that up to 90% of the identified rolls occur in near neutral to slightly unstable conditions. Roll wavelength and orientation are extracted with findings of multi-scale organization and directional contrast between low- and mid-latitudes. The systematical distribution of roll orientation with respect to the surface wind from tropics to extratropics recalls the importance of horizontal Coriolis force on rolls. Despite the significance of these highlights for both atmosphere and ocean studies, it is highly expected to extend the nearly global S- 1 WV SAR data for rolls, convective cells and other key air-sea processes. Results should be compared, explained, and complemented in the near future with in-depth theoretical and numerical studies
Styles APA, Harvard, Vancouver, ISO, etc.
21

Senecal, Joshua G. « Length-limited data transformation and compression / ». For electronic version search Digital dissertations database. Restricted to UC campuses. Access is free to UC campus dissertations, 2005. http://uclibs.org/PID/11984.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
22

Pople, Andrew James. « Value-based maintenance using limited data ». Thesis, University of Newcastle Upon Tyne, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.391958.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
23

Hsu, Bo-June (Bo-June Paul). « Language Modeling for limited-data domains ». Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/52796.

Texte intégral
Résumé :
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 99-109).
With the increasing focus of speech recognition and natural language processing applications on domains with limited amount of in-domain training data, enhanced system performance often relies on approaches involving model adaptation and combination. In such domains, language models are often constructed by interpolating component models trained from partially matched corpora. Instead of simple linear interpolation, we introduce a generalized linear interpolation technique that computes context-dependent mixture weights from features that correlate with the component confidence and relevance for each n-gram context. Since the n-grams from partially matched corpora may not be of equal relevance to the target domain, we propose an n-gram weighting scheme to adjust the component n-gram probabilities based on features derived from readily available corpus segmentation and metadata to de-emphasize out-of-domain n-grams. In scenarios without any matched data for a development set, we examine unsupervised and active learning techniques for tuning the interpolation and weighting parameters. Results on a lecture transcription task using the proposed generalized linear interpolation and n-gram weighting techniques yield up to a 1.4% absolute word error rate reduction over a linearly interpolated baseline language model. As more sophisticated models are only as useful as they are practical, we developed the MIT Language Modeling (MITLM) toolkit, designed for efficient iterative parameter optimization, and released it to the research community.
(cont.) With a compact vector-based n-gram data structure and optimized algorithm implementations, the toolkit not only improves the running time of common tasks by up to 40x, but also enables the efficient parameter tuning for language modeling techniques that were previously deemed impractical.
by Bo-June (Paul) Hsu.
Ph.D.
Styles APA, Harvard, Vancouver, ISO, etc.
24

Chang, Eric I.-Chao. « Improving wordspotting performance with limited training data ». Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/38056.

Texte intégral
Résumé :
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.
Includes bibliographical references (leaves 149-155).
by Eric I-Chao Chang.
Ph.D.
Styles APA, Harvard, Vancouver, ISO, etc.
25

Zama, Ramirez Pierluigi <1992&gt. « Deep Scene Understanding with Limited Training Data ». Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amsdottorato.unibo.it/9815/1/zamaramirez_pierluigi_tesi.pdf.

Texte intégral
Résumé :
Scene understanding by a machine is a challenging task due to the profound variety of nature. Nevertheless, deep learning achieves impressive results in several scene understanding tasks such as semantic segmentation, depth estimation, or optical flow. However, these kinds of approaches need a large amount of labeled data, leading to massive manual annotations, which are incredibly tedious and expensive to collect. In this thesis, we will focus on understanding a scene through deep learning with limited data availability. First of all, we will tackle the problem of the lack of data for semantic segmentation. We will show that computer graphics come in handy to our purpose, both to create a new, efficient tool for annotation as well to render synthetic annotated datasets quickly. However, a network trained only on synthetic data suffers from the so-called domain-shift problem, i.e. unable to generalize to real data. Thus, we will show that we can mitigate this problem using a novel deep image to image translation technique. In the second part of the thesis, we will focus on the relationship between scene understanding tasks. We argue that building a model aware of the connections between tasks is the first building stone to create more robust, efficient, performant models that need less annotated training data. In particular, we demonstrate that we can decrease the need for labels by exploiting the relationship between visual tasks. Finally, in the last part, we propose a novel unified framework for comprehensive scene understanding, which exploits the synergies between tasks to be more robust, efficient, and performant.
Styles APA, Harvard, Vancouver, ISO, etc.
26

Caprioli, Francesco. « Optimal fiscal policy, limited commitment and learning ». Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7396.

Texte intégral
Résumé :
Esta tesis trata sobre cómo la autoridad fiscal debe fijar los impuestos distorsivos de manera óptima. El capítulo 1 analiza el problema de la política fiscal cuando el gobierno tiene un incentivo a hacer default con su deuda externa. El capítulo 2 trata sobre el problema de la política fiscal cuando los agentes no conocen cómo el gobierno fija las tasas impositivas. La principal conclusión que obtengo es que, en ambos contextos, el resultado de suavidad de las tasas, que es estándar en la literatura de imposición óptima, se rompe. Cuando los gobiernos no tienen una tecnología de compromiso, los impuestos responden a los incentivos de default; cuando los agentes poseen información parcial sobre el modelo subyacente de la economía, los impuestos dependen de sus expectativas sobre los mismos.
This thesis is about how fiscal authority should optimally set dissorting taxes. Chapter 1 deals with the optimal fiscal policy problem when the government has an incentive to default on external debt. Chapter 2 deals with the optimal fiscal policy problem when households do not know how government sets taxes. The main conclusion I get is that, in each of these two contexts, the tax smoothing result, which is the standars result in the optimal taxation literature, is broken. When governments do not have a commitment technology taxes respond to the incentives to default; when agents have partial information about the underlying economic model, taxes depend on their beliefs about it.
Styles APA, Harvard, Vancouver, ISO, etc.
27

O'Farrell, Michael Robert. « Estimating persistence of fished populations with limited data / ». For electronic version search Digital dissertations database. Restricted to UC campuses. Access is free to UC campus dissertations, 2005. http://uclibs.org/PID/11984.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
28

Dowd, Michael. « Assimilation of data into limited-area coastal models ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/nq24773.pdf.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
29

McLaughlin, N. R. « Robust multimodal person identification given limited training data ». Thesis, Queen's University Belfast, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.579747.

Texte intégral
Résumé :
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal feature fusion (MOFF), for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowl- edge about the corruption. Furthermore, it is assumed there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature rep- resentation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Similarity-based optimal feature selection and multi- condition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Low-level feature fusion is performed using optimal feature selection, which automatically changes the weighting given to each modality based on the level of corruption. The framework for robust person identification is also applied to noise robust speaker identification, given very limited training data. Experiments have been carried out on a bimodal data set created from the SPIDRE speaker recogni- tion database and AR face recognition database, with variable noise corruption of speech and occlusion in the face images. Combining both modalities using MOFF, leads to significantly improved identification accuracy compared to the component unimodal systems, even with simultaneous corruption of both modal- ities. A novel piecewise-constant illumination model (PCIlVI) is then introduced for illumination invariant facial recognition. This method can be used given a single training facial image for each person, and assuming no prior knowledge of the illumination conditions of both the training and testing images. Small areas of the face are represented using magnitude Fourier features, which takes advan- tage of the shift-invariance of the magnitude Fourier representation, to increase robustness to small misalignment errors and small facial expression changes. Fi- nally, cosine similarity is used as an illumination invariant similarity measure, to compare small facial areas. Experiments have been carried out on the YaleB, ex- tended YaleB and eMU-PIE facial illumination databases. Facial identification accuracy using PCIlVI is comparable to or exceeds that of the literature.
Styles APA, Harvard, Vancouver, ISO, etc.
30

Fattah, Polla. « Behaviour classification for temporal data with limited readings ». Thesis, University of Nottingham, 2017. http://eprints.nottingham.ac.uk/44677/.

Texte intégral
Résumé :
Classifying items using temporal data, i.e. several readings of the same attribute in different time points, has many applications in the real world. The pivotal question which motivates this study is: ''Is it possible to quantify behavioural change in temporal data? And what is the best reference point to compare the behaviour change with?". The focus of this study will be in applications in economics such as playing many rounds of public goods games and share price moves in the stock market. There are many methods for classifying temporal data and many methods for measuring the change of items' behaviour in temporal data. However, the available methods for classifying temporal data produce complicated rules, and their models are buried in deep decision trees or complex neural networks that are hard for human experts to read and understand. Moreover, methods of measuring cluster changes do not focus on the individual item's behaviour rather; they concentrate on the clusters and their changes over time. This research presents methods for classifying temporal data items and measuring their behavioural changes between time points. As case of studies, public goods game and stock market price data are used to test novel methods of classification and behaviour change measure. To represent the magnitude of the behaviour change, we use cluster validity measures in a novel way by measuring the difference between item labels produced by the same clustering algorithm at each time point and a behaviour reference point. Such a reference point might be the first time point, the previous time point or a point representing the general overall behaviour of the items in the temporal data. This method uses external cluster validity indices to measure the difference between labels provided by the same clustering method in different time points rather than using different clustering methods for the same data set as it is the case for relative clustering indices. To create a general behavioural reference point in temporal data, we present a novel temporal rule-based classification method that consists of two stages. In the first stage, initial rules are generated based on experts' definition for the classes in the form of aggregated attributes of the temporal readings. These initial rules are not crisp and may overlap in their representation for the classes. This provides flexibility for the rules so that they can create a pool of classifiers that can be selected from. Then this pool of classifiers will be optimised in the second stage so that an optimised classifier will be selected among them. The optimised classifier is a set of discrete classification rules, which generates the most compact classes over all time points. Class compactness is measured by using statistical dispersion measures or Euclidean distance within class items. The classification results of the public goods game show that the proposed method for classification can produce better results for representing players than the available methods by economists and general temporal classification methods. Moreover, measuring players' behaviour supports economists' view of the players' behaviour change during game rounds. For the stock market data, we present a viable method for classifying stocks according to their stability which might help to provide insights for stock market predictability.
Styles APA, Harvard, Vancouver, ISO, etc.
31

Ding, Silin. « Freeway Travel Time Estimation Using Limited Loop Data ». University of Akron / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=akron1205288596.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
32

Li, Jiawei. « Person re-identification with limited labeled training data ». HKBU Institutional Repository, 2018. https://repository.hkbu.edu.hk/etd_oa/541.

Texte intégral
Résumé :
With the growing installation of surveillance video cameras in both private and public areas, it is an immediate requirement to develop intelligent video analysis system for the large-scale camera network. As a prerequisite step of person tracking and person retrieval in intelligent video analysis, person re-identification, which targets in matching person images across camera views is an important topic in computer vision community and has been received increasing attention in the recent years. In the supervised learning methods, the person re-identification task is formulated as a classification problem to extract matched person images/videos (positives) from unmatched person images/videos (negatives). Although the state-of-the-art supervised classification models could achieve encouraging re-identification performance, the assumption that label information is available for all the cameras, is impractical in large-scale camera network. That is because collecting the label information of every training subject from every camera in the large-scale network can be extremely time-consuming and expensive. While the unsupervised learning methods are flexible, their performance is typically weaker than the supervised ones. Though sufficient labels of the training subjects are not available from all the camera views, it is still reasonable to collect sufficient labels from a pair of camera views in the camera network or a few labeled data from each camera pair. Along this direction, we address two scenarios of person re-identification in large-scale camera network in this thesis, i.e. unsupervised domain adaptation and semi-supervised learning and proposed three methods to learn discriminative model using all available label information and domain knowledge in person re-identification. In the unsupervised domain adaptation scenario, we consider data with sufficient labels as the source domain, while data from the camera pair missing label information as the target domain. A novel domain adaptive approach is proposed to estimate the target label information and incorporate the labeled data from source domain with the estimated target label information for discriminative learning. Since the discriminative constraint of Support Vector Machines (SVM) can be relaxed into a necessary condition, which only relies on the mean of positive pairs (positive mean), a suboptimal classification model learning without target positive data can be those using target positive mean. A reliable positive mean estimation is given by using both the labeled data from the source domain and potential positive data selected from the unlabeled data in the target domain. An Adaptive Ranking Support Vector Machines (AdaRSVM) method is also proposed to improve the discriminability of the suboptimal mean based SVM model using source labeled data. Experimental results demonstrate the effectiveness of the proposed method. Different from the AdaRSVM method that using source labeled data, we can also improve the above mean based method by adapting it onto target unlabeled data. In more general situation, we improve a pre-learned classifier by adapting it onto target unlabeled data, where the pre-learned classifier can be domain adaptive or learned from only source labeled data. Since it is difficult to estimate positives from the imbalanced target unlabeled data, we propose to alternatively estimate positive neighbors which refer to data close to any true target positive. An optimization problem for positive neighbor estimation from unlabeled data is derived and solved by aligning the cross-person score distributions together with optimizing for multiple graphs based label propagation. To utilize the positive neighbors to learn discriminative classification model, a reliable multiple region metric learning method is proposed to learn a target adaptive metric using regularized affine hulls of positive neighbors as positive regions. Experimental results demonstrate the effectiveness of the proposed method. In the semi-supervised learning scenario, we propose a discriminative feature learning using all available information from the surveillance videos. To enrich the labeled data from target camera pair, image sequences (videos) of the tagged persons are collected from the surveillance videos by human tracking. To extract the discriminative and adaptable video feature representation, we propose to model the intra-view variations by a video variation dictionary and a video level adaptable feature by multiple sources domain adaptation and an adaptability-discriminability fusion. First, a novel video variation dictionary learning is proposed to model the large intra-view variations and solved as a constrained sparse dictionary learning problem. Second, a frame level adaptable feature is generated by multiple sources domain adaptation using the variation modeling. By mining the discriminative information of the frames from the reconstruction error of the variation dictionary, an adaptability-discriminability (AD) fusion is proposed to generate the video level adaptable feature. Experimental results demonstrate the effectiveness of the proposed method.
Styles APA, Harvard, Vancouver, ISO, etc.
33

Anderson, Christopher. « BANDWIDTH LIMITED 320 MBPS TRANSMITTER ». International Foundation for Telemetering, 1996. http://hdl.handle.net/10150/607635.

Texte intégral
Résumé :
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California
With every new spacecraft that is designed comes a greater density of information that will be stored once it is in operation. This, coupled with the desire to reduce the number of ground stations needed to download this information from the spacecraft, places new requirements on telemetry transmitters. These new transmitters must be capable of data rates of 320 Mbps and beyond. Although the necessary bandwidth is available for some non-bandwidth-limited transmissions in Ka-Band and above, many systems will continue to rely on more narrow allocations down to X-Band. These systems will require filtering of the modulation to meet spectral limits. The usual requirements of this filtering also include that it not introduce high levels of inter-symbol interference (ISI) to the transmission. These constraints have been addressed at CE by implementing a DSP technique that pre-filters a QPSK symbol set to achieve bandwidth-limited 320 Mbps operation. This implementation operates within the speed range of the radiation-hardened digital technologies that are currently available and consumes less power than the traditional high-speed FIR techniques.
Styles APA, Harvard, Vancouver, ISO, etc.
34

Watkins, Andrew B. « AIRS : a resource limited artificial immune classifier ». Master's thesis, Mississippi State : Mississippi State University, 2001. http://library.msstate.edu/etd/show.asp?etd=etd-11052001-102048.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
35

Zhang, Yi. « Learning with Limited Supervision by Input and Output Coding ». Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/156.

Texte intégral
Résumé :
In many real-world applications of supervised learning, only a limited number of labeled examples are available because the cost of obtaining high-quality examples is high. Even with a relatively large number of labeled examples, the learning problem may still suffer from limited supervision as the complexity of the prediction function increases. Therefore, learning with limited supervision presents a major challenge to machine learning. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of extra input and output information in learning. Information about the input space can be encoded by regularization. We first design a semi-supervised learning method for text classification that encodes the correlation of words inferred from seemingly irrelevant unlabeled text. We then propose a multi-task learning framework with a matrix-normal penalty, which compactly encodes the covariance structure of the joint input space of multiple tasks. To capture structure information that is more general than covariance and correlation, we study a class of regularization penalties on model compressibility. Then we design the projection penalty, which encodes the structure information from a dimension reduction while controlling the risk of information loss. Information about the output space can be exploited by error correcting output codes. Using the composite likelihood view, we propose an improved pairwise coding for multi-label classification, which encodes pairwise label density (as opposed to label comparisons) and decodes using variational methods. We then investigate problemdependent codes, where the encoding is learned from data instead of being predefined. We first propose a multi-label output code using canonical correlation analysis, where predictability of the code is optimized. We then argue that both discriminability and predictability are critical for output coding, and propose a max-margin formulation that promotes both discriminative and predictable codes. We empirically study our methods in a wide spectrum of applications, including document categorization, landmine detection, face recognition, brain signal classification, handwritten digit recognition, house price forecasting, music emotion prediction, medical decision, email analysis, gene function classification, outdoor scene recognition, and so forth. In all these applications, our proposed methods for encoding input and output information lead to significantly improved prediction performance.
Styles APA, Harvard, Vancouver, ISO, etc.
36

GIOBERGIA, FLAVIO. « Machine learning with limited label availability : algorithms and applications ». Doctoral thesis, Politecnico di Torino, 2023. https://hdl.handle.net/11583/2976594.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
37

Van, Niekerk Daniel Rudolph. « Automatic speech segmentation with limited data / by D.R. van Niekerk ». Thesis, North-West University, 2009. http://hdl.handle.net/10394/3978.

Texte intégral
Résumé :
The rapid development of corpus-based speech systems such as concatenative synthesis systems for under-resourced languages requires an efficient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility, while automation of this process has only been satisfactorily demonstrated on large corpora of a select few languages by employing techniques requiring extensive and specialised resources. In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done through an empirical evaluation of existing segmentation techniques on typical speech corpora in three South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efficient application of these techniques were investigated in order to improve the accuracy of resulting phonetic alignments. We found that the application of baseline speaker-specific Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated how such models can be developed and applied efficiently in this context. The result is segmentation of sufficient quality for synthesis applications, with the quality of alignments comparable to manual segmentation efforts in this context. Finally, possibilities for further automated refinement of phonetic alignments were investigated and an efficient corpus development strategy was proposed with suggestions for further work in this direction.
Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009.
Styles APA, Harvard, Vancouver, ISO, etc.
38

Nagy, Arnold B. « Priority area performance and planning areas with limited biological data ». Thesis, University of Sheffield, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.425193.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
39

Dowler, John D. « Using Neural Networks with Limited Data to Estimate Manufacturing Cost ». Ohio University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1211293606.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
40

Szotten, David. « Limited data problems in X-ray and polarized light tomography ». Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/limited-data-problems-in-xray-and-polarized-light-tomography(5bc153b4-7344-4a62-9879-e23cc3d60b2d).html.

Texte intégral
Résumé :
We present new reconstruction results and methods for limited data problems in photoelastic tomography. We begin with a survey of the current state of x-ray tomography. Discussing the Radon transform and its inversion we also consider some stability results for reconstruction in Sobolev spaces. We describe certain limited data problems and ways to tackle these, in particular the Two Step Hilbert reconstruction method. We then move on to photoelastic tomography, where we make use of techniques from scalar tomography to develop new methods for photoelastic tomographic reconstruction. We present the main mathematical model used in photoelasticity, the Truncated Transverse Ray Transform (TTRT). After some initial numerical studies, we extend a recently presented reconstruction algorithm for the TTRT from the Schwartz class to certain Sobolev spaces. We also give some stability results for inversion in these spaces. Moving on from general reconstruction to focus on inversion of some special cases of tensors we consider solenoidal and potential tensor fields. We discuss existing reconstruction methods and present several novel reconstructions and discuss their advantages over using more general machinery. We also extend our new algorithms, as well as existing ones, to certain cases of data truncation. Finally, we present numerical studies of the general reconstruction method. We give the first published results of TTRT reconstruction and go into some detail describing the implementation before presenting our results.
Styles APA, Harvard, Vancouver, ISO, etc.
41

Qu, Lizhen [Verfasser], et Gerhard [Akademischer Betreuer] Weikum. « Sentiment analysis with limited training data / Lizhen Qu. Betreuer : Gerhard Weikum ». Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2013. http://d-nb.info/1053680104/34.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
42

Mehdawi, Nader. « Monitoring for Underdetermined Underground Structures during Excavation Using Limited Sensor Data ». Master's thesis, University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5670.

Texte intégral
Résumé :
A realistic field monitoring application to evaluate close proximity tunneling effects of a new tunnel on an existing tunnel is presented. A blind source separation (BSS)-based monitoring framework was developed using sensor data collected from the existing tunnel while the new tunnel was excavated. The developed monitoring framework is particularly useful to analyze underdetermined systems due to insufficient sensor data for explicit input force-output deformation relations. The analysis results show that the eigen-parameters obtained from the correlation matrix of raw sensor data can be used as excellent indicators to assess the tunnel structural behaviors during the excavation with powerful visualization capability of tunnel lining deformation. Since the presented methodology is data-driven and not limited to a specific sensor type, it can be employed in various proximity excavation monitoring applications.
M.S.
Masters
Civil, Environmental, and Construction Engineering
Engineering and Computer Science
Civil Engineering; Structures and Geotechnical Engineering
Styles APA, Harvard, Vancouver, ISO, etc.
43

White, Susan Mary. « Sediment yield estimation from limited data sets : a Philippines case study ». Thesis, University of Exeter, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.332300.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
44

Don, Michael, et Tom Harkins. « Achieving High Resolution Measurements Within Limited Bandwidth Via Sensor Data Compression ». International Foundation for Telemetering, 2012. http://hdl.handle.net/10150/581447.

Texte intégral
Résumé :
ITC/USA 2012 Conference Proceedings / The Forty-Eighth Annual International Telemetering Conference and Technical Exhibition / October 22-25, 2012 / Town and Country Resort & Convention Center, San Diego, California
The U.S. Army Research Laboratory (ARL) is developing an onboard instrument and telemetry system to obtain measurements of the 30mm MK310 projectile's in-flight dynamics. The small size, high launch acceleration, and extremely high rates of this projectile create many design challenges. Particularly challenging is the high spin rate which can reach 1400 Hz at launch. The bandwidth required to continuously transmit solar data using the current method for such a rate would leave no room for data from other sensors. To solve this problem, a data compression scheme is implemented that retains the resolution of the solar sensor data while providing room in the telemetry frame for other measurements.
Styles APA, Harvard, Vancouver, ISO, etc.
45

Abidin, Mohamed Roseli bin Zainal. « Hydrological and hydraulic sensitivity analyses for flood modelling with limited data ». Thesis, University of Birmingham, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.707174.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
46

Fisher, Emily. « Tools for assessing data-limited fisheries and communicating stock status information ». Thesis, Fisher, Emily (2012) Tools for assessing data-limited fisheries and communicating stock status information. PhD thesis, Murdoch University, 2012. https://researchrepository.murdoch.edu.au/id/eprint/14881/.

Texte intégral
Résumé :
This PhD study was focused on developing and exploring tools for assessing the status of data-limited fish stocks. A management strategy evaluation (MSE) framework was developed to explore the effectiveness of alternative strategies for managing fish stocks for which sufficient data are available to allow a catch curve-based assessment, but which lack the reliable time series data on catches and/or catch per unit of effort required for developing an integrated age-structured fishery model. Explorations using the operating model of this framework indicated that, particularly for demersal fish species with limited movements and which suffer high levels of post-release mortality, use of temporal closures throughout the full area of a fishery are likely to be more effective for reducing fishing mortality than reducing daily bag limits, imposing more restrictive size limits, or constraining the areas open to fishing. Implications of differences in biological characteristics of fish species, including longevity, annual recruitment variability and post-release mortality, for the effectiveness of different management controls were explored using the operating model. The effectiveness of the graphical interface employed by the MSE model in communicating stock assessment information to fisheries managers and stakeholders was evaluated in a “scenario testing” study involving university students. Students viewed model outputs for several hypothetical fish stocks with different biological attributes and initial exploitation states. Based on their perception of the true status of each stock, students then “pulled” various alternative “management levers” available in the program. Analyses of data resulting from the study indicated that, provided the design was not overly complex, the interface of the MSE framework was effective for communicating stock assessment information. The results of the study illustrated the potential of this type of approach for evaluating and improving the effectiveness of the ways in which stock assessment information is communicated to fisheries managers and other stakeholders. During the next project phase, several methods for estimating rates of mortality of fish stocks were developed and explored. Maximum likelihood estimates of total mortality, calculated assuming that the age composition of fully-recruited fish was drawn from a geometric distribution and that annual recruitment was variable, had lower root mean squared error (RMSE) than other estimates obtained using traditional methods of catch curve analysis that did not allow for such variability. This catch curve model, which also provided potentially valuable information on recruitment variability, was then extended to allow for a change in total mortality, as might result from a major change to management. Analyses demonstrated that, despite variability in annual recruitment, it was possible to distinguish such a change in mortality in the age composition data if the mortality change was of sufficient magnitude and adequate time had elapsed since the change in mortality. Bias in the estimates of mortality for the two periods was explored. Next, a model was developed to provide estimates of mortality for fish species which undertake pronounced unidirectional, size-dependent movements during life, e.g. a size-dependent, offshore movement of fish to deeper water, when it is only possible to obtain representative samples of age and size compositions from the different areas and not for the overall population. The model was able to “disentangle” the similar, but slightly different, influences of mortality and movement on size and age data. Following simulation testing, the technique was applied to “real” data for a fish species in Western Australia (Pseudocaranx georgianus). The model fills a “void” for existing methods for such fish species, particularly if those species are of insufficient economic value to warrant an expensive, large-scale tagging program. Areas in which the work presented in this thesis could be expanded are discussed in the light of some likely directions for future fisheries research relating to data-limited fisheries.
Styles APA, Harvard, Vancouver, ISO, etc.
47

Säfdal, Joakim. « Data-Driven Engine Fault Classification and Severity Estimation Using Interpolated Fault Modes from Limited Training Data ». Thesis, Linköpings universitet, Fordonssystem, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173916.

Texte intégral
Résumé :
Today modern vehicles are expected to be safe, environmentally friendly, durable and economical. Monitoring the health of the vehicle is therefore more important than ever. As the complexity of vehicular systems increases the need for efficient monitoring methods has increased as well. Traditional methods of deriving models for the systems are today not as efficient as the complexity of the systems increases the time and skill needed to implement the models. An alternative is data driven methods where a collection of data associated with the behavior of the system is used to draw conclusions of the state of the system. Faults are however rare events and collecting sufficient data to cover all possible faults threatening a vehicle would be impossible. A method for drawing conclusions from limited historical data would therefore be desirable. In this thesis an algorithm using distiguishability as a method for fault classification and fault severity estimation is proposed. Historical data is interpolated over a fault severity vector using Gaussian process regression as a way to estimate fault modes for unknown fault sizes. The algorithm is then tested against validation data to evaluate the ability to detect and identify known fault classes and fault serveries, separate unknown fault classes from known fault classes, and estimate unknown fault sizes. The purpose of the study is to evaluate the possibility to use limited historical data to reduce the need for costly and time consuming data collection. The study shows promising results as fault class identification and fault size estimation using the proposed algorithm seem possible for fault sizes not included in the historical data.
Styles APA, Harvard, Vancouver, ISO, etc.
48

Dlamini, Luleka. « Exploring the potential of using remote sensing data to model agricultural systems in data-limited areas ». Master's thesis, Faculty of Science, 2020. http://hdl.handle.net/11427/32239.

Texte intégral
Résumé :
Crop models (CMs) can be a key component in addressing issues of global food security as they can be used to monitor and improve crop production. Regardless of their wide utilization, the employment of these models, particularly in isolated and rural areas, is often limited by the lack of reliable input data. This data scarcity increases uncertainties in model outputs. Nevertheless, some of these uncertainties can be mitigated by integrating remotely sensed data into the CMs. As such, increasing efforts are being made globally to integrate remotely sensed data into CMs to improve their overall performance and use. However, very few such studies have been done in South Africa. Therefore, this research assesses how well a crop model assimilated with remotely sensed data compares with a model calibrated with actual ground data (Maize_control). Ultimately leading to improved local cropping systems knowledge and the capacity to use CMs. As such, the study calibrated the DSSAT-CERES-Maize model using two generic soils (i.e. heavy clay soil and medium sandy soil) which were selected based on literature, to measure soil moisture from 1985 to 2015 in Bloemfontein. Using the data assimilation approach, the model's soil parameters were then adjusted based on remotely sensed soil moisture (SM) observations. The observed improvement was mainly assessed through the lens of SM simulations from the original generic set up to the final remotely sensed informed soil profile set up. The study also gave some measure of comparison with Maize_control and finally explored the impacts of this specific SM improvement on evapotranspiration (ET) and maize yield. The result shows that when compared to the observed data, assimilating remotely sensed data with the model significantly improved the mean simulation of SM while maintaining the representation of its variability. The improved SM, as a result of assimilation of remotely sensed data, closely compares with the Maize_control in terms of mean but there was no improvement in terms of variability. Data assimilation also improved the mean and variability of ET simulation when compared that of Maize_control, but only with heavy clay soil. However, maize yield was not improved in comparison. This confirms that these outputs were influenced by other factors aside from SM or the soil profile parameters. It was concluded that remote sensing data can be used to bias correct model inputs, thus improve certain model outputs.
Styles APA, Harvard, Vancouver, ISO, etc.
49

Barrère, Killian. « Architectures de Transformer légères pour la reconnaissance de textes manuscrits anciens ». Electronic Thesis or Diss., Rennes, INSA, 2023. http://www.theses.fr/2023ISAR0017.

Texte intégral
Résumé :
En reconnaissance d’écriture manuscrite, les architectures Transformer permettent de faibles taux d’erreur, mais sont difficiles à entraîner avec le peu de données annotées disponibles. Dans ce manuscrit, nous proposons des architectures Transformer légères adaptées aux données limitées. Nous introduisons une architecture rapide basée sur un encodeur Transformer, et traitant jusqu’à 60 pages par seconde. Nous proposons aussi des architectures utilisant un décodeur Transformer pour inclure l’apprentissage de la langue dans la reconnaissance des caractères. Pour entraîner efficacement nos architectures, nous proposons des algorithmes de génération de données synthétiques adaptées au style visuel des documents modernes et anciens. Nous proposons également des stratégies pour l’apprentissage avec peu de données spécifiques, et la réduction des erreurs de prédiction. Nos architectures, combinées à l’utilisation de données synthétiques et de ces stratégies, atteignent des taux d’erreur compétitifs sur des lignes de texte de documents modernes. Sur des documents anciens, elles parviennent à s’entraîner avec des nombres limités de données annotées, et surpassent les approches de l’état de l’art. En particulier, 500 lignes annotées sont suffisantes pour obtenir des taux d’erreur caractères proches de 5%
Transformer architectures deliver low error rates but are challenging to train due to limited annotated data in handwritten text recognition. We propose lightweight Transformer architectures to adapt to the limited amounts of annotated handwritten text available. We introduce a fast Transformer architecture with an encoder, processing up to 60 pages per second. We also present architectures using a Transformer decoder to incorporate language modeling into character recognition. To effectively train our architectures, we offer algorithms for generating synthetic data adapted to the visual style of modern and historical documents. Finally, we propose strategies for learning with limited data and reducing prediction errors. Our architectures, combined with synthetic data and these strategies, achieve competitive error rates on lines of text from modern documents. For historical documents, they train effectively with minimal annotated data, surpassing state-ofthe- art approaches. Remarkably, just 500 annotated lines are sufficient for character error rates close to 5%
Styles APA, Harvard, Vancouver, ISO, etc.
50

Sanakoyeu, Artsiom [Verfasser], et Björn [Akademischer Betreuer] Ommer. « Visual Representation Learning with Limited Supervision / Artsiom Sanakoyeu ; Betreuer : Björn Ommer ». Heidelberg : Universitätsbibliothek Heidelberg, 2021. http://d-nb.info/1231632488/34.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie