Дисертації: "Absence of training data"

1

Boxwell, Stephen Arthur. "A CCG-Based Method for Training a Semantic Role Labeler in the Absence of Explicit Syntactic Training Data." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1322594816.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Whaley, Steven R. J. "Bayesian analysis of sickness absence data." Thesis, University of Aberdeen, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.274884.

Повний текст джерела

Анотація:

Sickness-absence (SA) is a serious financial burden to UK industry totalling £10-12 billion in 1999 the equivalent of £434 and 7.8 days lost per worker. A major change in the reporting of SA occurred on 14 June 1982 with the introduction of self certification. Up to then all episodes had to be certified by a general practitioner. Since then, events that lasted for seven calendar or less have not required a GP's certificate and are 'self-certified'. A SA episode consists of the date the individual went off sick, the duration of the episode and a medical diagnosis given by either a GP or self diagnosis. A common approach to the analysis of SA data is to model the number of times an individual went off sick during a period of follow up via Poisson regression. Some studies on SA have examined the duration of SA, though most concentrated on the probability of going off sick. This thesis uses an intensity based approach to model the joint probability that a person goes off sick with a specific disease and has a specific duration of absence (the 'joint analysis'). A Bayesian hierarchical model, based on the conditional proportional hazards model, is formulated for the joint analysis and sampled using Markov chain Monte Carlo methods. Posterior expectations and 90% credible intervals are presented as summaries of the marginal posterior distributions of the parameters of the joint analysis. Trace plots of the log-joint posterior distribution are given to assess convergence of the MCMC sampler.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Yousefi, Sepehr. "Credit Risk Management in Absence of Financial and Market Data." Thesis, KTH, Matematisk statistik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188800.

Повний текст джерела

Анотація:

Credit risk management is a significant fragment in financial institutions' security precautions against the downside of their investments. A major quandary within the subject of credit risk is the modeling of simultaneous defaults. Globalization causes economises to be affected by innumerous external factors and companies to become interdependent, which in turn enlarges the complexity of establishing reliable mathematical models. The precarious situation is exacerbated by the fact that managers often suffer from the lack of data. The default correlations are most often calibrated by either using financial and/or market information. However, there exists circumstances where these types of data are inaccessible or unreliable. The problem of scarce data also induces diculties in the estimation of default probabilities. The frequency of insolvencies and changes in credit ratings are usually updated on an annual basis and historical information covers 20-25 years at best. From a mathematical perspective, this is considered as a small sample and standard statistical models are inferior in such situations. The first part of this thesis specifies the so-called entropy model which estimates the impact of macroeconomic fluctuations on the probability of defaults, and aims to outperform standard statistical models for small samples. The second part specifies the CIMDO, a framework for modeling correlated defaults without financial and market data. The last part submits a risk analysis framework for calculating the uncertainty in the simulated losses. It is shown that the entropy model will reduce the variance of the regression coefficients but increase its bias compared to the OLS and Maximum Likelihood. Furthermore there is a significant difference between the Student's t CIMDO and the t-Copula. The former appear to reduce the model uncertainty, however not to such extent that evident conclusions were carried out.
Kreditriskhantering är den enskilt viktigaste delen i banker och finansiella instituts säkerhetsåtgärder mot nedsidor i deras investeringar. En påtaglig svårighet inom ämnet är modelleringen av simultana konkurser. Globalisering ökar antalet parametrar som påverkar samhällsekonomin, vilket i sin tur försvårar etablering av tillförlitliga matematiska modeller. Den prekära situationen förvärras av det faktum att analytiker genomgående saknar tillräcklig data. Konkurskorrelation är allt som oftast kalibrerad med hjälp av information från årsrapporter eller marknaden. Dessvärre existerar det omständigheter där sådana typer av data är otillgängliga eller otillförlitliga. Samma problematik skapar även svårigheter i skattningen av sannolikheten till konkurs. Uppgifter såsom frekvensen av insolventa företag eller förändringar i kreditbetyg uppdateras i regel årligen, och historisk data täcker i bästa fall 20-25 år. Syftet med detta examensarbete är att ge ett övergripande ramverk för kreditriskhantering i avsaknad av finansiell information och marknadsdata. Detta innefattar att estimera vilken påverkan fluktueringar i makroekonomin har på sannolikheten för konkurs, modellera korrelerade konkurser samt sammanfatta ett ramverk för beräkning av osäkerheten i den estimerade förlustdistributionen. Den första delen av examensarbetet specificerar den så kallade entropy modellen. Denna skattar påverkan av makroekonomin på sannolikheterna för konkurs och ämnar att överträffa statistiska standardmodeller vid små datamängder. Den andra delen specificerar CIMDO, ett ramverk för beräkning av konkurskorrelation när marknads- och företagsdata saknas. Den sista delen framlägger ett ramverk för riskanalys av förlustdistributionen. Det visas att entropy modellen reducerar variansen i regressionskoefficienter men till kostnad av att försämra dess bias. Vidare är det en signifikant skillnad mellan student’s t CIMDO och t-Copula. Det förefaller som om den förstnämnda reducerar osäkerheten i beräkningarna, men inte till den grad att uppenbara slutsatser kan dras.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Nilsson, Maria. "Differences and similarities in work absence behavior : - empirical evidence from micro data." Doctoral thesis, Växjö universitet, Ekonomihögskolan, EHV, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-626.

Повний текст джерела

Анотація:

This thesis consists of three self-contained essays about absenteeism. Essay I analyzes if the design of the insurance system affects work absence, i.e. the classic insurance problem of moral hazard. Several reforms of the sickness insurance system were implemented during the period 1991-1996. Using Negative binomial models with fixed effects, the analysis show that both workers and employers changed their behavior due to the reforms. We also find that the extent of moral hazard varies depending on work contract structures. The reforms reducing the compensation levels decreased workers’ absence, both the number of absent days and the number of absence spells. The reform in 1992, introducing sick pay paid by the employers, also decreased absence levels, which probably can be explained by changes in personnel policy such as increased use of monitoring and screening of workers. Essay II examines the background to gender differences in work absence. Women are found, as in many earlier studies, to have higher absence levels than men. Our analysis, using finite mixture models, reveals that there are a group of women, comprised of about 41% of the women in our sample, that have a high average demand of absence. Among men, the high demand group is smaller consisting of about 36% of the male sample. The absence behavior differs as much between groups within gender as it does between men and women. The access to panel data covering the period 1971-1991 enables an analysis of the increased gender gap over time. Our analysis shows that the increased gender gap can be attributed to changes in behavior rather than in observable characteristics. Essay III analyzes the difference in work absence between natives and immigrants. Immigrants are found to have higher absence than natives when measured as the number of absent days. For the number of absence spells, the pattern for immigrants and natives is about the same. The analysis, using panel data and count data models, show that natives and immigrants have different characteristics concerning family situation, work conditions and health. We also find that natives and immigrants respond differently to these characteristics. We find, for example, that the absence of natives and immigrants are differently related to both economic incentives and work environment. Finally, our analysis shows that differences in work conditions and work environment only can explain a minor part of the ethnic differences in absence during the 1980’s.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Johnston, Cristin D. "Observation training evaluating a procedure for generating self-rules in the absence of reinforcement /." abstract and full text PDF (UNR users only), 2008. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3316373.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Muhammad, Azhar Ranjha, and Adnan Ghalib Ahmad. "Data Analysis and Graph Presentation of Team Training Data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-66998.

Повний текст джерела

Анотація:

This Report illustrates the team training system presentation as a web based graphs.The research is done based on the presentation of web information stored in database into the graphicalform. Ice-Faces with SQL database at back end data source is the way to demonstrate the implementationof graph system. By having research and comparisons it is found suitably best the Graph generating systemfor analysis of C3fire records.Several models for graphs are been selected for the illustration of best visualization of the demography andat last one with best demonstration of result is selected.The information which was displayed in tables stored in database is now viewable in the graphical format.The implementation was done by modifying and embedding codes in the previous version and successfullyimplementation is done. The graphs are displayed by the values stored in database and dynamicallyupdated as the values in the database are changed. There are four graphs finally selected and implementedthat shows the data, which are pie, bar, line and cluster bar graphs representing data in best viewableform.
C3Fire

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Powers, Richard. "Track-loss detection in the absence of truth data for target tracking in clutter." Connect to online resource, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3273736.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Goulette, Dana E. "Training assessment and modeling subjective data encapsulation for the National Training Center." Monterey, California. Naval Postgraduate School, 1997. http://hdl.handle.net/10945/9084.

Повний текст джерела

Анотація:

Approved for public release; distribution is unlimited
The National Training Center (NTC) located at Fort Irwin, California performs the critical Army mission of preparing battalion task forces and brigade staffs for combat. The NTC provides a unique opportunity to assess training proficiency. To assist in the training assessment of rotating units, the Army has spent millions of dollars on a state of the art instrumentation system that transmits objective data from all player vehicles and stores the information in a database. Currently, no subjective observer-controller (O/C) observations of training are stored in the database. The primary emphasis of this research is to develop a training assessment system and model subjective data encapsulation to enhance training performance analysis. The assessment system is designed to be incorporated into a relational database that will allow analysis of various measures of performance that provide input for platoon through brigade level After Action Reviews (AAR). Additionally, the database will support methods for simple data manipulation for the purpose of conducting post-rotation analysis and the identification of trends

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Chang, Eric I.-Chao. "Improving wordspotting performance with limited training data." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/38056.

Повний текст джерела

Анотація:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.
Includes bibliographical references (leaves 149-155).
by Eric I-Chao Chang.
Ph.D.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Georgsson, Adam, and Olof Christensson. "Visualization of training data reportedby football players." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16694.

Повний текст джерела

Анотація:

Background. Data from training sessions is gathered by a trainer from the playerswith the goal of analyzing and getting an overview of how the team is performing.The collected data is represented in tabular form, and over time the effort to inter-pret it becomes more demanding. Objectives. This thesis’ goal is to find out if there is a solution where collecting,processing and representing training data from football players can ease and improvethe trainer’s analysis of the team. Methods. A dataset is received from a football trainer, and it contains informa-tion about training sessions for his team of football players. The dataset is used tofind a suitable method and visualize the data. Feedback from the trainer is used todetermine what works and what does not. Furthermore, a survey with examples ofvisualization is given to the players and the trainer to get an understanding of howthe selected charts are interpreted. Results. Representing the attributes of most importance from received datasetrequires a chain of views (usage flow) to be introduced, from primary view to qua-ternary view. Each step in the chain tightens the level of details represented. Boxplot proved to be an appropriate choice to provide an overview of the team’s trainingdata. Conclusions. Visualizing training data gives a significant advantage to the trainerregarding team analysis. With box plotting will the trainer get an overview of theteam and can hereafter dig into more detailed data while interacting with the charts

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Alves, André Ribeiro. "Curricular training report in clinical data management." Master's thesis, Universidade de Aveiro, 2013. http://hdl.handle.net/10773/10875.

Повний текст джерела

Анотація:

Mestrado em Biomedicina Farmacêutica
Este relatório descreve as actividades desenvolvidas no contexto do estágio de 9 meses realizado na Unidade de Gestão de dados da Eurotrials com inicio em Setembro de 2011 e fim em Maio de 2012. A Eurotrials é uma empresa de consultoria científica que presta serviços à indústria farmacêutica e biotecnologia, nomeadamente na condução de ensaios clínicos. No processo do desenvolvimento de um novo medicamento os ensaios clínicos são a ferramenta mais importante de forma a verificar a segurança e eficácia da substância. A gestão de dados cínicos tem um papel muito importante na condução de ensaios clínicos e tem como objectivo gerar dados de grande qualidade e robustos para que possam ser analisados. A equipa de gestão de dados participa em actividades que vão desde o planeamento do estudo até à sua conclusão. As principais actividades exercidas no âmbito da gestão de dados foram o desenho do caderno de recolha de dados, bem como a criação da base de dados, gestão de discrepâncias e padronização de dados.
This report describes the activities undertaken in the context of a Curricular training with a duration of 9 months in the Data Management Unit at Eurotrials starting in September 2011 and end in May 2012. Eurotrials is a contract research organization that provides services to the pharmaceutical and biotechnology industries, namely clinical trial conduction. In the drug development process, clinical trials are the most important tool to verify if the drug is secure and effective. The field of clinical data management has a very important role in clinical trials conduction and aims to generate high-quality and reliable data so that it can be analyzed. The data management team is engaged in activities ranging from the design of the study until their completion. The main activates performed regarding clinical data management were the design of the case report form, database design, discrepancies management and data standardization.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Nicholson, Alexander Abu-Mostafa Yaser S. "Generalization error estimates and training data valuation /." Diss., Pasadena, Calif. : California Institute of Technology, 2002. http://resolver.caltech.edu/CaltechETD:etd-09062005-083717.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Fraser, D., T. Marder, Pamela J. Mims, and Bree Jimenez. "Training Teachers in Data-Based Decision Making." Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etsu-works/188.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Zama, Ramirez Pierluigi <1992&gt. "Deep Scene Understanding with Limited Training Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amsdottorato.unibo.it/9815/1/zamaramirez_pierluigi_tesi.pdf.

Повний текст джерела

Анотація:

Scene understanding by a machine is a challenging task due to the profound variety of nature. Nevertheless, deep learning achieves impressive results in several scene understanding tasks such as semantic segmentation, depth estimation, or optical flow. However, these kinds of approaches need a large amount of labeled data, leading to massive manual annotations, which are incredibly tedious and expensive to collect. In this thesis, we will focus on understanding a scene through deep learning with limited data availability. First of all, we will tackle the problem of the lack of data for semantic segmentation. We will show that computer graphics come in handy to our purpose, both to create a new, efficient tool for annotation as well to render synthetic annotated datasets quickly. However, a network trained only on synthetic data suffers from the so-called domain-shift problem, i.e. unable to generalize to real data. Thus, we will show that we can mitigate this problem using a novel deep image to image translation technique. In the second part of the thesis, we will focus on the relationship between scene understanding tasks. We argue that building a model aware of the connections between tasks is the first building stone to create more robust, efficient, performant models that need less annotated training data. In particular, we demonstrate that we can decrease the need for labels by exploiting the relationship between visual tasks. Finally, in the last part, we propose a novel unified framework for comprehensive scene understanding, which exploits the synergies between tasks to be more robust, efficient, and performant.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Van, Eenoo Edward Charles Jr. "Theoretically Valid Aggregates in the Absence of Homothetic Preferences, Separable Utility, and Complete Price Data." Thesis, Virginia Tech, 1998. http://hdl.handle.net/10919/9782.

Повний текст джерела

Анотація:

The improper aggregation of commodities can have important consequences when estimating a system of group demand equations. Generally, aggregates are created under the assumptions that intra-group preferences are homothetic and the consumer's utility function is weakly separable over some partition. These assumptions place severe restrictions on the model that can significantly impact parameter and elasticity estimates. An alternative to imposing weak separability is to employ the Generalized Composite Commodity Theorem, which requires the relative intra-group commodity prices to be independent of the group price index. This study compares the results of estimating a demand system for composite beef, pork, and poultry products under the assumptions of weak separability and the Generalized Composite Commodity Theorem. Another important issue related to aggregation is the specification of an appropriate group price index. Price indices consistent with linear homogeneous preferences (a subset of the homothetic class of preferences) and non-homothetic intra-group preferences are identified and it is shown that several of the commonly employed indices are biased in the absence of complete price data.
Master of Science

Стилі APA, Harvard, Vancouver, ISO та ін.

16

McLaughlin, N. R. "Robust multimodal person identification given limited training data." Thesis, Queen's University Belfast, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.579747.

Повний текст джерела

Анотація:

Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal feature fusion (MOFF), for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowl- edge about the corruption. Furthermore, it is assumed there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature rep- resentation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Similarity-based optimal feature selection and multi- condition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Low-level feature fusion is performed using optimal feature selection, which automatically changes the weighting given to each modality based on the level of corruption. The framework for robust person identification is also applied to noise robust speaker identification, given very limited training data. Experiments have been carried out on a bimodal data set created from the SPIDRE speaker recogni- tion database and AR face recognition database, with variable noise corruption of speech and occlusion in the face images. Combining both modalities using MOFF, leads to significantly improved identification accuracy compared to the component unimodal systems, even with simultaneous corruption of both modal- ities. A novel piecewise-constant illumination model (PCIlVI) is then introduced for illumination invariant facial recognition. This method can be used given a single training facial image for each person, and assuming no prior knowledge of the illumination conditions of both the training and testing images. Small areas of the face are represented using magnitude Fourier features, which takes advan- tage of the shift-invariance of the magnitude Fourier representation, to increase robustness to small misalignment errors and small facial expression changes. Fi- nally, cosine similarity is used as an illumination invariant similarity measure, to compare small facial areas. Experiments have been carried out on the YaleB, ex- tended YaleB and eMU-PIE facial illumination databases. Facial identification accuracy using PCIlVI is comparable to or exceeds that of the literature.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Anastasiadis, Aristoklis. "Neural networks training and applications using biological data." Thesis, Birkbeck (University of London), 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.428055.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Diffner, Fredrik, and Hovig Manjikian. "Training a Neural Network using Synthetically Generated Data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280334.

Повний текст джерела

Анотація:

A major challenge in training machine learning models is the gathering and labeling of a sufficiently large training data set. A common solution is the use of synthetically generated data set to expand or replace a real data set. This paper examines the performance of a machine learning model trained on synthetic data set versus the same model trained on real data. This approach was applied to the problem of character recognition using a machine learning model that implements convolutional neural networks. A synthetic data set of 1’240’000 images and two real data sets, Char74k and ICDAR 2003, were used. The result was that the model trained on the synthetic data set achieved an accuracy that was about 50% better than the accuracy of the same model trained on the real data set.
Vid utvecklandet av maskininlärningsmodeller kan avsaknaden av ett tillräckligt stort dataset för träning utgöra ett problem. En vanlig lösning är att använda syntetiskt genererad data för att antingen utöka eller helt ersätta ett dataset med verklig data. Denna uppsats undersöker prestationen av en maskininlärningsmodell tränad på syntetisk data jämfört med samma modell tränad på verklig data. Detta applicerades på problemet att använda ett konvolutionärt neuralt nätverk för att tyda tecken i bilder från ”naturliga” miljöer. Ett syntetiskt dataset bestående av 1’240’000 samt två stycken dataset med tecken från bilder, Char74K och ICDAR2003, användes. Resultatet visar att en modell tränad på det syntetiska datasetet presterade ca 50% bättre än samma modell tränad på Char74K.

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Curry, William. "Interpolation with prediction-error filters and training data /." May be available electronically:, 2008. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Li, Jiawei. "Person re-identification with limited labeled training data." HKBU Institutional Repository, 2018. https://repository.hkbu.edu.hk/etd_oa/541.

Повний текст джерела

Анотація:

With the growing installation of surveillance video cameras in both private and public areas, it is an immediate requirement to develop intelligent video analysis system for the large-scale camera network. As a prerequisite step of person tracking and person retrieval in intelligent video analysis, person re-identification, which targets in matching person images across camera views is an important topic in computer vision community and has been received increasing attention in the recent years. In the supervised learning methods, the person re-identification task is formulated as a classification problem to extract matched person images/videos (positives) from unmatched person images/videos (negatives). Although the state-of-the-art supervised classification models could achieve encouraging re-identification performance, the assumption that label information is available for all the cameras, is impractical in large-scale camera network. That is because collecting the label information of every training subject from every camera in the large-scale network can be extremely time-consuming and expensive. While the unsupervised learning methods are flexible, their performance is typically weaker than the supervised ones. Though sufficient labels of the training subjects are not available from all the camera views, it is still reasonable to collect sufficient labels from a pair of camera views in the camera network or a few labeled data from each camera pair. Along this direction, we address two scenarios of person re-identification in large-scale camera network in this thesis, i.e. unsupervised domain adaptation and semi-supervised learning and proposed three methods to learn discriminative model using all available label information and domain knowledge in person re-identification. In the unsupervised domain adaptation scenario, we consider data with sufficient labels as the source domain, while data from the camera pair missing label information as the target domain. A novel domain adaptive approach is proposed to estimate the target label information and incorporate the labeled data from source domain with the estimated target label information for discriminative learning. Since the discriminative constraint of Support Vector Machines (SVM) can be relaxed into a necessary condition, which only relies on the mean of positive pairs (positive mean), a suboptimal classification model learning without target positive data can be those using target positive mean. A reliable positive mean estimation is given by using both the labeled data from the source domain and potential positive data selected from the unlabeled data in the target domain. An Adaptive Ranking Support Vector Machines (AdaRSVM) method is also proposed to improve the discriminability of the suboptimal mean based SVM model using source labeled data. Experimental results demonstrate the effectiveness of the proposed method. Different from the AdaRSVM method that using source labeled data, we can also improve the above mean based method by adapting it onto target unlabeled data. In more general situation, we improve a pre-learned classifier by adapting it onto target unlabeled data, where the pre-learned classifier can be domain adaptive or learned from only source labeled data. Since it is difficult to estimate positives from the imbalanced target unlabeled data, we propose to alternatively estimate positive neighbors which refer to data close to any true target positive. An optimization problem for positive neighbor estimation from unlabeled data is derived and solved by aligning the cross-person score distributions together with optimizing for multiple graphs based label propagation. To utilize the positive neighbors to learn discriminative classification model, a reliable multiple region metric learning method is proposed to learn a target adaptive metric using regularized affine hulls of positive neighbors as positive regions. Experimental results demonstrate the effectiveness of the proposed method. In the semi-supervised learning scenario, we propose a discriminative feature learning using all available information from the surveillance videos. To enrich the labeled data from target camera pair, image sequences (videos) of the tagged persons are collected from the surveillance videos by human tracking. To extract the discriminative and adaptable video feature representation, we propose to model the intra-view variations by a video variation dictionary and a video level adaptable feature by multiple sources domain adaptation and an adaptability-discriminability fusion. First, a novel video variation dictionary learning is proposed to model the large intra-view variations and solved as a constrained sparse dictionary learning problem. Second, a frame level adaptable feature is generated by multiple sources domain adaptation using the variation modeling. By mining the discriminative information of the frames from the reconstruction error of the variation dictionary, an adaptability-discriminability (AD) fusion is proposed to generate the video level adaptable feature. Experimental results demonstrate the effectiveness of the proposed method.

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Rawls, Allen Worthington. "A systematic approach for improving predicted arrival time using historical data in absence of schedule reliability." View electronic thesis, 2008. http://dl.uncw.edu/etd/2008-1/r1/rawlsa/allenrawls.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Guo, Zhenyu. "Data famine in big data era : machine learning algorithms for visual object recognition with limited training data." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/46412.

Повний текст джерела

Анотація:

Big data is an increasingly attractive concept in many fields both in academia and in industry. The increasing amount of information actually builds an illusion that we are going to have enough data to solve all the data driven problems. Unfortunately it is not true, especially for areas where machine learning methods are heavily employed, since sufficient high-quality training data doesn't necessarily come with the big data, and it is not easy or sometimes impossible to collect sufficient training samples, which most computational algorithms depend on. This thesis mainly focuses on dealing situations with limited training data in visual object recognition, by developing novel machine learning algorithms to overcome the limited training data difficulty. We investigate three issues in object recognition involving limited training data: 1. one-shot object recognition, 2. cross-domain object recognition, and 3. object recognition for images with different picture styles. For Issue 1, we propose an unsupervised feature learning algorithm by constructing a deep structure of the stacked Hierarchical Dirichlet Process (HDP) auto-encoder, in order to extract "semantic" information from unlabeled source images. For Issue 2, we propose a Domain Adaptive Input-Output Kernel Learning algorithm to reduce the domain shifts in both input and output spaces. For Issue 3, we introduce a new problem involving images with different picture styles, successfully formulate the relationship between pixel mapping functions with gradient based image descriptors, and also propose a multiple kernel based algorithm to learn an optimal combination of basis pixel mapping functions to improve the recognition accuracy. For all the proposed algorithms, experimental results on publicly available data sets demonstrate the performance improvements over previous state-of-arts.

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Granlund, David. "Economic policy in health care : Sickness absence and pharmaceutical costs." Doctoral thesis, Umeå : Department of Economics, Umeå University, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1137.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Bailey, Roy Douglas. "Autogenic regulation training (ART), sickness absence, personal problems, time and the emotional-physical stress of student nurses in general training : a report of a longitudinal field investigation." Thesis, University of Hull, 1985. http://hydra.hull.ac.uk/resources/hull:5040.

Повний текст джерела

Анотація:

A field investigation was carried out with student nurses entering General Training in a School of Nursing. Autogenic Regulation Training (ART), sickness absence, personal problems, time and their emotional physical experience was evaluated. Measures used in the study included:The Sickness Absence Record (SAR)The Mooney Problem Checklist (MPC)The Crown-Crisp Experiential Index (CCEI)and The Personal Observations Inventory (POI)Data was collected at different time periods early in their nurse education. The study was carried out to investigate the effectiveness of ART in providing a method of coping with individual stress. Analyses were made between and within an ART group of student nurses and a comparison group who did not receive training in ART. Consideration was also given to individual differences of student nurses in each group.Particular attention was paid to the hypotheses that 1) ART is associated with reduced sickness absence in student nurses when analysed against a comparison group' of student nurses not trained in ART; and 2) ART is associated with reduced stress in student nurses when compared with student nurses not trained in ART. 'It is generally concluded that student nurses trained in ART may reduce their level of sickness absence and can alleviate stress for some student nurses. However, examination of individual student nurse reports of ART and its usefulness and practice within these group data, suggest more complex interpretations of the study. Despite the study limitations, implications for methods of stress control for nurses, curriculum development and cost-effective savings for nursing administrations are suggested, and possibilities for the development of comprehensive counselling services for nurses are raised. These issues it is suggested, should be examined within a broader programme of research into coping with stress amongst nurses.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Lavelle, Stephen J. "Fabricating synthetic data in support of training for domestic terrorist activity data mining research." Thesis, Monterey, California. Naval Postgraduate School, 2010. http://hdl.handle.net/10945/5196.

Повний текст джерела

Анотація:

Approved for public release; distribution is unlimited
Data mining is a mature technology, widespread in both government and industry. The proliferation of data storage in public and private sectors has provided more information than can be expediently processed. Data mining provides a means to extract meaningful conclusions from this growing store of data. In the interests of countering criminal and terrorist activity, data mining has become a focus of law enforcement and government agencies. The use of databases containing information on persons may conflict with privacy rights and laws. Gathering public awareness of government data mining programs and databases has been accompanied with concern and investigation of these programs. Following a review of data mining and privacy issues, in 2008 the National Research Council (NRC) recommended any training in development of data mining programs involving personal data be conducted using synthesized data. This thesis seeks to present an underlying discussion of these issues, to include data mining use, a simple data synthesis model for analysis to support the validity of the NRC recommendation, and the associated difficulties encountered in the process. Included is an analysis of the inherent difficulty in creating realistic and useful data.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Cornell, Axel. "Probabilistic Fault Isolation in Embedded Systems Using Training Data." Thesis, KTH, Reglerteknik, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-105874.

Повний текст джерела

Анотація:

In the heavy vehicle industry customers, laws and increasingly complex processes demand methods of supervising every aspect of a truck. Fault isolation systems are introduced to do just that. In order to assure a sustainable development new types of isolation systems are investigated to substitute the consistency based isolation systems of today. In this thesis an application of a probabilistic isolation method that ranks possible faults on their likeliness of being a fault in the process is implemented and evaluated as a possible future replacement of today's system. This method bases the isolation on training data collected from measurements on the process and observation of the process. The probabilistic isolation method is evaluated on hos it performs under different circumstances such as the effort of different amounts of training data and how well it performs if the tests and observations of the process are of varying quality. Solution to several problems that arise when this method is implemented are also investigated such as how the system handles cases where several faults occur at the same time, what happens if there are missing data in the observations of the system and how to solve problems that involve execution times which is important in embedded systems. The results that are derived show that this probabilistic isolation system performs well on the process as it is today and that this is a good substitute when developing for future processes. There is however a need for further development of the system such as improved isolation when there are several faults present in the process and questions on how to collect and store the training data still remain to be answered. A full scale implement would allow for better comparison with the current system and give more information on runtime and storage problems.

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Ericson, Anton. "Object Recognition Using Digitally Generated Images as Training Data." Thesis, Uppsala universitet, Bildanalys och människa-datorinteraktion, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-200158.

Повний текст джерела

Анотація:

Object recognition is a much studied computer vision problem, where the task is to find a given object in an image. This Master Thesis aims at doing a MATLAB implementation of an object recognition algorithm that finds three kinds of objects in images: electrical outlets, light switches and wall mounted air-conditioning controls. Visually, these three objects are quite similar and the aim is to be able to locate these objects in an image, as well as being able to distinguish them from one another. The object recognition was accomplished using Histogram of Oriented Gradients (HOG). During the training phase, the program was trained with images of the objects to be located, as well as reference images which did not contain the objects. A Support Vector Machine (SVM) was used in the classification phase. The performance was measured for two different setups, one where the training data consisted of photos and one where the training data consisted of digitally generated images created using a 3D modeling software, in addition to the photos. The results show that using digitally generated images as training images didn’t improve the accuracy in this case. The reason for this is probably that there is too little intraclass variability in the gradients in digitally generated images, they’re too synthetic in a sense, which makes them poor at reflecting reality for this specific approach. The result might have been different if a higher number of digitally generated images had been used.

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Dubey, Rohini. "PERFORMANCE EVALUATION of MILITARY TRAINING EXERCISES USING DATA MINING." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-13060.

Повний текст джерела

Анотація:

Attaining training objectives is the measure of a successful training as objectives defines the purpose of instructional events. Application of the training objectives is challenging in large and complex military trainings. The trainings in military domain not only focus on the completion of the trainings but effectively achieving the objectives of the training is the goal of the exercises. It has been realized that the performance to achieve the goal is strengthen by the instructional processes and materials which are crafted to address specific training objectives. Simulation is one of the effective and realistic learning tools which can be used in trainings. As it is known that simulation generates enormous data, analysis of this data which may contain hidden information is a challenging task. The use of data mining is a solution to this problem. The aim of this project is to propose a framework of a system for the instructors which can be followed for evaluating trainee’s performance so that their fulfillment of the training objectives can be improved. A proposal which is studied in this project is learning from previous training experiences using data mining techniques to improve the effectiveness of the training by predicting the performance of the trainee. For selecting the good prediction model to estimate the learning outcome of the trainees, different classification techniques have been compared. CRISP-DM model is considered as a base for proposing the framework in this dissertation. Proposed framework is then applied on the dataset obtained from the Swedish Military for the exercises which involved shooting the target.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Varga, Tamás. "Off-line cursive handwriting recognition using synthetic training data." Berlin Aka, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2838183&prov=M&dok_var=1&dok_ext=htm.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Okuma, Kenji. "Active exploration of training data for improved object detection." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/40520.

Повний текст джерела

Анотація:

This thesis concerns the problem of object detection, which is defined as finding all instances of an object class of interest and fitting each of them with a tight bounding window. This seemingly easy task for humans is still extremely difficult for machines. However, recent advances in object detection have enabled machines to categorize many classes of objects. Statistical models are often used for representing an object class of interest. These models learn from extensive training sets and generalize with low error rates to unseen data in a highly generic manner. But, these statistical methods have a major drawback in that they require a large amount of training data. We approach this problem by making the process of acquiring labels less tedious and less costly by reducing human labelling effort. Throughout this thesis, we explore means of efficient label acquisition for realizing cheaper training, faster development time, and higher-performance of object detectors. We use active learning with our novel interface to combine machine intelligence with human interventions, and effectively improve a state-of-the-art classifier by using additional unlabelled images from the Web. As the approach relies on a small amount of label input from a human oracle, there is still room to further reduce the amount of human effort. An ideal solution is, if possible, to have no humans involved in labelling novel data. Given a sparsely labelled video that contains very few labels, our novel self-learning approach achieves automatic acquisition of additional labels from the unlabelled portion of the video. Our approach combines colour segmentation, object detection and tracking in order to discover potential labels from novel data. We empirically show that our self-learning approach improves the performance of models that detect players in broadcast footage of sports games.

Стилі APA, Harvard, Vancouver, ISO та ін.

31

McClintick, Kyle W. "Training Data Generation Framework For Machine-Learning Based Classifiers." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/1276.

Повний текст джерела

Анотація:

In this thesis, we propose a new framework for the generation of training data for machine learning techniques used for classification in communications applications. Machine learning-based signal classifiers do not generalize well when training data does not describe the underlying probability distribution of real signals. The simplest way to accomplish statistical similarity between training and testing data is to synthesize training data passed through a permutation of plausible forms of noise. To accomplish this, a framework is proposed that implements arbitrary channel conditions and baseband signals. A dataset generated using the framework is considered, and is shown to be appropriately sized by having $11\%$ lower entropy than state-of-the-art datasets. Furthermore, unsupervised domain adaptation can allow for powerful generalized training via deep feature transforms on unlabeled evaluation-time signals. A novel Deep Reconstruction-Classification Network (DRCN) application is introduced, which attempts to maintain near-peak signal classification accuracy despite dataset bias, or perturbations on testing data unforeseen in training. Together, feature transforms and diverse training data generated from the proposed framework, teaching a range of plausible noise, can train a deep neural net to classify signals well in many real-world scenarios despite unforeseen perturbations.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Valenzuela, Michael Lawrence. "Machine Learning, Optimization, and Anti-Training with Sacrificial Data." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/605111.

Повний текст джерела

Анотація:

Traditionally the machine learning community has viewed the No Free Lunch (NFL) theorems for search and optimization as a limitation. I review, analyze, and unify the NFL theorem with the many frameworks to arrive at necessary conditions for improving black-box optimization, model selection, and machine learning in general. I review meta-learning literature to determine when and how meta-learning can benefit machine learning. We generalize meta-learning, in context of the NFL theorems, to arrive at a novel technique called Anti-Training with Sacrificial Data (ATSD). My technique applies at the meta level to arrive at domain specific algorithms and models. I also show how to generate sacrificial data. An extensive case study is presented along with simulated annealing results to demonstrate the efficacy of the ATSD method.

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Nahari, Ammar Jamal. "Creating a Data Acquisition Platform for Robot Skill Training." Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case153260922930446.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Friesch, Pius. "Generating Training Data for Keyword Spotting given Few Samples." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254960.

Повний текст джерела

Анотація:

Speech recognition systems generally need a large quantity of highly variable voice and recording conditions in order to produce robust results. In the specific case of keyword spotting, where only short commands are recognized instead of large vocabularies, the resource-intensive task of data acquisition has to be repeated for each keyword individually. Over the past few years, neural methods in speech synthesis and voice conversion made tremendous progress and generate samples that are realistic to the human ear. In this work, we explore the feasibility of using such methods to generate training data for keyword spotting methods. In detail, we want to evaluate if the generated samples are indeed realistic or only sound so and if a model trained on these generated samples can generalize to real samples. We evaluated three neural network speech synthesis and voice conversion techniques : (1) Speaker Adaptive VoiceLoop, (2) Factorized Hierarchical Variational Autoencoder (FHVAE), (3) Vector Quantised-Variational AutoEncoder (VQVAE). These three methods are evaluated as data augmentation or data generation techniques on a keyword spotting task. The performance of the models is compared to a baseline of changing the pitch, tempo, and speed of the original sample. The experiments show that using the neural network techniques can provide an up to 20% relative accuracy improvement on the validation set. The baseline augmentation technique performs at least twice as good. This seems to indicate that using multi-speaker speech synthesis or voice conversation naively does not yield varied or realistic enough samples.
Taligenkänningssystem behöver generellt en stor mängd träningsdata med varierande röstoch inspelningsförhållanden för att ge robusta resultat. I det specifika fallet med nyckelordsidentifiering, där endast korta kommandon känns igen i stället för stora vokabulärer, måste resurskrävande datainsamling göras för varje sökord individuellt. Under de senaste åren har neurala metoder i talsyntes och röstkonvertering gjort stora framsteg och genererar tal som är realistiskt för det mänskliga örat. I det här arbetet undersöker vi möjligheten att använda sådana metoder för att generera träningsdata för nyckelordsidentifiering. I detalj vill vi utvärdera om det genererade träningsdatat verkligen är realistiskt eller bara låter så, och om en modell tränad på dessa genererade exempel generaliserar väl till verkligt tal. Vi utvärderade tre metoder för neural talsyntes och röstomvandlingsteknik: (1) Speaker Adaptive VoiceLoop, (2) Factorized Hierarchical Variational Autoencoder (FHVAE), (3) Vector Quantised-Variational AutoEncoder (VQVAE).Dessa tre metoder används för att antingen generera träningsdata från text (talsyntes) eller att berika ett befintligt dataset för att simulera flera olika talare med hjälp av röstkonvertering, och utvärderas i ett system för nyckelordsidentifiering. Modellernas prestanda jämförs med en baslinje baserad på traditionell signalbehandling där tonhöjd och tempo varieras i det ursprungliga träningsdatat. Experimenten visar att man med hjälp av neurala nätverksmetoder kan ge en upp till 20% relativ noggrannhetsförbättring på valideringsuppsättningen jämfört med ursprungligt träningsdata. Baslinjemetoden baserad på signalbehandling ger minst dubbelt så bra resultat. Detta tycks indikera att användningen av talsyntes eller röstkonvertering med flera talare inte ger tillräckligt varierade eller representativa träningsdata.

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Bergendal, Rasmus, and Andreas Rohlén. "A comparison of training algorithms when training a Convolutional Neural Network for classifying road signs." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254932.

Повний текст джерела

Анотація:

This thesis is a comparison between three dierent training algorithms when training a Convolutional Neural Network for classifying road signs. The algorithms that were compared were Gradient Descent, Adadelta, and Adam. For this study the German Trac Sign Recognition Benchmark (GTSRB) was used, which is a scientically relevant dataset containing around 50000 annotated images. A combination of supervised and offline learning was used and the top accuracy of each algorithm was registered. Adam achieved the highest accuracy, followed by Adadelta and then GradientDescent. Improvements to the neural network were implemented in form of more convolutional layers and more feature recognizing filters. This improved the accuracy of the CNN trained with Adam by 0.76 percentagepoints
Detta examensarbete är en jämförelse av tre olika träningsalgoritmer vid traning av ett Convolutional Neural Network för klassifiering av vägskyltar. De algoritmer som jämfördes var Gradient Descent, Adadelta och Adam. I denna studie användes datamängden German Traffic Sign Recognition Benchmark (GTSRB), som är en vetenskapligt använd datamängd innehållande runt 50000 kommenterade bilder. En kombination av övervakad (supervised) och offline inlärning användes och varje algoritms toppresultat sparades. Adam uppnådde högst resultat, följt av Adadelta och sist Gradient Descent. Det neurala nätverket förbättrades med hjälp av fler convolutional lager och fler igenkännande filter. Detta förbättrade traffsakerheten hos nätverket som tränats med Adam med 0.76 procentenheter.

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Low, Choy Samantha Jane. "Hierarchical models for 2D presence/absence data having ambiguous zeroes: With a biogeographical case study on dingo behaviour." Thesis, Queensland University of Technology, 2001. https://eprints.qut.edu.au/37098/12/Samantha%20Low%20Choy%20Thesis.pdf.

Повний текст джерела

Анотація:

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Maiga, Aïssata, and Johanna Löv. "Real versus Simulated data for Image Reconstruction : A comparison between training with sparse simulated data and sparse real data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302028.

Повний текст джерела

Анотація:

Our study investigates how training with sparse simulated data versus sparse real data affects image reconstruction. We compared on several criteria such as number of events, speed and high dynamic range, HDR. The results indicate that the difference between simulated data and real data is not large. Training with real data performed often better, but only by 2%. The findings confirm what earlier studies have shown; training with simulated data generalises well, even when training on sparse datasets as this study shows.
Vår studie undersöker hur träning med gles simulerad data och gles verklig data från en eventkamera, påverkar bildrekonstruktion. Vi tränade två modeller, en med simulerad data och en med verklig för att sedan jämföra dessa på ett flertal kriterier som antal event, hastighet och high dynamic range, HDR. Resultaten visar att skillnaden mellan att träna med simulerad data och verklig data inte är stor. Modellen tränad med verklig data presterade bättre i de flesta fall, men den genomsnittliga skillnaden mellan resultaten är bara 2%. Resultaten bekräftar vad tidigare studier har visat; träning med simulerad data generaliserar bra, och som denna studie visar även vid träning på glesa datamängder.

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Winblad, Kjell. "The Impact of Training Data Division in Inductive Dependency Parsing." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-157139.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Jiang, Kailang. "Improve classification on infrequent discourse relations via training data enrichment." Thesis, University of British Columbia, 2016. http://hdl.handle.net/2429/59844.

Повний текст джерела

Анотація:

Discourse parsing is a popular technique widely used in text understanding, sentiment analysis, and other NLP tasks. However, for most discourse parsers, the performance varies significantly across different discourse relations. In this thesis, we first validate the underfitting hypothesis, i.e., the less frequent a relation is in the training data, the poorer the performance on that relation. We then explore how to increase the number of positive training instances, without resorting to manually creating additional labeled data. We propose a training data enrichment framework that relies on co-training of two different discourse parsers on unlabeled documents. Importantly, we show that co-training alone is not sufficient. The framework requires a filtering step to ensure that only “good quality” unlabeled documents can be used for enrichment and re-training. We propose and evaluate two ways to perform the filtering. The first is to use an agreement score between the two parsers. The second is to use only the confidence score of the faster parser. Our empirical results show that agreement score can help to boost the performance on infrequent relations, and that the confidence score is a viable approximation of the agreement score for infrequent relations.
Science, Faculty of
Computer Science, Department of
Graduate

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Masko, David, and Paulina Hensman. "The Impact of Imbalanced Training Data for Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166451.

Повний текст джерела

Анотація:

This thesis empirically studies the impact of imbalanced training data on Convolutional Neural Network (CNN) performance in image classification. Images from the CIFAR-10 dataset, a set containing 60 000 images of 10 different classes, are used to create training sets with different distributions between the classes. For example, some sets contain a disproportionately large amount of images of one class, and others contain very few images of one class. These training sets are used to train a CNN, and the networks’ classification performance is measured for each training set. The results show that imbalanced training data can potentially have a severely negative impact on overall performance in CNN, and that balanced training data yields the best results. Following this, oversampling is used on the imbalanced training sets to increase the performances to that of the balanced set. It is concluded that oversampling is a viable way to counter the impact of imbalances in the training data.
Detta kandidatexamensarbete utför en empirisk studie av den påverkan ojämnt fördelad träningsdata har på bildklassificeringsresultat för Convolutional Neural Networks(CNN). Bilder från datamängden CIFAR-10, bestående av 60 000 bilder fördelade mellan 10 klasser, används för att skapa träningsdatamängder med olika fördelningar mellan klasserna. Exempelvis innehåller vissa mängder oproportioneligt många bilder av en klass, medan andra innehåller väldigt få bilder av en klass. Dessa datamängder används för att träna ett CNN, och nätverkets klassificeringsresultat noteras för varje datamängd. Resultaten visar att ojämt fördelad träningsdata kan ha en markant negativ påverkan på de genomsnittliga resultaten för CNN, och att balanserad träningsdata ger bäst resultat. Oversampling används på de ojämnt fördeladade träningsdatamängderna vilket resulterar i samma resultat som för den balanserade träningsdatamängden. Detta visar att oversampling är ett gångbart sätt att motverka effekterna av ojämnt fördelad träningsdata.

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Whitworth, Timothy. "Channel estimation, data detection and training design for wireless communications." Thesis, University of Leeds, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.493283.

Повний текст джерела

Анотація:

With modern wireless communications seeking to attain higher data rates, possibly over time-varying channels, there is an urgent need for fast and efficient techniques for accurate channel estimation and data detection. This thesis proposes a number of novel solutions to the aforementioned problem.

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Bardolet, Pettersson Susana. "Managing imbalanced training data by sequential segmentation in machine learning." Thesis, Linköpings universitet, Avdelningen för medicinsk teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-155091.

Повний текст джерела

Анотація:

Imbalanced training data is a common problem in machine learning applications. Thisproblem refers to datasets in which the foreground pixels are significantly fewer thanthe background pixels. By training a machine learning model with imbalanced data, theresult is typically a model that classifies all pixels as the background class. A result thatindicates no presence of a specific condition when it is actually present is particularlyundesired in medical imaging applications. This project proposes a sequential system oftwo fully convolutional neural networks to tackle the problem. Semantic segmentation oflung nodules in thoracic computed tomography images has been performed to evaluate theperformance of the system. The imbalanced data problem is present in the training datasetused in this project, where the average percentage of pixels belonging to the foregroundclass is 0.0038 %. The sequential system achieved a sensitivity of 83.1 % representing anincrease of 34 % compared to the single system. The system only missed 16.83% of thenodules but had a Dice score of 21.6 % due to the detection of multiple false positives. Thismethod shows considerable potential to be a solution to the imbalanced data problem withcontinued development.

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Girerd, Daniel. "Strategic Selection of Training Data for Domain-Specific Speech Recognition." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1847.

Повний текст джерела

Анотація:

Speech recognition is now a key topic in computer science with the proliferation of voice-activated assistants, and voice-enabled devices. Many companies over a speech recognition service for developers to use to enable smart devices and services. These speech-to-text systems, however, have significant room for improvement, especially in domain specific speech. IBM's Watson speech-to-text service attempts to support domain specific uses by allowing users to upload their own training data for making custom models that augment Watson's general model. This requires deciding a strategy for picking the training model. This thesis experiments with different training choices for custom language models that augment Watson's speech to text service. The results show that using recent utterances is the best choice of training data in our use case of Digital Democracy. We are able to improve speech recognition accuracy by 2.3% percent over the control with no custom model. However, choosing training utterances most specific to the use case is better when large enough volumes of such training data is available.

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Sundin, Hannes, and Jakob Josefsson. "Evaluating synthetic training data for character recognition in natural images." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280292.

Повний текст джерела

Анотація:

This thesis is centered around character recognition in natural images. More specifically, evaluating the use of synthetic font images for training a Convolutional Neural Network (CNN), compared to natural training data. Training a CNN to recognize characters in natural images often demands a large amount of labeled data. One alternative is to instead generate synthetic data by using digital fonts. A total of 41,664 font images were generated, which in combination with already existing data yielded around 99,000 images. Using this synthetic dataset, the CNN was trained by incrementally increasing synthetic training data and tested on natural images. At the same time, different preprocessing methods were applied to the synthetic data in order to observe the effect on accuracy. Results show that even when using the best performing pre-processing method and having access to 99,000 synthetic training images, a smaller set of natural training data yielded better results. However, results also show that synthetic data can perform better than natural data, provided that a good preprocessing method is used and if the supply of natural images is limited.
I det här kandidatexamensarbetet behandlas bokstavigenkänning i naturliga bilder. Mer specifikt jämförs syntetiska typsnittsbilder med naturliga bilder för träning av ett Convolutional Neural Network (CNN). Att träna ett CNN för att känna igen bokstäver i naturliga bilder kräver oftast mycket betecknad naturlig data. Ett alternativ till detta är att producera syntetisk träningsdata i form av typsnittsbilder. I denna studie skapades 41664 typsnittsbilder, vilket i kombination med existerande data gav oss omkring 99 tusen syntetiska träningsbilder. Därefter tränades ett CNN med typsnittsbilder i ökande mängd för att sedan testas på naturliga bilder av bokstäver. Resultatet av detta jämfördes sedan med resultatet av att träna med naturliga bilder. Dessutom experimenterades med olika förbehandlingsmetoder för att observera förbehandlingens påverkan på klassifikationsgraden. Resultaten visade att även med den förbehandlingsmetoden som gav bäst resultat och med mycket mer data, var träning med syntetiska bilder inte lika effektivt som med naturliga bilder. Dock så visades det att med en bra förbehandlingsmetod kan syntetiska bilder ersätta naturliga bilder, givet att tillgången till naturliga bilder är begränsat.

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Collin, Sofie. "Synthetic Data for Training and Evaluation of Critical Traffic Scenarios." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177779.

Повний текст джерела

Анотація:

Modern camera-based vehicle safety systems heavily rely on machine learning and consequently require large amounts of training data to perform reliably. However, collecting and annotating the needed data is an extremely expensive and time-consuming process. In addition, it is exceptionally difficult to collect data that covers critical scenarios. This thesis investigates to what extent synthetic data can replace real-world data for these scenarios. Since only a limited amount of data consisting of such real-world scenarios is available, this thesis instead makes use of proxy scenarios, e.g. situations when pedestrians are located closely in front of the vehicle (for example at a crosswalk). The presented approach involves training a detector on real-world data where all samples of these proxy scenarios have been removed and compare it to other detectors trained on data where the removed samples have been replaced with various degrees of synthetic data. A method for generating and automatically and accurately annotating synthetic data, using features in the CARLA simulator, is presented. Also, the domain gap between the synthetic and real-world data is analyzed and methods in domain adaptation and data augmentation are reviewed. The presented experiments show that aligning statistical properties between the synthetic and real-world datasets distinctly mitigates the domain gap. There are also clear indications that synthetic data can help detect pedestrians in critical traffic situations

Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Pfaff, Lee. "The effect of training on individuals' interactions with visual data." Thesis, Boston University, 2013. https://hdl.handle.net/2144/12186.

Повний текст джерела

Анотація:

Thesis (M.A.)--Boston University
Introduction: Traditionally, students demonstrate their learning via testing and demonstrations but little is known about how learners’ interaction with information changes during and after training. Previous studies have shown the difference between naive and expert individuals’ interactions with an image but never in the same individuals before and after the educational process. Our lab’s goal is to explore this question using gaze tracking and quantitative measures. This will be done by looking at 3 specific variables: entry time, number of visits and fraction of viewing time. Hypotheses: We test 3 main hypotheses. (1) The trained group will attend to educationally salient features more than the non-trained group, after the training. (2) The non-trained group will attend to visually salient features more than the trained group after training. (3) Training will cause the trained group to attend more to educationally salient features after then training, when compared to base, while the non-trained group will have no change. [TRUNCATED]

Стилі APA, Harvard, Vancouver, ISO та ін.

47

Gaddis, Margaret L. "Training Citizen Scientists for Data Reliability| A Multiple Case Study to Identify Themes in Current Training Initiatives." Thesis, The University of the Rockies, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=13423764.

Повний текст джерела

Анотація:

This dissertation characterized trainings designed to prepare citizen scientists to collect ecological data in natural outdoor settings. Citizen scientists are volunteers who participate in scientific activities under the guidance of professional scientists and organizations. The work of citizen scientists greatly expands the data collection possibilities in natural resource management and increases science literacy among participants and their social communities. The general problem is that some scientists and land managers view the data collected by citizen scientists as unreliable. The specific problem is the absence of educational training measurement in citizen science program design and analysis with which to ascertain the learning gains of trained citizen scientists.

Through a sequenced methodology of data analysis, survey, and semi-structured interviews, deductive descriptors and codes guided a directed content analysis of data collected. The analysis indicated strong alignment between citizen science, andragogy, and social learning theory. The sample revealed a bimodal distribution related to the type of data collected and the subsequent training design. Little training existed when data collection involved photography only. Citizen scientists brought prior skills to the task but did not need to gain new procedural learning to complete their data collection task. When citizen scientists collected more complex measurements, classroom and field mentoring facilitated learning.

Citizen science leaders described their perception of the reliability of their citizen scientists’ data collection efforts. Computer technologies validated photo and water quality data. Therefore, quantitative data analysis supported the perception of data reliability. Terrestrial data had a range of reliability qualifications including video and paper quizzing, field observation of methods implemented, periodic data checks, and follow-up mentoring when data quality was poor. Managers of terrestrial citizen science programs were confident in the reliability of the data for the land management, policy, and research applications required.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

Spreyer, Kathrin. "Does it have to be trees? : Data-driven dependency parsing with incomplete and noisy training data." Phd thesis, Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2012/5749/.

Повний текст джерела

Анотація:

We present a novel approach to training data-driven dependency parsers on incomplete annotations. Our parsers are simple modifications of two well-known dependency parsers, the transition-based Malt parser and the graph-based MST parser. While previous work on parsing with incomplete data has typically couched the task in frameworks of unsupervised or semi-supervised machine learning, we essentially treat it as a supervised problem. In particular, we propose what we call agnostic parsers which hide all fragmentation in the training data from their supervised components. We present experimental results with training data that was obtained by means of annotation projection. Annotation projection is a resource-lean technique which allows us to transfer annotations from one language to another within a parallel corpus. However, the output tends to be noisy and incomplete due to cross-lingual non-parallelism and error-prone word alignments. This makes the projected annotations a suitable test bed for our fragment parsers. Our results show that (i) dependency parsers trained on large amounts of projected annotations achieve higher accuracy than the direct projections, and that (ii) our agnostic fragment parsers perform roughly on a par with the original parsers which are trained only on strictly filtered, complete trees. Finally, (iii) when our fragment parsers are trained on artificially fragmented but otherwise gold standard dependencies, the performance loss is moderate even with up to 50% of all edges removed.
Wir präsentieren eine neuartige Herangehensweise an das Trainieren von daten-gesteuerten Dependenzparsern auf unvollständigen Annotationen. Unsere Parser sind einfache Varianten von zwei bekannten Dependenzparsern, nämlich des transitions-basierten Malt-Parsers sowie des graph-basierten MST-Parsers. Während frühere Arbeiten zum Parsing mit unvollständigen Daten die Aufgabe meist in Frameworks für unüberwachtes oder schwach überwachtes maschinelles Lernen gebettet haben, behandeln wir sie im Wesentlichen mit überwachten Lernverfahren. Insbesondere schlagen wir "agnostische" Parser vor, die jegliche Fragmentierung der Trainingsdaten vor ihren daten-gesteuerten Lernkomponenten verbergen. Wir stellen Versuchsergebnisse mit Trainingsdaten vor, die mithilfe von Annotationsprojektion gewonnen wurden. Annotationsprojektion ist ein Verfahren, das es uns erlaubt, innerhalb eines Parallelkorpus Annotationen von einer Sprache auf eine andere zu übertragen. Bedingt durch begrenzten crosslingualen Parallelismus und fehleranfällige Wortalinierung ist die Ausgabe des Projektionsschrittes jedoch üblicherweise verrauscht und unvollständig. Gerade dies macht projizierte Annotationen zu einer angemessenen Testumgebung für unsere fragment-fähigen Parser. Unsere Ergebnisse belegen, dass (i) Dependenzparser, die auf großen Mengen von projizierten Annotationen trainiert wurden, größere Genauigkeit erzielen als die zugrundeliegenden direkten Projektionen, und dass (ii) die Genauigkeit unserer agnostischen, fragment-fähigen Parser der Genauigkeit der Originalparser (trainiert auf streng gefilterten, komplett projizierten Bäumen) annähernd gleichgestellt ist. Schließlich zeigen wir mit künstlich fragmentierten Gold-Standard-Daten, dass (iii) der Verlust an Genauigkeit selbst dann bescheiden bleibt, wenn bis zu 50% aller Kanten in den Trainingsdaten fehlen.

Стилі APA, Harvard, Vancouver, ISO та ін.

49

Säfdal, Joakim. "Data-Driven Engine Fault Classification and Severity Estimation Using Interpolated Fault Modes from Limited Training Data." Thesis, Linköpings universitet, Fordonssystem, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173916.

Повний текст джерела

Анотація:

Today modern vehicles are expected to be safe, environmentally friendly, durable and economical. Monitoring the health of the vehicle is therefore more important than ever. As the complexity of vehicular systems increases the need for efficient monitoring methods has increased as well. Traditional methods of deriving models for the systems are today not as efficient as the complexity of the systems increases the time and skill needed to implement the models. An alternative is data driven methods where a collection of data associated with the behavior of the system is used to draw conclusions of the state of the system. Faults are however rare events and collecting sufficient data to cover all possible faults threatening a vehicle would be impossible. A method for drawing conclusions from limited historical data would therefore be desirable. In this thesis an algorithm using distiguishability as a method for fault classification and fault severity estimation is proposed. Historical data is interpolated over a fault severity vector using Gaussian process regression as a way to estimate fault modes for unknown fault sizes. The algorithm is then tested against validation data to evaluate the ability to detect and identify known fault classes and fault serveries, separate unknown fault classes from known fault classes, and estimate unknown fault sizes. The purpose of the study is to evaluate the possibility to use limited historical data to reduce the need for costly and time consuming data collection. The study shows promising results as fault class identification and fault size estimation using the proposed algorithm seem possible for fault sizes not included in the historical data.

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Wang, Zhuoyu. "Bias from a missing covariate in the analysis of diagnostic test data in the absence of a gold-standard." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=119705.

Повний текст джерела

Анотація:

Covariates that influence the sensitivity and/or specificity of different diagnostic tests can create correlations between these tests, conditional on disease status. Thus, ignoring such covariates in a latent class analysis of imperfect tests would amount to ignoring conditional dependence, potentially leading to biased estimates of the prevalence of the condition under study and the accuracies of the tests. In the case of a dichotomous covariate affecting two imperfect tests, we derive an expression showing that the conditional covariance is a function of the product of the change in test sensitivity (or specificity) within subgroups defined by the covariate. For a uniformly or normally distributed continuous covariate, similar results are obtained numerically. Using series of simulated datasets, we study whether in the absence of covariate, unbiased estimates may be obtained by fitting a latent class model that allows for conditional dependence. We found that bias induced by ignoring the dependence and using a conditional independence model is not large in most cases. In cases where bias is present, a conditional dependence model, which places no constraints on the covariance between the tests, works well in adjusting for all three types of missing covariates. Our methods are applied to diagnostic testing data for the detection of tuberculosis which varies by the covariate HIV status.
Les covariables qui influencent la sensibilité et/ou la spécificité des différents tests de diagnostic peuvent créer des corrélations entre ces tests, conditionnellement à l'état de la maladie. Ainsi, en ignorant ces variables dans une analyse de classe latente de tests imparfaits, on en reviendrait à ignorer la dépendance conditionnelle pouvant conduire à des estimations biaisées de la prévalence de la condition sous étude ainsi qu'à la précision des tests. Dans le cas d'une covariable dichotomique affectant deux essais imparfaits, nous dérivons une expression qui montre que la covariance conditionnelle est une fonction du produit de la variation de la sensibilité du test (ou de la spécificité) dans les sous-groupes définis par la covariable. Pour une covariable continue distribuée uniformément ou normalement des résultats similaires sont obtenus numériquement. En utilisant des séries de données simulées, nous étudions si, avec l'absence de covariable, des estimations impartiales peuvent être obtenues en ajustant un modèle de classe latente permettant la dépendance conditionnelle. Nous avons constaté, en ignorant la dépendance et en utilisant un modèle d'indépendance conditionnelle, que le biais induit n'est pas grand dans la plupart des cas. Dans les cas où le biais est présent, un modèle de dépendance conditionnelle qui n'impose pas de contraintes sur la covariance entre les tests fonctionne bien en ajustant tous les trois types de variables manquantes. Nos méthodes sont appliquées aux données des tests diagnostiques pour le dépistage de la tuberculose qui varient en fonction du statut de la covariable du VIH.

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Absence of training data"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями