To see the other types of publications on this topic, follow the link: Prediction models.

Dissertations / Theses on the topic 'Prediction models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Prediction models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Haider, Peter. "Prediction with Mixture Models." Phd thesis, Universität Potsdam, 2013. http://opus.kobv.de/ubp/volltexte/2014/6961/.

Full text
Abstract:
Learning a model for the relationship between the attributes and the annotated labels of data examples serves two purposes. Firstly, it enables the prediction of the label for examples without annotation. Secondly, the parameters of the model can provide useful insights into the structure of the data. If the data has an inherent partitioned structure, it is natural to mirror this structure in the model. Such mixture models predict by combining the individual predictions generated by the mixture components which correspond to the partitions in the data. Often the partitioned structure is latent, and has to be inferred when learning the mixture model. Directly evaluating the accuracy of the inferred partition structure is, in many cases, impossible because the ground truth cannot be obtained for comparison. However it can be assessed indirectly by measuring the prediction accuracy of the mixture model that arises from it. This thesis addresses the interplay between the improvement of predictive accuracy by uncovering latent cluster structure in data, and further addresses the validation of the estimated structure by measuring the accuracy of the resulting predictive model. In the application of filtering unsolicited emails, the emails in the training set are latently clustered into advertisement campaigns. Uncovering this latent structure allows filtering of future emails with very low false positive rates. In order to model the cluster structure, a Bayesian clustering model for dependent binary features is developed in this thesis. Knowing the clustering of emails into campaigns can also aid in uncovering which emails have been sent on behalf of the same network of captured hosts, so-called botnets. This association of emails to networks is another layer of latent clustering. Uncovering this latent structure allows service providers to further increase the accuracy of email filtering and to effectively defend against distributed denial-of-service attacks. To this end, a discriminative clustering model is derived in this thesis that is based on the graph of observed emails. The partitionings inferred using this model are evaluated through their capacity to predict the campaigns of new emails. Furthermore, when classifying the content of emails, statistical information about the sending server can be valuable. Learning a model that is able to make use of it requires training data that includes server statistics. In order to also use training data where the server statistics are missing, a model that is a mixture over potentially all substitutions thereof is developed. Another application is to predict the navigation behavior of the users of a website. Here, there is no a priori partitioning of the users into clusters, but to understand different usage scenarios and design different layouts for them, imposing a partitioning is necessary. The presented approach simultaneously optimizes the discriminative as well as the predictive power of the clusters. Each model is evaluated on real-world data and compared to baseline methods. The results show that explicitly modeling the assumptions about the latent cluster structure leads to improved predictions compared to the baselines. It is beneficial to incorporate a small number of hyperparameters that can be tuned to yield the best predictions in cases where the prediction accuracy can not be optimized directly.
Das Lernen eines Modells für den Zusammenhang zwischen den Eingabeattributen und annotierten Zielattributen von Dateninstanzen dient zwei Zwecken. Einerseits ermöglicht es die Vorhersage des Zielattributs für Instanzen ohne Annotation. Andererseits können die Parameter des Modells nützliche Einsichten in die Struktur der Daten liefern. Wenn die Daten eine inhärente Partitionsstruktur besitzen, ist es natürlich, diese Struktur im Modell widerzuspiegeln. Solche Mischmodelle generieren Vorhersagen, indem sie die individuellen Vorhersagen der Mischkomponenten, welche mit den Partitionen der Daten korrespondieren, kombinieren. Oft ist die Partitionsstruktur latent und muss beim Lernen des Mischmodells mitinferiert werden. Eine direkte Evaluierung der Genauigkeit der inferierten Partitionsstruktur ist in vielen Fällen unmöglich, weil keine wahren Referenzdaten zum Vergleich herangezogen werden können. Jedoch kann man sie indirekt einschätzen, indem man die Vorhersagegenauigkeit des darauf basierenden Mischmodells misst. Diese Arbeit beschäftigt sich mit dem Zusammenspiel zwischen der Verbesserung der Vorhersagegenauigkeit durch das Aufdecken latenter Partitionierungen in Daten, und der Bewertung der geschätzen Struktur durch das Messen der Genauigkeit des resultierenden Vorhersagemodells. Bei der Anwendung des Filterns unerwünschter E-Mails sind die E-Mails in der Trainingsmende latent in Werbekampagnen partitioniert. Das Aufdecken dieser latenten Struktur erlaubt das Filtern zukünftiger E-Mails mit sehr niedrigen Falsch-Positiv-Raten. In dieser Arbeit wird ein Bayes'sches Partitionierunsmodell entwickelt, um diese Partitionierungsstruktur zu modellieren. Das Wissen über die Partitionierung von E-Mails in Kampagnen hilft auch dabei herauszufinden, welche E-Mails auf Veranlassen des selben Netzes von infiltrierten Rechnern, sogenannten Botnetzen, verschickt wurden. Dies ist eine weitere Schicht latenter Partitionierung. Diese latente Struktur aufzudecken erlaubt es, die Genauigkeit von E-Mail-Filtern zu erhöhen und sich effektiv gegen verteilte Denial-of-Service-Angriffe zu verteidigen. Zu diesem Zweck wird in dieser Arbeit ein diskriminatives Partitionierungsmodell hergeleitet, welches auf dem Graphen der beobachteten E-Mails basiert. Die mit diesem Modell inferierten Partitionierungen werden via ihrer Leistungsfähigkeit bei der Vorhersage der Kampagnen neuer E-Mails evaluiert. Weiterhin kann bei der Klassifikation des Inhalts einer E-Mail statistische Information über den sendenden Server wertvoll sein. Ein Modell zu lernen das diese Informationen nutzen kann erfordert Trainingsdaten, die Serverstatistiken enthalten. Um zusätzlich Trainingsdaten benutzen zu können, bei denen die Serverstatistiken fehlen, wird ein Modell entwickelt, das eine Mischung über potentiell alle Einsetzungen davon ist. Eine weitere Anwendung ist die Vorhersage des Navigationsverhaltens von Benutzern einer Webseite. Hier gibt es nicht a priori eine Partitionierung der Benutzer. Jedoch ist es notwendig, eine Partitionierung zu erzeugen, um verschiedene Nutzungsszenarien zu verstehen und verschiedene Layouts dafür zu entwerfen. Der vorgestellte Ansatz optimiert gleichzeitig die Fähigkeiten des Modells, sowohl die beste Partition zu bestimmen als auch mittels dieser Partition Vorhersagen über das Verhalten zu generieren. Jedes Modell wird auf realen Daten evaluiert und mit Referenzmethoden verglichen. Die Ergebnisse zeigen, dass das explizite Modellieren der Annahmen über die latente Partitionierungsstruktur zu verbesserten Vorhersagen führt. In den Fällen bei denen die Vorhersagegenauigkeit nicht direkt optimiert werden kann, erweist sich die Hinzunahme einer kleinen Anzahl von übergeordneten, direkt einstellbaren Parametern als nützlich.
APA, Harvard, Vancouver, ISO, and other styles
2

Vaidyanathan, Sivaranjani. "Bayesian Models for Computer Model Calibration and Prediction." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1435527468.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Charraud, Jocelyn, and Saez Adrian Garcia. "Bankruptcy prediction models on Swedish companies." Thesis, Umeå universitet, Företagsekonomi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-185143.

Full text
Abstract:
Bankruptcies have been a sensitive topic all around the world for over 50 years. From their research, the authors have found that only a few bankruptcy studies have been conducted in Sweden and even less on the topic of bankruptcy prediction models. This thesis investigates the performance of the Altman, Ohlson and Zmijewski bankruptcy prediction models. This research investigates all Swedish companies during the years 2017 and 2018.  This study has the intention to shed light on some of the most famous bankruptcy prediction models. It is interesting to explore the predictive abilities and usability of those three models in Sweden. The second purpose of this study is to create two models from the most significant variable out of the three models studied and to test its prediction power with the aim to create two models designed for Swedish companies.  We identified a research gap in terms of Sweden, where bankruptcy prediction models have been rather unexplored and especially with those three models. Furthermore, we have identified a second research gap regarding the time period of the research. Only a few studies have been conducted on the topic of bankruptcy prediction models post the financial crisis of 2007/08.  We have conducted a quantitative study in order to achieve the purpose of the study. The data used was secondary data gathered from the Serrano database. This research followed an abductive approach with a positive paradigm. This research has studied all active Swedish companies between the years 2017 and 2018. Finally, this contributed to the current field of knowledge on the topic through the analysis of the results of the models on Swedish companies, using the liquidity theory, solvency and insolvency theory, the pecking order theory, the profitability theory, the cash flow theory, and the contagion effect. The results aligned with the liquidity theory, the solvency and insolvency theory and the profitability theory. Moreover, from this research we have found that the Altman model has the lowest performance out of the three models, followed by the Ohlson model that shows some mixed results depending on the statistical analysis. Lastly, the Zmijewski model has the best performance out of the three models. Regarding the performance and the prediction power of the two new models were significantly higher than the three models studied.
APA, Harvard, Vancouver, ISO, and other styles
4

Rice, Nigel. "Multivariate prediction models in medicine." Thesis, Keele University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.314647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Brefeld, Ulf. "Semi-supervised structured prediction models." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2008. http://dx.doi.org/10.18452/15748.

Full text
Abstract:
Das Lernen aus strukturierten Eingabe- und Ausgabebeispielen ist die Grundlage für die automatisierte Verarbeitung natürlich auftretender Problemstellungen und eine Herausforderung für das Maschinelle Lernen. Die Einordnung von Objekten in eine Klassentaxonomie, die Eigennamenerkennung und das Parsen natürlicher Sprache sind mögliche Anwendungen. Klassische Verfahren scheitern an der komplexen Natur der Daten, da sie die multiplen Abhängigkeiten und Strukturen nicht erfassen können. Zudem ist die Erhebung von klassifizierten Beispielen in strukturierten Anwendungsgebieten aufwändig und ressourcenintensiv, während unklassifizierte Beispiele günstig und frei verfügbar sind. Diese Arbeit thematisiert halbüberwachte, diskriminative Vorhersagemodelle für strukturierte Daten. Ausgehend von klassischen halbüberwachten Verfahren werden die zugrundeliegenden analytischen Techniken und Algorithmen auf das Lernen mit strukturierten Variablen übertragen. Die untersuchten Verfahren basieren auf unterschiedlichen Prinzipien und Annahmen, wie zum Beispiel der Konsensmaximierung mehrerer Hypothesen im Lernen aus mehreren Sichten, oder der räumlichen Struktur der Daten im transduktiven Lernen. Desweiteren wird in einer Fallstudie zur Email-Batcherkennung die räumliche Struktur der Daten ausgenutzt und eine Lösung präsentiert, die der sequenziellen Natur der Daten gerecht wird. Aus den theoretischen Überlegungen werden halbüberwachte, strukturierte Vorhersagemodelle und effiziente Optmierungsstrategien abgeleitet. Die empirische Evaluierung umfasst Klassifikationsprobleme, Eigennamenerkennung und das Parsen natürlicher Sprache. Es zeigt sich, dass die halbüberwachten Methoden in vielen Anwendungen zu signifikant kleineren Fehlerraten führen als vollständig überwachte Baselineverfahren.
Learning mappings between arbitrary structured input and output variables is a fundamental problem in machine learning. It covers many natural learning tasks and challenges the standard model of learning a mapping from independently drawn instances to a small set of labels. Potential applications include classification with a class taxonomy, named entity recognition, and natural language parsing. In these structured domains, labeled training instances are generally expensive to obtain while unlabeled inputs are readily available and inexpensive. This thesis deals with semi-supervised learning of discriminative models for structured output variables. The analytical techniques and algorithms of classical semi-supervised learning are lifted to the structured setting. Several approaches based on different assumptions of the data are presented. Co-learning, for instance, maximizes the agreement among multiple hypotheses while transductive approaches rely on an implicit cluster assumption. Furthermore, in the framework of this dissertation, a case study on email batch detection in message streams is presented. The involved tasks exhibit an inherent cluster structure and the presented solution exploits the streaming nature of the data. The different approaches are developed into semi-supervised structured prediction models and efficient optimization strategies thereof are presented. The novel algorithms generalize state-of-the-art approaches in structural learning such as structural support vector machines. Empirical results show that the semi-supervised algorithms lead to significantly lower error rates than their fully supervised counterparts in many application areas, including multi-class classification, named entity recognition, and natural language parsing.
APA, Harvard, Vancouver, ISO, and other styles
6

Asterios, Geroukis. "Prediction of Linear Models: Application of Jackknife Model Averaging." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-297671.

Full text
Abstract:
When using linear models, a common practice is to find the single best model fit used in predictions. This on the other hand can cause potential problems such as misspecification and sometimes even wrong models due to spurious regression. Another method of predicting models introduced in this study as Jackknife Model Averaging developed by Hansen & Racine (2012). This assigns weights to all possible models one could use and allows the data to have heteroscedastic errors. This model averaging estimator is compared to the Mallows’s Model Averaging (Hansen, 2007) and model selection by Bayesian Information Criterion and Mallows’s Cp. The results show that the Jackknife Model Averaging technique gives less prediction errors compared to the other methods of model prediction. This study concludes that the Jackknife Model Averaging technique might be a useful choice when predicting data.
APA, Harvard, Vancouver, ISO, and other styles
7

Shrestha, Rakshya. "Deep soil mixing and predictive neural network models for strength prediction." Thesis, University of Cambridge, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.607735.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Grant, Stuart William. "Risk prediction models in cardiovascular surgery." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/risk-prediction-models-in-cardiovascular-surgery(1befbc5d-2aa6-4d24-8c32-e635cf55e339).html.

Full text
Abstract:
Objectives: Cardiovascular disease is the leading cause of mortality and morbidity in the developed world. Surgery can improve prognosis and relieve symptoms. Risk prediction models are increasingly being used to inform clinicians and patients about the risks of surgery, to facilitate clinical decision making and for the risk-adjustment of surgical outcome data. The importance of risk prediction models in cardiovascular surgery has been highlighted by the publication of cardiovascular surgery outcome data and the need for risk-adjustment. The overall objective of this thesis is to advance risk prediction modelling in cardiovascular surgery with a focus on the development of models for elective AAA repair and assessment of models for cardiac surgery. Methods: Three large clinical databases (two elective AAA repair and one cardiac surgery) were utilised. Each database was cleaned prior to analysis. Logistic regression was used to develop both regional and national risk prediction models for mortality following elective AAA repair. A regional model to identify the risk of developing renal failure following elective AAA repair was also developed. The performance of a widely used cardiac surgery risk prediction model (the logistic EuroSCORE) over time was evaluated using a national cardiac database. In addition an updated model version (EuroSCORE II) was validated and both models’ performance in emergency cardiac surgery was evaluated. Results: Regional risk models for mortality following elective AAA repair (VGNW model) and a model to predict post-operative renal failure were developed. Validation of the model for mortality using a national dataset demonstrated good performance compared to other available risk models. To improve generalisability a national model (the BAR score) with better discriminatory ability was developed. In a prospective validation of both models using regional data, the BAR score demonstrated excellent discrimination overall and good discrimination in procedural sub-groups. The EuroSCORE was found to have lost calibration over time due to a fall in observed mortality despite an increase in the predicted mortality of patients undergoing cardiac surgery. The EuroSCORE II demonstrated good performance for contemporary cardiac surgery. Both EuroSCORE models demonstrated inadequate performance for emergency cardiac surgery. Conclusions: Risk prediction models play an important role in cardiovascular surgery. Two accurate risk prediction models for mortality following elective AAA repair have been developed and can be used to risk-adjust surgical outcomes and facilitate clinical decision making. As surgical practice changes over time risk prediction models may lose accuracy which has implications for their application. Cardiac risk models may not be sufficiently accurate for high-risk patient groups such as those undergoing emergency surgery and specific emergency models may be required. Continuing research into new risk factors and model outcomes is needed and risk prediction models may play an increasing role in clinical decision making in the future.
APA, Harvard, Vancouver, ISO, and other styles
9

Jones, Margaret. "Point prediction in survival time models." Thesis, University of Newcastle Upon Tyne, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.340616.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Monsch, Matthieu (Matthieu Frederic). "Large scale prediction models and algorithms." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/84398.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Operations Research Center, 2013.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 129-132).
Over 90% of the data available across the world has been produced over the last two years, and the trend is increasing. It has therefore become paramount to develop algorithms which are able to scale to very high dimensions. In this thesis we are interested in showing how we can use structural properties of a given problem to come up with models applicable in practice, while keeping most of the value of a large data set. Our first application provides a provably near-optimal pricing strategy under large-scale competition, and our second focuses on capturing the interactions between extreme weather and damage to the power grid from large historical logs. The first part of this thesis is focused on modeling competition in Revenue Management (RM) problems. RM is used extensively across a swathe of industries, ranging from airlines to the hospitality industry to retail, and the internet has, by reducing search costs for customers, potentially added a new challenge to the design and practice of RM strategies: accounting for competition. This work considers a novel approach to dynamic pricing in the face of competition that is intuitive, tractable and leads to asymptotically optimal equilibria. We also provide empirical support for the notion of equilibrium we posit. The second part of this thesis was done in collaboration with a utility company in the North East of the United States. In recent years, there has been a number of powerful storms that led to extensive power outages. We provide a unified framework to help power companies reduce the duration of such outages. We first train a data driven model to predict the extent and location of damage from weather forecasts. This information is then used in a robust optimization model to optimally dispatch repair crews ahead of time. Finally, we build an algorithm that uses incoming customer calls to compute the likelihood of damage at any point in the electrical network.
by Matthieu Monsch.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
11

Wanigasekara, Prashan. "Latent state space models for prediction." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106269.

Full text
Abstract:
Thesis: S.M. in Engineering and Management, Massachusetts Institute of Technology, School of Engineering, System Design and Management Program, Engineering and Management Program, 2016.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 95-98).
In this thesis, I explore a novel algorithm to model the joint behavior of multiple correlated signals. Our chosen example is the ECG (Electrocardiogram) and ABP (Arterial Blood Pressure) signals from patients in the ICU (Intensive Care Unit). I then use the generated models to predict blood pressure levels of ICU patients based on their historical ECG and ABP signals. The algorithm used is a variant of a Hidden Markov model. The new extension is termed as the Latent State Space Copula Model. In the novel Latent State Space Copula Modelthe ECG, ABP signals are considered to be correlated and are modeled using a bivariate Gaussian copula with Weibull marginals generated by a hidden state. We assume that there are hidden patient "states" that transition from one hidden state to another driving a joint ECG-ABP behavior. We estimate the parameters of the model using a novel Gibbs sampling approach. Using this model, we generate predictors that are the state probabilities at any given time step and use them to predict a patient's future health condition. The predictions made by the model are binary and detects whether the Mean arterial pressure(MAP) is going to be above or below a certain threshold at a future time step. Towards the end of the thesis I do a comparison between the new Latent State Space Copula Model and a state of the art Classical Discrete HMM. The Latent State Space Copula Model achieves an Area Under the ROC (AUROC) curve of .7917 for 5 states while the Classical Discrete HMM achieves an AUROC of .7609 for 5 states.
by Prashan Wanigasekara.
S.M. in Engineering and Management
APA, Harvard, Vancouver, ISO, and other styles
12

Smith, Christopher P. "Surrogate models for aerodynamic performance prediction." Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/808464/.

Full text
Abstract:
Automatic optimisers can play a vital role in the design and development of engineering systems and processes. However, a lack of available data to guide the search can result in the global optimum solution never being found. Surrogate models can be used to address this lack of data and allow more of the design space to be explored, as well as provide an overall computational saving. In this thesis I have developed two novel long-term prediction methods that investigate the use of ensembles of surrogates to perform predictions of aerodynamic data. The models are built using intermediate computational fluid dynamic convergence data. The first method relies on a gradient based learning algorithm to optimise the base learners and the second utilises a hybrid multi-objective evolutionary algorithm. Different selection schemes are investigated to improve the prediction performance and the accuracy of the ensembles are compared to the converged data, as well as to the delta change between flow conditions. Three challenging real world aerodynamic data sets have been used to test the developed algorithms and insights into aerodynamic performance has been gained through analysis of the computational fluid dynamic convergence histories. The trends of the design space can be maintained, as well as achieving suitable overall prediction accuracy. Selecting a subset improves ensemble performance, but no selection method is superior to any others. The hybrid multi-objective evolutionary algorithm approach is also tested on two standard time series prediction tasks and the results presented are competitive with others reported in the literature. In addition, a novel technique that improves a parameter based surrogates learning through the transfer of additional information is also investigated to address the lack of data. Transfer learning has an initial impact on the learning rate of the surrogate, but negative transfer is observed with increasing numbers of epochs. Using the data available for the low dimensional problems, it is shown that the convergence prediction results are comparable to those from the parameter based surrogate. Therefore, the convergence prediction method could be used as a surrogate and form part of an aerodynamic optimisation task. However, there are a number of open questions that need to be addressed, including what is the best use of the surrogate during the search?
APA, Harvard, Vancouver, ISO, and other styles
13

Nsolo, Edward. "Prediction models for soccer sports analytics." Thesis, Linköpings universitet, Databas och informationsteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-149033.

Full text
Abstract:
In recent times there has been a substantial increase in research interest of soccer due to an increase of availability of soccer statistics data. With the help of data provider firms, access to historical soccer data becomes more simple and as a result data scientists started researching in the field. In this thesis, we develop prediction models that could be applied by data scientists and other soccer stakeholders. As a case study, we run several machine learning algorithms on historical data from five major European leagues and make a comparison. The study is built upon the idea of investigating different approaches that could be used to simplify the models while maintaining the correctness and the robustness of the models. Such approaches include feature selection and conversion of regression prediction problems to binary classification problems. Furthermore, a literature review study did not reveal research attempts about the use of a generalization of binary classification predictions that applies different target class upper boundaries other than 50% frequency binning. Thus, this thesis investigated the effects of such generalization against simplicity and performance of such models. We aimed to extend the traditional discretization of classes with equal frequency binning function which is standard for converting regression problems into the binary classification in many applications. Furthermore, we ought to establish important players’ features in individual leagues that could help team managers to have cost-efficient transferring strategies. The approach of selecting those features was achieved successfully by the application of wrapper and filter algorithms. Both methods turned out to be useful algorithms as the time taken to build the models was minimal, and the models were able to make good predictions. Furthermore, we noticed different features matter for different leagues. Therefore, in accessing the performance of players, such consideration should be kept in mind. Different machine learning algorithms were found to behave differently under different conditions. How-ever, Naïve Bayes was determined to be the best-fit in most cases. Moreover, the results suggest that it is possible to generalize binary classification problems and maintain the performance to a reasonable extent. But, it should be observed that the early stages of generalization of binary classification models involve a tedious work of training datasets, and that fact should be a tradeoff when thinking to use this approach.
APA, Harvard, Vancouver, ISO, and other styles
14

Stone, Peter H. "Climate Prediction: The Limits of Ocean Models." MIT Joint Program on the Science and Policy of Global Change, 2004. http://hdl.handle.net/1721.1/4056.

Full text
Abstract:
We identify three major areas of ignorance which limit predictability in current ocean GCMs. One is the very crude representation of subgrid-scale mixing processes. These processes are parameterized with coefficients whose values and variations in space and time are poorly known. A second problem derives from the fact that ocean models generally contain multiple equilibria and bifurcations, but there is no agreement as to where the current ocean sits with respect to the bifurcations. A third problem arises from the fact that ocean circulations are highly nonlinear, but only weakly dissipative, and therefore are potentially chaotic. The few studies that have looked at this kind of behavior have not answered fundamental questions, such as what are the major sources of error growth in model projections, and how large is the chaotic behavior relative to realistic changes in climate forcings. Advances in computers will help alleviate some of these problems, for example by making it more practical to explore to what extent the evolution of the oceans is chaotic. However models will have to rely on parameterizations of key small-scale processes such as diapycnal mixing for a long time. To make more immediate progress here requires the development of physically based prognostic parameterizations and coupling the mixing to its energy sources. Another possibly fruitful area of investigation is the use of paleoclimate data on changes in the ocean circulation to constrain more tightly the stability characteristics of the ocean circulation.
Abstract in HTML and technical report in PDF available on the Massachusetts Institute of Technology Joint Program on the Science and Policy of Global Change website (http://mit.edu/globalchange/www/).
APA, Harvard, Vancouver, ISO, and other styles
15

Faull, Nicholas Eric. "Ensemble climate prediction with coupled climate models." Thesis, University of Oxford, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442944.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Pasiouras, Fotios. "Development of bank acquisition targets prediction models." Thesis, Coventry University, 2005. http://curve.coventry.ac.uk/open/items/ecf1b00d-da92-9bd2-5b02-fa4fab8afb0c/1.

Full text
Abstract:
This thesis develops a range of prediction models for the purpose of predicting the acquisition of commercial banks in the European Union using publicly available data. Over the last thirty years, there have been approximately 30 studies that have attempted to identify potential acquisition targets, all of them focusing on non-bank sectors. We consider that prediction models developed specifically for the banking industry are essential due to the unusual structure of banks' financial statements, differences in the environment in which banks operate and other specific characteristics of banks that in general distinguish them from non-financial firms. We focus specifically on the EU banking sector, where M&As activity has been considerable in recent years, yet academic research relating to the EU has been rather limited compared to the case of the US. The methodology for developing prediction models involved identifying past cases of acquired banks and combining these with non-acquired banks in order to evaluate the prediction accuracy of various quantitative classification techniques. In this study, we construct a base sample of commercial banks covering 15 EU countries, and financial variables measuring capital strength, profit and cost efficiency, liquidity, growth, size and market power, with data in both raw and country-adjusted (i.e. raw variables divided by the average of the banking sector for the corresponding country) form. In order to allow for a proper comparative evaluation of classification methods, we select common subsets of the base sample and variables with high discriminatory power, dividing the sample period (1998-2002) into training sub-sample for model development (1998-2000), and holdout sub-sample for model evaluation (2001-2002). Although the results tend to support the findings of studies on non-financial firms, highlighting the difficulties in predicting acquisition targets, the prediction models we develop show classification accuracies generally higher than chance assignment based on prior probabilities. We also consider the use of equal and unequal matched holdout samples for evaluation, and find that overall classification accuracy tends to increase in the unequal matched samples, implying that equal matched samples do not necessarily overstate the prediction ability of models. The main goal of this study has been to compare and evaluate a variety of classification methods including statistical, econometric, machine learning and operational research techniques, as well as integrated techniques combining the predictions of individual classification methods. We found that some methods achieved very high accuracies in classifying non-acquired banks, but at the cost of relatively poor accuracy performance in classifying acquired banks. This suggests a trade-off in achieving high classification accuracy, although some methods (e.g. Discriminant) performed reasonably well in terms of achieving balanced overall classification accuracies of above chance predictions. Integrated prediction models offer the advantage of counterbalancing relatively poor performance of some classification methods with good performance of others, but in doing so could not out-perform all individual classification methods considered. In general, we found that the outcome of which method performed best depended largely on the group classification accuracy considered, as well as to some extent on the choice of the discriminatory variables. Concerning the use of raw or country-adjusted data, we found no clear effect on the prediction ability of the classification methods.
APA, Harvard, Vancouver, ISO, and other styles
17

Gray, Eoin. "Validating and updating lung cancer prediction models." Thesis, University of Sheffield, 2018. http://etheses.whiterose.ac.uk/19206/.

Full text
Abstract:
Lung cancer is a global disease that affect millions of individuals worldwide. Additionally, the disease is beset with a poor 5-year survival rate, a direct consequence of a low early stage diagnosis rate. In an attempt to improve lung cancer prognosis, individuals at high risk of developing lung cancer should be identified for periodic screening. Prediction models are devised to predict an individual’s risk of developing a disease over a specified time period. These can be used to identify high risk individuals and be made publically available to allow individuals’ to be conscience of their own risk. While prediction models have multiple uses it is imperative the models demonstrate a good standard of performance consistently when reviewed. The project conducted a systematic review, analysing previously published lung cancer prediction models. The review identified that there had been inadequate reporting of the existing models and when these models have been validated this had not been consistent across different publications. As a consequence models have not been consistently considered as a selective screening tool. The project then validated the prediction models using datasets from the International Lung Cancer Consortium. The validation identified the leading models which will allow a more targeted focus on these models in future research. This could culminate in the model being implemented as a clinical utility. The final stage reviewed methods to update a single prediction model or aggregate multiple prediction models into a meta-model. A literature review identified and evaluated the different methods, discussing how different methods can be successful in different scenarios. The methods were also reviewed for their suitability updating selected lung cancer prediction models, and appropriate methods were identified. These were then applied to create updated lung cancer models which were validated to assess which methods were successful at improving the performance and robustness of lung cancer prediction models. As lung cancer research develops, particularly into researching genetic markers that may explain lung cancer risk, these factors could be incorporated into already successful prediction models using appropriate model updating methods that were identified in our research.
APA, Harvard, Vancouver, ISO, and other styles
18

Pencina, Karol M. "Quantification of improvement in risk prediction models." Thesis, Boston University, 2012. https://hdl.handle.net/2144/32045.

Full text
Abstract:
Thesis (Ph.D.)--Boston University, 2012.
PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
The identification of new factors in modeling the probability of a binary outcome is the main challenge for statisticians and clinicians who want to improve risk prediction. This motivates researchers to search for measures that quantify the performance of new markers. There are many commonly used measures that assess the performance of the binary outcome model: logistic R-squares, discrimination slope. area under the ROC (receiver operating characteristic) curve (AUC) and Hosmer-Lemeshow goodness of fit chi-square. However, metrics that work well for model assessment may not be as good for quantifying the usefulness of new risk factors, especially when we add a new predictor to a well performing baseline model. The recently proposed new measures of improvement, the Integrated Discrimination Improvement (IDI) - difference between discrimination slopes - and the Net Reclassification Improvement (NRI), directly address the question of model performance and take it beyond the simple statistical significance of a new risk factor. Since these two measures are new and have not been studied as extensively as the older ones, a question of their interpretation naturally arises. In our research we propose meaningful interpretations to the new measures as well as extensions of these measures that address some of their potential shortcomings. Following the derivation of the maximum R-squared by Nagelkerke, we show how the IDI, which depends on the event rate, could be rescaled by its hypothetical maximum value to reduce this dependence. Furthermore, the IDI metric assumes a uniform distribution for all risk cut-offs. Application of clinically important thresholds prompted us to derive a metric that includes a prior distribution for the cut-off points and assigns different weights to sensitivity and specificity. Similarly, we propose the maximum and rescaled NRI. The latter is based on counting the number of categories by which the risk of a given person moved and guarantees that reclassification tables with equal marginal probabilities will lead to a zero NRI. All developments are investigated employing numerical simulations under the assumption of normality and varying effect sizes of the associations. We also illustrate the proposed concepts using examples from the Framingham Heart Study.
2031-01-02
APA, Harvard, Vancouver, ISO, and other styles
19

Wang, Yangzhengxuan. "Corporate default prediction : models, drivers and measurements." Thesis, University of Exeter, 2011. http://hdl.handle.net/10036/3457.

Full text
Abstract:
This thesis identifies the optimal set of corporate default drivers and examines the prediction performance of corporate default measurement tools, using a sample of companies in the United States from 1970 to 2009. In the discussion of optimal default drivers, feature selection techniques including the t-test and stepwise methods are used to filter relevant default information collected from previous empirical studies. The optimal default driver information set consists of quantitative parameters from accounting ratios, market indices, macroeconomic indicators, default history, and firm age. While both accounting ratios and market information dominate the explanatory ability, followed by default history, macroeconomic indicators contribute additional explanation for default risk. Moreover, industry effects show significance across alternative models, with the retail industry presenting as the sector with highest risk. The results are robust in both traditional and advanced random models. In investigating the optimal prediction method, two newly developed random models, mixed logit and frailty model, are tested for their theoretical superiority in capturing default clusters and unobservable information for default risk. The prediction ability of both models has been improved upon using the extended optimal set of default drivers. While the mixed logit model provides better prediction accuracy and shows stability in robustness checks, the frailty model benefits from computational efficiency and explains default clusters more thoroughly. This thesis further compares the prediction performance of large dimensional models across five categories based on the default probabilities transferred from alternative results in different models. Besides the traditional assessment criteria - covering the receiver operating characteristic curve, accuracy ratios, and classification error rates – this thesis thoroughly evaluates forecasting performance using innovative proxies including model stability under financial crisis, profitability and misclassification costs for creditors using alternative risk measurements. The practical superiority of the two advanced random models has been verified further in the comparative study.
APA, Harvard, Vancouver, ISO, and other styles
20

Mateus, Ana Teresa Moreirinha Vila Fernandes. "Quality management in laboratories- Effciency prediction models." Doctoral thesis, Universidade de Évora, 2021. http://hdl.handle.net/10174/29338.

Full text
Abstract:
In recent years, the choice of quality tools by laboratories has increased significantly. This fact contributed to the growth of competitiveness, requiring a new organizational posture to adapt to the new challenges. In order to obtain competitive advantages in the respective sectors of activity, laboratories have increasingly invested in innovation. In this context, the main objective of this study aims to develop efficiency models for laboratories using tools from the Scientific Area of Artificial Intelligence. Throughout this work, different studies will be presented, carried out in water analysis laboratories, stem cell cryopreservation laboratories and dialysis care clinics, in which innovative solutions and better resource control were sought, without compromising quality and promoting greater sustainability This work can be seen as an investigation opportunity that can be applied not only in laboratories and clinics, but also in organizations from different sectors in order to seek to define prediction models, allowing the anticipation of future scenarios and the evaluation of ways of acting. The results show the feasibility of applying the models and that the normative references applied to laboratories and clinics can be a basis for structuring the systems; Gestão da Qualidade em Laboratórios Modelos de Previsão de Eficiência Resumo: Nos últimos anos, a adoção de ferramentas da qualidade por parte dos laboratórios tem aumentado significativamente. Este facto contribuiu para o crescimento da competitividade, exigindo uma nova postura organizacional de forma a se adaptarem aos novos desafios. Tendo em vista obter vantagens competitivas nos respetivos sectores de atividade, os laboratórios têm, cada vez mais, apostado em inovação. Neste contexto, o principal objetivo deste estudo visa o desenvolvimento de modelos de eficiência para laboratórios através do recurso a ferramentas da Área Científica da Inteligência Artificial. Ao longo deste trabalho irão ser apresentados diferentes estudos, realizados em laboratórios de análises de águas, laboratórios de criopreservação de células estaminais e clínicas de prestação de cuidados de diálise, nos quais se procuraram soluções inovadoras e um melhor controlo de recursos, sem comprometer a qualidade e promovendo uma maior sustentabilidade. Este trabalho pode ser encarado como uma oportunidade de investigação que pode ser aplicado não apenas em laboratórios e clínicas mas, também, em organizações de diversos sectores com o intuito de se procurar definir modelos de previsão, possibilitando a antecipação de cenários futuros e a avaliação de formas de atuação. Os resultados mostram a viabilidade da aplicação dos modelos e que os referenciais normativos aplicados aos laboratórios e às clínicas podem servir como base para estruturação dos sistemas.
APA, Harvard, Vancouver, ISO, and other styles
21

Cordeiro, Silvio Ricardo. "Distributional models of multiword expression compositionality prediction." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0501/document.

Full text
Abstract:
Les systèmes de traitement automatique des langues reposent souvent sur l'idée que le langage est compositionnel, c'est-à-dire que le sens d'une entité linguistique peut être déduite à partir du sens de ses parties. Cette supposition ne s’avère pas vraie dans le cas des expressions polylexicales (EPLs). Par exemple, une "poule mouillée" n'est ni une poule, ni nécessairement mouillée. Les techniques pour déduire le sens des mots en fonction de leur distribution dans le texte ont obtenu de bons résultats sur plusieurs tâches, en particulier depuis l'apparition des word embeddings. Cependant, la représentation des EPLs reste toujours un problème non résolu. En particulier, on ne sait pas comment prédire avec précision, à partir des corpus, si une EPL donnée doit être traitée comme une unité indivisible (p.ex. "carton plein") ou comme une combinaison du sens de ses parties (p.ex. "eau potable"). Cette thèse propose un cadre méthodologique pour la prédiction de compositionnalité d'EPLs fondé sur des représentations de la sémantique distributionnelle, que nous instancions à partir d’une variété de paramètres. Nous présenterons une évaluation complète de l'impact de ces paramètres sur trois nouveaux ensembles de données modélisant la compositionnalité d'EPLs, en anglais, français et portugais. Finalement, nous présenterons une évaluation extrinsèque des niveaux de compositionnalité prédits par le modèle dans le contexte d’un système d'identification d'EPLs. Les résultats suggèrent que le choix spécifique de modèle distributionnel et de paramètres de corpus peut produire des prédictions de compositionnalité qui sont comparables à celles présentées dans l'état de l'art
Natural language processing systems often rely on the idea that language is compositional, that is, the meaning of a linguistic entity can be inferred from the meaning of its parts. This expectation fails in the case of multiword expressions (MWEs). For example, a person who is a "sitting duck" is neither a duck nor necessarily sitting. Modern computational techniques for inferring word meaning based on the distribution of words in the text have been quite successful at multiple tasks, especially since the rise of word embedding approaches. However, the representation of MWEs still remains an open problem in the field. In particular, it is unclear how one could predict from corpora whether a given MWE should be treated as an indivisible unit (e.g. "nut case") or as some combination of the meaning of its parts (e.g. "engine room"). This thesis proposes a framework of MWE compositionality prediction based on representations of distributional semantics, which we instantiate under a variety of parameters. We present a thorough evaluation of the impact of these parameters on three new datasets of MWE compositionality, encompassing English, French and Portuguese MWEs. Finally, we present an extrinsic evaluation of the predicted levels of MWE compositionality on the task of MWE identification. Our results suggest that the proper choice of distributional model and corpus parameters can produce compositionality predictions that are comparable to the state of the art
APA, Harvard, Vancouver, ISO, and other styles
22

Cordeiro, Silvio Ricardo. "Distributional models of multiword expression compositionality prediction." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/174519.

Full text
Abstract:
Sistemas de processamento de linguagem natural baseiam-se com frequência na hipótese de que a linguagem humana é composicional, ou seja, que o significado de uma entidade linguística pode ser inferido a partir do significado de suas partes. Essa expectativa falha no caso de expressões multipalavras (EMPs). Por exemplo, uma pessoa caracterizada como pão-duro não é literalmente um pão, e também não tem uma consistência molecular mais dura que a de outras pessoas. Técnicas computacionais modernas para inferir o significado das palavras com base na sua distribuição no texto vêm obtendo um considerável sucesso em múltiplas tarefas, especialmente após o surgimento de abordagens de word embeddings. No entanto, a representação de EMPs continua a ser um problema em aberto na área. Em particular, não existe um método consolidado que prediga, com base em corpora, se uma determinada EMP deveria ser tratada como unidade indivisível (por exemplo olho gordo) ou como alguma combinação do significado de suas partes (por exemplo tartaruga marinha). Esta tese propõe um modelo de predição de composicionalidade de EMPs com base em representações de semântica distribucional, que são instanciadas no contexto de uma variedade de parâmetros. Também é apresentada uma avaliação minuciosa do impacto desses parâmetros em três novos conjuntos de dados que modelam a composicionalidade de EMP, abrangendo EMPs em inglês, francês e português. Por fim, é apresentada uma avaliação extrínseca dos níveis previstos de composicionalidade de EMPs, através da tarefa de identificação de EMPs. Os resultados obtidos sugerem que a escolha adequada do modelo distribucional e de parâmetros de corpus pode produzir predições de composicionalidade que são comparáveis às observadas no estado da arte.
Natural language processing systems often rely on the idea that language is compositional, that is, the meaning of a linguistic entity can be inferred from the meaning of its parts. This expectation fails in the case of multiword expressions (MWEs). For example, a person who is a sitting duck is neither a duck nor necessarily sitting. Modern computational techniques for inferring word meaning based on the distribution of words in the text have been quite successful at multiple tasks, especially since the rise of word embedding approaches. However, the representation of MWEs still remains an open problem in the field. In particular, it is unclear how one could predict from corpora whether a given MWE should be treated as an indivisible unit (e.g. nut case) or as some combination of the meaning of its parts (e.g. engine room). This thesis proposes a framework of MWE compositionality prediction based on representations of distributional semantics, which we instantiate under a variety of parameters. We present a thorough evaluation of the impact of these parameters on three new datasets of MWE compositionality, encompassing English, French and Portuguese MWEs. Finally, we present an extrinsic evaluation of the predicted levels of MWE compositionality on the task of MWE identification. Our results suggest that the proper choice of distributional model and corpus parameters can produce compositionality predictions that are comparable to the state of the art.
APA, Harvard, Vancouver, ISO, and other styles
23

Gao, Sheng. "Latent factor models for link prediction problems." Paris 6, 2012. http://www.theses.fr/2012PA066056.

Full text
Abstract:
Avec la croissance d'Internet et celle des médias sociaux, les données relationnelles, qui décrivent un ensemble d'objets liés entre eux par différents relations, sont devenues courantes. En conséquence, une grande variété d'applications, telles que les systèmes de recommandation, l'analyse de réseaux sociaux, la fouille de données Web ou la bioinformatique, ont motivé l'étude de techniques d'apprentissage relationnel. Parmi le large éventail de ces techniques, nous traitons dans cette thèse le problème de prédiction de liens. Le problème de la prédiction de liens est une tache fondamentale de l'apprentissage relationnel, consistant à prédire la présence ou l'absence de liens entre objets, à partir de la topologie du réseau et/ou les attributs des objets. Cependant, la complexité et la sparsité des réseaux font de cette tache un problème ardu. Dans cette thèse, nous proposons des solutions pour faciliter l'apprentissage dans le cas de différentes applications. Dans le chapitre 3, nous présentons un cadre unifié afin de traiter le problème générique de prédiction de liens. Nous discutons les différentes caractéristiques des modèles des points de vue probabiliste et computationnel. Ensuite, en se focalisant sur les applications traitées dans cette thèse, nous proposons des modèles à facteurs latents pour deux types de taches de prédiction de liens: (i) prédiction structurelle de liens et (ii) prédiction temporelle de liens. Concernant la prédiction structurelle de liens, nous proposons dans le chapitre 4 une nouvelle application que nous appellons Prédiction de Motifs de Liens (PML). Nous introduisons un facteur latent spécifique pour différents types de relations en plus de facteurs latents pour caractériser les objets. Nous présentons un modèle de actorisation tensorielle dans un cadre Bayésien pour révéler la causalité intrinsèque de l'interaction sociale dans les réseaux multi-relationnels. De plus, étant donné la structure complexe des données relationnelles, nous proposons dans le chapitre 5 un modèle qui incorpore simultanément l'effet des facteurs de caractéristiques latentes et l'impact de la structure en blocs du réseau. Concernant la prédiction temporelle de liens dans les réseaux dynamiques, nous proposons dans le Chapitre 6 un modèle latent unifié qui intègre des sources d'information multiples, la topologie globale du réseau, les attributs des noeuds et les informations de proximité du réseau afin de capturer les motifs d'évolution temporelle des liens. Ce modèle joint repose sur la factorisation latente de matrices et sur une techniques de régularisation pour graphes. Chaque modèle proposé dans cette thèse a des performances comparables ou supérieures aux méthodes existantes. Des évaluations complètes sont conduites sur des jeux de données réels pour démontrer leur performances supérieures sur les méthodes de base. La quasi-totalité d'entre eux ont fait l'objet d'une publication dans des conférences nationales ou internationales
With the rising of Internet as well as modern social media, relational data has become ubiquitous, which consists of those kinds of data where the objects are linked to each other with various relation types. Accordingly, various relational learning techniques have been studied in a large variety of applications with relational data, such as recommender systems, social network analysis, Web mining or bioinformatic. Among a wide range of tasks encompassed by relational learning, we address the problem of link prediction in this thesis. Link prediction has arisen as a fundamental task in relational learning, which considers to predict the presence or absence of links between objects in the relational data based on the topological structure of the network and/or the attributes of objects. However, the complexity and sparsity of network structure make this a great challenging problem. In this thesis, we propose solutions to reduce the difficulties in learning and fit various models into corresponding applications. Basically, in Chapter 3 we present a unified framework of latent factor models to address the generic link prediction problem, in which we specifically discuss various configurations in the models from computational perspective and probabilistic view. Then, according to the applications addressed in this dissertation, we propose different latentfactor models for two classes of link prediction problems: (i) structural link prediction. (ii) temporal link prediction. In terms of structural link prediction problem, in Chapter 4 we define a new task called Link Pattern Prediction (LPP) in multi-relational networks. By introducing a specific latent factor for different relation types in addition to using latent feature factors to characterize objects, we develop a computational tensor factorization model, and the probabilistic version with its Bayesian treatment to reveal the intrinsic causality of interaction patterns in multi-relational networks. Moreover, considering the complex structural patterns in relational data, in Chapter 5 we propose a novel model that simultaneously incorporates the effect of latent feature factors and the impact from the latent cluster structures in the network, and also develop an optimization transfer algorithm to facilitate the model learning procedure. In terms of temporal link prediction problem in time-evolving networks, in Chapter 6 we propose a unified latent factor model which integrates multiple information sources in the network, including the global network structure, the content of objects and the graph proximity information from the network to capture the time-evolving patterns of links. This joint model is constructed based on matrix factorization and graph regularization technique. Each model proposed in this thesis achieves state-of-the-art performances, extensive experiments are conducted on real world datasets to demonstrate their significant improvements over baseline methods. Almost all of themhave been published in international or national peer-reviewed conference proceedings
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Ruomeng. "Evaluation of Current Concrete Creep Prediction Models." University of Toledo / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1461963600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Rantalainen, Mattias John. "Multivariate prediction models for bio-analytical data." Thesis, Imperial College London, 2008. http://hdl.handle.net/10044/1/1393.

Full text
Abstract:
Quantitative bio-analytical techniques that enable parallel measurements of large numbers of biomolecules generate vast amounts of information for studying and characterising biological systems. These analytical methods are commonly referred to as omics technologies, and can be applied for measurements of e.g. mRNA transcript, protein or metabolite abundances in a biological sample. The work presented in this thesis focuses on the application of multivariate prediction models for modelling and analysis of biological data generated by omics technologies. Omics data commonly contain up to tens of thousands of variables, which are often both noisy and multicollinear. Multivariate statistical methods have previously been shown to be valuable for visualisation and predictive modelling of biological and chemical data with similar properties to omics data. In this thesis currently available multivariate modelling methods are used in new applications, and new methods are developed to address some of the specific challenges associated with modelling of biological data. Three closely related areas of multivariate modelling of biological data are described and demonstrated in this thesis. First, a multivariate projection method is used in a novel application for predictive modelling between omics data sets, demonstrating how data from two analytical sources can be integrated and modelled to- gether by exploring covariation patterns between the data sets. This approach is exemplified by modelling of data from two studies, the first containing proteomic and metabolic profiling data and the second containing transcriptomic and metabolic profiling data. Second, a method for piecewise multivariate modelling of short timeseries data is developed and demonstrated by modelling of simulated data as well as metabolic profiling data from a toxicity study, providing a new method for characterisation of multivariate bio-analytical time-series data. Third, a kernel-based method is developed and applied for non-linear multivariate prediction modelling of omics data, addressing the specific challenge of modelling non-linear variation in biological data.
APA, Harvard, Vancouver, ISO, and other styles
26

Schwiegerling, James Theodore. "Visual performance prediction using schematic eye models." Diss., The University of Arizona, 1995. http://hdl.handle.net/10150/187327.

Full text
Abstract:
The goal of visual modeling is to predict the visual performance or a change in performance of an individual from a model of the human visual system. In designing a model of the human visual system, two distinct functions are considered. The first is the production of an image incident on the retina by the optical system of the eye, and the second is the conversion of this image into a perceived image by the retina and brain. The eye optics are evaluated using raytracing techniques familiar to the optical engineer. The effect of the retinal and brain function are combined with the raytracing results by analyzing the modulation of the retinal image. Each of these processes is important far evaluating the performance of the entire visual system. Techniques for converting the abstract system performance measures used by optical engineers into clinically-applicable measures such as visual acuity and contrast sensitivity are developed in this dissertation. Furthermore, a methodology for applying videokeratoscopic height data to the visual model is outlined. These tools are useful in modeling the visual effects of corrective lenses, ocular maladies and refractive surgeries. The modeling techniques are applied to examples of soft contact lenses, keratoconus, radial keratotomy, photorefractive keratectomy and automated lamellar keratoplasty. The modeling tools developed in this dissertation are meant to be general and modular. As improvements to the measurements of the properties and functionality of the various visual components are made, the new information can be incorporated into the visual system model. Furthermore, the examples discussed here represent only a small subset of the applications of the visual model. Additional ocular maladies and emerging refractive surgeries can be modeled as well.
APA, Harvard, Vancouver, ISO, and other styles
27

Rossi, A. "PREDICTIVE MODELS IN SPORT SCIENCE: MULTI-DIMENSIONAL ANALYSIS OF FOOTBALL TRAINING AND INJURY PREDICTION." Doctoral thesis, Università degli Studi di Milano, 2017. http://hdl.handle.net/2434/495229.

Full text
Abstract:
Due to the fact that team sports such as football have a complex multidirectional and intermittent nature, an accurate planning of the training workload is needed in order to maximise the athletes’ performance during the matches and reduce their risk of injury. Despite the evaluation of external workloads during trainings and matches has become more and more easier thanks to the advent of the tracking system technologies such as Global Position System (GPS), the planning of the best training workloads aimed to obtain the higher performance during the matches and a lower risk of injury during sport stimuli, is still a very difficult challenge for sport scientists, athletic trainers and coaches. The application of machine learning approaches on sport sciences aims to solve this crucial issue. Hence, the combination between data and sport scientists’ peculiarities could maximize the information that can be obtained from the football training and match analysis. Thus, the aim of this thesis is to provide examples of the application of the machine learning approach on sport science. In particular, two studies are provided with the aim of detecting a pattern during in-season football training weeks and predicting injuries. For these studies, 23 elite football players were monitored in eighty in-season trainings by using a portable non-differential 10 Hz global position system (GPS) integrated with 100 Hz 3-D accelerometer, a 3-D gyroscope, and a 3-D digital, Northern Ireland compass (STATSports Viper). Information about non-traumatic injuries were also recorded by the club’s medical staff. In order to detect a pattern during the in-season training weeks and the injuries, Extra Tree Random Forest (ETRFC) and Decision Tree (DT) Classifier were computed, respectively. In the first study it was found that the in-season football trainings follow a sinusoidal model (i.e. zig-zag shape found in autocorrelation analysis) because their periodization is characterized by repeated short-term cycles which are constituted by two parts: the first one (i.e. trainings long before the match) is constituted by high training loads, and the second one (i.e. trainings close to the match) by low ones. This short-term structure appears to be a strategy useful both to facilitate the decay of accumulated fatigue from high training loads performed at the beginning of the cycle and to promote readiness for the following performance. As a matter of fact, a patter was detected through the in-season football training weeks by ETRFC. This machine learning process can accurately define the training loads to be performed in each training day to maintain higher performance throughout the season. Moreover, it was found that the most important features able to discriminate short-term training days are the distance covered above 20 W·kg-1, the acceleration above 2 m·s-2, the total distance and the distance covered above 25.5 W·Kg-1 and below 19.8Km·h-1. Thus, in accordance with the results found in this study, athletic trainers and coaches may use machine learning processes to define training loads with the aim of obtaining the best performance during all the season matches. Players’ training loads discrepancy in comparison with the ones defined by athletic trainers and coaches as the best ones to obtain enhancement in match performance, might be considered an index of individuals’ physical issue, which could induce injuries. As a matter of fact, in the second study presented in this thesis, it was found that it is possible to correctly predict 60.9% of the injuries by using the rules defined by DT classifier assessing training loads in a predictive window of 6-days. In particular, it was found that the number of injuries that the player suffered through the season, the total number of Acceleration above 2 m·s-2 and 3 m·s-2, and the distance in meters when the Metabolic Power (Energy Consumption per Kilogramme per second) is above the value of 25.5 W/Kg per minute, are the most important features able to predict injuries. Moreover, the football team analysed in this thesis should keep under control the discrepancy of these features when players return to the regular training because of the numerous fall-backs into injuries that have been recorded. Thus, this machine learning approach enables football teams to identify when their players should pay more attention during both trainings and matches in order to reduce the injury risk, while improving team strategy. In conclusion, Machine Learning processes could help athletic trainers and coaches with the coaching process. In particular, they could define which training loads could be useful to obtain enhancement in sport performance and to predict injuries. The diversities of coaching processes and physical characteristics of the football players in each team do not permit to make inferences on the football players’ population. Hence, these models should be built in each team in order to improve the accuracy of the machine learning processes.
APA, Harvard, Vancouver, ISO, and other styles
28

Eadie, Edward Norman. "Small resource stock share price behaviour and prediction." Title page, contents and abstract only, 2002. http://web4.library.adelaide.edu.au/theses/09CM/09cme11.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Gamalielsson, Jonas. "Models for Protein Structure Prediction by Evolutionary Algorithms." Thesis, University of Skövde, Department of Computer Science, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-623.

Full text
Abstract:

Evolutionary algorithms (EAs) have been shown to be competent at solving complex, multimodal optimisation problems in applications where the search space is large and badly understood. EAs are therefore among the most promising classes of algorithms for solving the Protein Structure Prediction Problem (PSPP). The PSPP is how to derive the 3D-structure of a protein given only its sequence of amino acids. This dissertation defines, evaluates and shows limitations of simplified models for solving the PSPP. These simplified models are off-lattice extensions to the lattice HP model which has been proposed and is claimed to possess some of the properties of real protein folding such as the formation of a hydrophobic core. Lattice models usually model a protein at the amino acid level of detail, use simple energy calculations and are used mainly for search algorithm development. Off-lattice models usually model the protein at the atomic level of detail, use more complex energy calculations and may be used for comparison with real proteins. The idea is to combine the fast energy calculations of lattice models with the increased spatial possibilities of an off-lattice environment allowing for comparison with real protein structures. A hypothesis is presented which claims that a simplified off-lattice model which considers other amino acid properties apart from hydrophobicity will yield simulated structures with lower Root Mean Square Deviation (RMSD) to the native fold than a model only considering hydrophobicity. The hypothesis holds for four of five tested short proteins with a maximum of 46 residues. Best average RMSD for any model tested is above 6Å, i.e. too high for useful structure prediction and excludes significant resemblance between native and simulated structure. Hence, the tested models do not contain the necessary biological information to capture the complex interactions of real protein folding. It is also shown that the EA itself is competent and can produce near-native structures if given a suitable evaluation function. Hence, EAs are useful for eventually solving the PSPP.

APA, Harvard, Vancouver, ISO, and other styles
30

Mousavi, Biouki Seyed Mohammad Mahdi. "Design and performance evaluation of failure prediction models." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/25925.

Full text
Abstract:
Prediction of corporate bankruptcy (or distress) is one of the major activities in auditing firms’ risks and uncertainties. The design of reliable models to predict distress is crucial for many decision-making processes. Although a variety of models have been designed to predict distress, the relative performance evaluation of competing prediction models remains an exercise that is unidimensional in nature. To be more specific, although some studies use several performance criteria and their measures to assess the relative performance of distress prediction models, the assessment exercise of competing prediction models is restricted to their ranking by a single measure of a single criterion at a time, which leads to reporting conflicting results. The first essay of this research overcomes this methodological issue by proposing an orientation-free super-efficiency Data Envelopment Analysis (DEA) model as a multi-criteria assessment framework. Furthermore, the study performs an exhaustive comparative analysis of the most popular bankruptcy modelling frameworks for UK data. Also, it addresses two important research questions; namely, do some modelling frameworks perform better than others by design? and to what extent the choice and/or the design of explanatory variables and their nature affect the performance of modelling frameworks? Further, using different static and dynamic statistical frameworks, this chapter proposes new Failure Prediction Models (FPMs). However, within a super-efficiency DEA framework, the reference benchmark changes from one prediction model evaluation to another one, which in some contexts might be viewed as “unfair” benchmarking. The second essay overcomes this issue by proposing a Slacks-Based Measure Context-Dependent DEA (SBM-CDEA) framework to evaluate the competing Distress Prediction Models (DPMs). Moreover, it performs an exhaustive comparative analysis of the most popular corporate distress prediction frameworks under both a single criterion and multiple criteria using data of UK firms listed on London Stock Exchange (LSE). Further, this chapter proposes new DPMs using different static and dynamic statistical frameworks. Another shortcoming of the existing studies on performance evaluation lies in the use of static frameworks to compare the performance of DPMs. The third essay overcomes this methodological issue by suggesting a dynamic multi-criteria performance assessment framework, namely, Malmquist SBM-DEA, which by design, can monitor the performance of competing prediction models over time. Further, this study proposes new static and dynamic distress prediction models. Also, the study addresses several research questions as follows; what is the effect of information on the performance of DPMs? How the out-of-sample performance of dynamic DPMs compares to the out-of-sample performance of static ones? What is the effect of the length of training sample on the performance of static and dynamic models? Which models perform better in forecasting distress during the years with Higher Distress Rate (HDR)? On feature selection, studies have used different types of information including accounting, market, macroeconomic variables and the management efficiency scores as predictors. The recently applied techniques to take into account the management efficiency of firms are two-stage models. The two-stage DPMs incorporate multiple inputs and outputs to estimate the efficiency measure of a corporation relative to the most efficient ones, in the first stage, and use the efficiency score as a predictor in the second stage. The survey of the literature reveals that most of the existing studies failed to have a comprehensive comparison between two-stage DPMs. Moreover, the choice of inputs and outputs for DEA models that estimate the efficiency measures of a company has been restricted to accounting variables and features of the company. The fourth essay adds to the current literature of two-stage DPMs in several respects. First, the study proposes to consider the decomposition of Slack-Based Measure (SBM) of efficiency into Pure Technical Efficiency (PTE), Scale Efficiency (SE), and Mix Efficiency (ME), to analyse how each of these measures individually contributes to developing distress prediction models. Second, in addition to the conventional approach of using accounting variables as inputs and outputs of DEA models to estimate the measure of management efficiency, this study uses market information variables to calculate the measure of the market efficiency of companies. Third, this research provides a comprehensive analysis of two-stage DPMs through applying different DEA models at the first stage – e.g., input-oriented vs. output oriented, radial vs. non-radial, static vs. dynamic, to compute the measures of management efficiency and market efficiency of companies; and also using dynamic and static classifier frameworks at the second stage to design new distress prediction models.
APA, Harvard, Vancouver, ISO, and other styles
31

Gendron-Bellemare, Marc. "Learning prediction and abstraction in partially observable models." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=18471.

Full text
Abstract:
Markov models have been a keystone in Artificial Intelligence for many decades. However, they remain unsatisfactory when the environment modeled is partially observable. There are pathological examples where no history of fixed length is sufficient for accurate deci- sion making. On the other hand, working with a hidden state (such as in POMDPs) has a high computational cost. In order to circumvent this problem, I suggest the use of a context- based model. My approach replaces strict transition probabilities by a linear approximation of probability distributions. The method proposed provides a trade-off between a fully and partially observable model. I also discuss improving the approximation by constructing history-based features. Simple examples are given in order to show that the linear approx- imation can behave like certain Markov models. Empirical results on feature construction are also given to illustrate the power of the approach.
Depuis plusieurs déecennies, les modèeles de Markov forment l'une des bases de l'Intelligence Artificielle. Lorsque l'environnement modélisé n'est que partiellement observable, cepen- dant, ceux-ci demeurent insatisfaisants. Il est connu que la prise de décision optimale dans certains problèmes exige un historique infini. D'un autre côté, faire appel au con- cept d'état caché (tel qu'à travers l'utilisation de POMDPs) implique un coût computa- tionnel plus élevé. Afin de pallier à ce problème, je propose un modèle se servant une représentation concise de l'historique. Plutôt que de stocker un modèle parfait des prob- abilitités de transition, mon approche emploie d'une approximation linéaire des distribu- tions de probabilités. La méthode proposée est un compromis entre les modèles partielle- ment et complètement observables. Je traite aussi de la construction d'éléments en lien avec l'historique afin d'améliorer l'approximation linéaire. Des exemples restreints sont présentés afin de montrer qu'une approximation linéaire de certains modèles de Markov peut être atteinte. Des résultats empiriques au niveau de la construction d'éléments sont aussi présentés afin d'illustrer les bénéfices de mon approche.
APA, Harvard, Vancouver, ISO, and other styles
32

Svensson, Jacob. "Boundary Layer Parametrization in Numerical Weather Prediction Models." Licentiate thesis, Stockholms universitet, Meteorologiska institutionen (MISU), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-117134.

Full text
Abstract:
Numerical weather prediction (NWP) and climate models have shown to have a challenge to correctly simulate stable boundary layers and diurnal cycles. This aim of this study is to evaluate, describe and give suggestions for improvements of the descriptions of stable boundary layers in operational NWP models. Two papers are included. Paper I focuses on the description of the surface and the interactions between the surface and the boundary layer in COAMPSR, a regional NWP model. The soil parametrization showed to be of great importance to the structure of the boundary layer. Moreover, it showed also that a low frequency of radiation calculations caused a bias in received solar energy at the surface. In paper II, the focus is on the formulation of the turbulent transport in stable boundary layers. There, an implementation of a diffusion parametrization based on the amount of turbulent kinetic energy (TKE) is tested in a single column model (SCM) version of the global NWP model Integrated Forecast System (IFS). The TKE parametrization turned out to behave similarly as the currently operational diffusion parametrization in convective regimes and neutral regimes, but showed to be less diffusive in weakly stable and stable conditions. The formulations of diffusion also turned out to be very dependent on the length scale formulation. If the turbulence and the gradients of wind temperature and wind are weak, the magnitude of turbulence can enter an oscillating mode. This oscillation can be avoided with the use of a lower limit of the length scale.
Det har visat sig att det är en stor utmaning för numeriska väderprognosmodeller (NWP-modeller) att simulera stabilt skiktade atmosfäriska gränsskikt och gränsskiktets dygnscykel på ett korrekt sätt. Syftet med denna studien är att utvärdera, beskriva och ge förslag på förbättringar av beskrivningen av gränsskiktet i NWP-modeller. Studien innehåller två artiklar. Den första fokuserar på beskrivningen av markytan och interaktionen mellan marken och gränsskiktet i den regionala NWP-modellen COAMPS R . Det visade sig att beskrivningen av markytan har en signifikant inverkan på gränsskiktets struktur. Det framkom också att strålningsberäkningarna endast görs en gång i timmen vilket bland annat orsakar en bias i inkommande solinstrålning vid markytan. Den andra artikeln fokuserar på beskrivningen av den turbulenta transporten i stabila skiktade gränsskikt. En implemenering av en diffusionsparametrisering som bygger på turbulent kinetisk energy (TKE) testas i en endimensionell version av NWP-modellen Integrated Forecast System (IFS), utvecklat vid European Center for Medium Range Weather Forecasts (ECMWF). Den TKE-baserade diffussionsparametriseringen är likvärdigt med den nuvaran de operationella parametriseringen i neutrala och konvektiva gränsskikt, menär mindre diffusivt i stabila gränsskikt. Diffusionens intensitet är beroende påden turbulenta längdskalan. Vidare kan turbulensen i TKE-formuleringen hamna i ett oscillerande läge om turbulensen är svag samtidigt som temperatur- och vindgradienten är kraftig. Denna oscillation kan förhindras om längdskalans minsta tillåtna värde begränsas.
APA, Harvard, Vancouver, ISO, and other styles
33

Voigt, Alexander. "Mass spectrum prediction in non-minimal supersymmetric models." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-152797.

Full text
Abstract:
Supersymmetry is an attractive extension of the Standard Model (SM) of particle physics. The minimal supersymmetric extension (MSSM) provides gauge coupling unification, a dark matter candidate particle and can explain the breaking of the electroweak symmetry dynamically. However, it suffers from the little hierarchy and the mu-problem. Non-minimal supersymmetric extensions of the SM with a larger particle content or a higher symmetry can evade the problems of the MSSM. Such models may be well-motivated by Grand Unified Theories (GUTs) and can provide a rich new phenomenology with an extended Higgs sector, exotic particles, additional interactions and a close connection to String Theory. Interesting examples are the Next-to Minimal Supersymmetric Standard Model (NMSSM), which is motivated by the mu-problem, and the Exceptional Supersymmetric Standard Model (E6SSM), which is inspired by E6 GUTs. For phenomenological investigations of supersymmetric (SUSY) models the pole mass spectrum must be calculated from the fundamental model parameters. This task, however, is non-trivial as the spectrum must be consistent with measured low-energy observables (fine-structure constant, Z boson pole mass, muon decay etc.) as well as electroweak symmetry breaking and potential universality conditions on the soft supersymmetry breaking parameters at the GUT scale. Programs, which calculate the SUSY mass spectrum consistent with constraints of this kind are called spectrum generators. In this thesis four different contributions to the prediction of mass spectra and model parameters in non-minimal SUSY models are presented. (i) One-loop matching corrections of the E6SSM gauge and Yukawa couplings to the SM are calculated to increase the precision of the mass spectrum prediction in the constrained E6SSM. (ii) The beta-functions of vacuum expectation values (VEVs) are calculated in a general and supersymmetric gauge theory at the one- and two-loop level. The results enable an accurate calculation of the renormalization group running of the VEVs in non-minimal SUSY models. (iii) An NMSSM extension of Softsusy, a spectrum generator for the MSSM, is implemented. It represents a precise alternative to the already existing spectrum generator NMSPEC. (iv) FlexibleSUSY is presented, a general framework which creates a fast, modular and precise spectrum generator for any user-defined SUSY model. It represents a generalization of the hand-written SUSY spectrum generators and allows the study of a large variety of new SUSY models easily with high precision.
APA, Harvard, Vancouver, ISO, and other styles
34

Hawkes, Richard Nathanael. "Linear state models for volatility estimation and prediction." Thesis, Brunel University, 2007. http://bura.brunel.ac.uk/handle/2438/7138.

Full text
Abstract:
This thesis concerns the calibration and estimation of linear state models for forecasting stock return volatility. In the first two chapters I present aspects of financial modelling theory and practice that are of particular relevance to the theme of this present work. In addition to this I review the literature concerning these aspects with a particular emphasis on the area of dynamic volatility models. These chapters set the scene and lay the foundations for subsequent empirical work and are a contribution in themselves. The structure of the models employed in the application chapters 4,5 and 6 is the state-space structure, or alternatively the models are known as unobserved components models. In the literature these models have been applied in the estimation of volatility, both for high frequency and low frequency data. As opposed to what has been carried out in the literature I propose the use of these models with Gaussian components. I suggest the implementation of these for high frequency data for short and medium term forecasting. I then demonstrate the calibration of these models and compare medium term forecasting performance for different forecasting methods and model variations as well as that of GARCH and constant volatility models. I then introduce implied volatility measurements leading to two-state models and verify whether this derivative-based information improves forecasting performance. In chapter 6I compare different unobserved components models' specification and forecasting performance. The appendices contain the extensive workings of the parameter estimates' standard error calculations.
APA, Harvard, Vancouver, ISO, and other styles
35

AMARAL, BERNARDO HALLAK. "PREDICTION OF FUTURE VOLATILITY MODELS: BRAZILIAN MARKET ANALYSIS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=20458@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
Realizar a previsão de volatilidade futura é algo que intriga muitos estudiosos, pesquisadores e pessoas do mercado financeiro. O modelo e a metodologia utilizados no cálculo são fundamentais para o apreçamento de opções e dependendo das variáveis utilizadas, o resultado se torna muito sensível, propiciando resultados diferentes. Tudo isso pode causar cálculos imprecisos e estruturação de estratégias erradas de compra e venda de ações e opções por empresas e investidores. Por isso, o objetivo deste trabalho é utilizar alguns modelos para o cálculo de volatilidade futura e analisar os resultados, avaliando qual o melhor modelo a ser empregado, propiciando uma melhor previsão da volatilidade futura.
Make a prediction of future volatility is a subject that causes debate between scholars, researchers and people in the financial market. The modeal nd methodology used in the calculation are fundamental to the pricing of options and depending on the variables used, the result becomes very sensitive, giving different results. All this can cause inaccurate calculations and wrong strategies for buying and selling stocks and options by companies and investors. Therefore, the objective of this work is to use models for the calculation of future volatility and analyze the results, evaluating the best model to be used, allowing a better prediction of future volatility.
APA, Harvard, Vancouver, ISO, and other styles
36

Qarmalah, Najla Mohammed A. "Finite mixture models : visualisation, localised regression, and prediction." Thesis, Durham University, 2018. http://etheses.dur.ac.uk/12486/.

Full text
Abstract:
Initially, this thesis introduces a new graphical tool, that can be used to summarise data possessing a mixture structure. Computation of the required summary statistics makes use of posterior probabilities of class membership obtained from a fitted mixture model. In this context, both real and simulated data are used to highlight the usefulness of the tool for the visualisation of mixture data in comparison to the use of a traditional boxplot. This thesis uses localised mixture models to produce predictions from time series data. Estimation method used in these models is achieved using a kernel-weighted version of an EM-algorithm: exponential kernels with different bandwidths are used as weight functions. By modelling a mixture of local regressions at a target time point, but using different bandwidths, an informative estimated mixture probabilities can be gained relating to the amount of information available in the data set. This information is given a scale of resolution, that corresponds to each bandwidth. Nadaraya-Watson and local linear estimators are used to carry out localised estimation. For prediction at a future time point, a new methodology of bandwidth selection and adequate methods are proposed for each local method, and then compared to competing forecasting routines. A simulation study is executed to assess the performance of this model for prediction. Finally, double-localised mixture models are presented, that can be used to improve predictions for a variable time series using additional information provided by other time series. Estimation for these models is achieved using a double-kernel-weighted version of the EM-algorithm, employing exponential kernels with different horizontal bandwidths and normal kernels with different vertical bandwidths, that are focused around a target observation at a given time point. Nadaraya-Watson and local linear estimators are used to carry out the double-localised estimation. For prediction at a future time point, different approaches are considered for each local method, and are compared to competing forecasting routines. Real data is used to investigate the performance of the localised and double-localised mixture models for prediction. The data used predominately in this thesis is taken from the International Energy Agency (IEA).
APA, Harvard, Vancouver, ISO, and other styles
37

Li, Edwin. "LSTM Neural Network Models for Market Movement Prediction." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231627.

Full text
Abstract:
Interpreting time varying phenomena is a key challenge in the capital markets. Time series analysis using autoregressive methods has been carried out over the last couple of decades, often with reassuring results. However, such methods sometimes fail to explain trends and cyclical fluctuations, which may be characterized by long-range dependencies or even dependencies between the input features. The purpose of this thesis is to investigate whether recurrent neural networks with LSTM-cells can be used to capture these dependencies, and ultimately be used as a complement for index trading decisions. Experiments are made on different setups of the S&P-500 stock index, and two distinct models are built, each one being an improvement of the previous model. The first model is a multivariate regression model, and the second model is a multivariate binary classifier. The output of each model is used to reason about the future behavior of the index. The experiment shows for the configuration provided that LSTM RNNs are unsuitable for predicting exact values of daily returns, but gives satisfactory results when used to predict the direction of the movement.
Att förstå och kunna förutsäga hur index varierar med tiden och andra parametrar är ett viktigt problem inom kapitalmarknader. Tidsserieanalys med autoregressiva metoder har funnits sedan årtionden tillbaka, och har oftast gett goda resultat. Dessa metoder saknar dock möjligheten att förklara trender och cykliska variationer i tidsserien, något som kan karaktäriseras av tidsvarierande samband, men även samband mellan parametrar som indexet beror utav. Syftet med denna studie är att undersöka om recurrent neural networks (RNN) med long short-term memory-celler (LSTM) kan användas för att fånga dessa samband, för att slutligen användas som en modell för att komplettera indexhandel. Experimenten är gjorda mot en modifierad S&P-500 datamängd, och två distinkta modeller har tagits fram. Den ena är en multivariat regressionsmodell för att förutspå exakta värden, och den andra modellen är en multivariat klassifierare som förutspår riktningen på nästa dags indexrörelse. Experimenten visar för den konfiguration som presenteras i rapporten att LSTM RNN inte passar för att förutspå exakta värden för indexet, men ger tillfredsställande resultat när modellen ska förutsäga indexets framtida riktning.
APA, Harvard, Vancouver, ISO, and other styles
38

Rogge-Solti, Andreas, Laura Vana, and Jan Mendling. "Time Series Petri Net Models - Enrichment and Prediction." CEUR Workshop Proceedings, 2015. http://epub.wu.ac.at/5394/1/paper8.pdf.

Full text
Abstract:
Operational support as an area of process mining aims to predict the temporal performance of individual cases and the overall business process. Although seasonal effects, delays and performance trends are well-known to exist for business processes, there is up until now no prediction model available that explicitly captures this. In this paper, we introduce time series Petri net models. These models integrate the control flow perspective of Petri nets with time series prediction. Our evaluation on the basis of our prototypical implementation demonstrates the merits of this model in terms of better accuracy in the presence of time series effects.
APA, Harvard, Vancouver, ISO, and other styles
39

Fernando, Warnakulasuriya Chandima. "Blood Glucose Prediction Models for Personalized Diabetes Management." Thesis, North Dakota State University, 2018. https://hdl.handle.net/10365/28179.

Full text
Abstract:
Effective blood glucose (BG) control is essential for patients with diabetes. This calls for an immediate need to closely keep track of patients' BG level all the time. However, sometimes individual patients may not be able to monitor their BG level regularly due to all kinds of real-life interference. To address this issue, in this paper we propose machine-learning based prediction models that can automatically predict patients BG level based on their historical data and known current status. We take two approaches, one for predicting BG level only using individual's data and second is to use a population data. Our experimental results illustrate the effectiveness of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
40

Bratières, Sébastien. "Non-parametric Bayesian models for structured output prediction." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/274973.

Full text
Abstract:
Structured output prediction is a machine learning tasks in which an input object is not just assigned a single class, as in classification, but multiple, interdependent labels. This means that the presence or value of a given label affects the other labels, for instance in text labelling problems, where output labels are applied to each word, and their interdependencies must be modelled. Non-parametric Bayesian (NPB) techniques are probabilistic modelling techniques which have the interesting property of allowing model capacity to grow, in a controllable way, with data complexity, while maintaining the advantages of Bayesian modelling. In this thesis, we develop NPB algorithms to solve structured output problems. We first study a map-reduce implementation of a stochastic inference method designed for the infinite hidden Markov model, applied to a computational linguistics task, part-of-speech tagging. We show that mainstream map-reduce frameworks do not easily support highly iterative algorithms. The main contribution of this thesis consists in a conceptually novel discriminative model, GPstruct. It is motivated by labelling tasks, and combines attractive properties of conditional random fields (CRF), structured support vector machines, and Gaussian process (GP) classifiers. In probabilistic terms, GPstruct combines a CRF likelihood with a GP prior on factors; it can also be described as a Bayesian kernelized CRF. To train this model, we develop a Markov chain Monte Carlo algorithm based on elliptical slice sampling and investigate its properties. We then validate it on real data experiments, and explore two topologies: sequence output with text labelling tasks, and grid output with semantic segmentation of images. The latter case poses scalability issues, which are addressed using likelihood approximations and an ensemble method which allows distributed inference and prediction. The experimental validation demonstrates: (a) the model is flexible and its constituent parts are modular and easy to engineer; (b) predictive performance and, most crucially, the probabilistic calibration of predictions are better than or equal to that of competitor models, and (c) model hyperparameters can be learnt from data.
APA, Harvard, Vancouver, ISO, and other styles
41

Gorthi, Swathi. "Prediction Models for Estimation of Soil Moisture Content." DigitalCommons@USU, 2011. https://digitalcommons.usu.edu/etd/1090.

Full text
Abstract:
This thesis introduces the implementation of different supervised learning techniques for producing accurate estimates of soil moisture content using empirical information, including meteorological and remotely sensed data. The models thus developed can be extended to be used by the personal remote sensing systems developed in the Center for Self-Organizing Intelligent Systems (CSOIS). The dfferent models employed extend over a wide range of machine-learning techniques starting from basic linear regression models through models based on Bayesian framework. Also, ensembling methods such as bagging and boosting are implemented on all models for considerable improvements in accuracy. The main research objective is to understand, compare, and analyze the mathematical backgrounds underlying and results obtained from dfferent models and the respective improvisation techniques employed.
APA, Harvard, Vancouver, ISO, and other styles
42

Cutugno, Carmen. "Statistical models for the corporate financial distress prediction." Thesis, Università degli Studi di Catania, 2011. http://hdl.handle.net/10761/283.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Kumar, Akhil. "Budget-Related Prediction Models in the Business Environment with Special Reference to Spot Price Predictions." Thesis, North Texas State University, 1986. https://digital.library.unt.edu/ark:/67531/metadc331533/.

Full text
Abstract:
The purpose of this research is to study and improve decision accuracy in the real world. Spot price prediction of petroleum products, in a budgeting context, is the task chosen to study prediction accuracy. Prediction accuracy of executives in a multinational oil company is examined. The Brunswik Lens Model framework is used to evaluate prediction accuracy. Predictions of the individuals, the composite group (mathematical average of the individuals), the interacting group, and the environmental model were compared. Predictions of the individuals were obtained through a laboratory experiment in which experts were used as subjects. The subjects were required to make spot price predictions for two petroleum products. Eight predictor variables that were actually used by the subjects in real-world predictions were elicited through an interview process. Data for a 15 month period were used to construct 31 cases for each of the two products. Prediction accuracy was evaluated by comparing predictions with the actual spot prices. Predictions of the composite group were obtained by averaging the predictions of the individuals. Interacting group predictions were obtained ex post from the company's records. The study found the interacting group to be the least accurate. The implication of this finding is that even though an interacting group may be desirable for information synthesis, evaluation, or working toward group consensus, it is undesirable if prediction accuracy is critical. The accuracy of the environmental model was found to be the highest. This suggests that apart from random error, misweighting of cues by individuals and groups affects prediction accuracy. Another implication of this study is that the environmental model can also be used as an additional input in the prediction process to improve accuracy.
APA, Harvard, Vancouver, ISO, and other styles
44

Sawert, Marcus. "Predicting deliveries from suppliers : A comparison of predictive models." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-39314.

Full text
Abstract:
In the highly competitive environment that companies find themselves in today, it is key to have a well-functioning supply chain. For manufacturing companies, having a good supply chain is dependent on having a functioning production planning. The production planning tries to fulfill the demand while considering the resources available. This is complicated by the uncertainties that exist, such as the uncertainty in demand, in manufacturing and in supply. Several methods and models have been created to deal with production planning under uncertainty, but they often overlook the complexity in the supply uncertainty, by considering it as a stochastic uncertainty. To improve these models, a prediction based on earlier data regarding the supplier or item could be used to see when the delivery is likely to arrive. This study looked to compare different predictive models to see which one could best be suited for this purpose. Historic data regarding earlier deliveries was gathered from a large international manufacturing company and was preprocessed before used in the models. The target value that the models were to predict was the actual delivery time from the supplier. The data was then tested with the following four regression models in Python: Linear regression, ridge regression, Lasso and Elastic net. The results were calculated by cross-validation and presented in the form of the mean absolute error together with the standard deviation. The results showed that the Elastic net was the overall best performing model, and that the linear regression performed worst.
APA, Harvard, Vancouver, ISO, and other styles
45

Wiseman, Scott. "Bayesian learning in graphical models." Thesis, University of Kent, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.311261.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Hu, Zhongbo. "Atmospheric artifacts correction for InSAR using empirical model and numerical weather prediction models." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/668264.

Full text
Abstract:
lnSAR has been proved its unprecedented ability and merits of monitoring ground deformation on large scale with centimeter to millimeter scale accuracy. However, several factors affect the reliability and accuracy of its applications. Among them, atmospheric artifacts due to spatial and temporal variations of atmosphere state often pose noise to interferograms. Therefore, atmospheric artifacts m itigalion remains one of the biggest challenges to be addressed in the In SAR community. State-of-the-art research works have revealed atmospheric artifacts can be partially compensated with empirical models, temporal-spatial filtering approach in lnSAR time series, pointwise GPS zenith path delay and numerical weather prediction models. In this thesis, firstly, we further develop a covariance weighted linear empirical model correction method. Secondly, a realistic LOS direction integration approach based on global reanalysis data is employed and comprehensively compared with the conventional method that integrates along zenith direction. Finally, the realistic integration method is applied to local WRF numerical forecast model data. l'vbreover, detailed comparisons between different global reanalysis data and local WRF model are assessed. In terms of empirical models correcting methods, many publications have studied correcting stratified tropospheric phase delay by assuming a linear model between them and topography. However, most of these studies ha\19 not considered the effect of turbulent atmospheric artefacts when adjusting the linear model to data. In this thesis, an improved technique that minimizes the influence of turbulent atmosphere in the model adjustment has been presented. In the proposed algorithm, the model is adjusted to the phase differences of pixels instead of using the unwrapped phase of each pixel. In addition, the different phase differences are weighted as a function of its APS covariance estimated from an empirical variogram to reduce in the model adjustment the impact of pixel pairs with significant turbulent atmosphere. The performance of the proposed method has been validated with both simulated and real Sentinel-1 SAR data in Tenerife island, Spain. Considering methods using meteorological observations to mitigate APS, an accurate realistic com puling strategy utilizing global atmospheric reanalysis data has been implemented. With the approach, the realistic LOS path along satellite and the monitored points is considered, rather than converting from zenith path delay. Com pared with zenith delay based method, the biggest advantage is that it can avoid errors caused by anisotropic atmospheric behaviour. The accurate integration method is validated with Sentinel-1 data in three test sites: Tenerife island, Spain, Almeria, Spain and Crete island, Greece. Compared to conventional zenith method, the realistic integration method shows great improvement. A variety of global reanalysis data are available from different weather forecasting organizations, such as ERA-Interim, ERAS, MERRA2. In this study, the realistic integration mitigation method is assessed on these different reanalysis data. The results show that these data are feasible to mitigate APS to some extent in most cases. The assessment also demonstrates that the ERAS performs the best statistically, compared to other global reanalysis data. l'vbreover, as local numerical weather forecast models have the ability to predict high spatial resolution atmospheric parameters, by using which, it has the potential to achieve APS mitigation. In this thesis, the realistic integration method is also employed on the local WRF model data in Tenerife and Almeria test s ites. However, it turns out that the WRF model performs worse than the original global reanalysis data.
Las técnicas lnSAR han demostrado su capacidad sin precedentes y méritos para el monitoreo de la deformaci6n del suelo a gran escala con una precisión centimétrica o incluso milimétrica. Sin embargo, varios factores afectan la fiabilidad y precisión de sus aplicaciones. Entre ellos, los artefactos atmosféricos debidos a variaciones espaciales y temporales del estado de la atm6sfera a menudo añaden ruido a los interferogramas. Por lo tanto, la mitigación de los artefactos atmosféricos sigue siendo uno de los mayores desafíos a abordar en la comunidad lnSAR. Los trabajos de investigaci6n de vanguardia han revelado que los artefactos atmosféricos se pueden compensar parcialmente con modelos empíricos, enfoque de filtrado temporal-espacial en series temporales lnSAR, retardo puntual del camino cenital con GPS y modelos numéricos de predicción meteorológica. En esta tesis, en primer lugar, desarrollamos un método de corrección de modelo empírico lineal ponderado por covarianza. En segundo lugar, se emplea un enfoque realista de integracion de dirección LOS basado en datos de reanálisis global y se compara exhaustivamente con el método convencional que se integra a lo largo de la dirección cenital. Finalmente, el método de integraci6n realista se aplica a los datos del modelo de pronóstico numérico WRF local. Ademas, se evalúan las comparaciones detalladas entre diferentes datos de reanálisis global y el modelo WRF local. En términos de métodos de corrección con modelos empíricos, muchas publicaciones han estudiado la corrección del retraso estratificado de la fase troposférica asumiendo un modelo lineal entre ellos y la topografía. Sin embargo, la mayoría de estos estudios no han considerado el efecto de los artefactos atmosféricos turbulentos al ajustar el modelo lineal a los datos. En esta tesis, se ha presentado una técnica mejorada que minimiza la influencia de la atm6sfera turbulenta en el ajuste del modelo. En el algoritmo propuesto, el modelo se ajusta a las diferencias de fase de los pixeles en lugar de utilizar la fase sin desenrollar de cada pixel. Además, las diferentes diferencias de fase se ponderan en función de su covarianza APS estimada a partir de un variograma empírico para reducir en el ajuste del modelo el impacto de los pares de pixeles con una atm6sfera turbulenta significativa. El rendimiento del método propuesto ha sido validado con datos SAR Sentinel-1 simulados y reales en la isla de Tenerife, España. Teniendo en cuenta los métodos que utilizan observaciones meteorológicas para mitigar APS, se ha implementado una estrategia de computación realista y precisa que utiliza datos de reanálisis atmosférico global. Con el enfoque, se considera el camino realista de LOS a lo largo del satélite y los puntos monitoreados, en lugar de convertirlos desde el retardo de la ruta cenital. En comparación con el método basado en la demora cenital, la mayor ventaja es que puede evitar errores causados por el comportamiento atmosférico anisotrópico. El método de integración preciso se valida con los datos de Sentinel-1 en tres sitios de prueba: la isla de Tenerife, España, Almería, España y la isla de Creta, Grecia. En comparación con el método cenital convencional, el método de integración realista muestra una gran mejora.
APA, Harvard, Vancouver, ISO, and other styles
47

Almeida, Mara Elisabeth Monteiro. "Bankruptcy prediction models: an analysis for Portuguese SMEs." Master's thesis, 2020. http://hdl.handle.net/1822/67108.

Full text
Abstract:
Projeto de mestrado em Finanças
This project intends to test the models developed by Altman (1983) and Ohlson (1980) and assess the predictive capacity of these models when applied to a dataset of Portuguese SMEs. This work is the result of a partnership with a Portuguese startup called nBanks, which dedicates its activity to providing financial services to its customers. In this sense, this project will allow nBanks to develop a new and innovative instrument that will allow its customers to access their probability or risk of default. The data were collected from the Amadeus database, for the period between 2011 and 2018. The dataset consists of 194,979 companies, of which 2,913 companies are in distress and the remaining 192,066 are healthy companies. From the application of the models, it was concluded that the O-score, using a cut-off of 3.8%, is better than the Z’’-Score as it is the model that minimizes the error of a company in distress being classified as a healthy company, although the Z’'-score presents the best overall accuracy. The O-score model is better than the Z’’-Score in forecasting financial distress when considering the group of companies in distress. It is also concluded that when the period up to 5 years before financial distress is analyzed, the accuracy of the models decreases as we move forward in the number of years. An analysis of the top 25% of the companies classified as distressed, based on the results of the O-score, showed that those companies are medium-sized companies, concentrated in the North of Portugal and the Wholesale trade sector, except for motor vehicles and motorcycles.
Este projeto pretende testar os modelos desenvolvidos por Altman (1983) e Ohlson (1980) e avaliar a capacidade preditiva desses modelos quando aplicados a um conjunto de dados de PMEs portuguesas. Este trabalho é resultado de uma parceria com uma startup portuguesa chamada nBanks, que dedica a sua atividade à prestação de serviços financeiros a seus clientes. Nesse sentido, este projeto permitirá que a nBanks desenvolva um instrumento novo e inovador que permitirá aos seus clientes acederem a informação sobre a sua probabilidade de falência ou risco de incumprimento. Os dados foram coletados na base de dados Amadeus, para o período entre 2011 e 2018. A amostra é composta por 194.979 empresas, das quais 2.913 empresas estão em dificuldade e as restantes 192.066 são empresas saudáveis. A partir da aplicação dos modelos, concluiu-se que o modelo O-score, utilizando um ponto crítico de 3,8%, é melhor que o modelo Z’’-score, pois é o modelo que minimiza o erro de uma empresa em dificuldade ser classificada como uma empresa saudável, embora o Z’’-score apresente, de um modo geral, melhor precisão. O modelo O-score é melhor que o modelo Z’’-score na previsão de dificuldade financeira ao considerar o grupo de empresas em dificuldades. Conclui-se também que, quando analisado o período de até 5 anos antes da dificuldade financeira, a precisão dos modelos diminui à medida que avançamos no número de anos. Uma análise do Top 25% das empresas classificadas em dificuldade, com base nos resultados do O-score, mostrou que essas empresas são médias empresas, concentradas no norte de Portugal e no setor de comércio grossista, exceto veículos automotores e motocicletas.
APA, Harvard, Vancouver, ISO, and other styles
48

Yuan, Yan. "Prediction Performance of Survival Models." Thesis, 2008. http://hdl.handle.net/10012/3974.

Full text
Abstract:
Statistical models are often used for the prediction of future random variables. There are two types of prediction, point prediction and probabilistic prediction. The prediction accuracy is quantified by performance measures, which are typically based on loss functions. We study the estimators of these performance measures, the prediction error and performance scores, for point and probabilistic predictors, respectively. The focus of this thesis is to assess the prediction performance of survival models that analyze censored survival times. To accommodate censoring, we extend the inverse probability censoring weighting (IPCW) method, thus arbitrary loss functions can be handled. We also develop confidence interval procedures for these performance measures. We compare model-based, apparent loss based and cross-validation estimators of prediction error under model misspecification and variable selection, for absolute relative error loss (in chapter 3) and misclassification error loss (in chapter 4). Simulation results indicate that cross-validation procedures typically produce reliable point estimates and confidence intervals, whereas model-based estimates are often sensitive to model misspecification. The methods are illustrated for two medical contexts in chapter 5. The apparent loss based and cross-validation estimators of performance scores for probabilistic predictor are discussed and illustrated with an example in chapter 6. We also make connections for performance.
APA, Harvard, Vancouver, ISO, and other styles
49

Auerbach, Jonathan Lyle. "Some Statistical Models for Prediction." Thesis, 2020. https://doi.org/10.7916/d8-gcvm-jj03.

Full text
Abstract:
This dissertation examines the use of statistical models for prediction. Examples are drawn from public policy and chosen because they represent pressing problems facing U.S. governments at the local, state, and federal level. The first five chapters provide examples where the perfunctory use of linear models, the prediction tool of choice in government, failed to produce reasonable predictions. Methodological flaws are identified, and more accurate models are proposed that draw on advances in statistics, data science, and machine learning. Chapter 1 examines skyscraper construction, where the normality assumption is violated and extreme value analysis is more appropriate. Chapters 2 and 3 examine presidential approval and voting (a leading measure of civic participation), where the non-collinearity assumption is violated and an index model is more appropriate. Chapter 4 examines changes in temperature sensitivity due to global warming, where the linearity assumption is violated and a first-hitting-time model is more appropriate. Chapter 5 examines the crime rate, where the independence assumption is violated and a block model is more appropriate. The last chapter provides an example where simple linear regression was overlooked as providing a sensible solution. Chapter 6 examines traffic fatalities, where the linear assumption provides a better predictor than the more popular non-linear probability model, logistic regression. A theoretical connection is established between the linear probability model, the influence score, and the predictivity.
APA, Harvard, Vancouver, ISO, and other styles
50

Lin, Hsin-Yin, and 林欣穎. "The Application of Grey Prediction Models to Depreciation Expense Prediction." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/d6u94c.

Full text
Abstract:
碩士
國立臺北科技大學
自動化科技研究所
103
Earnings per share (EPS) serves an important indicator for investors to analyze listed companies. Business owners with good reputation and accountability also consider it is important to achieve their annual EPS growth rate. Higher EPS means the company is high profitability per of unit capital, the company has better ability than the other competitors. That means it can use fewer resources to create higher profit. Among the factors affecting a listed company&;#39;s earnings per share, where the cost of the item is depreciation. Depreciation is the cost that cannot be ignored. There are many ways to calculate depreciation. The most principal is according fixed assets application to choose the different depreciation methods. This research will use a listed company for an example. To use 2013 - 2015 first quarter of the company&;#39;s capital expenditure of used data for the analysis sample, and use the calculated used data index value of each fixed assets to the relevance data of transfer property. Use grey prediction theory‘s GM (1.1) model to analyze and forecast the accumulated depreciation of the company that in year at every month, to replace the original calculation of the monthly performed by manual work mode to increase operational efficiency.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography