Log in

Relevant bibliographies by topics / MACHINE ALGORITHMS / Dissertations / Theses

To see the other types of publications on this topic, follow the link: MACHINE ALGORITHMS.

Dissertations / Theses on the topic 'MACHINE ALGORITHMS'

Author: Grafiati

Published: 11 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'MACHINE ALGORITHMS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Andersson, Viktor. "Machine Learning in Logistics: Machine Learning Algorithms : Data Preprocessing and Machine Learning Algorithms." Thesis, Luleå tekniska universitet, Datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-64721.

Full text

Abstract:

Data Ductus is a Swedish IT-consultant company, their customer base ranging from small startups to large scale cooperations. The company has steadily grown since the 80s and has established offices in both Sweden and the US. With the help of machine learning, this project will present a possible solution to the errors caused by the human factor in the logistic business.A way of preprocessing data before applying it to a machine learning algorithm, as well as a couple of algorithms to use will be presented.
Data Ductus är ett svenskt IT-konsultbolag, deras kundbas sträcker sig från små startups till stora redan etablerade företag. Företaget har stadigt växt sedan 80-talet och har etablerat kontor både i Sverige och i USA. Med hjälp av maskininlärning kommer detta projket att presentera en möjlig lösning på de fel som kan uppstå inom logistikverksamheten, orsakade av den mänskliga faktorn.Ett sätt att förbehandla data innan den tillämpas på en maskininlärning algoritm, liksom ett par algoritmer för användning kommer att presenteras.

APA, Harvard, Vancouver, ISO, and other styles

2

Romano, Donato. "Machine Learning algorithms for predictive diagnostics applied to automatic machines." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22319/.

Full text

Abstract:

In questo lavoro di tesi è stato analizzato l'avvento dell'industria 4.0 all'interno dell' industria nel settore packaging. In particolare, è stata discussa l'importanza della diagnostica predittiva e sono stati analizzati e testati diversi approcci per la determinazione di modelli descrittivi del problema a partire dai dati. Inoltre, sono state applicate le principali tecniche di Machine Learning in modo da classificare i dati analizzati nelle varie classi di appartenenza.

APA, Harvard, Vancouver, ISO, and other styles

3

Moon, Gordon Euhyun. "Parallel Algorithms for Machine Learning." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1561980674706558.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Roderus, Jens, Simon Larson, and Eric Pihl. "Hadoop scalability evaluation for machine learning algorithms on physical machines : Parallel machine learning on computing clusters." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20102.

Full text

Abstract:

The amount of available data has allowed the field of machine learning to flourish. But with growing data set sizes comes an increase in algorithm execution times. Cluster computing frameworks provide tools for distributing data and processing power on several computer nodes and allows for algorithms to run in feasible time frames when data sets are large. Different cluster computing frameworks come with different trade-offs. In this thesis, the scalability of the execution time of machine learning algorithms running on the Hadoop cluster computing framework is investigated. A recent version of Hadoop and algorithms relevant in industry machine learning, namely K-means, latent Dirichlet allocation and naive Bayes are used in the experiments. This paper provides valuable information to anyone choosing between different cluster computing frameworks. The results show everything from moderate scalability to no scalability at all. These results indicate that Hadoop as a framework may have serious restrictions in how well tasks are actually parallelized. Possible scalability improvements could be achieved by modifying the machine learning library algorithms or by Hadoop parameter tuning.

APA, Harvard, Vancouver, ISO, and other styles

5

Sahoo, Shibashankar. "Soft machine : A pattern language for interacting with machine learning algorithms." Thesis, Umeå universitet, Designhögskolan vid Umeå universitet, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-182467.

Full text

Abstract:

The computational nature of soft computing e.g. machine learning and AI systems have been hidden by seamless interfaces for almost two decades now. It has led to the loss of control, inability to explore, and adapt to needs and privacy at an individual level to social-technical problems on a global scale. I propose a soft machine - a set of cohesive design patterns or ‘seams’ to interact with everyday ‘black-box’ algorithms. Through participatory design and tangible sketching, I illustrate several interaction techniques to show how people can naturally control, explore, and adapt in-context algorithmic systems. Unlike existing design approaches, I treat machine learning as playful ‘design material’ finding moments of interplay between human common sense and statical intelligence. Further, I conceive machine learning not as a ‘technology’ but rather as an iterative training ‘process’, which eventually changes the role of user from a passive consumer of technology to an active trainer of algorithms.

APA, Harvard, Vancouver, ISO, and other styles

6

Dunkelberg, Jr John S. "FEM Mesh Mapping to a SIMD Machine Using Genetic Algorithms." Digital WPI, 2001. https://digitalcommons.wpi.edu/etd-theses/1154.

Full text

Abstract:

The Finite Element Method is a computationally expensive method used to perform engineering analyses. By performing such computations on a parallel machine using a SIMD paradigm, these analyses' run time can be drastically reduced. However, the mapping of the FEM mesh elements to the SIMD machine processing elements is an NP-complete problem. This thesis examines the use of Genetic Algorithms as a search technique to find quality solutions to the mapping problem. A hill climbing algorithm is compared to a traditional genetic algorithm, as well as a "messy" genetic algorithm. The results and comparative advantages of these approaches are discussed.

APA, Harvard, Vancouver, ISO, and other styles

7

Williams, Cristyn Barry. "Colour constancy : human mechanisms and machine algorithms." Thesis, City University London, 1995. http://openaccess.city.ac.uk/7731/.

Full text

Abstract:

This thesis describes a quantitative experimental investigation into instantaneous colour constancy in humans. Colour constancy may be defined as the ability of the visual system to maintain a constant colour percept of a surface despite varying conditions of illumination. Instantaneous, in this context, refers to effects which happen very rapidly with the change of illumination, rather than those which may be due to long term adaptation of the photoreceptors. The results of experiments are discussed in the context of current computational models of colour constancy. Experiments on subjects with damage to the cerebral cortex are described. These highlight the different uses of chromatic signals within the cerebral cortex and provide evidence for location of the neural substrates which mediate instantaneous colour constancy. The introductory chapter describes briefly the visual system, in the first section, with particular reference to the processing of colour. The second section discusses the psychophysics of human colour vision and the third presents a summary of computational models of colour constancy described in the literature. Chapter two describes the dynamic colour matching technique developed for this investigation. This technique has the advantage of quantifying the level of constancy achieved, whilst maintaining a constant state of adaptation. The C index is defined as a measure of constancy, with 0 representing no constancy and 1 perfect constancy. Calibration procedures for the computer monitor and the necessary transformations to accurately simulate illuminant reflectance combinations are also described. Light scattered within the eye and its effect on colour constancy are discussed. Chapter three is concerned with the effects of altering the illuminant conditions on instantaneous colour constancy. The size of the illuminant shift is varied. Artificial illuminants are compared with those of the Plankian locus. The effects of overall illuminance and the luminance contrast between target and surround are investigated. Chapter four considers the spatial structure of the visual scene. Simple uniform surrounds are compared with those which have a more complex spatiochromatic structure (Mondrians). The effects of varying the test target size and shape are investigated. The decrease in constancy as a black border is placed between test target and surround is measured. Chapter five describes experiments on four subjects with damage to the cerebral cortex. Chromatic discrimination thresholds are investigated for three subjects with achromatopsia as are the contribution of both sighted and blind hemifields to constancy for a subject with hemianopia. Contrary to the predictions of many of the current computational models, using unnatural illuminants has no substantial effect on the C index, nor does the size of the illuminant shift or the luminance contrast between experimental target and surround. The complexity of the surrounding field does not effect constancy. These findings are similar to those from chromatic induction experiments reported in the literature. However, the effect of a black annulus is found to have different spatial parameters that those reported from experiments on chromatic induction, suggesting that a different mechanism may be involved. The three achromatopsics can be shown to exhibit instantaneous colour constancy. However the blind hemifield of the hemianope does not contribute. This suggests that the fusiform gyrus is not the human homologue of V4 and that the primary visual cortex is necessary for instantaneous colour constancy.

APA, Harvard, Vancouver, ISO, and other styles

8

Mitchell, Brian. "Prepositional phrase attachment using machine learning algorithms." Thesis, University of Sheffield, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.412729.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

PASSOS, BRUNO LEONARDO KMITA DE OLIVEIRA. "SCHEDULING ALGORITHMS APPLICATION FOR MACHINE AVAILABILITY CONSTRAINT." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24311@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Grande parte da literatura de problemas de escalonamento assume que todas as máquinas estão disponíveis durante todo o período de análise o que, na prática, não é verdade, pois algumas das máquinas podem estar indisponíveis para processamento sem aviso prévio devido a problemas ou a políticas de utilização de seus recursos. Nesta tese, exploramos algumas das poucas heurísticas disponíveis na literatura para a minimização do makespan para este tipo de problema NP-difícil e apresentamos uma nova heurística que utiliza estatísticas de disponibilidade das máquinas para gerar um escalonamento. O estudo experimental com dados reais mostrou que a nova heurística apresenta ganhos de makespan em relação aos demais algoritmos clássicos que não utilizam informações de disponibilidade no processo de decisão. A aplicação prática deste problema está relacionada a precificação de ativos de uma carteira teórica de forma a estabelecer o risco de mercado da forma mais rápida possível através da utilização de recursos tecnológicos ociosos.
Most literature in scheduling theory assumes that machines are always available during the scheduling time interval, which in practice is not true due to machine breakdowns or resource usage policies. We study a few available heuristics for the NP-hard problem of minimizing the makespan when breakdowns may happen. We also develop a new scheduling heuristic based on historical machine availability information. Our experimental study, with real data, suggests that this new heuristic is better in terms of makespan than other algorithms that do not take this information into account. We apply the results of our investigation for the asset-pricing problem of a fund portfolio in order to determine a full valuation market risk using idle technological resources of a company.

APA, Harvard, Vancouver, ISO, and other styles

10

Wen, Tong 1970. "Support Vector Machine algorithms : analysis and applications." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/8404.

Full text

Abstract:

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2002.
Includes bibliographical references (p. 89-97).
Support Vector Machines (SVMs) have attracted recent attention as a learning technique to attack classification problems. The goal of my thesis work is to improve computational algorithms as well as the mathematical understanding of SVMs, so that they can be easily applied to real problems. SVMs solve classification problems by learning from training examples. From the geometry, it is easy to formulate the finding of SVM classifiers as a linearly constrained Quadratic Programming (QP) problem. However, in practice its dual problem is actually computed. An important property of the dual QP problem is that its solution is sparse. The training examples that determine the SVM classifier are known as support vectors (SVs). Motivated by the geometric derivation of the primal QP problem, we investigate how the dual problem is related to the geometry of SVs. This investigation leads to a geometric interpretation of the scaling property of SVMs and an algorithm to further compress the SVs. A random model for the training examples connects the Hessian matrix of the dual QP problem to Wishart matrices. After deriving the distributions of the elements of the inverse Wishart matrix Wn-1(n, nI), we give a conjecture about the summation of the elements of Wn-1(n, nI). It becomes challenging to solve the dual QP problem when the training set is large. We develop a fast algorithm for solving this problem. Numerical experiments show that the MATLAB implementation of this projected Conjugate Gradient algorithm is competitive with benchmark C/C++ codes such as SVMlight and SvmFu. Furthermore, we apply SVMs to time series data.
(cont.) In this application, SVMs are used to predict the movement of the stock market. Our results show that using SVMs has the potential to outperform the solution based on the most widely used geometric Brownian motion model of stock prices.
by Tong Wen.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

11

Johansson, Samuel, and Karol Wojtulewicz. "Machine learning algorithms in a distributed context." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148920.

Full text

Abstract:

Interest in distributed approaches to machine learning has increased significantly in recent years due to continuously increasing data sizes for training machine learning models. In this thesis we describe three popular machine learning algorithms: decision trees, Naive Bayes and support vector machines (SVM) and present existing ways of distributing them. We also perform experiments with decision trees distributed with bagging, boosting and hard data partitioning and evaluate them in terms of performance measures such as accuracy, F1 score and execution time. Our experiments show that the execution time of bagging and boosting increase linearly with the number of workers, and that boosting performs significantly better than bagging and hard data partitioning in terms of F1 score. The hard data partitioning algorithm works well for large datasets where the execution time decrease as the number of workers increase without any significant loss in accuracy or F1 score, while the algorithm performs poorly on small data with an increase in execution time and loss in accuracy and F1 score when the number of workers increase.

APA, Harvard, Vancouver, ISO, and other styles

12

Shen, Chenyang. "Regularized models and algorithms for machine learning." HKBU Institutional Repository, 2015. https://repository.hkbu.edu.hk/etd_oa/195.

Full text

Abstract:

Multi-lable learning (ML), multi-instance multi-label learning (MIML), large network learning and random under-sampling system are four active research topics in machine learning which have been studied intensively recently. So far, there are still a lot of open problems to be figured out in these topics which attract worldwide attention of researchers. This thesis mainly focuses on several novel methods designed for these research tasks respectively. Then main difference between ML learning and traditional classification task is that in ML learning, one object can be characterized by several different labels (or classes). One important observation is that the labels received by similar objects in ML data are usually highly correlated with each other. In order to exploring this correlation of labels between objects which might be a key issue in ML learning, we consider to require the resulting label indicator to be low rank. In the proposed model, nuclear norm which is a famous convex relaxation of intractable matrix rank is introduced to label indicator in order to exploiting the underlying correlation in label domain. Motivated by the idea of spectral clustering, we also incorporate information from feature domain by constructing a graph among objects based on their features. Then with partial label information available, we integrate them together into a convex low rank based model designed for ML learning. The proposed model can be solved efficiently by using alternating direction method of multiplier (ADMM). We test the performance on several benchmark ML data sets and make comparisons with the state-of-art algorithms. The classification results demonstrate the efficiency and effectiveness of the proposed low rank based methods. One step further, we consider MIML learning problem which is usually more complicated than ML learning: besides the possibility of having multiple labels, each object can be described by multiple instances simultaneously which may significantly increase the size of data. To handle the MIML learning problem we first propose and develop a novel sparsity-based MIML learning algorithm. Our idea is to formulate and construct a transductive objective function for label indicator to be learned by using the method of random walk with restart that exploits the relationships among instances and labels of objects, and computes the affinities among the objects. Then sparsity can be introduced in the labels indicator of the objective function such that relevant and irrelevant objects with respect to a given class can be distinguished. The resulting sparsity-based MIML model can be given as a constrained convex optimization problem, and it can be solved very efficiently by using the augmented Lagrangian method (ALM). Experimental results on benchmark data have shown that the proposed sparse-MIML algorithm is computationally efficient, and effective in label prediction for MIML data. We demonstrate that the performance of the proposed method is better than the other testing MIML learning algorithms. Moreover, one big concern of an MIML learning algorithm is computational efficiency, especially when figuring out classification problem for large data sets. Most of the existing methods for solving MIML problems in literature may take a long computational time and have a huge storage cost for large MIML data sets. In this thesis, our main aim is to propose and develop an efficient Markov Chain based learning algorithm for MIML problems. Our idea is to perform labels classification among objects and features identification iteratively through two Markov chains constructed by using objects and features respectively. The classification of objects can be obtained by using labels propagation via training data in the iterative method. Because it is not necessary to compute and store a huge affinity matrix among objects/instances, both the storage and computational time can be reduced significantly. For instance, when we handle MIML image data set of 10000 objects and 250000 instances, the proposed algorithm takes about 71 seconds. Also experimental results on some benchmark data sets are reported to illustrate the effectiveness of the proposed method in one-error, ranking loss, coverage and average precision, and show that it is competitive with the other methods. In addition, we consider the module identification from large biological networks. Nowadays, the interactions among different genes, proteins and other small molecules are becoming more and more significant and have been studied intensively. One general way that helps people understand these interactions is to analyze networks constructed from genes/proteins. In particular, module structure as a common property of most biological networks has drawn much attention of researchers from different fields. However, biological networks might be corrupted by noise in the data which often lead to the miss-identification of module structure. Besides, some edges in network might be removed (or some nodes might be miss-connected) when improper parameters are selected which may also affect the module identified significantly. In conclusion, the module identification results are sensitive to noise as well as parameter selection of network. In this thesis, we consider employing multiple networks for consistent module detection in order to reduce the effect of noise and parameter settings. Instead of studying different networks separately, our idea is to combine multiple networks together by building them into tensor structure data. Then given any node as prior label information, tensor-based Markov chains are constructed iteratively for identification of the modules shared by the multiple networks. In addition, the proposed tensor-based Markov chain algorithm is capable of simultaneously evaluating the contribution from each network. It would be useful to measure the consistency of modules in the multiple networks. In the experiments, we test our method on two groups of gene co-expression networks from human beings. We also validate biological meaning of modules identified by the proposed method. Finally, we introduce random under-sampling techniques with application to X-ray computed tomography (CT). Under-sampling techniques are realized to be powerful tools of reducing the scale of problem especially for large data analysis. However, information loss seems to be un-avoidable which inspires different under-sampling strategies for preserving more useful information. Here we focus on under-sampling for the real-world CT reconstruction problem. The main motivation is to reduce the total radiation dose delivered to patient which has arisen significant clinical concern for CT imaging. We compare two popular regular CT under-sampling strategies with ray random under-sampling. The results support the conclusion that random under-sampling always outperforms regular ones especially for the high down-sampling ratio cases. Moreover, based on the random ray under-sampling strategy, we propose a novel scatter removal method which further improves performance of ray random under-sampling in CT reconstruction.

APA, Harvard, Vancouver, ISO, and other styles

13

Choudhury, A. "Fast machine learning algorithms for large data." Thesis, University of Southampton, 2002. https://eprints.soton.ac.uk/45907/.

Full text

Abstract:

Traditional machine learning has been largely concerned with developing techniques for small or modestly sized datasets. These techniques fail to scale up well for large data problems, a situation becoming increasingly common in today’s world. This thesis is concerned with the problem of learning with large data. In particular, it considers solving the three basic tasks in machine learning, viz., classification, regression and density approximation. We develop fast memory- efficient algorithmics for kernel machine training and deployment. These include considering efficient preprocessing steps for speeding up existing training algorithms as well as developing a general purpose framework for machine learning using kernel methods. Emphasis is placed on the development of computationally efficient greedy schemes which leverage state-of-the-art techniques from the field of numerical linear algebra. The algorithms presented here underline a basic premise that it is possible to efficiently train a kernel machine on large data, which generalizes well and yet has a sparse expansion leading to improved runtime performance. Empirical evidence is provided in support of this premise throughout the thesis.

APA, Harvard, Vancouver, ISO, and other styles

14

Westerlund, Fredrik. "CREDIT CARD FRAUD DETECTION (Machine learning algorithms)." Thesis, Umeå universitet, Statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136031.

Full text

Abstract:

Credit card fraud is a field with perpetrators performing illegal actions that may affect other individuals or companies negatively. For instance, a criminalcan steal credit card information from an account holder and then conduct fraudulent transactions. The activities are a potential contributory factor to how illegal organizations such as terrorists and drug traffickers support themselves financially. Within the machine learning area, there are several methods that possess the ability to detect credit card fraud transactions; supervised learning and unsupervised learning algorithms. This essay investigates the supervised approach, where two algorithms (Hellinger Distance Decision Tree (HDDT) and Random Forest) are evaluated on a real life dataset of 284,807 transactions. Under those circumstances, the main purpose is to develop a “well-functioning” model with a reasonable capacity to categorize transactions as fraudulent or legit. As the data is heavily unbalanced, reducing the false-positive rate is also an important part when conducting research in the chosen area. In conclusion, evaluated algorithms present a fairly similar outcome, where both models have the capability to distinguish the classes from each other. However, the Random Forest approach has a better performance than HDDT in all measures of interest.

APA, Harvard, Vancouver, ISO, and other styles

15

Li, Yunming. "Machine vision algorithms for mining equipment automation." Thesis, Queensland University of Technology, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

16

Liu, Ming. "Design and Evaluation of Algorithms for Online Machine Scheduling Problems." Phd thesis, Ecole Centrale Paris, 2009. http://tel.archives-ouvertes.fr/tel-00453316.

Full text

Abstract:

Dans cette thèse, nous proposons et évaluons des algorithmes pour résoudre des problèmes d'ordonnancement en ligne. Pendant des décennies, les études en ordonnancement considèrent des modèles déterministes où toutes les informations nécessaires pour la définition du problème sont supposées connues à l'avance. Cette hypothèse n'est généralement pas réaliste. Ceci a motivé les études sur l'ordonnancement en ligne. Dans un problème d'ordonnancement en ligne, un algorithme doit prendre des décisions sans connaissance du futur. L'analyse compétitive est généralement la méthode utilisée pour évaluer les performances de tels algorithmes. Dans cette analyse, la performance d'un algorithme en ligne est mesurée par le ratio compétitif qui est le ratio dans le pire cas entre la performance de la solution obtenue et celle d'une solution optimale hors ligne. Nous considérons principalement deux paradigmes en ligne: celui où les tâches se présentent dans la liste et celui où les tâches arrivent au fur et à mesure. Sur la base de ces deux paradigmes, nous considérons différents modèles : une seule machine, deux machines identiques parallèles, deux machines uniformes parallèles, batch machines et open shop. Pour chacun des problèmes, nous démontrons une borne inférieure de ratios compétitifs et proposons des algorithmes en ligne. Ensuite, nous évaluons la performance de ces algorithmes à l'aide de l'analyse compétitive. Pour certains problèmes, nous montrons que les algorithmes proposés sont optimaux dans le sens où le ratio compétitif est égal à la borne inférieure.

APA, Harvard, Vancouver, ISO, and other styles

17

Thompson, Simon Giles. "Distributed boosting algorithms." Thesis, University of Portsmouth, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.285529.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Wang, Gang. "Solution path algorithms : an efficient model selection approach /." View abstract or full-text, 2007. http://library.ust.hk/cgi/db/thesis.pl?CSED%202007%20WANGG.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Li, Xiao. "Regularized adaptation : theory, algorithms, and applications /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/5928.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Kalyanasundaram, Subrahmanyam. "Turing machine algorithms and studies in quasi-randomness." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42808.

Full text

Abstract:

Randomness is an invaluable resource in theoretical computer science. However, pure random bits are hard to obtain. Quasi-randomness is a tool that has been widely used in eliminating/reducing the randomness from randomized algorithms. In this thesis, we study some aspects of quasi-randomness in graphs. Specifically, we provide an algorithm and a lower bound for two different kinds of regularity lemmas. Our algorithm for FK-regularity is derived using a spectral characterization of quasi-randomness. We also use a similar spectral connection to also answer an open question about quasi-random tournaments. We then provide a "Wowzer" type lower bound (for the number of parts required) for the strong regularity lemma. Finally, we study the derandomization of complexity classes using Turing machine simulations. 1. Connections between quasi-randomness and graph spectra. Quasi-random (or pseudo-random) objects are deterministic objects that behave almost like truly random objects. These objects have been widely studied in various settings (graphs, hypergraphs, directed graphs, set systems, etc.). In many cases, quasi-randomness is very closely related to the spectral properties of the combinatorial object that is under study. In this thesis, we discover the spectral characterizations of quasi-randomness in two different cases to solve open problems. A Deterministic Algorithm for Frieze-Kannan Regularity: The Frieze-Kannan regularity lemma asserts that any given graph of large enough size can be partitioned into a number of parts such that, across parts, the graph is quasi-random. . It was unknown if there was a deterministic algorithm that could produce a parition satisfying the conditions of the Frieze-Kannan regularity lemma in deterministic sub-cubic time. In this thesis, we answer this question by designing an O(n[superscript]w) time algorithm for constructing such a partition, where w is the exponent of fast matrix multiplication. Even Cycles and Quasi-Random Tournaments: Chung and Graham in had provided several equivalent characterizations of quasi-randomness in tournaments. One of them is about the number of "even" cycles where even is defined in the following sense. A cycle is said to be even, if when walking along it, an even number of edges point in the wrong direction. Chung and Graham showed that if close to half of the 4-cycles in a tournament T are even, then T is quasi-random. They asked if the same statement is true if instead of 4-cycles, we consider k-cycles, for an even integer k. We resolve this open question by showing that for every fixed even integer k geq 4, if close to half of the k-cycles in a tournament T are even, then T must be quasi-random. 2. A Wowzer type lower bound for the strong regularity lemma. The regularity lemma of Szemeredi asserts that one can partition every graph into a bounded number of quasi-random bipartite graphs. Alon, Fischer, Krivelevich and Szegedy obtained a variant of the regularity lemma that allows one to have an arbitrary control on this measure of quasi-randomness. However, their proof only guaranteed to produce a partition where the number of parts is given by the Wowzer function, which is the iterated version of the Tower function. We show here that a bound of this type is unavoidable by constructing a graph H, with the property that even if one wants a very mild control on the quasi-randomness of a regular partition, then any such partition of H must have a number of parts given by a Wowzer-type function. 3. How fast can we deterministically simulate nondeterminism? We study an approach towards derandomizing complexity classes using Turing machine simulations. We look at the problem of deterministically counting the exact number of accepting computation paths of a given nondeterministic Turing machine. We provide a deterministic algorithm, which runs in time roughly O(sqrt(S)), where S is the size of the configuration graph. The best of the previously known methods required time linear in S. Our result implies a simulation of probabilistic time classes like PP, BPP and BQP in the same running time. This is an improvement over the currently best known simulation by van Melkebeek and Santhanam.

APA, Harvard, Vancouver, ISO, and other styles

21

Janagam, Anirudh, and Saddam Hossen. "Analysis of Network Intrusion Detection System with Machine Learning Algorithms (Deep Reinforcement Learning Algorithm)." Thesis, Blekinge Tekniska Högskola, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17126.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Torcolacci, Veronica. "Implementation of Machine Learning Algorithms on Hardware Accelerators." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

Nowadays, cutting-edge technology, innovation and efficiency are the cornerstones on which industries are based. Therefore, prognosis and health management have started to play a key role in the prevention of crucial faults and failures. Recognizing malfunctions in a system in advance is fundamental both in economic and safety terms. This obviously requires a lot of data – mainly information from sensors or machine control - to be processed, and it’s in this scenario that Machine Learning comes to the aid. This thesis aims to apply these methodologies to prognosis in automatic machines and has been carried out at LIAM lab (Laboratorio Industriale Automazione Macchine per il packaging), an industrial research laboratory born from the experience of leading companies in the sector. Machine learning techniques such as neural networks will be exploited to solve the problems of classification that derive from the system in exam. Such algorithms will be combined with systems identification techniques that performs an estimate of the plant parameters and a feature reduction by compressing the data. This makes easier for the neural networks to distinguish the different operating conditions and perform a good prognosis activity. Practically the algorithms will be developed in Python and then implemented on two hardware accelerators, whose performance will be evaluated.

APA, Harvard, Vancouver, ISO, and other styles

23

Lim, Choon Kee. "Hypercube machine implementation of low-level vision algorithms." Ohio : Ohio University, 1989. http://www.ohiolink.edu/etd/view.cgi?ohiou1182864143.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ouyang, Hua. "Optimal stochastic and distributed algorithms for machine learning." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49091.

Full text

Abstract:

Stochastic and data-distributed optimization algorithms have received lots of attention from the machine learning community due to the tremendous demand from the large-scale learning and the big-data related optimization. A lot of stochastic and deterministic learning algorithms are proposed recently under various application scenarios. Nevertheless, many of these algorithms are based on heuristics and their optimality in terms of the generalization error is not sufficiently justified. In this talk, I will explain the concept of an optimal learning algorithm, and show that given a time budget and proper hypothesis space, only those achieving the lower bounds of the estimation error and the optimization error are optimal. Guided by this concept, we investigated the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We proposed a novel algorithm named Accelerated Nonsmooth Stochastic Gradient Descent, which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algorithm that can achieve the optimal O(1/t) rate for minimizing nonsmooth loss functions. The fast rates are confirmed by empirical comparisons with state-of-the-art algorithms including the averaged SGD. The Alternating Direction Method of Multipliers (ADMM) is another flexible method to explore function structures. In the second part we proposed stochastic ADMM that can be applied to a general class of convex and nonsmooth functions, beyond the smooth and separable least squares loss used in lasso. We also demonstrate the rates of convergence for our algorithm under various structural assumptions of the stochastic function: O(1/sqrt{t}) for convex functions and O(log t/t) for strongly convex functions. A novel application named Graph-Guided SVM is proposed to demonstrate the usefulness of our algorithm. We also extend the scalability of stochastic algorithms to nonlinear kernel machines, where the problem is formulated as a constrained dual quadratic optimization. The simplex constraint can be handled by the classic Frank-Wolfe method. The proposed stochastic Frank-Wolfe methods achieve comparable or even better accuracies than state-of-the-art batch and online kernel SVM solvers, and are significantly faster. The last part investigates the problem of data-distributed learning. We formulate it as a consensus-constrained optimization problem and solve it with ADMM. It turns out that the underlying communication topology is a key factor in achieving a balance between a fast learning rate and computation resource consumption. We analyze the linear convergence behavior of consensus ADMM so as to characterize the interplay between the communication topology and the penalty parameters used in ADMM. We observe that given optimal parameters, the complete bipartite and the master-slave graphs exhibit the fastest convergence, followed by bi-regular graphs.

APA, Harvard, Vancouver, ISO, and other styles

25

Odetayo, Michael Omoniyi. "On genetic algorithms in machine learning and optimisation." Thesis, University of Strathclyde, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239866.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Sengupta, Sudipta 1974. "Algorithms and approximation schemes for machine scheduling problems." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80240.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Brunning, James Jonathan Jesse. "Alignment models and algorithms for statistical machine translation." Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.608922.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Al-Abri, Eman S. "Modelling atmospheric ozone concentration using machine learning algorithms." Thesis, Loughborough University, 2016. https://dspace.lboro.ac.uk/2134/25091.

Full text

Abstract:

Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages.

APA, Harvard, Vancouver, ISO, and other styles

29

Lim, Choon Kee. "Hypercube machine implementation of low-level vision algorithms." Ohio University / OhioLINK, 1988. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1182864143.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Dabert, Geoffrey. "Application of Machine Learning techniques to Optimization algorithms." Thesis, KTH, Optimeringslära och systemteori, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-207471.

Full text

Abstract:

Optimization problems have been immuned to any attempt of combination with machine learning until a decade ago but it is now an active research field. This thesis has studied the potential implementation of a machine learning heuristic to improve the resolution of the optimization scheduling problems based on a Constraint Programming solver. Some scheduling problems, known as N P -hard problems, suffer from large computational cost (large number of jobs to schedule) and consequent human effort (well-suited heuristics need to be derived). Moreover industrial scheduling problems obviously evolves over time but a lot of features and the basic structure remain the same. Hence they have potential in the implementation a supervised-learning-based heuristic. First part of the study was to model a given benchmark of instances and im- plement some famous heuristics (as earliest due date, combined with the largest duration) in order to solve the benchmark. Based on the none-optimality of returned solutions, primaries instances were choosen to implement our method. The second part represents the procedure which has been set up to design a supervised-learning-based heuristic. An instance generator was first built to map the potential industrial evolutions of the instances. It returned secondaries instances representing the learning database. Then a CP-well-suited node ex- traction scheme was set up to collect relevant information from the resolution of the search tree. It will collect data from nodes of the search tree given a proper criteria. These nodes are next projected onto a constant-dimensional space which described the system, the underlying subtree and the impact of the affectations. Upon these features and designed target values statistical mod- els are implemented. A linear and a gradient boosting regressions have been implemented, calibrated and tuned upon the data. Last was to integrate the supervised-learning model into an heuristic framework. This has been done via a soft propagation module to try the instantiation of all the children of the considered node and apply the given module upon them. The selection decision rule was based upon a reconstructed score. Third part was to test the procedure implemented. New secondaries instances were generated and supervised- learning-based heuristic tested against the earliest due date one. The procedure was tested upon two different instances. The integrated heuristic returned positive results for both instances. For the first one (10 jobs to schedule) a gain in the first solution found (resp. the number of backtracks) of 18% (resp. 13% were realized. For the second instance (90 jobs to schedule) a gain in the first solution found of at least 16%. The results come to validate the procedure implemented and the methodology used.

APA, Harvard, Vancouver, ISO, and other styles

31

Awe, Olusegun P. "Machine learning algorithms for cognitive radio wireless networks." Thesis, Loughborough University, 2015. https://dspace.lboro.ac.uk/2134/19609.

Full text

Abstract:

In this thesis new methods are presented for achieving spectrum sensing in cognitive radio wireless networks. In particular, supervised, semi-supervised and unsupervised machine learning based spectrum sensing algorithms are developed and various techniques to improve their performance are described. Spectrum sensing problem in multi-antenna cognitive radio networks is considered and a novel eigenvalue based feature is proposed which has the capability to enhance the performance of support vector machines algorithms for signal classification. Furthermore, spectrum sensing under multiple primary users condition is studied and a new re-formulation of the sensing task as a multiple class signal detection problem where each class embeds one or more states is presented. Moreover, the error correcting output codes based multi-class support vector machines algorithms is proposed and investigated for solving the multiple class signal detection problem using two different coding strategies. In addition, the performance of parametric classifiers for spectrum sensing under slow fading channel is studied. To address the attendant performance degradation problem, a Kalman filter based channel estimation technique is proposed for tracking the temporally correlated slow fading channel and updating the decision boundary of the classifiers in real time. Simulation studies are included to assess the performance of the proposed schemes. Finally, techniques for improving the quality of the learning features and improving the detection accuracy of sensing algorithms are studied and a novel beamforming based pre-processing technique is presented for feature realization in multi-antenna cognitive radio systems. Furthermore, using the beamformer derived features, new algorithms are developed for multiple hypothesis testing facilitating joint spatio-temporal spectrum sensing. The key performance metrics of the classifiers are evaluated to demonstrate the superiority of the proposed methods in comparison with previously proposed alternatives.

APA, Harvard, Vancouver, ISO, and other styles

32

Granström, Daria, and Johan Abrahamsson. "Loan Default Prediction using Supervised Machine Learning Algorithms." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252312.

Full text

Abstract:

It is essential for a bank to estimate the credit risk it carries and the magnitude of exposure it has in case of non-performing customers. Estimation of this kind of risk has been done by statistical methods through decades and with respect to recent development in the field of machine learning, there has been an interest in investigating if machine learning techniques can perform better quantification of the risk. The aim of this thesis is to examine which method from a chosen set of machine learning techniques exhibits the best performance in default prediction with regards to chosen model evaluation parameters. The investigated techniques were Logistic Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificial Neural Network and Support Vector Machine. An oversampling technique called SMOTE was implemented in order to treat the imbalance between classes for the response variable. The results showed that XGBoost without implementation of SMOTE obtained the best result with respect to the chosen model evaluation metric.
Det är nödvändigt för en bank att ha en bra uppskattning på hur stor risk den bär med avseende på kunders fallissemang. Olika statistiska metoder har använts för att estimera denna risk, men med den nuvarande utvecklingen inom maskininlärningsområdet har det väckt ett intesse att utforska om maskininlärningsmetoder kan förbättra kvaliteten på riskuppskattningen. Syftet med denna avhandling är att undersöka vilken metod av de implementerade maskininlärningsmetoderna presterar bäst för modellering av fallissemangprediktion med avseende på valda modelvaldieringsparametrar. De implementerade metoderna var Logistisk Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificiella neurala nätverk och Stödvektormaskin. En översamplingsteknik, SMOTE, användes för att behandla obalansen i klassfördelningen för svarsvariabeln. Resultatet blev följande: XGBoost utan implementering av SMOTE visade bäst resultat med avseende på den valda metriken.

APA, Harvard, Vancouver, ISO, and other styles

33

Lubbe, H. G., and B. J. Kotze. "Machine learning through self generating programs." Interim : Interdisciplinary Journal, Vol 6, Issue 2: Central University of Technology Free State Bloemfontein, 2007. http://hdl.handle.net/11462/407.

Full text

Abstract:

Published Article
People have tried different ways to make machines intelligent. One option is to use a simulated neural net as a platform for Genetic Algorithms. Neural nets are a combination of neurons in a certain pattern. Neurons in a neural net system are a simulation of neurons in an organism's brain. Genetic Algorithms represent an emulation of evolution in nature. The question arose as to why write a program to simulate neurons if a program can execute the functions a combination of neurons would generate. For this reason a virtual robot indicated in Figure 1 was made "intelligent" by developing a process where the robot creates a program for itself. Although Genetic Algorithms might have been used in the past to generate a program, a new method called Single-Chromosome-Evolution-Algorithms (SCEA) was introduced and compared to Genetic Algorithms operation. Instructions in the program were changed by using either Genetic Algorithms or alternatively with SCEA where only one simulation was needed per generation to be tested by the fitness of the system.

APA, Harvard, Vancouver, ISO, and other styles

34

Tu, Zhuozhuo. "Towards Robust and Reliable Machine Learning: Theory and Algorithms." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/28832.

Full text

Abstract:

Machine learning models, especially deep neural networks, have achieved impressive performance across a variety of domains including image classification, natural language processing, and speech recognition. However, recent examples have shown that these models are susceptible to test-time shift such as adversarial attacks or distributional shift. Additionally, machine learning algorithms require having access to personal data, and the learned model can be discriminatory with respect to minority social groups, raising privacy and fairness risks. To tackle these issues, in this thesis, we study several topics on robustness and reliability in machine learning, with a focus on generalization, adversarial examples, distributional robustness and fairness (privacy). We start with the generalization problem in recurrent neural networks. We propose new generalization bounds for recurrent neural networks based on matrix 1-norm and Fisher-Rao norm. Our bound has no explicit dependency on the size of networks and can potentially explain the effect of noise training on generalization of recurrent neural networks as demonstrated by our empirical results. We then move forward to dataset shift robustness, which involves adversarial examples and distributional shift. For adversarial examples, we theoretically analyze the adversarially robust generalization properties of machine learning models. For distributional shift, we focus on learning a robust model and propose new algorithms to solve Wasserstein distributionally robust optimization problem which apply to arbitrary level of robustness and general loss functions. Lastly, to ensure both privacy and fairness, we present a fairness-aware federated learning framework and provide an efficient and provably convergent algorithm to solve it. Experimental results show that our method can lead to significant benefits in practice in terms of both accuracy and fairness.

APA, Harvard, Vancouver, ISO, and other styles

35

Granek, Justin. "Application of machine learning algorithms to mineral prospectivity mapping." Thesis, University of British Columbia, 2016. http://hdl.handle.net/2429/59988.

Full text

Abstract:

In the modern era of diminishing returns on fixed exploration budgets, challenging targets, and ever-increasing numbers of multi-parameter datasets, proper management and integration of available data is a crucial component of any mineral exploration program. Machine learning algorithms have successfully been used for years by the technology sector to accomplish just this task on their databases, and recent developments aim at appropriating these successes to the field of mineral exploration. Framing the exploration task as a supervised learning problem, the geological, geochemical and geophysical information can be used as training data, and known mineral occurences can be used as training labels. The goal is to parameterize the complex relationships between the data and the labels such that mineral potential can be estimated in under-explored regions using available geoscience data. Numerous models and algorithms have been attempted for mineral prospectivity mapping in the past, and in this thesis we propose two new approaches. The first is a modified support vector machine algorithm which incorporates uncertainties on both the data and the labels. Due to the nature of geoscience data and the characteristics of the mineral prospectivity mapping problem, uncertainties are known to be very important. The algorithm is demonstrated on a synthetic dataset to highlight this importance, and then used to generate a prospectivity map for copper-gold porphyry targets in central British Columbia using the QUEST dataset as a case study. The second approach, convolutional neural networks, was selected due to its inherent sensitivity to spatial patterns. Though neural networks have been used for mineral prospectivity mapping, convolutional neural nets have yet to be applied to the problem. Having gained extreme popularity in the computer vision field for tasks involving image segmentation, identification and anomaly detection, the algorithm is ideally suited to handle the mineral prospectivity mapping problem. A CNN code is developed in Julia, then tested on a synthetic example to illustrate its effectiveness at identifying coincident structures in a multi-modal dataset. Finally, a subset of the QUEST dataset is used to generate a prospectivity map using CNNs.
Science, Faculty of
Earth, Ocean and Atmospheric Sciences, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

36

Chen, Tracy Lin. "Performance of ordinal algorithms for parallel machine scheduling problems." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/6458.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Artchounin, Daniel. "Tuning of machine learning algorithms for automatic bug assignment." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139230.

Full text

Abstract:

In software development projects, bug triage consists mainly of assigning bug reports to software developers or teams (depending on the project). The partial or total automation of this task would have a positive economic impact on many software projects. This thesis introduces a systematic four-step method to find some of the best configurations of several machine learning algorithms intending to solve the automatic bug assignment problem. These four steps are respectively used to select a combination of pre-processing techniques, a bug report representation, a potential feature selection technique and to tune several classifiers. The aforementioned method has been applied on three software projects: 66 066 bug reports of a proprietary project, 24 450 bug reports of Eclipse JDT and 30 358 bug reports of Mozilla Firefox. 619 configurations have been applied and compared on each of these three projects. In production, using the approach introduced in this work on the bug reports of the proprietary project would have increased the accuracy by up to 16.64 percentage points.

APA, Harvard, Vancouver, ISO, and other styles

38

Darnald, Johan. "Predicting Attrition in Financial Data with Machine Learning Algorithms." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-225852.

Full text

Abstract:

For most businesses there are costs involved when acquiring new customers and having longer relationships with customers is therefore often more profitable. Predicting if an individual is prone to leave the business is then a useful tool to help any company take actions to mitigate this cost. The event when a person ends their relationship with a business is called attrition or churn. Predicting peoples actions is however hard and many different factors can affect their choices. This paper investigates different machine learning methods for predicting attrition in the customer base of a bank. Four different methods are chosen based on the results they have shown in previous research and these are then tested and compared to find which works best for predicting these events. Four different datasets from two different products and with two different applications are created from real world data from a European bank. All methods are trained and tested on each dataset. The results of the tests are then evaluated and compared to find what works best. The methods found in previous research to most reliably achieve good results in predicting churn in banking customers are the Support Vector Machine, Neural Network, Balanced Random Forest, and the Weighted Random Forest. The results show that the Balanced Random Forest achieves the best results with an average AUC of 0.698 and an average F-score of 0.376. The accuracy and precision of the model are concluded to not be enough to make definite decisions but can be used with other factors such as profitability estimations to improve the effectiveness of any actions taken to prevent the negative effects of churn.
För de flesta företag finns det en kostnad involverad i att skaffa nya kunder. Längre relationer med kunder är därför ofta mer lönsamma. Att kunna förutsäga om en kund är nära att lämna företaget är därför ett användbart verktyg för att kunna utföra åtgärder för att minska denna kostnad. Händelsen när en kund avslutar sin relation med ett företag kallas här efter kundförlust. Att förutsäga människors handlingar är däremot svårt och många olika faktorer kan påverka deras val. Denna avhandling undersöker olika maskininlärningsmetoder för att förutsäga kundförluster hos en bank. Fyra metoder väljs baserat på tidigare forskning och dessa testas och jämförs sedan för att hitta vilken som fungerar bäst för att förutsäga dessa händelser. Fyra dataset från två olika produkter och med två olika användningsområden skapas från verklig data ifrån en Europeisk bank. Alla metoder tränas och testas på varje dataset. Resultaten från dessa test utvärderas och jämförs sedan för att få reda på vilken metod som fungerar bäst. Metoderna som enligt tidigare forskning ger de mest pålitliga och bästa resultaten för att förutsäga kundförluster hos banker är stödvektormaskin, neurala nätverk, balanserad slumpmässig skog och vägd slumpmässig skog. Resultatet av testerna visar att en balanserad slumpmässig skog får bäst resultat med en genomsnittlig AUC på 0.698 och ett F-värde på 0.376. Träffsäkerheten och det positiva prediktiva värdet på metoden är inte tillräckligt för att ta definitiva handlingar med men kan användas med andra faktorer så som lönsamhetsuträkningar för att förbättra effektiviteten av handlingar som tas för att minska de negativa effekterna av kundförluster.

APA, Harvard, Vancouver, ISO, and other styles

39

Raykar, Vikas Chandrakant. "Scalable machine learning for massive datasets fast summation algorithms /." College Park, Md. : University of Maryland, 2007. http://hdl.handle.net/1903/6797.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2007.
Thesis research directed by: Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

40

Xu, Yi-Chang. "Parallel thinning algorithms and their implementation on hypercube machine." Ohio : Ohio University, 1991. http://www.ohiolink.edu/etd/view.cgi?ohiou1183989550.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Poke, Marius [Verfasser]. "Algorithms for High-Performance State-Machine Replication / Marius Poke." Hamburg : Helmut-Schmidt-Universität, Bibliothek, 2019. http://d-nb.info/1192766512/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Ibrahim, Osman Ali Sadek. "Evolutionary algorithms and machine learning techniques for information retrieval." Thesis, University of Nottingham, 2017. http://eprints.nottingham.ac.uk/47696/.

Full text

Abstract:

In the context of Artificial Intelligence research, Evolutionary Algorithms and Machine Learning (EML) techniques play a fundamental role for optimising Information Retrieval (IR). However, numerous research studies did not consider the limitation of using EML at the beginning of establishing the IR systems, while other research studies compared EML techniques by only presenting overall final results without analysing important experimental settings such as the training or evolving run-times against IR effectiveness obtained. Furthermore, most papers describing research on EML techniques in IR domain did not consider the memory size requirements for applying such techniques. This thesis seeks to address some research gaps of applying EML techniques to IR systems. It also proposes to apply (1+1)-Evolutionary Strategy ((1+1)-ES) with and without gradient step-size to achieve improvements in IR systems. The thesis starts by identifying the limitation of applying EML techniques at the beginning of the IR system. This limitation is that all IR test collections are only partially judged to only some user queries. This means that the majority of documents in the IR test collections have no relevance labels for any of the user queries. These relevance labels are used to check the quality of the evolved solution in each evolving iteration of the EML techniques. Thus, this thesis introduces a mathematical approach instead of the EML technique in the early stage of establishing the IR system. It also shows the impact of the pre-processing procedure in this mathematical approach. The heuristic limitations in the IR processes such as in pre-processing procedure inspires the demands of EML technique to optimise IR systems after gathering the relevance labels. This thesis proposes a (1+1)-Evolutionary Gradient Strategy ((1+1)-EGS) to evolve Global Term Weights (GTW) in IR documents. The GTW is a value assigned to each index term to indicate the topic of the documents. It has the discrimination value of the term to discriminate between documents in the same collection. The (1+1)-EGS technique is used by two methods for fully and partially evolved procedures. In the two methods, partially evolved method outperformed the mathematical model (Term Frequency-Average Term Occurrence (TF-ATO)), the probabilistic model (Okapi-BM25) and the fully evolved method. The evaluation metrics for these experiments were the Mean Average Precision (MAP), the Average Precision (AP) and the Normalized Discounted Cumulative Gain (NDCG). Another important process in IR is the supervised Learning to Rank (LTR) of the fully judged datasets after gathering the relevance labels from user interaction. The relevance labels indicate that every document is either relevant or irrelevant in a certain degree to a user query. LTR is one of the current problems in IR that attracts the attention from researchers. The LTR problem is mainly about ranking the retrieved documents in search engines, question answering and product recommendation systems. There are a number of LTR approaches from the areas of EML. Most approaches have the limitation of being too slow or not being very effective or presenting too large a problem size. This thesis investigates a new application of a (1+1)-Evolutionary Strategy with three initialisation techniques hence resulting in three algorithm variations (ES-Rank, IESR-Rank and IESVM-Rank), to tackle the LTR problem. Experimental results from comparing the proposed method to fourteen EML techniques from the literature, show that IESR-Rank achieves the overall best performance. Ten datasets; which are MSLR-WEB10K dataset, LETOR 4 datasets, LETOR 3 datasets; and five performance metrics, Mean Average Precision (MAP), Root Mean Square Error (RMSE), Precision (P@10), Reciprocal Rank (RR@10), Normalised Discounted Cumulative Gain (NDCG@10) at top-10 query-document pairs retrieved, were used in the experiments. Finally, this thesis presents the benefits of using ES-Rank to optimise online click model that simulate user click interactions. Generally, the contribution of this thesis is an effective and efficient EML method for tackling various processes within IR. The thesis advances the understanding of how EML techniques can be applied to improve IR systems.

APA, Harvard, Vancouver, ISO, and other styles

43

Shah, Niyati S. "Implementing Machine Learning Algorithms for Identifying Microstructure of Materials." Thesis, California State University, Long Beach, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10837912.

Full text

Abstract:

Alloys of different materials are extensively used in many fields of our day-to-day life. Several studies are performed at a microscopic level to analyze the properties of such alloys. Manually evaluating these microscopic structures (microstructures) can be time-consuming. This thesis attempts to build different models that can automate the identification of an alloy from its microstructure. All the models were developed, with various supervised and unsupervised machine learning algorithms, and results of all the models were compared. The best accuracy of 92.01 ? 0.54% and 94.31 ? 0.59% was achieved, for identifying the type of an alloy from its microstructure (Task 1) and classifying the microstructure as belonging to either Ferrous, Non-Ferrous or Others class (Task 2), respectively. The model, which gave the best accuracy, was then used to build an Image Search Engine (ISE) that can predict the type of an alloy from its microstructure, search the microstructures by different keywords and search for visually similar microstructures.

APA, Harvard, Vancouver, ISO, and other styles

44

Johansson, David. "Price Prediction of Vinyl Records Using Machine Learning Algorithms." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-96464.

Full text

Abstract:

Machine learning algorithms have been used for price prediction within several application areas. Examples include real estate, the stock market, tourist accommodation, electricity, art, cryptocurrencies, and fine wine. Common approaches in studies are to evaluate the accuracy of predictions and compare different algorithms, such as Linear Regression or Neural Networks. There is a thriving global second-hand market for vinyl records, but the research of price prediction within the area is very limited. The purpose of this project was to expand on existing knowledge within price prediction in general to evaluate some aspects of price prediction of vinyl records. That included investigating the possible level of accuracy and comparing the efficiency of algorithms. A dataset of 37000 samples of vinyl records was created with data from the Discogs website, and multiple machine learning algorithms were utilized in a controlled experiment. Among the conclusions drawn from the results was that the Random Forest algorithm generally generated the strongest results, that results can vary substantially between different artists or genres, and that a large part of the predictions had a good accuracy level, but that a relatively small amount of large errors had a considerable effect on the general results.

APA, Harvard, Vancouver, ISO, and other styles

45

Vandehzad, Mashhood. "Efficient flight schedules with utilizing Machine Learning prediction algorithms." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20663.

Full text

Abstract:

While data is becoming more and more pervasive and ubiquitous in today’s life, businesses in modern societies prefer to take advantage of using data, in particular Big Data, in their decision-making and analytical processes to increase their product efficiency. Software applications which are being utilized in the airline industry are one of the most complex and sophisticated ones for which conducting of data analyzing techniques can make many decision making processes easier and faster. Flight delays are one of the most important areas under investigation in this area because they cause a lot of overhead costs to the airline companies on one hand and airports on the other hand. The aim of this study project is to utilize different machine learning algorithms on real world data to be able to predict flight delays for all causes like weather, passenger delays, maintenance, airport congestion etc in order to create more efficient flight schedules. We will use python as the programming language to create an artifact for our prediction purposes. We will analyse different algorithms from the accuracy perspective and propose a combined method in order to optimize our prediction results.

APA, Harvard, Vancouver, ISO, and other styles

46

Roychowdhury, Anirban. "Robust and Scalable Algorithms for Bayesian Nonparametric Machine Learning." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1511901271093727.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Liang, Jiongqian. "Human-in-the-loop Machine Learning: Algorithms and Applications." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1523988406039076.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Lanctot, J. Kevin (Joseph Kevin) Carleton University Dissertation Mathematics. "Discrete estimator algorithms: a mathematical model of machine learning." Ottawa, 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

49

Li, Ling Abu-Mostafa Yaser S. "Data complexity in machine learning and novel classification algorithms /." Diss., Pasadena, Calif. : Caltech, 2006. http://resolver.caltech.edu/CaltechETD:etd-04122006-114210.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Bäckman, David. "EVALUATION OF MACHINE LEARNING ALGORITHMS FOR SMS SPAM FILTERING." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163188.

Full text

Abstract:

The purpose of this thesis is to evaluate different machine learning algorithms and methods for text representation in order to determine what is best suited to use to distinguish between spam SMS and legitimate SMS. A data set that contains 5573 real SMS has been used to train the algorithms K-Nearest Neighbor, Support Vector Machine, Naive Bayes and Logistic Regression. The different methods that have been used to represent text are Bag of Words, Bigram and Word2Vec. In particular, it has been investigated if semantic text representations can improve the performance of classification. A total of 12 combinations have been evaluated with help of the metrics accuracy and F1-score.The results shows that Logistic Regression together with Bag of Words reach the highest accuracy and F1-score. Bigram as text representation seems to work worse then the others methods. Word2Vec can increase the performnce for K-Nearst Neigbor but not for the other algorithms.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!