Log in

Relevant bibliographies by topics / Ranking algorithms / Dissertations / Theses

Dissertations / Theses on the topic 'Ranking algorithms'

To see the other types of publications on this topic, follow the link: Ranking algorithms.

Author: Grafiati

Published: 4 June 2021

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Ranking algorithms.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Xu, Liqun. "Algorithms for random ranking generation." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0021/MQ54338.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wong, Brian Wai Fung. "Deep-web search engine ranking algorithms." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61246.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 79-80).
The deep web refers to content that is hidden behind HTML forms. The deep web contains a large collection of data that are unreachable by link-based search engines. A study conducted at University of California, Berkeley estimated that the deep web consists of around 91,000 terabytes of data, whereas the surface web is only about 167 terabytes. To access this content, one must submit valid input values to the HTML form. Several researchers have studied methods for crawling deep web content. One of the most promising methods uses unique wrappers for HTML forms. User inputs are first filtered through the wrappers before being submitted to the forms. However, this method requires a new algorithm for ranking search results generated by the wrappers. In this paper, I explore methods for ranking search results returned from a wrapped-based deep web search engine.
by Brian Wai Fung Wong.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

3

Trailović, Lidija. "Ranking and optimization of target tracking algorithms." online access from Digital Dissertation Consortium access full-text, 2002. http://libweb.cityu.edu.hk/cgi-bin/er/db/ddcdiss.pl?3074810.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Spanias, Demetris. "Professional tennis : quantitative models and ranking algorithms." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/24813.

Full text

Abstract:

Professional singles tennis is a popular global sport that attracts spectators and speculators alike. In recent years, financial trading related to sport outcomes has become a reality, thanks to the rise of online betting exchanges and the ever increasing development and deployment of quantitative models for sports. This thesis investigates the extent to which the outcome of a match between two professional tennis players can be forecast using quantitative models parameterised by historical data. Three different approaches are explored, each having its own advantages and disadvantages. Firstly, the problem is approached using a Markov chain to model a tennis point, estimating the probability of a player winning a point while serving. Such a probability can be used as a parameter to existing hierarchical models to estimate the probability of a player winning the match. We demonstrate how this probability can be estimated using varying subsets of historical player data and investigate their effect on results. Averaged historical data over varying opponents with different skill sets, does not necessarily provide a fair basis of comparison when evaluating the performance of players. The second approach presented is a technique that uses data, which includes only matches played against common opponents, to find the difference between the modelled players' probability of winning a point on their serve against each common opponent. This difference in probability for each common opponent is a 'transitive contribution' towards victory for the match being modelled. By combining these 'contributions' the 'Common-Opponent' model overcomes the problems of using average historical statistics at the cost of a shrinking data set. Finally, the thesis ventures into the field of player rankings. Rankings provide a fast and simple method for predicting match winners and comparing players. We present a variety of methods to generate such player rankings, either by making use of network analysis or hierarchical models. The generated rankings are then evaluated using their ability to correctly represent the subset of matches that were used to generate them as well as their ability to forecast future matches.

APA, Harvard, Vancouver, ISO, and other styles

5

Trotman, Andrew, and n/a. "Searching and ranking structured documents." University of Otago. Department of Computer Science, 2007. http://adt.otago.ac.nz./public/adt-NZDU20070403.110440.

Full text

Abstract:

It is common to see documents with explicit structure marked up in languages such as XML. Queries, on the other hand, typically have no structure. There is a clear mismatch, although documents contain structure it is typically not used in information retrieval. An efficient index structure for document-centric searching is proposed and its efficiency is discussed. It is shown to be at worst linear with respect to the number of occurrences of a given search term. The algorithm is then extended to accommodate element-centric information retrieval. Ranking algorithms for structured documents are examined. Genetic Algorithms are used to learn different weights for each structure present in a document. Applying these weights as part of a function is shown to yield significant precision improvements in some functions. Genetic Programming is then used to learn an entire ranking function. This function is shown to be portable between document collections. A query language for structured information retrieval is proposed. Use of this language in the 2004 INEX workshop resulted in a large decrease in query errors. Structured information retrieval is now a viable alternative to its unstructured counterpart. A successful query language, efficient indexing structures, and improved ranking functions are all presented.

APA, Harvard, Vancouver, ISO, and other styles

6

Dunaiski, Marcel Paul. "Analysing ranking algorithms and publication trends on scholarly citation networks." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/96106.

Full text

Abstract:

Thesis (MSc)--Stellenbosch University, 2014.
ENGLISH ABSTRACT: Citation analysis is an important tool in the academic community. It can aid universities, funding bodies, and individual researchers to evaluate scientific work and direct resources appropriately. With the rapid growth of the scientific enterprise and the increase of online libraries that include citation analysis tools, the need for a systematic evaluation of these tools becomes more important. The research presented in this study deals with scientific research output, i.e., articles and citations, and how they can be used in bibliometrics to measure academic success. More specifically, this research analyses algorithms that rank academic entities such as articles, authors and journals to address the question of how well these algorithms can identify important and high-impact entities. A consistent mathematical formulation is developed on the basis of a categorisation of bibliometric measures such as the h-index, the Impact Factor for journals, and ranking algorithms based on Google’s PageRank. Furthermore, the theoretical properties of each algorithm are laid out. The ranking algorithms and bibliometric methods are computed on the Microsoft Academic Search citation database which contains 40 million papers and over 260 million citations that span across multiple academic disciplines. We evaluate the ranking algorithms by using a large test data set of papers and authors that won renowned prizes at numerous Computer Science conferences. The results show that using citation counts is, in general, the best ranking metric. However, for certain tasks, such as ranking important papers or identifying high-impact authors, algorithms based on PageRank perform better. As a secondary outcome of this research, publication trends across academic disciplines are analysed to show changes in publication behaviour over time and differences in publication patterns between disciplines.
AFRIKAANSE OPSOMMING: Sitasiesanalise is ’n belangrike instrument in die akademiese omgewing. Dit kan universiteite, befondsingsliggams en individuele navorsers help om wetenskaplike werk te evalueer en hulpbronne toepaslik toe te ken. Met die vinnige groei van wetenskaplike uitsette en die toename in aanlynbiblioteke wat sitasieanalise insluit, word die behoefte aan ’n sistematiese evaluering van hierdie gereedskap al hoe belangriker. Die navorsing in hierdie studie handel oor die uitsette van wetenskaplike navorsing, dit wil sê, artikels en sitasies, en hoe hulle gebruik kan word in bibliometriese studies om akademiese sukses te meet. Om meer spesifiek te wees, hierdie navorsing analiseer algoritmes wat akademiese entiteite soos artikels, outeers en journale gradeer. Dit wys hoe doeltreffend hierdie algoritmes belangrike en hoë-impak entiteite kan identifiseer. ’n Breedvoerige wiskundige formulering word ontwikkel uit ’n versameling van bibliometriese metodes soos byvoorbeeld die h-indeks, die Impak Faktor vir journaale en die rang-algoritmes gebaseer op Google se PageRank. Verder word die teoretiese eienskappe van elke algoritme uitgelê. Die rang-algoritmes en bibliometriese metodes gebruik die sitasiedatabasis van Microsoft Academic Search vir berekeninge. Dit bevat 40 miljoen artikels en meer as 260 miljoen sitasies, wat oor verskeie akademiese dissiplines strek. Ons gebruik ’n groot stel toetsdata van dokumente en outeers wat bekende pryse op talle rekenaarwetenskaplike konferensies gewen het om die rang-algoritmes te evalueer. Die resultate toon dat die gebruik van sitasietellings, in die algemeen, die beste rangmetode is. Vir sekere take, soos die gradeering van belangrike artikels, of die identifisering van hoë-impak outeers, presteer algoritmes wat op PageRank gebaseer is egter beter. ’n Sekondêre resultaat van hierdie navorsing is die ontleding van publikasie tendense in verskeie akademiese dissiplines om sodoende veranderinge in publikasie gedrag oor tyd aan te toon en ook die verskille in publikasie patrone uit verskillende dissiplines uit te wys.

APA, Harvard, Vancouver, ISO, and other styles

7

Sun, Mingxuan. "Visualizing and modeling partial incomplete ranking data." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45793.

Full text

Abstract:

Analyzing ranking data is an essential component in a wide range of important applications including web-search and recommendation systems. Rankings are difficult to visualize or model due to the computational difficulties associated with the large number of items. On the other hand, partial or incomplete rankings induce more difficulties since approaches that adapt well to typical types of rankings cannot apply generally to all types. While analyzing ranking data has a long history in statistics, construction of an efficient framework to analyze incomplete ranking data (with or without ties) is currently an open problem. This thesis addresses the problem of scalability for visualizing and modeling partial incomplete rankings. In particular, we propose a distance measure for top-k rankings with the following three properties: (1) metric, (2) emphasis on top ranks, and (3) computational efficiency. Given the distance measure, the data can be projected into a low dimensional continuous vector space via multi-dimensional scaling (MDS) for easy visualization. We further propose a non-parametric model for estimating distributions of partial incomplete rankings. For the non-parametric estimator, we use a triangular kernel that is a direct analogue of the Euclidean triangular kernel. The computational difficulties for large n are simplified using combinatorial properties and generating functions associated with symmetric groups. We show that our estimator is computational efficient for rankings of arbitrary incompleteness and tie structure. Moreover, we propose an efficient learning algorithm to construct a preference elicitation system from partial incomplete rankings, which can be used to solve the cold-start problems in ranking recommendations. The proposed approaches are examined in experiments with real search engine and movie recommendation data.

APA, Harvard, Vancouver, ISO, and other styles

8

Zacharia, Giorgos 1974. "Regularized algorithms for ranking, and manifold learning for related tasks." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/47753.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
Includes bibliographical references (leaves 119-127).
This thesis describes an investigation of regularized algorithms for ranking problems for user preferences and information retrieval problems. We utilize regularized manifold algorithms to appropriately incorporate data from related tasks. This investigation was inspired by personalization challenges in both user preference and information retrieval ranking problems. We formulate the ranking problem of related tasks as a special case of semi-supervised learning. We examine how to incorporate instances from related tasks, with the appropriate penalty in the loss function to optimize performance on the hold out sets. We present a regularized manifold approach that allows us to learn a distance metric for the different instances directly from the data. This approach allows incorporation of information from related task examples, without prior estimation of cross-task coefficient covariances. We also present applications of ranking problems in two text analysis problems: a) Supervise content-word learning, and b) Company Entity matching for record linkage problems.
by Giorgos Zacharia.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

9

Halverson, Ranette Hudson. "Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design." Thesis, University of North Texas, 1993. https://digital.library.unt.edu/ark:/67531/metadc278153/.

Full text

Abstract:

The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching.

APA, Harvard, Vancouver, ISO, and other styles

10

Lee, Chun-fan, and 李俊帆. "Fitting factor models for ranking data using efficient EM-type algorithms." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2002. http://hub.hku.hk/bib/B31227557.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Salomon, Sophie. "Bias Mitigation Techniques and a Cost-Aware Framework for Boosted Ranking Algorithms." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1586450345426827.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

McMeen, John Norman Jr. "Ranking Methods for Global Optimization of Molecular Structures." Digital Commons @ East Tennessee State University, 2014. https://dc.etsu.edu/etd/2447.

Full text

Abstract:

This work presents heuristics for searching large sets of molecular structures for low-energy, stable systems. The goal is to find the globally optimal structures in less time or by consuming less computational resources. The strategies intermittently evaluate and rank structures during molecular dynamics optimizations, culling possible weaker solutions from evaluations earlier, leaving better solutions to receive more simulation time. Although some imprecision was introduced from not allowing all structures to fully optimize before ranking, the strategies identify metrics that can be used to make these searches more efficient when computational resources are limited.

APA, Harvard, Vancouver, ISO, and other styles

13

Stojkovic, Ivan. "Functional Norm Regularization for Margin-Based Ranking on Temporal Data." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/522550.

Full text

Abstract:

Computer and Information Science
Ph.D.
Quantifying the properties of interest is an important problem in many domains, e.g., assessing the condition of a patient, estimating the risk of an investment or relevance of the search result. However, the properties of interest are often latent and hard to assess directly, making it difficult to obtain classification or regression labels, which are needed to learn a predictive models from observable features. In such cases, it is typically much easier to obtain relative comparison of two instances, i.e. to assess which one is more intense (with respect to the property of interest). One framework able to learn from such kind of supervised information is ranking SVM, and it will make a basis of our approach. Applications in bio-medical datasets typically have specific additional challenges. First, and the major one, is the limited amount of data examples, due to an expensive measuring technology, and/or infrequency of conditions of interest. Such limited number of examples makes both identification of patterns/models and their validation less useful and reliable. Repeated samples from the same subject are collected on multiple occasions over time, which breaks IID sample assumption and introduces dependency structure that needs to be taken into account more appropriately. Also, feature vectors are highdimensional, and typically of much higher cardinality than the number of samples, making models less useful and their learning less efficient. Hypothesis of this dissertation is that use of the functional norm regularization can help alleviating mentioned challenges, by improving generalization abilities and/or learning efficiency of predictive models, in this case specifically of the approaches based on the ranking SVM framework. The temporal nature of data was addressed with loss that fosters temporal smoothness of functional mapping, thus accounting for assumption that temporally proximate samples are more correlated. Large number of feature variables was handled using the sparsity inducing L1 norm, such that most of the features have zero effect in learned functional mapping. Proposed sparse (temporal) ranking objective is convex but non-differentiable, therefore smooth dual form is derived, taking the form of quadratic function with box constraints, which allows efficient optimization. For the case where there are multiple similar tasks, joint learning approach based on matrix norm regularization, using trace norm L* and sparse row L21 norm was also proposed. Alternate minimization with proximal optimization algorithm was developed to solve the mentioned multi-task objective. Generalization potentials of the proposed high-dimensional and multi-task ranking formulations were assessed in series of evaluations on synthetically generated and real datasets. The high-dimensional approach was applied to disease severity score learning from gene expression data in human influenza cases, and compared against several alternative approaches. Application resulted in scoring function with improved predictive performance, as measured by fraction of correctly ordered testing pairs, and a set of selected features of high robustness, according to three similarity measures. The multi-task approach was applied to three human viral infection problems, and for learning the exam scores in Math and English. Proposed formulation with mixed matrix norm was overall more accurate than formulations with single norm regularization.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

14

Puthiya, Parambath Shameem Ahamed. "New methods for multi-objective learning." Thesis, Compiègne, 2016. http://www.theses.fr/2016COMP2322/document.

Full text

Abstract:

Les problèmes multi-objectifs se posent dans plusieurs scénarios réels dans le monde où on doit trouver une solution optimale qui soit un compromis entre les différents objectifs en compétition. Dans cette thèse, on étudie et on propose des algorithmes pour traiter les problèmes des machines d’apprentissage multi-objectif. On étudie deux méthodes d’apprentissage multi-objectif en détail. Dans la première méthode, on étudie le problème de trouver le classifieur optimal pour réaliser des mesures de performances multivariées. Dans la seconde méthode, on étudie le problème de classer des informations diverses dans les missions de recherche des informations
Multi-objective problems arise in many real world scenarios where one has to find an optimal solution considering the trade-off between different competing objectives. Typical examples of multi-objective problems arise in classification, information retrieval, dictionary learning, online learning etc. In this thesis, we study and propose algorithms for multi-objective machine learning problems. We give many interesting examples of multi-objective learning problems which are actively persuaded by the research community to motivate our work. Majority of the state of the art algorithms proposed for multi-objective learning comes under what is called “scalarization method”, an efficient algorithm for solving multi-objective optimization problems. Having motivated our work, we study two multi-objective learning tasks in detail. In the first task, we study the problem of finding the optimal classifier for multivariate performance measures. The problem is studied very actively and recent papers have proposed many algorithms in different classification settings. We study the problem as finding an optimal trade-off between different classification errors, and propose an algorithm based on cost-sensitive classification. In the second task, we study the problem of diverse ranking in information retrieval tasks, in particular recommender systems. We propose an algorithm for diverse ranking making use of the domain specific information, and formulating the problem as a submodular maximization problem for coverage maximization in a weighted similarity graph. Finally, we conclude that scalarization based algorithms works well for multi-objective learning problems. But when considering algorithms for multi-objective learning problems, scalarization need not be the “to go” approach. It is very important to consider the domain specific information and objective functions. We end this thesis by proposing some of the immediate future work, which are currently being experimented, and some of the short term future work which we plan to carry out

APA, Harvard, Vancouver, ISO, and other styles

15

Safran, Mejdl Sultan. "EFFICIENT LEARNING-BASED RECOMMENDATION ALGORITHMS FOR TOP-N TASKS AND TOP-N WORKERS IN LARGE-SCALE CROWDSOURCING SYSTEMS." OpenSIUC, 2018. https://opensiuc.lib.siu.edu/dissertations/1511.

Full text

Abstract:

A pressing need for efficient personalized recommendations has emerged in crowdsourcing systems. On the one hand, workers confront a flood of tasks, and they often spend too much time to find tasks matching their skills and interests. Thus, workers want effective recommendation of the most suitable tasks with regard to their skills and preferences. On the other hand, requesters sometimes receive results in low-quality completion since a less qualified worker may start working on a task before a better-skilled worker may get hands on. Thus, requesters want reliable recommendation of the best workers for their tasks in terms of workers' qualifications and accountability. The task and worker recommendation problems in crowdsourcing systems have brought up unique characteristics that are not present in traditional recommendation scenarios, i.e., the huge flow of tasks with short lifespans, the importance of workers' capabilities, and the quality of the completed tasks. These unique features make traditional recommendation approaches (mostly developed for e-commerce markets) no longer satisfactory for task and worker recommendation in crowdsourcing systems. In this research, we reveal our insight into the essential difference between the tasks in crowdsourcing systems and the products/items in e-commerce markets, and the difference between buyers' interests in products/items and workers' interests in tasks. Our insight inspires us to bring up categories as a key mediation mechanism between workers and tasks. We propose a two-tier data representation scheme (defining a worker-category suitability score and a worker-task attractiveness score) to support personalized task and worker recommendation. We also extend two optimization methods, namely least mean square error (LMS) and Bayesian personalized rank (BPR) in order to better fit the characteristics of task/worker recommendation in crowdsourcing systems. We then integrate the proposed representation scheme and the extended optimization methods along with the two adapted popular learning models, i.e., matrix factorization and kNN, and result in two lines of top-N recommendation algorithms for crowdsourcing systems: (1) Top-N-Tasks (TNT) recommendation algorithms for discovering the top-N most suitable tasks for a given worker, and (2) Top-N-Workers (TNW) recommendation algorithms for identifying the top-N best workers for a task requester. An extensive experimental study is conducted that validates the effectiveness and efficiency of a broad spectrum of algorithms, accompanied by our analysis and the insights gained.

APA, Harvard, Vancouver, ISO, and other styles

16

Owusu-Kesseh, Daniel. "The Relative Security Metric of Information Systems: Using AIMD Algorithms." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1462278857.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Williams, Garrick J. "Abstracting Glicko-2 for Team Games." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427962458.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Lacour, Renaud. "Approches de résolution exacte et approchée en optimisation combinatoire multi-objectif, application au problème de l'arbre couvrant de poids minimal." Thesis, Paris 9, 2014. http://www.theses.fr/2014PA090067/document.

Full text

Abstract:

On s'attache dans cette thèse à plusieurs aspects liés à la résolution de problèmes multi-objectifs, sans se limiter au cas biobjectif. Nous considérons la résolution exacte, dans le sens de la détermination de l'ensemble des points non dominés, ainsi que la résolution approchée dans laquelle on cherche une approximation de cet ensemble dont la qualité est garantie a priori.Nous nous intéressons d'abord au problème de la détermination d'une représentation explicite de la région de recherche. La région de recherche, étant donné un ensemble de points réalisables connus, exclut la partie de l'espace des objectifs que dominent ces points et constitue donc la partie de l'espace des objectifs où les efforts futurs doivent être concentrés dans la perspective de déterminer tous les points non dominés.Puis nous considérons le recours aux algorithmes de séparation et évaluation ainsi qu'aux algorithmes de ranking afin de proposer une nouvelle méthode hybride de détermination de l'ensemble des points non dominés. Nous montrons que celle-ci peut également servir à obtenir une approximation de l'ensemble des points non dominés. Cette méthode est implantée pour le problème de l'arbre couvrant de poids minimal. Les quelques propriétés de ce problème que nous passons en revue nous permettent de spécialiser certaines procédures et d'intégrer des prétraitements spécifiques. L'intérêt de cette approche est alors soutenu à l'aide de résultats expérimentaux
This thesis deals with several aspects related to solving multi-objective problems, without restriction to the bi-objective case. We consider exact solving, which generates the nondominated set, and approximate solving, which computes an approximation of the nondominated set with a priori guarantee on the quality.We first consider the determination of an explicit representation of the search region. The search region, defined with respect to a set of known feasible points, excludes from the objective space the part which is dominated by these points. Future efforts to find all nondominated points should therefore be concentrated on the search region.Then we review branch and bound and ranking algorithms and we propose a new hybrid approach for the determination of the nondominated set. We show how the proposed method can be adapted to generate an approximation of the nondominated set. This approach is instantiated on the minimum spanning tree problem. We review several properties of this problem which enable us to specialize some procedures of the proposed approach and integrate specific preprocessing rules. This approach is finally supported through experimental results

APA, Harvard, Vancouver, ISO, and other styles

19

Krestel, Ralf [Verfasser]. "On the use of language models and topic models in the web : new algorithms for filtering, classification, ranking, and recommendation / Ralf Krestel." Hannover : Technische Informationsbibliothek und Universitätsbibliothek Hannover (TIB), 2012. http://d-nb.info/1022753363/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Harrington, Edward, and edwardharrington@homemail com au. "Aspects of Online Learning." The Australian National University. Research School of Information Sciences and Engineering, 2004. http://thesis.anu.edu.au./public/adt-ANU20060328.160810.

Full text

Abstract:

Online learning algorithms have several key advantages compared to their batch learning algorithm counterparts: they are generally more memory efficient, and computationally mor efficient; they are simpler to implement; and they are able to adapt to changes where the learning model is time varying. Online algorithms because of their simplicity are very appealing to practitioners. his thesis investigates several online learning algorithms and their application. The thesis has an underlying theme of the idea of combining several simple algorithms to give better performance. In this thesis we investigate: combining weights, combining hypothesis, and (sort of) hierarchical combining.¶ Firstly, we propose a new online variant of the Bayes point machine (BPM), called the online Bayes point machine (OBPM). We study the theoretical and empirical performance of the OBPm algorithm. We show that the empirical performance of the OBPM algorithm is comparable with other large margin classifier methods such as the approximately large margin algorithm (ALMA) and methods which maximise the margin explicitly, like the support vector machine (SVM). The OBPM algorithm when used with a parallel architecture offers potential computational savings compared to ALMA. We compare the test error performance of the OBPM algorithm with other online algorithms: the Perceptron, the voted-Perceptron, and Bagging. We demonstrate that the combinationof the voted-Perceptron algorithm and the OBPM algorithm, called voted-OBPM algorithm has better test error performance than the voted-Perceptron and Bagging algorithms. We investigate the use of various online voting methods against the problem of ranking, and the problem of collaborative filtering of instances. We look at the application of online Bagging and OBPM algorithms to the telecommunications problem of channel equalization. We show that both online methods were successful at reducing the effect on the test error of label flipping and additive noise.¶ Secondly, we introduce a new mixture of experts algorithm, the fixed-share hierarchy (FSH) algorithm. The FSH algorithm is able to track the mixture of experts when the switching rate between the best experts may not be constant. We study the theoretical aspects of the FSH and the practical application of it to adaptive equalization. Using simulations we show that the FSH algorithm is able to track the best expert, or mixture of experts, in both the case where the switching rate is constant and the case where the switching rate is time varying.

APA, Harvard, Vancouver, ISO, and other styles

21

Guan, Wei. "New support vector machine formulations and algorithms with application to biomedical data analysis." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41126.

Full text

Abstract:

The Support Vector Machine (SVM) classifier seeks to find the separating hyperplane wx=r that maximizes the margin distance 1/||w||2^2. It can be formalized as an optimization problem that minimizes the hinge loss Ʃ[subscript i](1-y[subscript i] f(x[subscript i]))₊ plus the L₂-norm of the weight vector. SVM is now a mainstay method of machine learning. The goal of this dissertation work is to solve different biomedical data analysis problems efficiently using extensions of SVM, in which we augment the standard SVM formulation based on the application requirements. The biomedical applications we explore in this thesis include: cancer diagnosis, biomarker discovery, and energy function learning for protein structure prediction. Ovarian cancer diagnosis is problematic because the disease is typically asymptomatic especially at early stages of progression and/or recurrence. We investigate a sample set consisting of 44 women diagnosed with serous papillary ovarian cancer and 50 healthy women or women with benign conditions. We profile the relative metabolite levels in the patient sera using a high throughput ambient ionization mass spectrometry technique, Direct Analysis in Real Time (DART). We then reduce the diagnostic classification on these metabolic profiles into a functional classification problem and solve it with functional Support Vector Machine (fSVM) method. The assay distinguished between the cancer and control groups with an unprecedented 99\% accuracy (100\% sensitivity, 98\% specificity) under leave-one-out-cross-validation. This approach has significant clinical potential as a cancer diagnostic tool. High throughput technologies provide simultaneous evaluation of thousands of potential biomarkers to distinguish different patient groups. In order to assist biomarker discovery from these low sample size high dimensional cancer data, we first explore a convex relaxation of the L₀-SVM problem and solve it using mixed-integer programming techniques. We further propose a more efficient L₀-SVM approximation, fractional norm SVM, by replacing the L₂-penalty with L[subscript q]-penalty (q in (0,1)) in the optimization formulation. We solve it through Difference of Convex functions (DC) programming technique. Empirical studies on the synthetic data sets as well as the real-world biomedical data sets support the effectiveness of our proposed L₀-SVM approximation methods over other commonly-used sparse SVM methods such as the L₁-SVM method. A critical open problem in emph{ab initio} protein folding is protein energy function design. We reduce the problem of learning energy function for extit{ab initio} folding to a standard machine learning problem, learning-to-rank. Based on the application requirements, we constrain the reduced ranking problem with non-negative weights and develop two efficient algorithms for non-negativity constrained SVM optimization. We conduct the empirical study on an energy data set for random conformations of 171 proteins that falls into the {it ab initio} folding class. We compare our approach with the optimization approach used in protein structure prediction tool, TASSER. Numerical results indicate that our approach was able to learn energy functions with improved rank statistics (evaluated by pairwise agreement) as well as improved correlation between the total energy and structural dissimilarity.

APA, Harvard, Vancouver, ISO, and other styles

22

Peel, Thomas. "Algorithmes de poursuite stochastiques et inégalités de concentration empiriques pour l'apprentissage statistique." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4769/document.

Full text

Abstract:

La première partie de cette thèse introduit de nouveaux algorithmes de décomposition parcimonieuse de signaux. Basés sur Matching Pursuit (MP) ils répondent au problème suivant : comment réduire le temps de calcul de l'étape de sélection de MP, souvent très coûteuse. En réponse, nous sous-échantillonnons le dictionnaire à chaque itération, en lignes et en colonnes. Nous montrons que cette approche fondée théoriquement affiche de bons résultats en pratique. Nous proposons ensuite un algorithme itératif de descente de gradient par blocs de coordonnées pour sélectionner des caractéristiques en classification multi-classes. Celui-ci s'appuie sur l'utilisation de codes correcteurs d'erreurs transformant le problème en un problème de représentation parcimonieuse simultanée de signaux. La deuxième partie expose de nouvelles inégalités de concentration empiriques de type Bernstein. En premier, elles concernent la théorie des U-statistiques et sont utilisées pour élaborer des bornes en généralisation dans le cadre d'algorithmes de ranking. Ces bornes tirent parti d'un estimateur de variance pour lequel nous proposons un algorithme de calcul efficace. Ensuite, nous présentons une version empirique de l'inégalité de type Bernstein proposée par Freedman [1975] pour les martingales. Ici encore, la force de notre borne réside dans l'introduction d'un estimateur de variance calculable à partir des données. Cela nous permet de proposer des bornes en généralisation pour l'ensemble des algorithmes d'apprentissage en ligne améliorant l'état de l'art et ouvrant la porte à une nouvelle famille d'algorithmes d'apprentissage tirant parti de cette information empirique
The first part of this thesis introduces new algorithms for the sparse encoding of signals. Based on Matching Pursuit (MP) they focus on the following problem : how to reduce the computation time of the selection step of MP. As an answer, we sub-sample the dictionary in line and column at each iteration. We show that this theoretically grounded approach has good empirical performances. We then propose a bloc coordinate gradient descent algorithm for feature selection problems in the multiclass classification setting. Thanks to the use of error-correcting output codes, this task can be seen as a simultaneous sparse encoding of signals problem. The second part exposes new empirical Bernstein inequalities. Firstly, they concern the theory of the U-Statistics and are applied in order to design generalization bounds for ranking algorithms. These bounds take advantage of a variance estimator and we propose an efficient algorithm to compute it. Then, we present an empirical version of the Bernstein type inequality for martingales by Freedman [1975]. Again, the strength of our result lies in the variance estimator computable from the data. This allows us to propose generalization bounds for online learning algorithms which improve the state of the art and pave the way to a new family of learning algorithms taking advantage of this empirical information

APA, Harvard, Vancouver, ISO, and other styles

23

Yang, Bo. "Analyses bioinformatiques et classements consensus pour les données biologiques à haut débit." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112250/document.

Full text

Abstract:

Cette thèse aborde deux problèmes relatifs à l’analyse et au traitement des données biologiques à haut débit: le premier touche l’analyse bioinformatique des génomes à grande échelle, le deuxième est consacré au développement d’algorithmes pour le problème de la recherche d’un classement consensus de plusieurs classements.L’épissage des ARN est un processus cellulaire qui modifie un ARN pré-messager en en supprimant les introns et en raboutant les exons. L’hétérodimère U2AF a été très étudié pour son rôle dans processus d’épissage lorsqu’il se fixe sur des sites d’épissage fonctionnels. Cependant beaucoup de problèmes critiques restent en suspens, notamment l’impact fonctionnel des mutations de ces sites associées à des cancers. Par une analyse des interactions U2AF-ARN à l’échelle génomique, nous avons déterminé qu’U2AF a la capacité de reconnaître environ 88% des sites d’épissage fonctionnels dans le génome humain. Cependant on trouve de très nombreux autres sites de fixation d’U2AF dans le génome. Nos analyses suggèrent que certains de ces sites sont impliqués dans un processus de régulation de l’épissage alternatif. En utilisant une approche d’apprentissage automatique, nous avons développé une méthode de prédiction des sites de fixation d’UA2F, dont les résultats sont en accord avec notre modèle de régulation. Ces résultats permettent de mieux comprendre la fonction d’U2AF et les mécanismes de régulation dans lesquels elle intervient.Le classement des données biologiques est une nécessité cruciale. Nous nous sommes intéressés au problème du calcul d’un classement consensus de plusieurs classements de données, dans lesquels des égalités (ex-aequo) peuvent être présentes. Plus précisément, il s’agit de trouver un classement dont la somme des distances aux classements donnés en entrée est minimale. La mesure de distance utilisée le plus fréquemment pour ce problème est la distance de Kendall-tau généralisée. Or, il a été montré que, pour cette distance, le problème du consensus est NP-difficile dès lors qu’il y a plus de quatre classements en entrée. Nous proposons pour le résoudre une heuristique qui est une nouvelle variante d’algorithme à pivot. Cette heuristique, appelée Consistent-pivot, s’avère à la fois plus précise et plus rapide que les algorithmes à pivot qui avaient été proposés auparavant
It is thought to be more and more important to solve biological questions using Bioinformatics approaches in the post-genomic era. This thesis focuses on two problems related to high troughput data: bioinformatics analysis at a large scale, and development of algorithms of consensus ranking. In molecular biology and genetics, RNA splicing is a modification of the nascent pre-messenger RNA (pre-mRNA) transcript in which introns are removed and exons are joined. The U2AF heterodimer has been well studied for its role in defining functional 3’ splice sites in pre-mRNA splicing, but multiple critical problems are still outstanding, including the functional impact of their cancer-associated mutations. Through genome-wide analysis of U2AF-RNA interactions, we report that U2AF has the capacity to define ~88% of functional 3’ splice sites in the human genome. Numerous U2AF binding events also occur in other genomic locations, and metagene and minigene analysis suggests that upstream intronic binding events interfere with the immediate downstream 3’ splice site associated with either the alternative exon to cause exon skipping or competing constitutive exon to induce inclusion of the alternative exon. We further build up a U2AF65 scoring scheme for predicting its target sites based on the high throughput sequencing data using a Maximum Entropy machine learning method, and the scores on the up and down regulated cases are consistent with our regulation model. These findings reveal the genomic function and regulatory mechanism of U2AF, which facilitates us understanding those associated diseases.Ranking biological data is a crucial need. Instead of developing new ranking methods, Cohen-Boulakia and her colleagues proposed to generate a consensus ranking to highlight the common points of a set of rankings while minimizing their disagreements to combat the noise and error for biological data. However, it is a NP-hard questioneven for only four rankings based on the Kendall-tau distance. In this thesis, we propose a new variant of pivot algorithms named as Consistent-Pivot. It uses a new strategy of pivot selection and other elements assignment, which performs better both on computation time and accuracy than previous pivot algorithms

APA, Harvard, Vancouver, ISO, and other styles

24

Cure, Morgane. "Concurrence à l'ère du numérique : exemples dans l'industrie hôtelière." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG013.

Full text

Abstract:

La numérisation croissante de l’économie bouleverse les canaux de distribution des vendeurs et favorise l’émergence de nouveaux acteurs : les plateformes d’intermédiation. Avec elles, le modèle traditionnel de revente laisse place à un modèle d’agence et crée un terrain fertile à différents cas de restrictions verticales. La numérisation grandissante des marchés pousse les autorités de concurrence à questionner et adapter leur analyse économique des pratiques. Cette thèse se concentre sur l’industrie hôtelière qui fait l’objet de plusieurs cas d’espèces. Les pratiques contractuelles telles que les clauses de parité de prix imposées par les agences de voyages en ligne aux hôteliers ont fait l’objet de nombreuses investigations, principalement en Europe. Le premier chapitre de cette thèse développe un modèle d’estimation structurelle de la demande permettant d’évaluer le degré de substitution entre les canaux de distribution en en ligne d'une chaîne d'hôtels, élément crucial retenu dans la définition des marchés. A l’issue des différents cas de concurrence, les clauses de parité de prix ont partiellement ou totalement été interdites dans plusieurs pays. En réponse, les plateformes d’intermédiation ont développé de nouveaux programmes offrant aux hôteliers une visibilité accrue en l’échange d’une application volontaire de la clause de parité de prix. Le second chapitre de cette thèse étudie l’effet de l’adoption de ce type de programme sur les prix fixés par les hôteliers en différenciant l’effet lié à l’accroissement de la demande, permis par les gains de visibilité, de ceux liés à l’auto-application de la clause et à la hausse des commissions inhérentes au programme. Cette thèse porte également sur le lien entre les agences de voyage en ligne et un autre type de plateforme sur ce marché, les sites de comparaison de prix. Ces derniers promettent aux consommateurs l'affichage des offres les plus compétitives du marché mais les critères utilisés dans les algorithmes de classement font désormais débat. D'autre part, l'intégration verticale des certaines de ces plateformes à de plus grands groupes, en possédant déjà plusieurs, interroge leur impartialité. Le troisième chapitre de cette thèse étudie l’impact de l'intégration de Kayak et plusieurs agences de voyage en ligne (comme Booking.com) au sein du groupe Booking Holding sur les classements des hôtels et des canaux de vente affichés sur le site de comparaison de prix
The growing digitalization of the economy has been disrupting the sellers distribution channels and has been favoring the emergence of new players: intermediation platforms. Meanwhile the traditional resale model gives way to an agency model and creates fertile ground for different cases of vertical restraints. The increasing digitalization of markets therefore pushes competition authorities to question and adapt their economic analysis of practices. This thesis focuses on the hotel industry which has been the subject of several specific cases, especially in Europe. Contractual practices such as price parity clauses imposed by online travel agencies to hotels have been the subject of numerous investigations. The first chapter of this thesis develops a model of structural demand estimation, allowing to assess the degree of substitution between the online distribution channels of a hotel chain, a crucial element in the market definition. Following the various competition cases, price parity clauses were partially or completely prohibited in several countries. In response, the platforms have developed new programs offering hotels an increased visibility in exchange of the voluntary compliance of price parity clauses. The second chapter of this thesis studies the effect of the adoption of this program on the prices set by the hotels separating the effects linked to the demand increase, thanks to visibility gains, from those linked to the clause compliance and fee increase linked to the program. This thesis also deals with the link between online travel agencies and another type of platforms: price comparison websites. The latter promise consumers the display of the most competitive offers on the market but the criteria used in the ranking algorithms are now debated. Moreover, their vertical integration into larger groups, which also have online travel agencies, raises questions about their impartiality. The third chapter studies the impact of the integration of Kayak and several online travel agencies (such as Booking.com) within the Booking Holding group on the ranking of hotels and sales channels displayed on the price comparison website

APA, Harvard, Vancouver, ISO, and other styles

25

Robbiano, Sylvain. "Méthodes d'apprentissage statistique pour le ranking : théorie, algorithmes et applications." Phd thesis, Telecom ParisTech, 2013. http://tel.archives-ouvertes.fr/tel-00936092.

Full text

Abstract:

Le ranking multipartite est un problème d'apprentissage statistique qui consiste à ordonner les observations qui appartiennent à un espace de grande dimension dans le même ordre que les labels, de sorte que les observations avec le label le plus élevé apparaissent en haut de la liste. Cette thèse vise à comprendre la nature probabiliste du problème de ranking multipartite afin d'obtenir des garanties théoriques pour les algorithmes de ranking. Dans ce cadre, la sortie d'un algorithme de ranking prend la forme d'une fonction de scoring, une fonction qui envoie l'espace des observations sur la droite réelle et l'ordre final est construit en utilisant l'ordre induit par la droite réelle. Les contributions de ce manuscrit sont les suivantes : d'abord, nous nous concentrons sur la caractérisation des solutions optimales de ranking multipartite. Une nouvelle condition sur les rapports de vraisemblance est introduite et jugée nécessaire et suffisante pour rendre le problème de ranking multipartite bien posé. Ensuite, nous examinons les critères pour évaluer la fonction de scoring et on propose d'utiliser une généralisation de la courbe ROC nommée la surface ROC pour cela ainsi que le volume induit par cette surface. Pour être utilisée dans les applications, la contrepartie empirique de la surface ROC est étudiée et les résultats sur sa consistance sont établis. Le deuxième thème de recherche est la conception d'algorithmes pour produire des fonctions de scoring. La première procédure est basée sur l'agrégation des fonctions de scoring apprises sur des sous-problèmes de ranking binaire. Dans le but d'agréger les ordres induits par les fonctions de scoring, nous utilisons une approche métrique basée sur le de Kendall pour trouver une fonction de scoring médiane. La deuxième procédure est une méthode récursive, inspirée par l'algorithme TreeRank qui peut être considéré comme une version pondérée de CART. Une simple modification est proposée pour obtenir une approximation de la surface ROC optimale en utilisant une fonction de scoring constante par morceaux. Ces procédures sont comparées aux algorithmes de l'état de l'art pour le ranking multipartite en utilisant des jeux de données réelles et simulées. Les performances mettent en évidence les cas où nos procédures sont bien adaptées, en particulier lorsque la dimension de l'espace des caractéristiques est beaucoup plus grand que le nombre d'étiquettes. Enfin, nous revenons au problème de ranking binaire afin d'établir des vitesses minimax adaptatives de convergence. Ces vitesses sont montrées pour des classes de distributions contrôlées par la complexité de la distribution a posteriori et une condition de faible bruit. La procédure qui permet d'atteindre ces taux est basée sur des estimateurs de type plug-in de la distribution a posteriori et une méthode d'agrégation utilisant des poids exponentiels.

APA, Harvard, Vancouver, ISO, and other styles

26

李莉華 and Lei-wah Lee. "On improving the relevancy ranking algorithm in web search engine." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B31222973.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Lee, Lei-wah. "On improving the relevancy ranking algorithm in web search engine /." Hong Kong : University of Hong Kong, 2000. http://sunzi.lib.hku.hk/hkuto/record.jsp?B21607448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Kangas, Carl-Evert. "Ranking Highscores : Evaluation of a dynamic Bucket with Global Query algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-127677.

Full text

Abstract:

The task of ranking highscores in a computer game may sound like a trivial task. It turns out it is not, because the naive solution have a time complexity not suitable for online applications in terms of response time and running cost. An overview of a few approaches to ranking is presented: how an N-ary tree could be used to do ranking and how to do linear approximation. Two ways of obtaining a model for doing linear approximation are demonstrated, a method called Buckets with Global Queryis described and a method based on Frugal Streaming is elaborated on.Finally, a variant of the Buckets with Global Query algorithm where the buckets are adjusted continuosly according to the changes in the distribution of high scores is evaluated. The dynamic variant of the algorithm performs well in terms of accuracy for at least 100 000 highscore up-dates but have no significant gains in reduced CPU-time.

APA, Harvard, Vancouver, ISO, and other styles

29

Yang, Fang. "A Comprehensive Approach for Bulk Power System Reliability Assessment." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14488.

Full text

Abstract:

Abstract The goal of this research is to advance the state of the art in bulk power system reliability assessment. Bulk power system reliability assessment is an important procedure at both power system planning and operating stages to assure reliable and acceptable electricity service to customers. With the increase in the complexity of modern power systems and advances in the power industry toward restructuring, the system models and algorithms of traditional reliability assessment techniques are becoming obsolete as they suffer from nonrealistic system models and slow convergence (even non-convergence) when multi-level contingencies are considered and the system is overstressed. To allow more rigor in system modeling and higher computational efficiency in reliability evaluation procedures, this research proposes an analytically-based security-constrained adequacy evaluation (SCAE) methodology that performs bulk power system reliability assessment. The SCAE methodology adopts a single-phase quadratized power flow (SPQPF) model as a basis and encompasses three main steps: (1) critical contingency selection, (2) effects analysis, and (3) reliability index computations. In the critical contingency selection, an improved contingency selection method is developed using a wind-chime contingency enumeration scheme and a performance index approach based on the system state linearization technique, which can rank critical contingencies with high accuracy and efficiency. In the effects analysis for selected critical contingencies, a non-divergent optimal quadratized power flow (NDOQPF) algorithm is developed to (1) incorporate major system operating practices, security constraints, and remedial actions in a constrained optimization problem and (2) guarantee convergence and provide a solution under all conditions. This algorithm is also capable of efficiently solving the ISO/RTO operational mode in deregulated power systems. Based on the results of the effects analysis, reliability indices that provide a quantitative indication of the system reliability level are computed. In addition, this research extends the proposed SCAE framework to include the effects of protection system hidden failures on bulk power system reliability. The overall SCAE methodology is implemented and applied to IEEE reliability test systems, and evaluation results demonstrate the expected features of proposed advanced techniques. Finally, the contributions of this research are summarized and recommendations for future research are proposed.

APA, Harvard, Vancouver, ISO, and other styles

30

Mittal, Arpit. "Human layout estimation using structured output learning." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:bb290cfd-5216-42d7-b3d2-c2b4b01614bc.

Full text

Abstract:

In this thesis, we investigate the problem of human layout estimation in unconstrained still images. This involves predicting the spatial configuration of body parts. We start our investigation with pictorial structure models and propose an efficient method of model fitting using skin regions. To detect the skin, we learn a colour model locally from the image by detecting the facial region. The resulting skin detections are also used for hand localisation. Our next contribution is a comprehensive dataset of 2D hand images. We collected this dataset from publicly available image sources, and annotated images with hand bounding boxes. The bounding boxes are not axis aligned, but are rather oriented with respect to the wrist. Our dataset is quite exhaustive as it includes images of different hand shapes and layout configurations. Using our dataset, we train a hand detector that is robust to background clutter and lighting variations. Our hand detector is implemented as a two-stage system. The first stage involves proposing hand hypotheses using complementary image features, which are then evaluated by the second stage classifier. This improves both precision and recall and results in a state-of-the-art hand detection method. In addition we develop a new method of non-maximum suppression based on super-pixels. We also contribute an efficient training algorithm for structured output ranking. In our algorithm, we reduce the time complexity of an expensive training component from quadratic to linear. This algorithm has a broad applicability and we use it for solving human layout estimation and taxonomic multiclass classification problems. For human layout, we use different body part detectors to propose part candidates. These candidates are then combined and scored using our ranking algorithm. By applying this bottom-up approach, we achieve accurate human layout estimation despite variations in viewpoint and layout configuration. In the multiclass classification problem, we define the misclassification error using a class taxonomy. The problem then reduces to a structured output ranking problem and we use our ranking method to optimise it. This allows inclusion of semantic knowledge about the classes and results in a more meaningful classification system. Lastly, we substantiate our ranking algorithm with theoretical proofs and derive the generalisation bounds for it. These bounds prove that the training error reduces to the lowest possible error asymptotically.

APA, Harvard, Vancouver, ISO, and other styles

31

Vayatis, Nicolas. "Approches statistiques en apprentissage : boosting et ranking." Habilitation à diriger des recherches, Université Pierre et Marie Curie - Paris VI, 2006. http://tel.archives-ouvertes.fr/tel-00120738.

Full text

Abstract:

Depuis une dizaine d'années, la théorie statistique de l'apprentissage a connu une forte expansion. L'avènement d'algorithmes hautement performants pour la classification de données en grande dimension, tels que le boosting ou les machines à noyaux (SVM) a engendré de nombreuses questions statistiques que la théorie de Vapnik-Chervonenkis (VC) ne permettait pas de résoudre. En effet, le principe de Minimisation du Risque Empirique ne rend pas compte des méthodes d'apprentissage concrètes et le concept de complexité combinatoire de VC dimension ne permet pas d'expliquer les capacités de généralisation d'algorithmes
sélectionnant un estimateur au sein d'une classe massive telle que l'enveloppe convexe d'une classe de VC. Dans le premier volet du mémoire, on rappelle les interprétations des algorithmes de boosting comme des implémentations de principes de minimisation
de risques convexes et on étudie leurs propriétés sous cet angle. En particulier, on montre l'importance de la
régularisation pour obtenir des stratégies consistantes. On développe également une nouvelle classe d'algorithmes de type gradient stochastique appelés algorithmes de descente miroir avec moyennisation et on évalue leur comportement à travers des simulations informatiques. Après avoir présenté les principes fondamentaux du boosting, on s'attache dans le
deuxième volet à des questions plus avancées telles que
l'élaboration d'inégalités d'oracle. Ainsi, on étudie la
calibration précise des pénalités en fonction des critères
de coût utilisés. On présente des résultats
non-asymptotiques sur la performance des estimateurs du boosting pénalisés, notamment les vitesses rapides sous les conditions de marge de type Mammen-Tsybakov et on décrit les capacités d'approximation du boosting utilisant les "rampes" (stumps) de décision. Le troisième volet du mémoire explore le problème du ranking. Un enjeu important dans des applications
telles que la fouille de documents ou le "credit scoring" est d'ordonner les instances plutôt que de les catégoriser. On propose une formulation simple de ce problème qui permet d'interpréter le ranking comme une classification sur des paires d'observations. La différence dans ce cas vient du fait que les
critères empiriques sont des U-statistiques et on développe donc la théorie de la classification adaptée à ce contexte. On explore également la question de la généralisation de l'erreur de ranking afin de pouvoir inclure des a priori sur l'ordre des instances, comme dans le cas où on ne s'intéresse qu'aux "meilleures" instances.

APA, Harvard, Vancouver, ISO, and other styles

32

Silva, Sérgio Francisco da. "Seleção de características por meio de algoritmos genéticos para aprimoramento de rankings e de modelos de classificação." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19072011-151501/.

Full text

Abstract:

Sistemas de recuperação de imagens por conteúdo (Content-based image retrieval { CBIR) e de classificação dependem fortemente de vetores de características que são extraídos das imagens considerando critérios visuais específicos. É comum que o tamanho dos vetores de características seja da ordem de centenas de elementos. Conforme se aumenta o tamanho (dimensionalidade) do vetor de características, também se aumentam os graus de irrelevâncias e redundâncias, levando ao problema da \"maldição da dimensionalidade\". Desse modo, a seleção das características relevantes é um passo primordial para o bom funcionamento de sistemas CBIR e de classificação. Nesta tese são apresentados novos métodos de seleção de características baseados em algoritmos genéticos (do inglês genetic algorithms - GA), visando o aprimoramento de consultas por similaridade e modelos de classificação. A família Fc (\"Fitness coach\") de funções de avaliação proposta vale-se de funções de avaliação de ranking, para desenvolver uma nova abordagem de seleção de características baseada em GA que visa aprimorar a acurácia de sistemas CBIR. A habilidade de busca de GA considerando os critérios de avaliação propostos (família Fc) trouxe uma melhora de precisão de consultas por similaridade de até 22% quando comparado com métodos wrapper tradicionais para seleção de características baseados em decision-trees (C4.5), naive bayes, support vector machine, 1-nearest neighbor e mineração de regras de associação. Outras contribuições desta tese são dois métodos de seleção de características baseados em filtragem, com aplicações em classificação de imagens, que utilizam o cálculo supervisionado da estatística de silhueta simplificada como função de avaliação: o silhouette-based greedy search (SiGS) e o silhouette-based genetic algorithm search (SiGAS). Os métodos propostos superaram os métodos concorrentes na literatura (CFS, FCBF, ReliefF, entre outros). É importante também ressaltar que o ganho em acurácia obtido pela família Fc, e pelos métodos SiGS e SiGAS propostos proporcionam também um decréscimo significativo no tamanho do vetor de características, de até 90%
Content-based image retrieval (CBIR) and classification systems rely on feature vectors extracted from images considering specific visual criteria. It is common that the size of a feature vector is of the order of hundreds of elements. When the size (dimensionality) of the feature vector is increased, a higher degree of redundancy and irrelevancy can be observed, leading to the \"curse of dimensionality\" problem. Thus, the selection of relevant features is a key aspect in a CBIR or classification system. This thesis presents new methods based on genetic algorithms (GA) to perform feature selection. The Fc (\"Fitness coach\") family of fitness functions proposed takes advantage of single valued ranking evaluation functions, in order to develop a new method of genetic feature selection tailored to improve the accuracy of CBIR systems. The ability of the genetic algorithms to boost feature selection by employing evaluation criteria (fitness functions) improves up to 22% the precision of the query answers in the analyzed databases when compared to traditional wrapper feature selection methods based on decision-tree (C4.5), naive bayes, support vector machine, 1-nearest neighbor and association rule mining. Other contributions of this thesis are two filter-based feature selection algorithms for classification purposes, which calculate the simplified silhouette statistic as evaluation function: the silhouette-based greedy search (SiGS) and the silhouette-based genetic algorithm search (SiGAS). The proposed algorithms overcome the state-of-the-art ones (CFS, FCBF and ReliefF, among others). It is important to stress that the gain in accuracy of the proposed methods family Fc, SiGS and SIGAS is allied to a significant decrease in the feature vector size, what can reach up to 90%

APA, Harvard, Vancouver, ISO, and other styles

33

Vogel, Robin. "Similarity ranking for biometrics : theory and practice." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT031.

Full text

Abstract:

L’augmentation rapide de la population combinée à la mobilité croissante des individus a engendré le besoin de systèmes de gestion d’identités sophistiqués. À cet effet, le terme biométrie se réfère généralement aux méthodes permettant d’identifier les individus en utilisant des caractéristiques biologiques ou comportementales. Les méthodes les plus populaires, c’est-à-dire la reconnaissance d’empreintes digitales, d’iris ou de visages, se basent toutes sur des méthodes de vision par ordinateur. L’adoption de réseaux convolutifs profonds, rendue possible par le calcul générique sur processeur graphique, ont porté les récentes avancées en vision par ordinateur. Ces avancées ont permis une amélioration drastique des performances des méthodes conventionnelles en biométrie, ce qui a accéléré leur adoption pour des usages concrets, et a provoqué un débat public sur l’utilisation de ces techniques. Dans ce contexte, les concepteurs de systèmes biométriques sont confrontés à un grand nombre de challenges dans l’apprentissage de ces réseaux. Dans cette thèse, nous considérons ces challenges du point de vue de l’apprentissage statistique théorique, ce qui nous amène à proposer ou esquisser des solutions concrètes. Premièrement, nous répondons à une prolifération de travaux sur l’apprentissage de similarité pour les réseaux profonds, qui optimisent des fonctions objectif détachées du but naturel d’ordonnancement recherché en biométrie. Précisément, nous introduisons la notion d’ordonnancement par similarité, en mettant en évidence la relation entre l’ordonnancement bipartite et la recherche d’une similarité adaptée à l’identification biométrique. Nous étendons ensuite la théorie sur l’ordonnancement bipartite à ce nouveau problème, tout en l’adaptant aux spécificités de l’apprentissage sur paires, notamment concernant son coût computationnel. Les fonctions objectif usuelles permettent d’optimiser la performance prédictive, mais de récents travaux ont mis en évidence la nécessité de prendre en compte d’autres facteurs lors de l’entraı̂nement d’un système biométrique, comme les biais présents dans les données, la robustesse des prédictions ou encore des questions d’équité. La thèse aborde ces trois exemples, en propose une étude statistique minutieuse, ainsi que des méthodes pratiques qui donnent les outils nécessaires aux concepteurs de systèmes biométriques pour adresser ces problématiques, sans compromettre la performance de leurs algorithmes
The rapid growth in population, combined with the increased mobility of people has created a need for sophisticated identity management systems.For this purpose, biometrics refers to the identification of individuals using behavioral or biological characteristics. The most popular approaches, i.e. fingerprint, iris or face recognition, are all based on computer vision methods. The adoption of deep convolutional networks, enabled by general purpose computing on graphics processing units, made the recent advances incomputer vision possible. These advances have led to drastic improvements for conventional biometric methods, which boosted their adoption in practical settings, and stirred up public debate about these technologies. In this respect, biometric systems providers face many challenges when learning those networks.In this thesis, we consider those challenges from the angle of statistical learning theory, which leads us to propose or sketch practical solutions. First, we answer to the proliferation of papers on similarity learningfor deep neural networks that optimize objective functions that are disconnected with the natural ranking aim sought out in biometrics. Precisely, we introduce the notion of similarity ranking, by highlighting the relationship between bipartite ranking and the requirements for similarities that are well suited to biometric identification. We then extend the theory of bipartite ranking to this new problem, by adapting it to the specificities of pairwise learning, particularly those regarding its computational cost. Usual objective functions optimize for predictive performance, but recentwork has underlined the necessity to consider other aspects when training a biometric system, such as dataset bias, prediction robustness or notions of fairness. The thesis tackles all of those three examplesby proposing their careful statistical analysis, as well as practical methods that provide the necessary tools to biometric systems manufacturers to address those issues, without jeopardizing the performance of their algorithms

APA, Harvard, Vancouver, ISO, and other styles

34

Jaini, Nor. "An efficient ranking analysis in multi-criteria decision making." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/an-efficient-ranking-analysis-in-multicriteria-decision-making(c5a694d5-fd43-434f-9f9f-b86f7581b97c).html.

Full text

Abstract:

This study is conducted with the aims to develop a new ranking method for multi-criteria decision making problem with conflicting criteria. Such a problem has a set of Pareto solutions, where the act of improving a value of one solution will result in depreciating some of the others. Thus, in this type of problem, there is no unique solution. However, out of many available options, the Decision Maker eventually has to choose only one solution. With this problem as the motivation, the current study develops a compromise ranking algorithm, namely a trade-off ranking method. The trade-off ranking method able to give a trade-off solution with the least compromise compared to other choices as the best solution. The properties of the algorithm are studied in the thesis on several test cases. The proposed method is compared against several multi-criteria decision making methods with ranking based on the distance measure, which are the TOPSIS, relative distance and VIKOR. The sensitivity analysis and uncertainty test are carried out to examine the methods robustness. A critical criteria analysis is also done to test for the most critical criterion in a multi-criteria problem. The decision making method is considered further in a fuzzy environment problem where the fuzzy trade-off ranking is developed and compared against existing fuzzy decision making methods.

APA, Harvard, Vancouver, ISO, and other styles

35

Junior, Lucelindo Dias Ferreira. "Sistema de Engenharia Kansei para apoiar a descrição da visão do produto no contexto do Gerenciamento Ágil de Projetos de produtos manufaturados." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/18/18156/tde-09032012-141046/.

Full text

Abstract:

O Gerenciamento Ágil de Projetos é uma abordagem útil para projetos com alto grau de complexidade e incerteza. Duas de suas características são: o envolvimento do consumidor nas tomadas de decisão sobre o projeto do produto; e, o uso de uma visão do produto, artefato que representa e comunica as características prioritárias e fundamentais do produto a ser desenvolvido. Há métodos para apoiar a criação da visão do produto, mas eles apresentam deficiências em operacionalizar o envolvimento do consumidor final. Por outro lado, existe a Engenharia Kansei, uma metodologia que permite capturar as necessidades de um grande número de consumidores e relacioná-las a características do produto. Este trabalho apresenta um estudo aprofundado da metodologia da Engenharia Kansei e analisa como essa pode ser útil para apoiar a descrição da visão do produto, no contexto do Gerenciamento Ágil de Projetos de produtos manufaturados. Em seguida, para verificar essa proposição, apresenta o desenvolvimento de um Sistema de Engenharia Kansei baseado na Teoria de Quantificação Tipo I, Aritmética Fuzzy, e Algoritmos Genéticos, testado para o projeto de uma caneta voltada a alunos de pós-graduação. Para execução do projeto foi utilizado um conjunto de métodos e procedimentos, tais como: revisão bibliográfica sistemática; desenvolvimento matemático; desenvolvimento computacional; e, estudo de caso. Analisa-se o Sistema de Engenharia Kansei proposto, e os resultados no caso aplicado, para averiguar seu potencial. Indica evidencias que o Sistema de Engenharia Kansei é capaz de gerar requisitos sobre configurações de produtos segundo a perspectiva do consumidor potencial, e que essas configurações são úteis para a formulação da visão do produto e na evolução desta visão no decorrer do projeto de produto.
The Agile Project Management is a useful approach for projects with high degree of complexity and uncertainty. Two of its singularities are: costumer involvement in decision making about the product design; and the use of a product vision, an artifact that represents and communicates the fundamental and high-priority features of the product to be developed. There are methods to support the creation of the product vision, but they have shortcomings in operationalizing the costumer involvement. On the other hand, there is the Kansei Engineering, a methodology to capture the needs of a large number of consumers and correlate them to product features. This paper presents a detailed study of the Kansei Engineering methodology and analyzes how this can be useful to support the description of the product vision, in the context of Agile Project Management of manufactured products. Then, to verify this proposition, it presents the development of a Kansei Engineering System based on Quantification Theory Type I, Fuzzy Arithmetic and Genetic Algorithms, tested for the design of a pen aimed at graduate students. To implement the project we used a set of methods and procedures, such as systematic literature review, mathematical development, computational development, and case study. It analyzes the proposed Kansei Engineering System and the results in the case study applied, to ascertain their potential. Evidence indicates that Kansei Engineering System is capable of generating requirements on product configurations from the perspective of the potential consumer, and that these configurations are useful for the description of the product vision and for the progression of this vision during the project of the product.

APA, Harvard, Vancouver, ISO, and other styles

36

Pascoal, Luiz Mário Lustosa. "Um método social-evolucionário para geração de rankings que apoiem a recomendação de eventos." Universidade Federal de Goiás, 2014. http://repositorio.bc.ufg.br/tede/handle/tede/4345.

Full text

Abstract:

Submitted by Erika Demachki (erikademachki@gmail.com) on 2015-03-24T21:17:09Z No. of bitstreams: 3 Dissertação - Luiz Mario Lustosa Pascoal - 2014.pdf: 7280181 bytes, checksum: 68a6ac0602e3e51f6e6952bbd6916150 (MD5) FunctionApproximator.zip: 2288624 bytes, checksum: 178c2e6a0b080b3d0548836974016236 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Approved for entry into archive by Erika Demachki (erikademachki@gmail.com) on 2015-03-24T21:19:16Z (GMT) No. of bitstreams: 3 Dissertação - Luiz Mario Lustosa Pascoal - 2014.pdf: 7280181 bytes, checksum: 68a6ac0602e3e51f6e6952bbd6916150 (MD5) FunctionApproximator.zip: 2288624 bytes, checksum: 178c2e6a0b080b3d0548836974016236 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Made available in DSpace on 2015-03-24T21:19:16Z (GMT). No. of bitstreams: 3 Dissertação - Luiz Mario Lustosa Pascoal - 2014.pdf: 7280181 bytes, checksum: 68a6ac0602e3e51f6e6952bbd6916150 (MD5) FunctionApproximator.zip: 2288624 bytes, checksum: 178c2e6a0b080b3d0548836974016236 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-08-22
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
With the development of web 2.0, social networks have achieved great space on the internet, with that many users provide information and interests about themselves. There are expert systems that make use of the user’s interests to recommend different products, these systems are known as Recommender Systems. One of the main techniques of a Recommender Systems is the Collaborative Filtering (User-based) which recommends products to users based on what other similar people liked in the past. Therefore, this work presents model approximation of functions that generates rankings, that through a Genetic Algorithm, is able to learn an approximation function composed by different social variables, customized for each Facebook user. The learned function must be able to reproduce a ranking of people (friends) originally created with user’s information, that apply some influence in the user’s decision. As a case study, this work discusses the context of events through information regarding the frequency of participation of some users at several distinct events. Two different approaches on learning and applying the approximation function have been developed. The first approach provides a general model that learns a function in advance and then applies it in a set of test data and the second approach presents an specialist model that learns a specific function for each test scenario. Two proposals for evaluating the ordering created by the learned function, called objective functions A and B, where the results for both objective functions show that it is possible to obtain good solutions with the generalist and the specialist approaches of the proposed method.
Com o desenvolvimento da Web 2.0, as redes sociais têm conquistado grande espaço na internet, com isso muitos usuários acabam fornecendo diversas informações e interesses sobre si mesmos. Existem sistemas especialistas que fazem uso dos interesses do usuário para recomendar diferentes produtos, esses sistemas são conhecidos como Sistemas de Recomendação. Uma das principais técnicas de um Sistema de Recomendação é a Filtragem Colaborativa (User-based) que recomenda produtos para seus usuários baseados no que outras pessoas similares à ele tenham gostado no passado. Portanto, este trabalho apresenta um modelo de aproximação de funções geradora de rankings que, através de um Algoritmo Genético, é capaz de aprender uma função de aproximação composta por diferentes atributos sociais, personalizada para cada usuário do Facebook. A função aprendida deve ser capaz de reproduzir um ranking de pessoas (amigos) criado originalmente com informações do usuário, que exercem certa influência na decisão do usuário. Como estudo de caso, esse trabalho aborda o contexto de eventos através de informações com relação a frequência de participação de alguns usuários em vários eventos distintos. Foram desenvolvidas duas abordagens distintas para aprendizagem e aplicação da função de aproximação. A primeira abordagem apresenta um modelo generalista, que previamente aprende uma função e em seguida a aplica em um conjunto de dados de testes e a segunda abordagem apresenta um modelo especialista, que aprende uma função específica para cada cenário de teste. Também foram apresentadas duas propostas para avaliação da ordenação criada pela função aprendida, denominadas funções objetivo A e B, onde os resultados para ambas as funções objetivo A e B mostram que é possível obter boas soluções com as abordagens generalista e especialista do método proposto.

APA, Harvard, Vancouver, ISO, and other styles

37

Browning, James Paul. "On detection and ranking methods for a distributed radio-frequency sensor network : theory and algorithmic implementation." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10047710/.

Full text

Abstract:

A theoretical foundation for pre-detection fusion of sensors is needed if the United States Air Force is to ever field a system of distributed and layered sensors that can detect and perform parameter estimation of complex, extended targets in difficult interference environments, without human intervention, in near real-time. This research is relevant to the United States Air Force within its layered sensing and cognitive radar/sensor initiatives. The asymmetric threat of the twenty-first century introduces stressing sensing conditions that may exceed the ability of traditional monostatic sensing systems to perform their required intelligence, surveillance and reconnaissance missions. In particular, there is growing interest within the United States Air Force to move beyond single sensor sensing systems, and instead begin fielding and leveraging distributed sensing systems to overcome the inherent challenges imposed by the modern threat space. This thesis seeks to analyze the impact of integrating target echoes in the angular domain, to determine if better detection and ranking performance is achieved through the use of a distributed sensor network. Bespoke algorithms are introduced for detection and ranking ISR missions leveraging a distributed network of radio-frequency sensors: the first set of bespoke algorithms area based upon a depth-based nonparametric detection algorithm, which is to shown to enhance the recovery of targets under lower signal-to-noise ratios than an equivalent monostatic radar system; the second set of bespoke algorithms are based upon random matrix theoretic and concentration of measure mathematics, and demonstrated to outperform the depth-based nonparametric approach. This latter approach shall be shown to be effective across a broad range of signal-to-noise ratios, both positive and negative.

APA, Harvard, Vancouver, ISO, and other styles

38

Milchevski, Evica [Verfasser], and Sebastian [Akademischer Betreuer] Michel. "Similarity Search Algorithms over Top-k Rankings and Class-Constrained Objects / Evica Milchevski ; Betreuer: Sebastian Michel." Kaiserslautern : Technische Universität Kaiserslautern, 2019. http://d-nb.info/1194372554/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Kim, Jinhan. "J-model : an open and social ensemble learning architecture for classification." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/7672.

Full text

Abstract:

Ensemble learning is a promising direction of research in machine learning, in which an ensemble classifier gives better predictive and more robust performance for classification problems by combining other learners. Meanwhile agent-based systems provide frameworks to share knowledge from multiple agents in an open context. This thesis combines multi-agent knowledge sharing with ensemble methods to produce a new style of learning system for open environments. We now are surrounded by many smart objects such as wireless sensors, ambient communication devices, mobile medical devices and even information supplied via other humans. When we coordinate smart objects properly, we can produce a form of collective intelligence from their collaboration. Traditional ensemble methods and agent-based systems have complementary advantages and disadvantages in this context. Traditional ensemble methods show better classification performance, while agent-based systems might not guarantee their performance for classification. Traditional ensemble methods work as closed and centralised systems (so they cannot handle classifiers in an open context), while agent-based systems are natural vehicles for classifiers in an open context. We designed an open and social ensemble learning architecture, named J-model, to merge the conflicting benefits of the two research domains. The J-model architecture is based on a service choreography approach for coordinating classifiers. Coordination protocols are defined by interaction models that describe how classifiers will interact with one another in a peer-to-peer manner. The peer ranking algorithm recommends more appropriate classifiers to participate in an interaction model to boost the success rate of results of their interactions. Coordinated participant classifiers who are recommended by the peer ranking algorithm become an ensemble classifier within J-model. We evaluated J-model’s classification performance with 13 UCI machine learning benchmark data sets and a virtual screening problem as a realistic classification problem. J-model showed better performance of accuracy, for 9 benchmark sets out of 13 data sets, than 8 other representative traditional ensemble methods. J-model gave better results of specificity for 7 benchmark sets. In the virtual screening problem, J-model gave better results for 12 out of 16 bioassays than already published results. We defined different interaction models for each specific classification task and the peer ranking algorithm was used across all the interaction models. Our research contributions to knowledge are as follows. First, we showed that service choreography can be an effective ensemble coordination method for classifiers in an open context. Second, we used interaction models that implement task specific coordinations of classifiers to solve a variety of representative classification problems. Third, we designed the peer ranking algorithm which is generally and independently applicable to the task of recommending appropriate member classifiers from a classifier pool based on an open pool of interaction models and classifiers.

APA, Harvard, Vancouver, ISO, and other styles

40

Xie, Lin. "Statistical inference for rankings in the presence of panel segmentation." Diss., Kansas State University, 2011. http://hdl.handle.net/2097/13247.

Full text

Abstract:

Doctor of Philosophy
Department of Statistics
Paul Nelson
Panels of judges are often used to estimate consumer preferences for m items such as food products. Judges can either evaluate each item on several ordinal scales and indirectly produce an overall ranking, or directly report a ranking of the items. A complete ranking orders all the items from best to worst. A partial ranking, as we use the term, only reports rankings of the best q out of m items. Direct ranking, the subject of this report, does not require the widespread but questionable practice of treating ordinal measurement as though they were on ratio or interval scales. Here, we develop and study segmentation models in which the panel may consist of relatively homogeneous subgroups, the segments. Judges within a subgroup will tend to agree among themselves and differ from judges in the other subgroups. We develop and study the statistical analysis of mixture models where it is not known to which segment a judge belongs or in some cases how many segments there are. Viewing segment membership indicator variables as latent data, an E-M algorithm was used to find the maximum likelihood estimators of the parameters specifying a mixture of Mallow’s (1957) distance models for complete and partial rankings. A simulation study was conducted to evaluate the behavior of the E-M algorithm in terms of such issues as the fraction of data sets for which the algorithm fails to converge and the sensitivity of initial values to the convergence rate and the performance of the maximum likelihood estimators in terms of bias and mean square error, where applicable. A Bayesian approach was developed and credible set estimators was constructed. Simulation was used to evaluate the performance of these credible sets as confidence sets. A method for predicting segment membership from covariates measured on a judge was derived using a logistic model applied to a mixture of Mallows probability distance models. The effects of covariates on segment membership were assessed. Likelihood sets for parameters specifying mixtures of Mallows distance models were constructed and explored.

APA, Harvard, Vancouver, ISO, and other styles

41

Ben, Qingyan. "Flight Sorting Algorithm Based on Users’ Behaviour." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294132.

Full text

Abstract:

The model predicts the best flight order and recommend best flight to users. The thesis could be divided into the following three parts: Feature choosing, data-preprocessing, and various algorithms experiment. For feature choosing, besides the original information of flight itself, we add the user’s selection status into our model, which the flight class is, together with children or not. In the data preprocessing stage, data cleaning is used to process incomplete and repeated data. Then a normalization method removes the noise in the data. After various balancing processing, the class-imbalance data is corrected best with SMOTE method. Based on our existing data, I choose the classification model and Sequential ranking algorithm. Use price, direct flight or not, travel time, etc. as features, and click or not as label. The classification algorithms I used includes Logistic Regression, Gradient Boosting, KNN, Decision Tree, Random Forest, Gaussian Process Classifier, Gaussian NB Bayesian and Quadratic Discriminant Analysis. In addition, we also adopted Sequential ranking algorithm. The results show that Random Forest-SMOTE performs best with AUC of ROC=0.94, accuracy=0.8998.
Modellen förutsäger den bästa flygordern och rekommenderar bästa flyg till användarna. Avhandlingen kan delas in i följande tre delar: Funktionsval, databehandling och olika algoritms experiment. För funktionsval, förutom den ursprungliga informationen om själva flygningen, lägger vi till användarens urvalsstatus i vår modell, vilken flygklassen är , tillsammans med barn eller inte. Datarengöring används för att hantera dubbletter och ofullständiga data. Därefter tar en normaliserings metod bort bruset i data. Efter olika balanserings behandlingar är SMOTE-metoden mest lämplig för att korrigera klassobalans flyg data. Baserat på våra befintliga data väljer jag klassificerings modell och sekventiell ranknings algoritm. Använd pris, direktflyg eller inte, restid etc. som funktioner, och klicka eller inte som etikett. Klassificerings algoritmerna som jag använde inkluderar Logistic Regression, Gradient Boost, KNN, Decision Tree, Random Forest, Gaussian Process Classifier, Gaussian NB Bayesian and Quadratic Discriminant Analysis. Dessutom antog vi också Sequential ranking algoritm. Resultaten visar att Random Forest-SMOTE presterar bäst med AUC för ROC = 0.94, noggrannhet = 0.8998.

APA, Harvard, Vancouver, ISO, and other styles

42

Zapién, Arreola Karina. "Algorithme de chemin de régularisation pour l'apprentissage statistique." Thesis, Rouen, INSA, 2009. http://www.theses.fr/2009ISAM0001/document.

Full text

Abstract:

La sélection d’un modèle approprié est l’une des tâches essentielles de l’apprentissage statistique. En général, pour une tâche d’apprentissage donnée, on considère plusieurs classes de modèles ordonnées selon un certain ordre de « complexité». Dans ce cadre, le processus de sélection de modèle revient `a trouver la « complexité » optimale, permettant d’estimer un modèle assurant une bonne généralisation. Ce problème de sélection de modèle se résume à l’estimation d’un ou plusieurs hyper-paramètres définissant la complexité du modèle, par opposition aux paramètres qui permettent de spécifier le modèle dans la classe de complexité choisie. L’approche habituelle pour déterminer ces hyper-paramètres consiste à utiliser une « grille ». On se donne un ensemble de valeurs possibles et on estime, pour chacune de ces valeurs, l’erreur de généralisation du meilleur modèle. On s’intéresse, dans cette thèse, à une approche alternative consistant à calculer l’ensemble des solutions possibles pour toutes les valeurs des hyper-paramètres. C’est ce qu’on appelle le chemin de régularisation. Il se trouve que pour les problèmes d’apprentissage qui nous intéressent, des programmes quadratiques paramétriques, on montre que le chemin de régularisation associé à certains hyper-paramètres est linéaire par morceaux et que son calcul a une complexité numérique de l’ordre d’un multiple entier de la complexité de calcul d’un modèle avec un seul jeu hyper-paramètres. La thèse est organisée en trois parties. La première donne le cadre général des problèmes d’apprentissage de type SVM (Séparateurs à Vaste Marge ou Support Vector Machines) ainsi que les outils théoriques et algorithmiques permettant d’appréhender ce problème. La deuxième partie traite du problème d’apprentissage supervisé pour la classification et l’ordonnancement dans le cadre des SVM. On montre que le chemin de régularisation de ces problèmes est linéaire par morceaux. Ce résultat nous permet de développer des algorithmes originaux de discrimination et d’ordonnancement. La troisième partie aborde successivement les problèmes d’apprentissage semi supervisé et non supervisé. Pour l’apprentissage semi supervisé, nous introduisons un critère de parcimonie et proposons l’algorithme de chemin de régularisation associé. En ce qui concerne l’apprentissage non supervisé nous utilisons une approche de type « réduction de dimension ». Contrairement aux méthodes à base de graphes de similarité qui utilisent un nombre fixe de voisins, nous introduisons une nouvelle méthode permettant un choix adaptatif et approprié du nombre de voisins
The selection of a proper model is an essential task in statistical learning. In general, for a given learning task, a set of parameters has to be chosen, each parameter corresponds to a different degree of “complexity”. In this situation, the model selection procedure becomes a search for the optimal “complexity”, allowing us to estimate a model that assures a good generalization. This model selection problem can be summarized as the calculation of one or more hyperparameters defining the model complexity in contrast to the parameters that allow to specify a model in the chosen complexity class. The usual approach to determine these parameters is to use a “grid search”. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piece wise linear. Moreover, its calculation is no more complex than calculating a single PQP solution. This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines’ (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset [Ross 07b] are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors

APA, Harvard, Vancouver, ISO, and other styles

43

Paris, Bruno Mendonça. "Learning to rank: combinação de algoritmos aplicando stacking e análise dos resultados." Universidade Presbiteriana Mackenzie, 2017. http://tede.mackenzie.br/jspui/handle/tede/3494.

Full text

Abstract:

Submitted by Marta Toyoda (1144061@mackenzie.br) on 2018-02-21T23:45:28Z No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Approved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2018-04-04T11:43:59Z (GMT) No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2018-04-04T11:43:59Z (GMT). No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-11-07
With the growth of the amount of information available in recent years, which will continue to grow due to the increase in users, devices and information shared over the internet, accessing the desired information should be done in a quick way so it is not spent too much time looking for what you want. A search in engines like Google, Yahoo, Bing is expected that the rst results bring the desired information. An area that aims to bring relevant documents to the user is known as Information Retrieval and can be aided by Learning to Rank algorithms, which applies machine learning to try to bring important documents to users in the best possible ordering. This work aims to verify a way to get an even better ordering of documents, using a technique of combining algorithms known as Stacking. To do so, it will used the RankLib tool, part of Lemur Project, developed in the Java language that contains several Learning to Rank algorithms, and the datasets from a base maintained by Microsoft Research Group known as LETOR.
Com o crescimento da quantidade de informação disponível nos últimos anos, a qual irá continuar crescendo devido ao aumento de usuários, dispositivos e informações compartilhadas pela internet, acessar a informação desejada deve ser feita de uma maneira rápida afim de não se gastar muito tempo procurando o que se deseja. Uma busca em buscadores como Google, Yahoo, Bing espera-se que os primeiros resultados tragam a informação desejada. Uma área que tem o objetivo de trazer os documentos relevantes para o usuário é conhecida por Recuperação de Informação e pode ser auxiliada por algoritmos Learning to Rank, que aplica aprendizagem de máquina para tentar trazer os documentos importantes aos usuários na melhor ordenação possível. Esse trabalho visa verificar uma maneira de obter uma ordenação ainda melhor de documentos, empregando uma técnica de combinar algoritmos conhecida por Stacking. Para isso será utilizada a ferramenta RankLib, parte de um projeto conhecido por Lemur, desenvolvida na linguagem Java, que contém diversos algoritmos Learning to Rank, e o conjuntos de dados provenientes de uma base mantida pela Microsoft Research Group conhecida por LETOR.

APA, Harvard, Vancouver, ISO, and other styles

44

Niu, Yue S., Ning Hao, and Heping Zhang. "Multiple Change-Point Detection: A Selective Overview." INST MATHEMATICAL STATISTICS, 2016. http://hdl.handle.net/10150/622820.

Full text

Abstract:

Very long and noisy sequence data arise from biological sciences to social science including high throughput data in genomics and stock prices in econometrics. Often such data are collected in order to identify and understand shifts in trends, for example, from a bull market to a bear market in finance or from a normal number of chromosome copies to an excessive number of chromosome copies in genetics. Thus, identifying multiple change points in a long, possibly very long, sequence is an important problem. In this article, we review both classical and new multiple change-point detection strategies. Considering the long history and the extensive literature on the change-point detection, we provide an in-depth discussion on a normal mean change-point model from aspects of regression analysis, hypothesis testing, consistency and inference. In particular, we present a strategy to gather and aggregate local information for change-point detection that has become the cornerstone of several emerging methods because of its attractiveness in both computational and theoretical properties.

APA, Harvard, Vancouver, ISO, and other styles

45

Atanassova, Iana. "Exploitation informatique des annotations sémantiques automatiques d'Excom pour la recherche d'informations et la navigation." Thesis, Paris 4, 2012. http://www.theses.fr/2012PA040252.

Full text

Abstract:

À partir du moteur d’annotation sémantique Excom, nous avons élaboré un systèmede recherche d’informations qui repose sur des catégories sémantiques issues d’analyses linguistiquesautomatiques afin de proposer une approche de fouille textuelle innovante. Les annotationssont obtenues par la méthode d’Exploration Contextuelle faisant appel à une modélisationdes connaissances linguistiques sous forme de marqueurs et de règles. Le traitement des requêtesselon des points de vue de fouille se trouve au coeur de la stratégie de recherche d’informations.Pour cela, notre approche s’appuie sur des catégories d’annotation organisées en ontologies linguistiquessous forme de graphes. Afin d’offrir à l’utilisateur des résultats pertinents, nous avonsmis en place des algorithmes d’ordonnancement des réponses et de gestion de la redondance.Ces algorithmes reposent principalement sur la structure des ontologies linguistiques utiliséespour l’annotation. Nous avons proposé une évaluation de la pertinence des résultats en tenantcompte de la spécificité de l’approche. Les interfaces que nous avons développées permettent laconstruction de nouveaux produits documentaires tels que les fiches de synthèse offrant une extractiond’informations structurées selon des critères sémantiques. Cee approche a égalementpour vocation de proposer des outils dédiés à la veille stratégique et à l’intelligence économique
Using the Excom engine for semantic annotation, we have constructed an InformationRetrieval System based on semantic categories from automatic language analyses in order topropose a new approach to text search. e annotations are obtained by the Contextual Explorationmethod which is a knowledge based linguistic approach using markers and disambiguationrules. e queries are formulated according to search viewpoints which are at the heart of theInformation Retrieval strategy. Our approach uses the annotation categories which are organisedin linguistic ontologies structured as graphs. In order to provide relevant results to the user,we have designed algorithms for ranking and paraphrase identification. ese algorithms exploitprincipally the structure of the linguistic ontologies for the annotation. We have carriedout an evaluation of the relevance of the system results taking into account the specificity ofour approach. We have developed user interfaces allowing the construction of new informationproducts such as structured text syntheses using information extraction according to semanticcriteria. is approach also aims to offer tools in the field of economic intelligence

APA, Harvard, Vancouver, ISO, and other styles

46

Adkins, Laura Jean. "A Generalization of the EM Algorithm for Maximum Likelihood Estimation in Mallows' Model Using Partially Ranked Data and Asymptotic Relative Efficiencies for Some Ranking Tests of The K-Sample Problem /." The Ohio State University, 1996. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487933245538208.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Hatefi, Armin. "Mixture model analysis with rank-based samples." Statistica Sinica, 2013. http://hdl.handle.net/1993/23849.

Full text

Abstract:

Simple random sampling (SRS) is the most commonly used sampling design in data collection. In many applications (e.g., in fisheries and medical research) quantification of the variable of interest is either time-consuming or expensive but ranking a number of sampling units, without actual measurement on them, can be done relatively easy and at low cost. In these situations, one may use rank-based sampling (RBS) designs to obtain more representative samples from the underlying population and improve the efficiency of the statistical inference. In this thesis, we study the theory and application of the finite mixture models (FMMs) under RBS designs. In Chapter 2, we study the problems of Maximum Likelihood (ML) estimation and classification in a general class of FMMs under different ranked set sampling (RSS) designs. In Chapter 3, deriving Fisher information (FI) content of different RSS data structures including complete and incomplete RSS data, we show that the FI contained in each variation of the RSS data about different features of FMMs is larger than the FI contained in their SRS counterparts. There are situations where it is difficult to rank all the sampling units in a set with high confidence. Forcing rankers to assign unique ranks to the units (as RSS) can lead to substantial ranking error and consequently to poor statistical inference. We hence focus on the partially rank-ordered set (PROS) sampling design, which is aimed at reducing the ranking error and the burden on rankers by allowing them to declare ties (partially ordered subsets) among the sampling units. Studying the information and uncertainty structures of the PROS data in a general class of distributions, in Chapter 4, we show the superiority of the PROS design in data analysis over RSS and SRS schemes. In Chapter 5, we also investigate the ML estimation and classification problems of FMMs under the PROS design. Finally, we apply our results to estimate the age structure of a short-lived fish species based on the length frequency data, using SRS, RSS and PROS designs.

APA, Harvard, Vancouver, ISO, and other styles

48

Brancotte, Bryan. "Agrégation de classements avec égalités : algorithmes, guides à l'utilisateur et applications aux données biologiques." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112184/document.

Full text

Abstract:

L'agrégation de classements consiste à établir un consensus entre un ensemble de classements (éléments ordonnés). Bien que ce problème ait de très nombreuses applications (consensus entre les votes d'utilisateurs, consensus entre des résultats ordonnés différemment par divers moteurs de recherche...), calculer un consensus exact est rarement faisable dans les cas d'applications réels (problème NP-difficile). De nombreux algorithmes d'approximation et heuristiques ont donc été conçus. Néanmoins, leurs performances (en temps et en qualité de résultat produit) sont très différentes et dépendent des jeux de données à agréger. Plusieurs études ont cherché à comparer ces algorithmes mais celles-ci n’ont généralement pas considéré le cas (pourtant courant dans les jeux de données réels) des égalités entre éléments dans les classements (éléments classés au même rang). Choisir un algorithme de consensus adéquat vis-à-vis d'un jeu de données est donc un problème particulièrement important à étudier (grand nombre d’applications) et c’est un problème ouvert au sens où aucune des études existantes ne permet d’y répondre. Plus formellement, un consensus de classements est un classement qui minimise le somme des distances entre ce consensus et chacun des classements en entrés. Nous avons considérés (comme une grande partie de l’état-de-art) la distance de Kendall-Tau généralisée, ainsi que des variantes, dans nos études. Plus précisément, cette thèse comporte trois contributions. Premièrement, nous proposons de nouveaux résultats de complexité associés aux cas que l'on rencontre dans les données réelles où les classements peuvent être incomplets et où plusieurs éléments peuvent être classés à égalité. Nous isolons les différents « paramètres » qui peuvent expliquer les variations au niveau des résultats produits par les algorithmes d’agrégation (par exemple, utilisation de la distance de Kendall-Tau généralisée ou de variantes, d’un pré-traitement des jeux de données par unification ou projection). Nous proposons un guide pour caractériser le contexte et le besoin d’un utilisateur afin de le guider dans le choix à la fois d’un pré-traitement de ses données mais aussi de la distance à choisir pour calculer le consensus. Nous proposons finalement une adaptation des algorithmes existants à ce nouveau contexte. Deuxièmement, nous évaluons ces algorithmes sur un ensemble important et varié de jeux de données à la fois réels et synthétiques reproduisant des caractéristiques réelles telles que similarité entre classements, la présence d'égalités, et différents pré-traitements. Cette large évaluation passe par la proposition d’une nouvelle méthode pour générer des données synthétiques avec similarités basée sur une modélisation en chaîne Markovienne. Cette évaluation a permis d'isoler les caractéristiques des jeux de données ayant un impact sur les performances des algorithmes d'agrégation et de concevoir un guide pour caractériser le besoin d'un utilisateur et le conseiller dans le choix de l'algorithme à privilégier. Une plateforme web permettant de reproduire et étendre ces analyses effectuée est disponible (rank-aggregation-with-ties.lri.fr). Enfin, nous démontrons l'intérêt d'utiliser l'approche d'agrégation de classements dans deux cas d'utilisation. Nous proposons un outil reformulant à-la-volé des requêtes textuelles d'utilisateur grâce à des terminologies biomédicales, pour ensuite interroger de bases de données biologiques, et finalement produire un consensus des résultats obtenus pour chaque reformulation (conqur-bio.lri.fr). Nous comparons l'outil à la plateforme de références et montrons une amélioration nette des résultats en qualité. Nous calculons aussi des consensus entre liste de workflows établie par des experts dans le contexte de la similarité entre workflows scientifiques. Nous observons que les consensus calculés sont très en accord avec les utilisateurs dans une large proportion de cas
The rank aggregation problem is to build consensus among a set of rankings (ordered elements). Although this problem has numerous applications (consensus among user votes, consensus between results ordered differently by different search engines ...), computing an optimal consensus is rarely feasible in cases of real applications (problem NP-Hard). Many approximation algorithms and heuristics were therefore designed. However, their performance (time and quality of product loss) are quite different and depend on the datasets to be aggregated. Several studies have compared these algorithms but they have generally not considered the case (yet common in real datasets) that elements can be tied in rankings (elements at the same rank). Choosing a consensus algorithm for a given dataset is therefore a particularly important issue to be studied (many applications) and it is an open problem in the sense that none of the existing studies address it. More formally, a consensus ranking is a ranking that minimizes the sum of the distances between this consensus and the input rankings. Like much of the state-of-art, we have considered in our studies the generalized Kendall-Tau distance, and variants. Specifically, this thesis has three contributions. First, we propose new complexity results associated with cases encountered in the actual data that rankings may be incomplete and where multiple items can be classified equally (ties). We isolate the different "features" that can explain variations in the results produced by the aggregation algorithms (for example, using the generalized distance of Kendall-Tau or variants, pre-processing the datasets with unification or projection). We propose a guide to characterize the context and the need of a user to guide him into the choice of both a pre-treatment of its datasets but also the distance to choose to calculate the consensus. We finally adapt existing algorithms to this new context. Second, we evaluate these algorithms on a large and varied set of datasets both real and synthetic reproducing actual features such as similarity between rankings, the presence of ties and different pre-treatments. This large evaluation comes with the proposal of a new method to generate synthetic data with similarities based on a Markov chain modeling. This evaluation led to the isolation of datasets features that impact the performance of the aggregation algorithms, and to design a guide to characterize the needs of a user and advise him in the choice of the algorithm to be use. A web platform to replicate and extend these analyzes is available (rank-aggregation-with-ties.lri.fr). Finally, we demonstrate the value of using the rankings aggregation approach in two use cases. We provide a tool to reformulating the text user queries through biomedical terminologies, to then query biological databases, and ultimately produce a consensus of results obtained for each reformulation (conqur-bio.lri.fr). We compare the results to the references platform and show a clear improvement in quality results. We also calculate consensus between list of workflows established by experts in the context of similarity between scientific workflows. We note that the computed consensus agree with the expert in a very large majority of cases

APA, Harvard, Vancouver, ISO, and other styles

49

Khaki, Kazimali M. "Weightless neural networks for face recognition." Thesis, Brunel University, 2013. http://bura.brunel.ac.uk/handle/2438/8025.

Full text

Abstract:

The interface with the real-world has proved to be extremely challenging throughout the past 70 years in which computer technology has been developing. The problem initially is assumed to be somewhat trivial, as humans are exceptionally skilled at interpreting real-world data, for example pictures and sounds. Traditional analytical methods have so far not provided the complete answer to what will be termed pattern recognition. Biological inspiration has motivated pattern recognition researchers since the early days of the subject, and the idea of a neural network which has self-evolving properties has always been seen to be a potential solution to this endeavour. Unlike the development of computer technology in which successive generations of improved devices have been developed, the neural network approach has been less successful, with major setbacks occurring in its development. However, the fact that natural processing in animals and humans is a voltage-based process, devoid of software, and self-evolving, provides an on-going motivation for pattern recognition in artificial neural networks. This thesis addresses the application of weightless neural networks using a ranking pre-processor to implement general pattern recognition with specific reference to face processing. The evaluation of the system will be carried out on open source databases in order to obtain a direct comparison of the efficacy of the method, in particular considerable use will be made of the MIT-CBCL face database. The methodology is cost effective in both software and hardware forms, offers real-time video processing, and can be implemented on all computer platforms. The results of this research show significant improvements over published results, and provide a viable commercial methodology for general pattern recognition.

APA, Harvard, Vancouver, ISO, and other styles

50

Wang, Bo. "Variable Ranking by Solution-path Algorithms." Thesis, 2012. http://hdl.handle.net/10012/6496.

Full text

Abstract:

Variable Selection has always been a very important problem in statistics. We often meet situations where a huge data set is given and we want to find out the relationship between the response and the corresponding variables. With a huge number of variables, we often end up with a big model even if we delete those that are insignificant. There are two reasons why we are unsatisfied with a final model with too many variables. The first reason is the prediction accuracy. Though the prediction bias might be small under a big model, the variance is usually very high. The second reason is interpretation. With a large number of variables in the model, it's hard to determine a clear relationship and explain the effects of variables we are interested in. A lot of variable selection methods have been proposed. However, one disadvantage of variable selection is that different sizes of model require different tuning parameters in the analysis, which is hard to choose for non-statisticians. Xin and Zhu advocate variable ranking instead of variable selection. Once variables are ranked properly, we can make the selection by adopting a threshold rule. In this thesis, we try to rank the variables using Least Angle Regression (LARS). Some shrinkage methods like Lasso and LARS can shrink the coefficients to zero. The advantage of this kind of methods is that they can give a solution path which describes the order that variables enter the model. This provides an intuitive way to rank variables based on the path. However, Lasso can sometimes be difficult to apply to variable ranking directly. This is because that in a Lasso solution path, variables might enter the model and then get dropped. This dropping issue makes it hard to rank based on the order of entrance. However, LARS, which is a modified version of Lasso, doesn't have this problem. We'll make use of this property and rank variables using LARS solution path.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!