Dissertations / Theses on the topic 'Ranking algorithms'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Ranking algorithms.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Xu, Liqun. "Algorithms for random ranking generation." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0021/MQ54338.pdf.
Full textWong, Brian Wai Fung. "Deep-web search engine ranking algorithms." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61246.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (p. 79-80).
The deep web refers to content that is hidden behind HTML forms. The deep web contains a large collection of data that are unreachable by link-based search engines. A study conducted at University of California, Berkeley estimated that the deep web consists of around 91,000 terabytes of data, whereas the surface web is only about 167 terabytes. To access this content, one must submit valid input values to the HTML form. Several researchers have studied methods for crawling deep web content. One of the most promising methods uses unique wrappers for HTML forms. User inputs are first filtered through the wrappers before being submitted to the forms. However, this method requires a new algorithm for ranking search results generated by the wrappers. In this paper, I explore methods for ranking search results returned from a wrapped-based deep web search engine.
by Brian Wai Fung Wong.
M.Eng.
Trailović, Lidija. "Ranking and optimization of target tracking algorithms." online access from Digital Dissertation Consortium access full-text, 2002. http://libweb.cityu.edu.hk/cgi-bin/er/db/ddcdiss.pl?3074810.
Full textSpanias, Demetris. "Professional tennis : quantitative models and ranking algorithms." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/24813.
Full textTrotman, Andrew, and n/a. "Searching and ranking structured documents." University of Otago. Department of Computer Science, 2007. http://adt.otago.ac.nz./public/adt-NZDU20070403.110440.
Full textDunaiski, Marcel Paul. "Analysing ranking algorithms and publication trends on scholarly citation networks." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/96106.
Full textENGLISH ABSTRACT: Citation analysis is an important tool in the academic community. It can aid universities, funding bodies, and individual researchers to evaluate scientific work and direct resources appropriately. With the rapid growth of the scientific enterprise and the increase of online libraries that include citation analysis tools, the need for a systematic evaluation of these tools becomes more important. The research presented in this study deals with scientific research output, i.e., articles and citations, and how they can be used in bibliometrics to measure academic success. More specifically, this research analyses algorithms that rank academic entities such as articles, authors and journals to address the question of how well these algorithms can identify important and high-impact entities. A consistent mathematical formulation is developed on the basis of a categorisation of bibliometric measures such as the h-index, the Impact Factor for journals, and ranking algorithms based on Google’s PageRank. Furthermore, the theoretical properties of each algorithm are laid out. The ranking algorithms and bibliometric methods are computed on the Microsoft Academic Search citation database which contains 40 million papers and over 260 million citations that span across multiple academic disciplines. We evaluate the ranking algorithms by using a large test data set of papers and authors that won renowned prizes at numerous Computer Science conferences. The results show that using citation counts is, in general, the best ranking metric. However, for certain tasks, such as ranking important papers or identifying high-impact authors, algorithms based on PageRank perform better. As a secondary outcome of this research, publication trends across academic disciplines are analysed to show changes in publication behaviour over time and differences in publication patterns between disciplines.
AFRIKAANSE OPSOMMING: Sitasiesanalise is ’n belangrike instrument in die akademiese omgewing. Dit kan universiteite, befondsingsliggams en individuele navorsers help om wetenskaplike werk te evalueer en hulpbronne toepaslik toe te ken. Met die vinnige groei van wetenskaplike uitsette en die toename in aanlynbiblioteke wat sitasieanalise insluit, word die behoefte aan ’n sistematiese evaluering van hierdie gereedskap al hoe belangriker. Die navorsing in hierdie studie handel oor die uitsette van wetenskaplike navorsing, dit wil sê, artikels en sitasies, en hoe hulle gebruik kan word in bibliometriese studies om akademiese sukses te meet. Om meer spesifiek te wees, hierdie navorsing analiseer algoritmes wat akademiese entiteite soos artikels, outeers en journale gradeer. Dit wys hoe doeltreffend hierdie algoritmes belangrike en hoë-impak entiteite kan identifiseer. ’n Breedvoerige wiskundige formulering word ontwikkel uit ’n versameling van bibliometriese metodes soos byvoorbeeld die h-indeks, die Impak Faktor vir journaale en die rang-algoritmes gebaseer op Google se PageRank. Verder word die teoretiese eienskappe van elke algoritme uitgelê. Die rang-algoritmes en bibliometriese metodes gebruik die sitasiedatabasis van Microsoft Academic Search vir berekeninge. Dit bevat 40 miljoen artikels en meer as 260 miljoen sitasies, wat oor verskeie akademiese dissiplines strek. Ons gebruik ’n groot stel toetsdata van dokumente en outeers wat bekende pryse op talle rekenaarwetenskaplike konferensies gewen het om die rang-algoritmes te evalueer. Die resultate toon dat die gebruik van sitasietellings, in die algemeen, die beste rangmetode is. Vir sekere take, soos die gradeering van belangrike artikels, of die identifisering van hoë-impak outeers, presteer algoritmes wat op PageRank gebaseer is egter beter. ’n Sekondêre resultaat van hierdie navorsing is die ontleding van publikasie tendense in verskeie akademiese dissiplines om sodoende veranderinge in publikasie gedrag oor tyd aan te toon en ook die verskille in publikasie patrone uit verskillende dissiplines uit te wys.
Sun, Mingxuan. "Visualizing and modeling partial incomplete ranking data." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45793.
Full textZacharia, Giorgos 1974. "Regularized algorithms for ranking, and manifold learning for related tasks." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/47753.
Full textIncludes bibliographical references (leaves 119-127).
This thesis describes an investigation of regularized algorithms for ranking problems for user preferences and information retrieval problems. We utilize regularized manifold algorithms to appropriately incorporate data from related tasks. This investigation was inspired by personalization challenges in both user preference and information retrieval ranking problems. We formulate the ranking problem of related tasks as a special case of semi-supervised learning. We examine how to incorporate instances from related tasks, with the appropriate penalty in the loss function to optimize performance on the hold out sets. We present a regularized manifold approach that allows us to learn a distance metric for the different instances directly from the data. This approach allows incorporation of information from related task examples, without prior estimation of cross-task coefficient covariances. We also present applications of ranking problems in two text analysis problems: a) Supervise content-word learning, and b) Company Entity matching for record linkage problems.
by Giorgos Zacharia.
Ph.D.
Halverson, Ranette Hudson. "Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design." Thesis, University of North Texas, 1993. https://digital.library.unt.edu/ark:/67531/metadc278153/.
Full textLee, Chun-fan, and 李俊帆. "Fitting factor models for ranking data using efficient EM-type algorithms." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2002. http://hub.hku.hk/bib/B31227557.
Full textSalomon, Sophie. "Bias Mitigation Techniques and a Cost-Aware Framework for Boosted Ranking Algorithms." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1586450345426827.
Full textMcMeen, John Norman Jr. "Ranking Methods for Global Optimization of Molecular Structures." Digital Commons @ East Tennessee State University, 2014. https://dc.etsu.edu/etd/2447.
Full textStojkovic, Ivan. "Functional Norm Regularization for Margin-Based Ranking on Temporal Data." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/522550.
Full textPh.D.
Quantifying the properties of interest is an important problem in many domains, e.g., assessing the condition of a patient, estimating the risk of an investment or relevance of the search result. However, the properties of interest are often latent and hard to assess directly, making it difficult to obtain classification or regression labels, which are needed to learn a predictive models from observable features. In such cases, it is typically much easier to obtain relative comparison of two instances, i.e. to assess which one is more intense (with respect to the property of interest). One framework able to learn from such kind of supervised information is ranking SVM, and it will make a basis of our approach. Applications in bio-medical datasets typically have specific additional challenges. First, and the major one, is the limited amount of data examples, due to an expensive measuring technology, and/or infrequency of conditions of interest. Such limited number of examples makes both identification of patterns/models and their validation less useful and reliable. Repeated samples from the same subject are collected on multiple occasions over time, which breaks IID sample assumption and introduces dependency structure that needs to be taken into account more appropriately. Also, feature vectors are highdimensional, and typically of much higher cardinality than the number of samples, making models less useful and their learning less efficient. Hypothesis of this dissertation is that use of the functional norm regularization can help alleviating mentioned challenges, by improving generalization abilities and/or learning efficiency of predictive models, in this case specifically of the approaches based on the ranking SVM framework. The temporal nature of data was addressed with loss that fosters temporal smoothness of functional mapping, thus accounting for assumption that temporally proximate samples are more correlated. Large number of feature variables was handled using the sparsity inducing L1 norm, such that most of the features have zero effect in learned functional mapping. Proposed sparse (temporal) ranking objective is convex but non-differentiable, therefore smooth dual form is derived, taking the form of quadratic function with box constraints, which allows efficient optimization. For the case where there are multiple similar tasks, joint learning approach based on matrix norm regularization, using trace norm L* and sparse row L21 norm was also proposed. Alternate minimization with proximal optimization algorithm was developed to solve the mentioned multi-task objective. Generalization potentials of the proposed high-dimensional and multi-task ranking formulations were assessed in series of evaluations on synthetically generated and real datasets. The high-dimensional approach was applied to disease severity score learning from gene expression data in human influenza cases, and compared against several alternative approaches. Application resulted in scoring function with improved predictive performance, as measured by fraction of correctly ordered testing pairs, and a set of selected features of high robustness, according to three similarity measures. The multi-task approach was applied to three human viral infection problems, and for learning the exam scores in Math and English. Proposed formulation with mixed matrix norm was overall more accurate than formulations with single norm regularization.
Temple University--Theses
Puthiya, Parambath Shameem Ahamed. "New methods for multi-objective learning." Thesis, Compiègne, 2016. http://www.theses.fr/2016COMP2322/document.
Full textMulti-objective problems arise in many real world scenarios where one has to find an optimal solution considering the trade-off between different competing objectives. Typical examples of multi-objective problems arise in classification, information retrieval, dictionary learning, online learning etc. In this thesis, we study and propose algorithms for multi-objective machine learning problems. We give many interesting examples of multi-objective learning problems which are actively persuaded by the research community to motivate our work. Majority of the state of the art algorithms proposed for multi-objective learning comes under what is called “scalarization method”, an efficient algorithm for solving multi-objective optimization problems. Having motivated our work, we study two multi-objective learning tasks in detail. In the first task, we study the problem of finding the optimal classifier for multivariate performance measures. The problem is studied very actively and recent papers have proposed many algorithms in different classification settings. We study the problem as finding an optimal trade-off between different classification errors, and propose an algorithm based on cost-sensitive classification. In the second task, we study the problem of diverse ranking in information retrieval tasks, in particular recommender systems. We propose an algorithm for diverse ranking making use of the domain specific information, and formulating the problem as a submodular maximization problem for coverage maximization in a weighted similarity graph. Finally, we conclude that scalarization based algorithms works well for multi-objective learning problems. But when considering algorithms for multi-objective learning problems, scalarization need not be the “to go” approach. It is very important to consider the domain specific information and objective functions. We end this thesis by proposing some of the immediate future work, which are currently being experimented, and some of the short term future work which we plan to carry out
Safran, Mejdl Sultan. "EFFICIENT LEARNING-BASED RECOMMENDATION ALGORITHMS FOR TOP-N TASKS AND TOP-N WORKERS IN LARGE-SCALE CROWDSOURCING SYSTEMS." OpenSIUC, 2018. https://opensiuc.lib.siu.edu/dissertations/1511.
Full textOwusu-Kesseh, Daniel. "The Relative Security Metric of Information Systems: Using AIMD Algorithms." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1462278857.
Full textWilliams, Garrick J. "Abstracting Glicko-2 for Team Games." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427962458.
Full textLacour, Renaud. "Approches de résolution exacte et approchée en optimisation combinatoire multi-objectif, application au problème de l'arbre couvrant de poids minimal." Thesis, Paris 9, 2014. http://www.theses.fr/2014PA090067/document.
Full textThis thesis deals with several aspects related to solving multi-objective problems, without restriction to the bi-objective case. We consider exact solving, which generates the nondominated set, and approximate solving, which computes an approximation of the nondominated set with a priori guarantee on the quality.We first consider the determination of an explicit representation of the search region. The search region, defined with respect to a set of known feasible points, excludes from the objective space the part which is dominated by these points. Future efforts to find all nondominated points should therefore be concentrated on the search region.Then we review branch and bound and ranking algorithms and we propose a new hybrid approach for the determination of the nondominated set. We show how the proposed method can be adapted to generate an approximation of the nondominated set. This approach is instantiated on the minimum spanning tree problem. We review several properties of this problem which enable us to specialize some procedures of the proposed approach and integrate specific preprocessing rules. This approach is finally supported through experimental results
Krestel, Ralf [Verfasser]. "On the use of language models and topic models in the web : new algorithms for filtering, classification, ranking, and recommendation / Ralf Krestel." Hannover : Technische Informationsbibliothek und Universitätsbibliothek Hannover (TIB), 2012. http://d-nb.info/1022753363/34.
Full textHarrington, Edward, and edwardharrington@homemail com au. "Aspects of Online Learning." The Australian National University. Research School of Information Sciences and Engineering, 2004. http://thesis.anu.edu.au./public/adt-ANU20060328.160810.
Full textGuan, Wei. "New support vector machine formulations and algorithms with application to biomedical data analysis." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41126.
Full textPeel, Thomas. "Algorithmes de poursuite stochastiques et inégalités de concentration empiriques pour l'apprentissage statistique." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4769/document.
Full textThe first part of this thesis introduces new algorithms for the sparse encoding of signals. Based on Matching Pursuit (MP) they focus on the following problem : how to reduce the computation time of the selection step of MP. As an answer, we sub-sample the dictionary in line and column at each iteration. We show that this theoretically grounded approach has good empirical performances. We then propose a bloc coordinate gradient descent algorithm for feature selection problems in the multiclass classification setting. Thanks to the use of error-correcting output codes, this task can be seen as a simultaneous sparse encoding of signals problem. The second part exposes new empirical Bernstein inequalities. Firstly, they concern the theory of the U-Statistics and are applied in order to design generalization bounds for ranking algorithms. These bounds take advantage of a variance estimator and we propose an efficient algorithm to compute it. Then, we present an empirical version of the Bernstein type inequality for martingales by Freedman [1975]. Again, the strength of our result lies in the variance estimator computable from the data. This allows us to propose generalization bounds for online learning algorithms which improve the state of the art and pave the way to a new family of learning algorithms taking advantage of this empirical information
Yang, Bo. "Analyses bioinformatiques et classements consensus pour les données biologiques à haut débit." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112250/document.
Full textIt is thought to be more and more important to solve biological questions using Bioinformatics approaches in the post-genomic era. This thesis focuses on two problems related to high troughput data: bioinformatics analysis at a large scale, and development of algorithms of consensus ranking. In molecular biology and genetics, RNA splicing is a modification of the nascent pre-messenger RNA (pre-mRNA) transcript in which introns are removed and exons are joined. The U2AF heterodimer has been well studied for its role in defining functional 3’ splice sites in pre-mRNA splicing, but multiple critical problems are still outstanding, including the functional impact of their cancer-associated mutations. Through genome-wide analysis of U2AF-RNA interactions, we report that U2AF has the capacity to define ~88% of functional 3’ splice sites in the human genome. Numerous U2AF binding events also occur in other genomic locations, and metagene and minigene analysis suggests that upstream intronic binding events interfere with the immediate downstream 3’ splice site associated with either the alternative exon to cause exon skipping or competing constitutive exon to induce inclusion of the alternative exon. We further build up a U2AF65 scoring scheme for predicting its target sites based on the high throughput sequencing data using a Maximum Entropy machine learning method, and the scores on the up and down regulated cases are consistent with our regulation model. These findings reveal the genomic function and regulatory mechanism of U2AF, which facilitates us understanding those associated diseases.Ranking biological data is a crucial need. Instead of developing new ranking methods, Cohen-Boulakia and her colleagues proposed to generate a consensus ranking to highlight the common points of a set of rankings while minimizing their disagreements to combat the noise and error for biological data. However, it is a NP-hard questioneven for only four rankings based on the Kendall-tau distance. In this thesis, we propose a new variant of pivot algorithms named as Consistent-Pivot. It uses a new strategy of pivot selection and other elements assignment, which performs better both on computation time and accuracy than previous pivot algorithms
Cure, Morgane. "Concurrence à l'ère du numérique : exemples dans l'industrie hôtelière." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG013.
Full textThe growing digitalization of the economy has been disrupting the sellers distribution channels and has been favoring the emergence of new players: intermediation platforms. Meanwhile the traditional resale model gives way to an agency model and creates fertile ground for different cases of vertical restraints. The increasing digitalization of markets therefore pushes competition authorities to question and adapt their economic analysis of practices. This thesis focuses on the hotel industry which has been the subject of several specific cases, especially in Europe. Contractual practices such as price parity clauses imposed by online travel agencies to hotels have been the subject of numerous investigations. The first chapter of this thesis develops a model of structural demand estimation, allowing to assess the degree of substitution between the online distribution channels of a hotel chain, a crucial element in the market definition. Following the various competition cases, price parity clauses were partially or completely prohibited in several countries. In response, the platforms have developed new programs offering hotels an increased visibility in exchange of the voluntary compliance of price parity clauses. The second chapter of this thesis studies the effect of the adoption of this program on the prices set by the hotels separating the effects linked to the demand increase, thanks to visibility gains, from those linked to the clause compliance and fee increase linked to the program. This thesis also deals with the link between online travel agencies and another type of platforms: price comparison websites. The latter promise consumers the display of the most competitive offers on the market but the criteria used in the ranking algorithms are now debated. Moreover, their vertical integration into larger groups, which also have online travel agencies, raises questions about their impartiality. The third chapter studies the impact of the integration of Kayak and several online travel agencies (such as Booking.com) within the Booking Holding group on the ranking of hotels and sales channels displayed on the price comparison website
Robbiano, Sylvain. "Méthodes d'apprentissage statistique pour le ranking : théorie, algorithmes et applications." Phd thesis, Telecom ParisTech, 2013. http://tel.archives-ouvertes.fr/tel-00936092.
Full text李莉華 and Lei-wah Lee. "On improving the relevancy ranking algorithm in web search engine." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B31222973.
Full textLee, Lei-wah. "On improving the relevancy ranking algorithm in web search engine /." Hong Kong : University of Hong Kong, 2000. http://sunzi.lib.hku.hk/hkuto/record.jsp?B21607448.
Full textKangas, Carl-Evert. "Ranking Highscores : Evaluation of a dynamic Bucket with Global Query algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-127677.
Full textYang, Fang. "A Comprehensive Approach for Bulk Power System Reliability Assessment." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14488.
Full textMittal, Arpit. "Human layout estimation using structured output learning." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:bb290cfd-5216-42d7-b3d2-c2b4b01614bc.
Full textVayatis, Nicolas. "Approches statistiques en apprentissage : boosting et ranking." Habilitation à diriger des recherches, Université Pierre et Marie Curie - Paris VI, 2006. http://tel.archives-ouvertes.fr/tel-00120738.
Full textsélectionnant un estimateur au sein d'une classe massive telle que l'enveloppe convexe d'une classe de VC. Dans le premier volet du mémoire, on rappelle les interprétations des algorithmes de boosting comme des implémentations de principes de minimisation
de risques convexes et on étudie leurs propriétés sous cet angle. En particulier, on montre l'importance de la
régularisation pour obtenir des stratégies consistantes. On développe également une nouvelle classe d'algorithmes de type gradient stochastique appelés algorithmes de descente miroir avec moyennisation et on évalue leur comportement à travers des simulations informatiques. Après avoir présenté les principes fondamentaux du boosting, on s'attache dans le
deuxième volet à des questions plus avancées telles que
l'élaboration d'inégalités d'oracle. Ainsi, on étudie la
calibration précise des pénalités en fonction des critères
de coût utilisés. On présente des résultats
non-asymptotiques sur la performance des estimateurs du boosting pénalisés, notamment les vitesses rapides sous les conditions de marge de type Mammen-Tsybakov et on décrit les capacités d'approximation du boosting utilisant les "rampes" (stumps) de décision. Le troisième volet du mémoire explore le problème du ranking. Un enjeu important dans des applications
telles que la fouille de documents ou le "credit scoring" est d'ordonner les instances plutôt que de les catégoriser. On propose une formulation simple de ce problème qui permet d'interpréter le ranking comme une classification sur des paires d'observations. La différence dans ce cas vient du fait que les
critères empiriques sont des U-statistiques et on développe donc la théorie de la classification adaptée à ce contexte. On explore également la question de la généralisation de l'erreur de ranking afin de pouvoir inclure des a priori sur l'ordre des instances, comme dans le cas où on ne s'intéresse qu'aux "meilleures" instances.
Silva, Sérgio Francisco da. "Seleção de características por meio de algoritmos genéticos para aprimoramento de rankings e de modelos de classificação." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19072011-151501/.
Full textContent-based image retrieval (CBIR) and classification systems rely on feature vectors extracted from images considering specific visual criteria. It is common that the size of a feature vector is of the order of hundreds of elements. When the size (dimensionality) of the feature vector is increased, a higher degree of redundancy and irrelevancy can be observed, leading to the \"curse of dimensionality\" problem. Thus, the selection of relevant features is a key aspect in a CBIR or classification system. This thesis presents new methods based on genetic algorithms (GA) to perform feature selection. The Fc (\"Fitness coach\") family of fitness functions proposed takes advantage of single valued ranking evaluation functions, in order to develop a new method of genetic feature selection tailored to improve the accuracy of CBIR systems. The ability of the genetic algorithms to boost feature selection by employing evaluation criteria (fitness functions) improves up to 22% the precision of the query answers in the analyzed databases when compared to traditional wrapper feature selection methods based on decision-tree (C4.5), naive bayes, support vector machine, 1-nearest neighbor and association rule mining. Other contributions of this thesis are two filter-based feature selection algorithms for classification purposes, which calculate the simplified silhouette statistic as evaluation function: the silhouette-based greedy search (SiGS) and the silhouette-based genetic algorithm search (SiGAS). The proposed algorithms overcome the state-of-the-art ones (CFS, FCBF and ReliefF, among others). It is important to stress that the gain in accuracy of the proposed methods family Fc, SiGS and SIGAS is allied to a significant decrease in the feature vector size, what can reach up to 90%
Vogel, Robin. "Similarity ranking for biometrics : theory and practice." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT031.
Full textThe rapid growth in population, combined with the increased mobility of people has created a need for sophisticated identity management systems.For this purpose, biometrics refers to the identification of individuals using behavioral or biological characteristics. The most popular approaches, i.e. fingerprint, iris or face recognition, are all based on computer vision methods. The adoption of deep convolutional networks, enabled by general purpose computing on graphics processing units, made the recent advances incomputer vision possible. These advances have led to drastic improvements for conventional biometric methods, which boosted their adoption in practical settings, and stirred up public debate about these technologies. In this respect, biometric systems providers face many challenges when learning those networks.In this thesis, we consider those challenges from the angle of statistical learning theory, which leads us to propose or sketch practical solutions. First, we answer to the proliferation of papers on similarity learningfor deep neural networks that optimize objective functions that are disconnected with the natural ranking aim sought out in biometrics. Precisely, we introduce the notion of similarity ranking, by highlighting the relationship between bipartite ranking and the requirements for similarities that are well suited to biometric identification. We then extend the theory of bipartite ranking to this new problem, by adapting it to the specificities of pairwise learning, particularly those regarding its computational cost. Usual objective functions optimize for predictive performance, but recentwork has underlined the necessity to consider other aspects when training a biometric system, such as dataset bias, prediction robustness or notions of fairness. The thesis tackles all of those three examplesby proposing their careful statistical analysis, as well as practical methods that provide the necessary tools to biometric systems manufacturers to address those issues, without jeopardizing the performance of their algorithms
Jaini, Nor. "An efficient ranking analysis in multi-criteria decision making." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/an-efficient-ranking-analysis-in-multicriteria-decision-making(c5a694d5-fd43-434f-9f9f-b86f7581b97c).html.
Full textJunior, Lucelindo Dias Ferreira. "Sistema de Engenharia Kansei para apoiar a descrição da visão do produto no contexto do Gerenciamento Ágil de Projetos de produtos manufaturados." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/18/18156/tde-09032012-141046/.
Full textThe Agile Project Management is a useful approach for projects with high degree of complexity and uncertainty. Two of its singularities are: costumer involvement in decision making about the product design; and the use of a product vision, an artifact that represents and communicates the fundamental and high-priority features of the product to be developed. There are methods to support the creation of the product vision, but they have shortcomings in operationalizing the costumer involvement. On the other hand, there is the Kansei Engineering, a methodology to capture the needs of a large number of consumers and correlate them to product features. This paper presents a detailed study of the Kansei Engineering methodology and analyzes how this can be useful to support the description of the product vision, in the context of Agile Project Management of manufactured products. Then, to verify this proposition, it presents the development of a Kansei Engineering System based on Quantification Theory Type I, Fuzzy Arithmetic and Genetic Algorithms, tested for the design of a pen aimed at graduate students. To implement the project we used a set of methods and procedures, such as systematic literature review, mathematical development, computational development, and case study. It analyzes the proposed Kansei Engineering System and the results in the case study applied, to ascertain their potential. Evidence indicates that Kansei Engineering System is capable of generating requirements on product configurations from the perspective of the potential consumer, and that these configurations are useful for the description of the product vision and for the progression of this vision during the project of the product.
Pascoal, Luiz Mário Lustosa. "Um método social-evolucionário para geração de rankings que apoiem a recomendação de eventos." Universidade Federal de Goiás, 2014. http://repositorio.bc.ufg.br/tede/handle/tede/4345.
Full textApproved for entry into archive by Erika Demachki (erikademachki@gmail.com) on 2015-03-24T21:19:16Z (GMT) No. of bitstreams: 3 Dissertação - Luiz Mario Lustosa Pascoal - 2014.pdf: 7280181 bytes, checksum: 68a6ac0602e3e51f6e6952bbd6916150 (MD5) FunctionApproximator.zip: 2288624 bytes, checksum: 178c2e6a0b080b3d0548836974016236 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Made available in DSpace on 2015-03-24T21:19:16Z (GMT). No. of bitstreams: 3 Dissertação - Luiz Mario Lustosa Pascoal - 2014.pdf: 7280181 bytes, checksum: 68a6ac0602e3e51f6e6952bbd6916150 (MD5) FunctionApproximator.zip: 2288624 bytes, checksum: 178c2e6a0b080b3d0548836974016236 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-08-22
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
With the development of web 2.0, social networks have achieved great space on the internet, with that many users provide information and interests about themselves. There are expert systems that make use of the user’s interests to recommend different products, these systems are known as Recommender Systems. One of the main techniques of a Recommender Systems is the Collaborative Filtering (User-based) which recommends products to users based on what other similar people liked in the past. Therefore, this work presents model approximation of functions that generates rankings, that through a Genetic Algorithm, is able to learn an approximation function composed by different social variables, customized for each Facebook user. The learned function must be able to reproduce a ranking of people (friends) originally created with user’s information, that apply some influence in the user’s decision. As a case study, this work discusses the context of events through information regarding the frequency of participation of some users at several distinct events. Two different approaches on learning and applying the approximation function have been developed. The first approach provides a general model that learns a function in advance and then applies it in a set of test data and the second approach presents an specialist model that learns a specific function for each test scenario. Two proposals for evaluating the ordering created by the learned function, called objective functions A and B, where the results for both objective functions show that it is possible to obtain good solutions with the generalist and the specialist approaches of the proposed method.
Com o desenvolvimento da Web 2.0, as redes sociais têm conquistado grande espaço na internet, com isso muitos usuários acabam fornecendo diversas informações e interesses sobre si mesmos. Existem sistemas especialistas que fazem uso dos interesses do usuário para recomendar diferentes produtos, esses sistemas são conhecidos como Sistemas de Recomendação. Uma das principais técnicas de um Sistema de Recomendação é a Filtragem Colaborativa (User-based) que recomenda produtos para seus usuários baseados no que outras pessoas similares à ele tenham gostado no passado. Portanto, este trabalho apresenta um modelo de aproximação de funções geradora de rankings que, através de um Algoritmo Genético, é capaz de aprender uma função de aproximação composta por diferentes atributos sociais, personalizada para cada usuário do Facebook. A função aprendida deve ser capaz de reproduzir um ranking de pessoas (amigos) criado originalmente com informações do usuário, que exercem certa influência na decisão do usuário. Como estudo de caso, esse trabalho aborda o contexto de eventos através de informações com relação a frequência de participação de alguns usuários em vários eventos distintos. Foram desenvolvidas duas abordagens distintas para aprendizagem e aplicação da função de aproximação. A primeira abordagem apresenta um modelo generalista, que previamente aprende uma função e em seguida a aplica em um conjunto de dados de testes e a segunda abordagem apresenta um modelo especialista, que aprende uma função específica para cada cenário de teste. Também foram apresentadas duas propostas para avaliação da ordenação criada pela função aprendida, denominadas funções objetivo A e B, onde os resultados para ambas as funções objetivo A e B mostram que é possível obter boas soluções com as abordagens generalista e especialista do método proposto.
Browning, James Paul. "On detection and ranking methods for a distributed radio-frequency sensor network : theory and algorithmic implementation." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10047710/.
Full textMilchevski, Evica [Verfasser], and Sebastian [Akademischer Betreuer] Michel. "Similarity Search Algorithms over Top-k Rankings and Class-Constrained Objects / Evica Milchevski ; Betreuer: Sebastian Michel." Kaiserslautern : Technische Universität Kaiserslautern, 2019. http://d-nb.info/1194372554/34.
Full textKim, Jinhan. "J-model : an open and social ensemble learning architecture for classification." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/7672.
Full textXie, Lin. "Statistical inference for rankings in the presence of panel segmentation." Diss., Kansas State University, 2011. http://hdl.handle.net/2097/13247.
Full textDepartment of Statistics
Paul Nelson
Panels of judges are often used to estimate consumer preferences for m items such as food products. Judges can either evaluate each item on several ordinal scales and indirectly produce an overall ranking, or directly report a ranking of the items. A complete ranking orders all the items from best to worst. A partial ranking, as we use the term, only reports rankings of the best q out of m items. Direct ranking, the subject of this report, does not require the widespread but questionable practice of treating ordinal measurement as though they were on ratio or interval scales. Here, we develop and study segmentation models in which the panel may consist of relatively homogeneous subgroups, the segments. Judges within a subgroup will tend to agree among themselves and differ from judges in the other subgroups. We develop and study the statistical analysis of mixture models where it is not known to which segment a judge belongs or in some cases how many segments there are. Viewing segment membership indicator variables as latent data, an E-M algorithm was used to find the maximum likelihood estimators of the parameters specifying a mixture of Mallow’s (1957) distance models for complete and partial rankings. A simulation study was conducted to evaluate the behavior of the E-M algorithm in terms of such issues as the fraction of data sets for which the algorithm fails to converge and the sensitivity of initial values to the convergence rate and the performance of the maximum likelihood estimators in terms of bias and mean square error, where applicable. A Bayesian approach was developed and credible set estimators was constructed. Simulation was used to evaluate the performance of these credible sets as confidence sets. A method for predicting segment membership from covariates measured on a judge was derived using a logistic model applied to a mixture of Mallows probability distance models. The effects of covariates on segment membership were assessed. Likelihood sets for parameters specifying mixtures of Mallows distance models were constructed and explored.
Ben, Qingyan. "Flight Sorting Algorithm Based on Users’ Behaviour." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294132.
Full textModellen förutsäger den bästa flygordern och rekommenderar bästa flyg till användarna. Avhandlingen kan delas in i följande tre delar: Funktionsval, databehandling och olika algoritms experiment. För funktionsval, förutom den ursprungliga informationen om själva flygningen, lägger vi till användarens urvalsstatus i vår modell, vilken flygklassen är , tillsammans med barn eller inte. Datarengöring används för att hantera dubbletter och ofullständiga data. Därefter tar en normaliserings metod bort bruset i data. Efter olika balanserings behandlingar är SMOTE-metoden mest lämplig för att korrigera klassobalans flyg data. Baserat på våra befintliga data väljer jag klassificerings modell och sekventiell ranknings algoritm. Använd pris, direktflyg eller inte, restid etc. som funktioner, och klicka eller inte som etikett. Klassificerings algoritmerna som jag använde inkluderar Logistic Regression, Gradient Boost, KNN, Decision Tree, Random Forest, Gaussian Process Classifier, Gaussian NB Bayesian and Quadratic Discriminant Analysis. Dessutom antog vi också Sequential ranking algoritm. Resultaten visar att Random Forest-SMOTE presterar bäst med AUC för ROC = 0.94, noggrannhet = 0.8998.
Zapién, Arreola Karina. "Algorithme de chemin de régularisation pour l'apprentissage statistique." Thesis, Rouen, INSA, 2009. http://www.theses.fr/2009ISAM0001/document.
Full textThe selection of a proper model is an essential task in statistical learning. In general, for a given learning task, a set of parameters has to be chosen, each parameter corresponds to a different degree of “complexity”. In this situation, the model selection procedure becomes a search for the optimal “complexity”, allowing us to estimate a model that assures a good generalization. This model selection problem can be summarized as the calculation of one or more hyperparameters defining the model complexity in contrast to the parameters that allow to specify a model in the chosen complexity class. The usual approach to determine these parameters is to use a “grid search”. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piece wise linear. Moreover, its calculation is no more complex than calculating a single PQP solution. This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines’ (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset [Ross 07b] are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors
Paris, Bruno Mendonça. "Learning to rank: combinação de algoritmos aplicando stacking e análise dos resultados." Universidade Presbiteriana Mackenzie, 2017. http://tede.mackenzie.br/jspui/handle/tede/3494.
Full textApproved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2018-04-04T11:43:59Z (GMT) No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2018-04-04T11:43:59Z (GMT). No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-11-07
With the growth of the amount of information available in recent years, which will continue to grow due to the increase in users, devices and information shared over the internet, accessing the desired information should be done in a quick way so it is not spent too much time looking for what you want. A search in engines like Google, Yahoo, Bing is expected that the rst results bring the desired information. An area that aims to bring relevant documents to the user is known as Information Retrieval and can be aided by Learning to Rank algorithms, which applies machine learning to try to bring important documents to users in the best possible ordering. This work aims to verify a way to get an even better ordering of documents, using a technique of combining algorithms known as Stacking. To do so, it will used the RankLib tool, part of Lemur Project, developed in the Java language that contains several Learning to Rank algorithms, and the datasets from a base maintained by Microsoft Research Group known as LETOR.
Com o crescimento da quantidade de informação disponível nos últimos anos, a qual irá continuar crescendo devido ao aumento de usuários, dispositivos e informações compartilhadas pela internet, acessar a informação desejada deve ser feita de uma maneira rápida afim de não se gastar muito tempo procurando o que se deseja. Uma busca em buscadores como Google, Yahoo, Bing espera-se que os primeiros resultados tragam a informação desejada. Uma área que tem o objetivo de trazer os documentos relevantes para o usuário é conhecida por Recuperação de Informação e pode ser auxiliada por algoritmos Learning to Rank, que aplica aprendizagem de máquina para tentar trazer os documentos importantes aos usuários na melhor ordenação possível. Esse trabalho visa verificar uma maneira de obter uma ordenação ainda melhor de documentos, empregando uma técnica de combinar algoritmos conhecida por Stacking. Para isso será utilizada a ferramenta RankLib, parte de um projeto conhecido por Lemur, desenvolvida na linguagem Java, que contém diversos algoritmos Learning to Rank, e o conjuntos de dados provenientes de uma base mantida pela Microsoft Research Group conhecida por LETOR.
Niu, Yue S., Ning Hao, and Heping Zhang. "Multiple Change-Point Detection: A Selective Overview." INST MATHEMATICAL STATISTICS, 2016. http://hdl.handle.net/10150/622820.
Full textAtanassova, Iana. "Exploitation informatique des annotations sémantiques automatiques d'Excom pour la recherche d'informations et la navigation." Thesis, Paris 4, 2012. http://www.theses.fr/2012PA040252.
Full textUsing the Excom engine for semantic annotation, we have constructed an InformationRetrieval System based on semantic categories from automatic language analyses in order topropose a new approach to text search. e annotations are obtained by the Contextual Explorationmethod which is a knowledge based linguistic approach using markers and disambiguationrules. e queries are formulated according to search viewpoints which are at the heart of theInformation Retrieval strategy. Our approach uses the annotation categories which are organisedin linguistic ontologies structured as graphs. In order to provide relevant results to the user,we have designed algorithms for ranking and paraphrase identification. ese algorithms exploitprincipally the structure of the linguistic ontologies for the annotation. We have carriedout an evaluation of the relevance of the system results taking into account the specificity ofour approach. We have developed user interfaces allowing the construction of new informationproducts such as structured text syntheses using information extraction according to semanticcriteria. is approach also aims to offer tools in the field of economic intelligence
Adkins, Laura Jean. "A Generalization of the EM Algorithm for Maximum Likelihood Estimation in Mallows' Model Using Partially Ranked Data and Asymptotic Relative Efficiencies for Some Ranking Tests of The K-Sample Problem /." The Ohio State University, 1996. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487933245538208.
Full textHatefi, Armin. "Mixture model analysis with rank-based samples." Statistica Sinica, 2013. http://hdl.handle.net/1993/23849.
Full textBrancotte, Bryan. "Agrégation de classements avec égalités : algorithmes, guides à l'utilisateur et applications aux données biologiques." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112184/document.
Full textThe rank aggregation problem is to build consensus among a set of rankings (ordered elements). Although this problem has numerous applications (consensus among user votes, consensus between results ordered differently by different search engines ...), computing an optimal consensus is rarely feasible in cases of real applications (problem NP-Hard). Many approximation algorithms and heuristics were therefore designed. However, their performance (time and quality of product loss) are quite different and depend on the datasets to be aggregated. Several studies have compared these algorithms but they have generally not considered the case (yet common in real datasets) that elements can be tied in rankings (elements at the same rank). Choosing a consensus algorithm for a given dataset is therefore a particularly important issue to be studied (many applications) and it is an open problem in the sense that none of the existing studies address it. More formally, a consensus ranking is a ranking that minimizes the sum of the distances between this consensus and the input rankings. Like much of the state-of-art, we have considered in our studies the generalized Kendall-Tau distance, and variants. Specifically, this thesis has three contributions. First, we propose new complexity results associated with cases encountered in the actual data that rankings may be incomplete and where multiple items can be classified equally (ties). We isolate the different "features" that can explain variations in the results produced by the aggregation algorithms (for example, using the generalized distance of Kendall-Tau or variants, pre-processing the datasets with unification or projection). We propose a guide to characterize the context and the need of a user to guide him into the choice of both a pre-treatment of its datasets but also the distance to choose to calculate the consensus. We finally adapt existing algorithms to this new context. Second, we evaluate these algorithms on a large and varied set of datasets both real and synthetic reproducing actual features such as similarity between rankings, the presence of ties and different pre-treatments. This large evaluation comes with the proposal of a new method to generate synthetic data with similarities based on a Markov chain modeling. This evaluation led to the isolation of datasets features that impact the performance of the aggregation algorithms, and to design a guide to characterize the needs of a user and advise him in the choice of the algorithm to be use. A web platform to replicate and extend these analyzes is available (rank-aggregation-with-ties.lri.fr). Finally, we demonstrate the value of using the rankings aggregation approach in two use cases. We provide a tool to reformulating the text user queries through biomedical terminologies, to then query biological databases, and ultimately produce a consensus of results obtained for each reformulation (conqur-bio.lri.fr). We compare the results to the references platform and show a clear improvement in quality results. We also calculate consensus between list of workflows established by experts in the context of similarity between scientific workflows. We note that the computed consensus agree with the expert in a very large majority of cases
Khaki, Kazimali M. "Weightless neural networks for face recognition." Thesis, Brunel University, 2013. http://bura.brunel.ac.uk/handle/2438/8025.
Full textWang, Bo. "Variable Ranking by Solution-path Algorithms." Thesis, 2012. http://hdl.handle.net/10012/6496.
Full text