Dissertations / Theses on the topic 'Ranking learning'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Ranking learning.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Latham, Andrew C. "Multiple-Instance Feature Ranking." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1440642294.
Full textSinsel, Erik W. "Ensemble learning for ranking interesting attributes." Morgantown, W. Va. : [West Virginia University Libraries], 2005. https://eidr.wvu.edu/etd/documentdata.eTD?documentid=4400.
Full textTitle from document title page. Document formatted into pages; contains viii, 81 p. : ill. Includes abstract. Includes bibliographical references (p. 72-74).
Mattsson, Fredrik, and Anton Gustafsson. "Optimize Ranking System With Machine Learning." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-37431.
Full textAchab, Mastane. "Ranking and risk-aware reinforcement learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT020.
Full textThis thesis divides into two parts: the first part is on ranking and the second on risk-aware reinforcement learning. While binary classification is the flagship application of empirical risk minimization (ERM), the main paradigm of machine learning, more challenging problems such as bipartite ranking can also be expressed through that setup. In bipartite ranking, the goal is to order, by means of scoring methods, all the elements of some feature space based on a training dataset composed of feature vectors with their binary labels. This thesis extends this setting to the continuous ranking problem, a variant where the labels are taking continuous values instead of being simply binary. The analysis of ranking data, initiated in the 18th century in the context of elections, has led to another ranking problem using ERM, namely ranking aggregation and more precisely the Kemeny's consensus approach. From a training dataset made of ranking data, such as permutations or pairwise comparisons, the goal is to find the single "median permutation" that best corresponds to a consensus order. We present a less drastic dimensionality reduction approach where a distribution on rankings is approximated by a simpler distribution, which is not necessarily reduced to a Dirac mass as in ranking aggregation.For that purpose, we rely on mathematical tools from the theory of optimal transport such as Wasserstein metrics. The second part of this thesis focuses on risk-aware versions of the stochastic multi-armed bandit problem and of reinforcement learning (RL), where an agent is interacting with a dynamic environment by taking actions and receiving rewards, the objective being to maximize the total payoff. In particular, a novel atomic distributional RL approach is provided: the distribution of the total payoff is approximated by particles that correspond to trimmed means
Korba, Anna. "Learning from ranking data : theory and methods." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLT009/document.
Full textRanking data, i.e., ordered list of items, naturally appears in a wide variety of situations, especially when the data comes from human activities (ballots in political elections, survey answers, competition results) or in modern applications of data processing (search engines, recommendation systems). The design of machine-learning algorithms, tailored for these data, is thus crucial. However, due to the absence of any vectorial structure of the space of rankings, and its explosive cardinality when the number of items increases, most of the classical methods from statistics and multivariate analysis cannot be applied in a direct manner. Hence, a vast majority of the literature rely on parametric models. In this thesis, we propose a non-parametric theory and methods for ranking data. Our analysis heavily relies on two main tricks. The first one is the extensive use of the Kendall’s tau distance, which decomposes rankings into pairwise comparisons. This enables us to analyze distributions over rankings through their pairwise marginals and through a specific assumption called transitivity, which prevents cycles in the preferences from happening. The second one is the extensive use of embeddings tailored to ranking data, mapping rankings to a vector space. Three different problems, unsupervised and supervised, have been addressed in this context: ranking aggregation, dimensionality reduction and predicting rankings with features.The first part of this thesis focuses on the ranking aggregation problem, where the goal is to summarize a dataset of rankings by a consensus ranking. Among the many ways to state this problem stands out the Kemeny aggregation method, whose solutions have been shown to satisfy many desirable properties, but can be NP-hard to compute. In this work, we have investigated the hardness of this problem in two ways. Firstly, we proposed a method to upper bound the Kendall’s tau distance between any consensus candidate (typically the output of a tractable procedure) and a Kemeny consensus, on any dataset. Then, we have casted the ranking aggregation problem in a rigorous statistical framework, reformulating it in terms of ranking distributions, and assessed the generalization ability of empirical Kemeny consensus.The second part of this thesis is dedicated to machine learning problems which are shown to be closely related to ranking aggregation. The first one is dimensionality reduction for ranking data, for which we propose a mass-transportation approach to approximate any distribution on rankings by a distribution exhibiting a specific type of sparsity. The second one is the problem of predicting rankings with features, for which we investigated several methods. Our first proposal is to adapt piecewise constant methods to this problem, partitioning the feature space into regions and locally assigning as final label (a consensus ranking) to each region. Our second proposal is a structured prediction approach, relying on embedding maps for ranking data enjoying theoretical and computational advantages
FILHO, FRANCISCO BENJAMIM. "RANKING OF WEB PAGES BY LEARNING MULTIPLE LATENT CATEGORIES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19540@1.
Full textO crescimento explosivo e a acessibilidade generalizada da World Wide Web (WWW) levaram ao aumento da atividade de pesquisa na área da recuperação de informação para páginas Web. A WWW é um rico e imenso ambiente em que as páginas se assemelham a uma comunidade grande de elementos conectada através de hiperlinks em razão da semelhança entre o conteúdo das páginas, a popularidade da página, a autoridade sobre o assunto e assim por diante, sabendo-se que, em verdade, quando um autor de uma página a vincula à outra, está concebendo-a como importante para si. Por isso, a estrutura de hiperlink da WWW é conhecida por melhorar significativamente o desempenho das pesquisas para além do uso de estatísticas de distribuição simples de texto. Nesse sentido, a abordagem Hyperlink Induced Topic Search (HITS) introduz duas categorias básicas de páginas Web, hubs e autoridades, que revelam algumas informações semânticas ocultas a partir da estrutura de hiperlink. Em 2005, fizemos uma primeira extensão do HITS, denominada de Extended Hyperlink Induced Topic Search (XHITS), que inseriu duas novas categorias de páginas Web, quais sejam, novidades e portais. Na presente tese, revisamos o XHITS, transformando-o em uma generalização do HITS, ampliando o modelo de duas categorias para várias e apresentando um algoritmo eficiente de aprendizagem de máquina para calibrar o modelo proposto valendo-se de múltiplas categorias latentes. As descobertas aqui expostas indicam que a nova abordagem de aprendizagem fornece um modelo XHITS mais preciso. É importante registrar, por fim, que os experimentos realizados com a coleção ClueWeb09 25TB de páginas da WWW, baixadas em 2009, mostram que o XHITS pode melhorar significativamente a eficácia da pesquisa Web e produzir resultados comparáveis aos do TREC 2009/2010 Web Track, colocando-o na sexta posição, conforme os resultados publicados.
The rapid growth and generalized accessibility of the World Wide Web (WWW) have led to an increase in research in the field of the information retrieval for Web pages. The WWW is an immense and prodigious environment in which Web pages resemble a huge community of elements. These elements are connected via hyperlinks on the basis of similarity between the content of the pages, the popularity of a given page, the extent to which the information provided is authoritative in relation to a given field etc. In fact, when the author of a Web page links it to another, s/he is acknowledging the importance of the linked page to his/her information. As such the hyperlink structure of the WWW significantly improves research performance beyond the use of simple text distribution statistics. To this effect, the HITS approach introduces two basic categories of Web pages, hubs and authorities which uncover certain hidden semantic information using the hyperlink structure. In 2005, we made a first extension of HITS, called Extended Hyperlink Induced Topic Search (XHITS), which inserted two new categories of Web pages, which are novelties and portals. In this thesis, we revised the XHITS, transforming it into a generalization of HITS, broadening the model from two categories to various and presenting an efficient machine learning algorithm to calibrate the proposed model using multiple latent categories. The findings we set out here indicate that the new learning approach provides a more precise XHITS model. It is important to note, in closing, that experiments with the ClueWeb09 25TB collection of Web pages, downloaded in 2009, demonstrated that the XHITS is capable of significantly improving Web research efficiency and producing results comparable to those of the TREC 2009/2010 Web Track.
Cheung, Chi-Wai. "Probabilistic rank aggregation for multiple SVM ranking /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20CHEUNG.
Full textVogel, Robin. "Similarity ranking for biometrics : theory and practice." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT031.
Full textThe rapid growth in population, combined with the increased mobility of people has created a need for sophisticated identity management systems.For this purpose, biometrics refers to the identification of individuals using behavioral or biological characteristics. The most popular approaches, i.e. fingerprint, iris or face recognition, are all based on computer vision methods. The adoption of deep convolutional networks, enabled by general purpose computing on graphics processing units, made the recent advances incomputer vision possible. These advances have led to drastic improvements for conventional biometric methods, which boosted their adoption in practical settings, and stirred up public debate about these technologies. In this respect, biometric systems providers face many challenges when learning those networks.In this thesis, we consider those challenges from the angle of statistical learning theory, which leads us to propose or sketch practical solutions. First, we answer to the proliferation of papers on similarity learningfor deep neural networks that optimize objective functions that are disconnected with the natural ranking aim sought out in biometrics. Precisely, we introduce the notion of similarity ranking, by highlighting the relationship between bipartite ranking and the requirements for similarities that are well suited to biometric identification. We then extend the theory of bipartite ranking to this new problem, by adapting it to the specificities of pairwise learning, particularly those regarding its computational cost. Usual objective functions optimize for predictive performance, but recentwork has underlined the necessity to consider other aspects when training a biometric system, such as dataset bias, prediction robustness or notions of fairness. The thesis tackles all of those three examplesby proposing their careful statistical analysis, as well as practical methods that provide the necessary tools to biometric systems manufacturers to address those issues, without jeopardizing the performance of their algorithms
Zacharia, Giorgos 1974. "Regularized algorithms for ranking, and manifold learning for related tasks." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/47753.
Full textIncludes bibliographical references (leaves 119-127).
This thesis describes an investigation of regularized algorithms for ranking problems for user preferences and information retrieval problems. We utilize regularized manifold algorithms to appropriately incorporate data from related tasks. This investigation was inspired by personalization challenges in both user preference and information retrieval ranking problems. We formulate the ranking problem of related tasks as a special case of semi-supervised learning. We examine how to incorporate instances from related tasks, with the appropriate penalty in the loss function to optimize performance on the hold out sets. We present a regularized manifold approach that allows us to learn a distance metric for the different instances directly from the data. This approach allows incorporation of information from related task examples, without prior estimation of cross-task coefficient covariances. We also present applications of ranking problems in two text analysis problems: a) Supervise content-word learning, and b) Company Entity matching for record linkage problems.
by Giorgos Zacharia.
Ph.D.
Guo, Li Li. "Direct Optimization of Ranking Measures for Learning to Rank Models." Wright State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=wright1341520987.
Full textRoss, Jacob W. "Features for Ranking Tweets Based on Credibility and Newsworthiness." Wright State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=wright1431037057.
Full textLyubchyk, Leonid, Oleksy Galuza, and Galina Grinberg. "Ranking Model Real-Time Adaptation via Preference Learning Based on Dynamic Clustering." Thesis, ННК "IПСА" НТУУ "КПI iм. Iгоря Сiкорського", 2017. http://repository.kpi.kharkov.ua/handle/KhPI-Press/36819.
Full textMatsubara, Edson Takashi. "Relações entre ranking, análise ROC e calibração em aprendizado de máquina." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04032009-114050/.
Full textSupervised learning has been used mostly for classification. In this work we show the benefits of a welcome shift in attention from classification to ranking. A ranker is an algorithm that sorts a set of instances from highest to lowest expectation that the instance is positive, and a ranking is the outcome of this sorting. Usually a ranking is obtained by sorting scores given by classifiers. In this work, we are concerned about novel approaches to promote the use of ranking. Therefore, we present the differences and relations between ranking and classification followed by a proposal of a novel ranking algorithm called LEXRANK, whose rankings are derived not from scores, but from a simple ranking of attribute values obtained from the training data. One very important field which uses rankings as its main input is ROC analysis. The study of decision trees and ROC analysis suggested an interesting way to visualize the tree construction in ROC graphs, which has been implemented in a system called PROGROC. Focusing on ROC analysis, we observed that the slope of segments obtained from the ROC convex hull is equivalent to the likelihood ratio, which can be converted into probabilities. Interestingly, this ROC convex hull calibration method is equivalent to Pool Adjacent Violators (PAV). Furthermore, the ROC convex hull calibration method optimizes Brier Score, and the exploration of this measure leads us to find an interesting connection between the Brier Score and ROC Curves. Finally, we also investigate rankings build in the selection method which increments the labelled set of CO-TRAINING, a semi-supervised multi-view learning algorithm
Ibstedt, Julia, Elsa Rådahl, Erik Turesson, and Voorde Magdalena vande. "Application and Further Development of TrueSkill™ Ranking in Sports." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-384863.
Full textAtaman, Kaan. "Learning to rank by maximizing the AUC with linear programming for problems with binary output." Diss., University of Iowa, 2007. http://ir.uiowa.edu/etd/151.
Full textPuthiya, Parambath Shameem Ahamed. "New methods for multi-objective learning." Thesis, Compiègne, 2016. http://www.theses.fr/2016COMP2322/document.
Full textMulti-objective problems arise in many real world scenarios where one has to find an optimal solution considering the trade-off between different competing objectives. Typical examples of multi-objective problems arise in classification, information retrieval, dictionary learning, online learning etc. In this thesis, we study and propose algorithms for multi-objective machine learning problems. We give many interesting examples of multi-objective learning problems which are actively persuaded by the research community to motivate our work. Majority of the state of the art algorithms proposed for multi-objective learning comes under what is called “scalarization method”, an efficient algorithm for solving multi-objective optimization problems. Having motivated our work, we study two multi-objective learning tasks in detail. In the first task, we study the problem of finding the optimal classifier for multivariate performance measures. The problem is studied very actively and recent papers have proposed many algorithms in different classification settings. We study the problem as finding an optimal trade-off between different classification errors, and propose an algorithm based on cost-sensitive classification. In the second task, we study the problem of diverse ranking in information retrieval tasks, in particular recommender systems. We propose an algorithm for diverse ranking making use of the domain specific information, and formulating the problem as a submodular maximization problem for coverage maximization in a weighted similarity graph. Finally, we conclude that scalarization based algorithms works well for multi-objective learning problems. But when considering algorithms for multi-objective learning problems, scalarization need not be the “to go” approach. It is very important to consider the domain specific information and objective functions. We end this thesis by proposing some of the immediate future work, which are currently being experimented, and some of the short term future work which we plan to carry out
Stojkovic, Ivan. "Functional Norm Regularization for Margin-Based Ranking on Temporal Data." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/522550.
Full textPh.D.
Quantifying the properties of interest is an important problem in many domains, e.g., assessing the condition of a patient, estimating the risk of an investment or relevance of the search result. However, the properties of interest are often latent and hard to assess directly, making it difficult to obtain classification or regression labels, which are needed to learn a predictive models from observable features. In such cases, it is typically much easier to obtain relative comparison of two instances, i.e. to assess which one is more intense (with respect to the property of interest). One framework able to learn from such kind of supervised information is ranking SVM, and it will make a basis of our approach. Applications in bio-medical datasets typically have specific additional challenges. First, and the major one, is the limited amount of data examples, due to an expensive measuring technology, and/or infrequency of conditions of interest. Such limited number of examples makes both identification of patterns/models and their validation less useful and reliable. Repeated samples from the same subject are collected on multiple occasions over time, which breaks IID sample assumption and introduces dependency structure that needs to be taken into account more appropriately. Also, feature vectors are highdimensional, and typically of much higher cardinality than the number of samples, making models less useful and their learning less efficient. Hypothesis of this dissertation is that use of the functional norm regularization can help alleviating mentioned challenges, by improving generalization abilities and/or learning efficiency of predictive models, in this case specifically of the approaches based on the ranking SVM framework. The temporal nature of data was addressed with loss that fosters temporal smoothness of functional mapping, thus accounting for assumption that temporally proximate samples are more correlated. Large number of feature variables was handled using the sparsity inducing L1 norm, such that most of the features have zero effect in learned functional mapping. Proposed sparse (temporal) ranking objective is convex but non-differentiable, therefore smooth dual form is derived, taking the form of quadratic function with box constraints, which allows efficient optimization. For the case where there are multiple similar tasks, joint learning approach based on matrix norm regularization, using trace norm L* and sparse row L21 norm was also proposed. Alternate minimization with proximal optimization algorithm was developed to solve the mentioned multi-task objective. Generalization potentials of the proposed high-dimensional and multi-task ranking formulations were assessed in series of evaluations on synthetically generated and real datasets. The high-dimensional approach was applied to disease severity score learning from gene expression data in human influenza cases, and compared against several alternative approaches. Application resulted in scoring function with improved predictive performance, as measured by fraction of correctly ordered testing pairs, and a set of selected features of high robustness, according to three similarity measures. The multi-task approach was applied to three human viral infection problems, and for learning the exam scores in Math and English. Proposed formulation with mixed matrix norm was overall more accurate than formulations with single norm regularization.
Temple University--Theses
Alshehri, Adel. "A Machine Learning Approach to Predicting Community Engagement on Social Media During Disasters." Scholar Commons, 2019. https://scholarcommons.usf.edu/etd/7728.
Full textSalomon, Sophie. "Bias Mitigation Techniques and a Cost-Aware Framework for Boosted Ranking Algorithms." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1586450345426827.
Full textReed, Jeremy T. "Acoustic segment modeling and preference ranking for music information retrieval." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37189.
Full textKim, Jinhan. "J-model : an open and social ensemble learning architecture for classification." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/7672.
Full textLipscombe, Trevor, and n/a. "Different teachers for different students? : The relationship between learning style, other student variables and students' ranking of teacher characteristics." University of Canberra. Education, 1989. http://erl.canberra.edu.au./public/adt-AUC20060817.141319.
Full textDokania, Puneet Kumar. "High-Order Inference, Ranking, and Regularization Path for Structured SVM." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC044/document.
Full textThis thesis develops novel methods to enable the use of structured prediction in computer vision and medical imaging. Specifically, our contributions are four fold. First, we propose a new family of high-order potentials that encourage parsimony in the labeling, and enable its use by designing an accurate graph cuts based algorithm to minimize the corresponding energy function. Second, we show how the average precision SVM formulation can be extended to incorporate high-order information for ranking. Third, we propose a novel regularization path algorithm for structured SVM. Fourth, we show how the weakly supervised framework of latent SVM can be employed to learn the parameters for the challenging deformable registration problem.In more detail, the first part of the thesis investigates the high-order inference problem. Specifically, we present a novel family of discrete energy minimization problems, which we call parsimonious labeling. It is a natural generalization of the well known metric labeling problems for high-order potentials. In addition to this, we propose a generalization of the Pn-Potts model, which we call Hierarchical Pn-Potts model. In the end, we propose parallelizable move making algorithms with very strong multiplicative bounds for the optimization of the hierarchical Pn-Potts model and the parsimonious labeling.Second part of the thesis investigates the ranking problem while using high-order information. Specifically, we introduce two alternate frameworks to incorporate high-order information for the ranking tasks. The first framework, which we call high-order binary SVM (HOB-SVM), optimizes a convex upperbound on weighted 0-1 loss while incorporating high-order information using joint feature map. The rank list for the HOB-SVM is obtained by sorting samples using max-marginals based scores. The second framework, which we call high-order AP-SVM (HOAP-SVM), takes its inspiration from AP-SVM and HOB-SVM (our first framework). Similar to AP-SVM, it optimizes upper bound on average precision. However, unlike AP-SVM and similar to HOB-SVM, it can also encode high-order information. The main disadvantage of HOAP-SVM is that estimating its parameters requires solving a difference-of-convex program. We show how a local optimum of the HOAP-SVM learning problem can be computed efficiently by the concave-convex procedure. Using standard datasets, we empirically demonstrate that HOAP-SVM outperforms the baselines by effectively utilizing high-order information while optimizing the correct loss function.In the third part of the thesis, we propose a new algorithm SSVM-RP to obtain epsilon-optimal regularization path of structured SVM. We also propose intuitive variants of the Block-Coordinate Frank-Wolfe algorithm (BCFW) for the faster optimization of the SSVM-RP algorithm. In addition to this, we propose a principled approach to optimize the SSVM with additional box constraints using BCFW and its variants. In the end, we propose regularization path algorithm for SSVM with additional positivity/negativity constraints.In the fourth and the last part of the thesis (Appendix), we propose a novel weakly supervised discriminative algorithm for learning context specific registration metrics as a linear combination of conventional metrics. Conventional metrics can cope partially - depending on the clinical context - with tissue anatomical properties. In this work we seek to determine anatomy/tissue specific metrics as a context-specific aggregation/linear combination of known metrics. We propose a weakly supervised learning algorithm for estimating these parameters conditionally to the data semantic classes, using a weak training dataset. We show the efficacy of our approach on three highly challenging datasets in the field of medical imaging, which vary in terms of anatomical structures and image modalities
Jiao, Jian. "A framework for finding and summarizing product defects, and ranking helpful threads from online customer forums through machine learning." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/23159.
Full textPh. D.
Safran, Mejdl Sultan. "EFFICIENT LEARNING-BASED RECOMMENDATION ALGORITHMS FOR TOP-N TASKS AND TOP-N WORKERS IN LARGE-SCALE CROWDSOURCING SYSTEMS." OpenSIUC, 2018. https://opensiuc.lib.siu.edu/dissertations/1511.
Full textHarrington, Edward, and edwardharrington@homemail com au. "Aspects of Online Learning." The Australian National University. Research School of Information Sciences and Engineering, 2004. http://thesis.anu.edu.au./public/adt-ANU20060328.160810.
Full textParis, Bruno Mendonça. "Learning to rank: combinação de algoritmos aplicando stacking e análise dos resultados." Universidade Presbiteriana Mackenzie, 2017. http://tede.mackenzie.br/jspui/handle/tede/3494.
Full textApproved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2018-04-04T11:43:59Z (GMT) No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2018-04-04T11:43:59Z (GMT). No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-11-07
With the growth of the amount of information available in recent years, which will continue to grow due to the increase in users, devices and information shared over the internet, accessing the desired information should be done in a quick way so it is not spent too much time looking for what you want. A search in engines like Google, Yahoo, Bing is expected that the rst results bring the desired information. An area that aims to bring relevant documents to the user is known as Information Retrieval and can be aided by Learning to Rank algorithms, which applies machine learning to try to bring important documents to users in the best possible ordering. This work aims to verify a way to get an even better ordering of documents, using a technique of combining algorithms known as Stacking. To do so, it will used the RankLib tool, part of Lemur Project, developed in the Java language that contains several Learning to Rank algorithms, and the datasets from a base maintained by Microsoft Research Group known as LETOR.
Com o crescimento da quantidade de informação disponível nos últimos anos, a qual irá continuar crescendo devido ao aumento de usuários, dispositivos e informações compartilhadas pela internet, acessar a informação desejada deve ser feita de uma maneira rápida afim de não se gastar muito tempo procurando o que se deseja. Uma busca em buscadores como Google, Yahoo, Bing espera-se que os primeiros resultados tragam a informação desejada. Uma área que tem o objetivo de trazer os documentos relevantes para o usuário é conhecida por Recuperação de Informação e pode ser auxiliada por algoritmos Learning to Rank, que aplica aprendizagem de máquina para tentar trazer os documentos importantes aos usuários na melhor ordenação possível. Esse trabalho visa verificar uma maneira de obter uma ordenação ainda melhor de documentos, empregando uma técnica de combinar algoritmos conhecida por Stacking. Para isso será utilizada a ferramenta RankLib, parte de um projeto conhecido por Lemur, desenvolvida na linguagem Java, que contém diversos algoritmos Learning to Rank, e o conjuntos de dados provenientes de uma base mantida pela Microsoft Research Group conhecida por LETOR.
Zhang, Ganqin. "Bipartite RankBoost+: An Improvement to Bipartite RankBoost." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case160767885657324.
Full textArvidsson, Simon, and Marcus Gullstrand. "Predicting forest strata from point clouds using geometric deep learning." Thesis, Jönköping University, JTH, Avdelningen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-54155.
Full textMittal, Arpit. "Human layout estimation using structured output learning." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:bb290cfd-5216-42d7-b3d2-c2b4b01614bc.
Full textLin, Xiao. "Leveraging Multimodal Perspectives to Learn Common Sense for Vision and Language Tasks." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/79521.
Full textPh. D.
Bjöörn, Anton. "Employing a Transformer Language Model for Information Retrieval and Document Classification : Using OpenAI's generative pre-trained transformer, GPT-2." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281766.
Full textInformationsflödet på Internet fortsätter att öka vilket gör det allt lättare att missa viktiga nyheter som inte intresserar en stor mängd människor. För att bekämpa detta problem behövs allt mer sofistikerade informationssökningsmetoder. Förtränade transformermodeller har sedan ett par år tillbaka tagit över som de mest framstående neurala nätverken för att hantera text. Det här arbetet undersöker hur väl en sådan språkmodell, Open AIs General Pre-trained Transformer 2 (GPT-2), kan generalisera från att generera text till att användas för informationssökning och klassificering av texter. För att utvärdera detta jämförs en transformerbaserad modell med en mer traditionell Term Frequency- Inverse Document Frequency (TF-IDF) vektoriseringsmodell. Målet är att klargöra hur användbara förtränade transformermodeller faktiskt är i skapandet av specialiserade informationssökningssystem. Den minsta versionen av språkmodellen GPT-2 anpassas och tränas om till att ranka och klassificera nyhetsartiklar, skrivna på engelska, och uppnår liknande prestanda som den TF-IDF baserade modellen. Den GPT-2 baserade modellen hade i genomsnitt 0.74 procentenheter högre Normalized Discounted Cumulative Gain (NDCG) men provstorleken var ej stor nog för att ge dessa resultat hög statistisk säkerhet.
Chen, Xi. "Learning with Sparcity: Structures, Optimization and Applications." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/228.
Full textBlumm, Nicolas C. "On the Purpose & Ethics of Elite Higher Education." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/cmc_theses/1713.
Full textLee, Joonseok. "Local approaches for collaborative filtering." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53846.
Full textBen, Qingyan. "Flight Sorting Algorithm Based on Users’ Behaviour." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294132.
Full textModellen förutsäger den bästa flygordern och rekommenderar bästa flyg till användarna. Avhandlingen kan delas in i följande tre delar: Funktionsval, databehandling och olika algoritms experiment. För funktionsval, förutom den ursprungliga informationen om själva flygningen, lägger vi till användarens urvalsstatus i vår modell, vilken flygklassen är , tillsammans med barn eller inte. Datarengöring används för att hantera dubbletter och ofullständiga data. Därefter tar en normaliserings metod bort bruset i data. Efter olika balanserings behandlingar är SMOTE-metoden mest lämplig för att korrigera klassobalans flyg data. Baserat på våra befintliga data väljer jag klassificerings modell och sekventiell ranknings algoritm. Använd pris, direktflyg eller inte, restid etc. som funktioner, och klicka eller inte som etikett. Klassificerings algoritmerna som jag använde inkluderar Logistic Regression, Gradient Boost, KNN, Decision Tree, Random Forest, Gaussian Process Classifier, Gaussian NB Bayesian and Quadratic Discriminant Analysis. Dessutom antog vi också Sequential ranking algoritm. Resultaten visar att Random Forest-SMOTE presterar bäst med AUC för ROC = 0.94, noggrannhet = 0.8998.
Zeng, Kaiman. "Next Generation of Product Search and Discovery." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2312.
Full textAlaya, Mili Nourhene. "Managing the empirical hardness of the ontology reasoning using the predictive modelling." Thesis, Paris 8, 2016. http://www.theses.fr/2016PA080062/document.
Full textHighly optimized reasoning algorithms have been developed to allow inference tasks on expressive ontology languages such as OWL (DL). Nevertheless, reasoning remains a challenge in practice. In overall, a reasoner could be optimized for some, but not all ontologies. Given these observations, the main purpose of this thesis is to investigate means to cope with the reasoner performances variability phenomena. We opted for the supervised learning as the kernel theory to guide the design of our solution. Our main claim is that the output quality of a reasoner is closely depending on the quality of the ontology. Accordingly, we first introduced a novel collection of features which characterise the design quality of an OWL ontology. Afterwards, we modelled a generic learning framework to help predicting the overall empirical hardness of an ontology; and to anticipate a reasoner robustness under some online usage constraints. Later on, we discussed the issue of reasoner automatic selection for ontology based applications. We introduced a novel reasoner ranking framework. Correctness and efficiency are our main ranking criteria. We proposed two distinct methods: i) ranking based on single label prediction, and ii) a multi-label ranking method. Finally, we suggested to extract the ontology sub-parts that are the most computationally demanding ones. Our method relies on the atomic decomposition and the locality modules extraction techniques and employs our predictive model of the ontology hardness. Excessive experimentations were carried out to prove the worthiness of our approaches. All of our proposals were gathered in a user assistance system called "ADSOR"
Clémençon, Stéphan. "Résumé des Travaux en Statistique et Applications des Statistiques." Habilitation à diriger des recherches, Université de Nanterre - Paris X, 2006. http://tel.archives-ouvertes.fr/tel-00138299.
Full textPeel, Thomas. "Algorithmes de poursuite stochastiques et inégalités de concentration empiriques pour l'apprentissage statistique." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4769/document.
Full textThe first part of this thesis introduces new algorithms for the sparse encoding of signals. Based on Matching Pursuit (MP) they focus on the following problem : how to reduce the computation time of the selection step of MP. As an answer, we sub-sample the dictionary in line and column at each iteration. We show that this theoretically grounded approach has good empirical performances. We then propose a bloc coordinate gradient descent algorithm for feature selection problems in the multiclass classification setting. Thanks to the use of error-correcting output codes, this task can be seen as a simultaneous sparse encoding of signals problem. The second part exposes new empirical Bernstein inequalities. Firstly, they concern the theory of the U-Statistics and are applied in order to design generalization bounds for ranking algorithms. These bounds take advantage of a variance estimator and we propose an efficient algorithm to compute it. Then, we present an empirical version of the Bernstein type inequality for martingales by Freedman [1975]. Again, the strength of our result lies in the variance estimator computable from the data. This allows us to propose generalization bounds for online learning algorithms which improve the state of the art and pave the way to a new family of learning algorithms taking advantage of this empirical information
Ouni, Zaïd. "Statistique pour l’anticipation des niveaux de sécurité secondaire des générations de véhicules." Thesis, Paris 10, 2016. http://www.theses.fr/2016PA100099/document.
Full textRoad safety is a world, European and French priority. Because light vehicles (or simply“vehicles”) are obviously one of the main actors of road activity, the improvement of roadsafety necessarily requires analyzing their characteristics in terms of traffic road accident(or simply “accident”). If the new vehicles are developed in engineering department and validated in laboratory, it is the reality of real-life accidents that ultimately characterizesthem in terms of secondary safety, ie, that demonstrates which level of security they offer to their occupants in case of an accident. This is why car makers want to rank generations of vehicles according to their real-life levels of safety. We address this problem by exploiting a French data set of accidents called BAAC (Bulletin d’Analyse d’Accident Corporel de la Circulation). In addition, fleet data are used to associate a generational class (GC) to each vehicle. We elaborate two methods of ranking of GCs in terms of secondary safety. The first one yields contextual rankings, ie, rankings of GCs in specified contexts of accident. The second one yields global rankings, ie, rankings of GCs determined relative to a distribution of contexts of accident. For the contextual ranking, we proceed by “scoring”: we look for a score function that associates a real number to any combination of GC and a context of accident; the smaller is this number, the safer is the GC in the given context. The optimal score function is estimated by “ensemble learning”, under the form of an optimal convex combination of scoring functions produced by a library of ranking algorithms by scoring. An oracle inequality illustrates the performance of the obtained meta-algorithm. The global ranking is also based on “scoring”: we look for a scoring function that associates any GC with a real number; the smaller is this number, the safer is the GC. Causal arguments are used to adapt the above meta-algorithm by averaging out the context. The results of the two ranking procedures are in line with the experts’ expectations
Лозова, К. А. "Оцінювання якості професорсько-викладацького складу ВНЗ у дистанційній освіті." Thesis, Сумський державний університет, 2015. http://essuir.sumdu.edu.ua/handle/123456789/39684.
Full textMehta, Khushang Samir. "Using Machine Learning for Incremental Aggregation of Collaborative Rankings." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613745957050039.
Full textDepecker, Marine. "Méthodes d'apprentissage statistique pour le scoring." Phd thesis, Télécom ParisTech, 2010. http://pastel.archives-ouvertes.fr/pastel-00572421.
Full textGuan, Wei. "New support vector machine formulations and algorithms with application to biomedical data analysis." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41126.
Full textLiu, Xialei. "Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/670154.
Full textLas redes neuronales convolucionales profundas (CNNS) han alcanzado resultados muy positivos en diferentes aplicaciones de reconocimiento visual, tales como clasificación, detección o segmentación de imágenes. En esta tesis, abordamos dos limitaciones de las CNNs. La primera, entrenar CNNs profundas requiere grandes cantidades de datos etiquetados, los cuales sonmuy costosos y arduos de conseguir. La segunda es que entrenar en sistemas de aprendizaje continuo es un problema abierto para la investigación. El olvido catastrófico en redes es muy común cuando se adapta un modelo entrenado a nuevos entornos o nuevas tareas. Por lo tanto, en esta tesis, tenemos como objetivo mejorar las CNNs para aplicaciones con datos limitados y adaptarlas de forma continua a nuevas tareas. El aprendizaje auto-supervisado compensa la falta de datos etiquetados con la introducción de tareas auxiliares en las cuales los datos están fácilmente disponibles. En la primera parte de la tesis, mostramos cómo los ránquings se pueden utilizar de forma parecida a una tarea auto-supervisada para los problemas de regresión. Después, proponemos una técnica de propagación hacia atrás eficiente para redes siamesas que previene el computo redundante introducido por las arquitecturas de red multi-rama. Además, demostramos quemedir la incertidumbre de las redes en las tareas parecidas a las auto-supervisadas, es una buena medida de la cantidad de información que contienen los datos no etiquetados. Dicha medida puede ser entonces usada para la ejecución de algoritmos de aprendizaje activo. Estosmarcos que proponemos los aplicamos entonces a dos problemas de regresión: Evaluación de Calidad de Imagen (IQA) y el contador de personas. En los dos casos, mostramos cómo generar de forma automática grupos de imágenes ranqueadas para los datos no etiquetados. Nuestros resultados muestran que las redes entrenadas para la regresión de las anotaciones de los datos etiquetados, a la vez que para aprender a ordenar los ránquings de los datos no etiquetados, obtienen resultados significativamente mejores al estado del arte. También demostramos que el aprendizaje activo utilizando ránquings puede reducir la cantidad de etiquetado en un 50% para ambas tareas de IQA y contador de personas. En la segunda parte de la tesis, proponemos dos métodos para evitar el olvido catastrófico en escenarios de aprendizaje secuencial de tareas. El primer método deriva del de Consolidación Elástica de Pesos, el cuál utiliza la diagonal de laMatriz de Información de Fisher (FIM) para medir la importancia de los pesos de la red. No obstante, la aproximación asumida no es realista. Por lo tanto, diagonalizamos la aproximación de la FIM utilizando un grupo de parámetros de rotación factorizada proporcionando una mejora significativa en el rendimiento de tareas secuenciales para el caso del aprendizaje continuo. Para el segundo método, demostramos que el olvido se manifiesta de forma diferente en cada capa de la red y proponemos un método híbrido donde la destilación se utiliza para el extractor de características y la rememoración en el clasificador mediante generación de características. Nuestro método soluciona la limitación de la rememoración mediante generación de imágenes y la destilación de probabilidades (como la utilizada en elmétodo Aprendizaje Sin Olvido), y puede añadir de forma natural nuevas tareas en un único clasificador bien calibrado. Los experimentos confirman que el método propuesto sobrepasa las métricas de referencia y parte del estado del arte.
Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trainedmodels to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks. Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50% for both IQA and crowd counting. In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from ElasticWeight Consolidation, which uses a diagonal Fisher InformationMatrix (FIM) tomeasure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirmthat our proposed approach outperforms the baselines and some start-of-the-art methods.
Prati, Ronaldo Cristiano. ""Novas abordagens em aprendizado de máquina para a geração de regras, classes desbalanceadas e ordenação de casos"." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-01092006-155445/.
Full textMachine learning algorithms are often the most appropriate algorithms for a great variety of data mining applications. However, most machine learning research to date has mainly dealt with the well-circumscribed problem of finding a model (generally a classifier) given a single, small and relatively clean dataset in the attribute-value form, where the attributes have previously been chosen to facilitate learning. Furthermore, the end-goal is simple and well-defined, such as accurate classifiers in the classification problem. Data mining opens up new directions for machine learning research, and lends new urgency to others. With data mining, machine learning is now removing each one of these constraints. Therefore, machine learning's many valuable contributions to data mining are reciprocated by the latter's invigorating effect on it. In this thesis, we explore this interaction by proposing new solutions to some problems due to the application of machine learning algorithms to data mining applications. More specifically, we contribute to the following problems. New approaches to rule learning. In this category, we propose two new methods for rule learning. In the first one, we propose a new method for finding exceptions to general rules. The second one is a rule selection algorithm based on the ROC graph. Rules come from an external larger set of rules and the algorithm performs a selection step based on the current convex hull in the ROC graph. Proportion of examples among classes. We investigated several aspects related to this issue. Firstly, we carried out a series of experiments on artificial data sets in order to verify our hypothesis that overlapping among classes is a complicating factor in highly skewed data sets. We also carried out a broadly experimental analysis with several methods (some of them proposed by us) that artificially balance skewed datasets. Our experiments show that, in general, over-sampling methods perform better than under-sampling methods. Finally, we investigated the relationship between class imbalance and small disjuncts, as well as the influence of the proportion of examples among classes in the process of labelling unlabelled cases in the semi-supervised learning algorithm Co-training. New method for combining rankings. We propose a new method called BordaRanking to construct ensembles of rankings based on borda count voting, which could be applied whenever only the rankings are available. Results show an improvement upon the base-rankings constructed by taking into account the ordering given by classifiers which output continuous-valued scores, as well as a comparable performance with the fusion of such scores.
Efes, Stergios. "Zero-shot, One Kill: BERT for Neural Information Retrieval." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444835.
Full textGentile, Teresa Anna Rita, Nito Ernesto De, and Walter Vesperi. "A Survey on Knowledge Management in Universities in the QS Rankings: E-learning and MOOCs." TUDpress, 2016. https://tud.qucosa.de/id/qucosa%3A33980.
Full textCoimbra, Andre Rodrigues. "Método automático para descoberta de funções de ordenação utilizando programação genética paralela em GPU." Universidade Federal de Goiás, 2014. http://repositorio.bc.ufg.br/tede/handle/tede/4525.
Full textApproved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-05-15T13:37:45Z (GMT) No. of bitstreams: 2 Dissertação - André Rodrigues Coimbra - 2014.pdf: 5214859 bytes, checksum: d951502129d7be5d60b6a785516c3ad1 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Made available in DSpace on 2015-05-15T13:37:45Z (GMT). No. of bitstreams: 2 Dissertação - André Rodrigues Coimbra - 2014.pdf: 5214859 bytes, checksum: d951502129d7be5d60b6a785516c3ad1 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-03-28
Ranking functions have a vital role in the performance of information retrieval systems ensuring that documents more related to the user’s search need – represented as a query – are shown in the top results, preventing the user from having to examine a range of documents that are not really relevant. Therefore, this work uses Genetic Programming (GP), an Evolutionary Computation technique, to find ranking functions automaticaly and systematicaly. Moreover, in this project the technique of GP was developed following a strategy that exploits parallelism through graphics processing units. Other known methods in the context of information retrieval as classification committees and the Lazy strategy were combined with the proposed approach – called Finch. These combinations were only feasible due to the GP nature and the use of parallelism. The experimental results with the Finch, regarding the ranking functions quality, surpassed the results of several strategies known in the literature. Considering the time performance, significant gains were also achieved. The solution developed exploiting the parallelism spends around twenty times less time than the solution using only the central processing unit.
Funções de ordenação têm um papel vital no desempenho de sistemas de recuperação de informação garantindo que os documentos mais relacionados com o desejo do usuário – representado através de uma consulta – sejam trazidos no topo dos resultados, evitando que o usuário tenha que analisar uma série de documentos que não sejam realmente relevantes. Assim, utiliza-se a Programação Genética (PG), uma técnica da Computação Evolucionária, para descobrir de forma automática e sistemática funções de ordenação. Além disso, neste trabalho a técnica de PG foi desenvolvida seguindo uma estratégia que explora o paralelismo através de unidades gráficas de processamento. Foram agregados ainda na abordagem proposta – denominada Finch – outros métodos conhecidos no contexto de recuperação de informação como os comitês de classificação e a estratégia Lazy. Sendo que essa complementação só foi viável devido a natureza da PG e em virtude da utilização do paralelismo. Os resultados experimentais encontrados com a Finch, em relação à qualidade das funções de ordenação descobertas, superaram os resultados de diversas estratégias conhecidas na literatura. Considerando o desempenho da abordagem em função do tempo, também foram alcançados ganhos significativos. A solução desenvolvida explorando o paralelismo gasta, em média, vinte vezes menos tempo que a solução utilizando somente a unidade central de processamento.