Dissertations / Theses on the topic 'Search for the nearest neighbour'

To see the other types of publications on this topic, follow the link: Search for the nearest neighbour.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Search for the nearest neighbour.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Kibriya, Ashraf Masood. "Fast Algorithms for Nearest Neighbour Search." The University of Waikato, 2007. http://hdl.handle.net/10289/2463.

Full text
Abstract:
The nearest neighbour problem is of practical significance in a number of fields. Often we are interested in finding an object near to a given query object. The problem is old, and a large number of solutions have been proposed for it in the literature. However, it remains the case that even the most popular of the techniques proposed for its solution have not been compared against each other. Also, many techniques, including the old and popular ones, can be implemented in a number of ways, and often the different implementations of a technique have not been thoroughly compared either. This research presents a detailed investigation of different implementations of two popular nearest neighbour search data structures, KDTrees and Metric Trees, and compares the different implementations of each of the two structures against each other. The best implementations of these structures are then compared against each other and against two other techniques, Annulus Method and Cover Trees. Annulus Method is an old technique that was rediscovered during the research for this thesis. Cover Trees are one of the most novel and promising data structures for nearest neighbour search that have been proposed in the literature.
APA, Harvard, Vancouver, ISO, and other styles
2

Shehu, Usman Gulumbe. "Cube technique for Nearest Neighbour(s) search." Thesis, University of Strathclyde, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.248365.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Casselryd, Oskar, and Filip Jansson. "Troll detection with sentiment analysis and nearest neighbour search." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209474.

Full text
Abstract:
Internet trolls are gaining more influence in society due to the rapidgrowth of social media. A troll farm is a group of Internet trolls that get paid to spread certain opinions or information online. Identifying a troll farm can be difficult, since the trolls try to stay hidden. This study examines if it is possible to identify troll farms on Twitter by conducting a sentiment analysis on user tweets and modeling it as a nearest neighbor problem. The experiment was done with 4 simulated trolls and 150 normal twitter users. The users were modelled into datapoints based on the sentiment, frequency and time of their tweets. The result of the nearest neighbor search could not show a clear link between the trolls as their behaviour was not similar enough.
Internet-troll har de senaste åren fått ökat inflytande i och med ökat användande av sociala medier. En trollfarm är en grupp troll som får betalt för att sprida specifika åsikter eller information online. Det kan vara svårt att urskilja användarna i en trollfarm från vanliga användare då de ständigt försöker undvika upptäckt. I denna studie undersöks hurvida man kan finna en trollfarm  på Twitter genom att utföra en sentimentanalys på användares tweets och sedan modelera det som ett nearest neighbor problem. Experimentet utfördes med 4 simulerade troll och 150 vanliga twitteranvändare. Användarna modelerades efter tid, frekvens och sentiment på deras tweets. Resultatet från modeleringen kunde inte påvisa ett samband mellan trollen då deras beteendemönster skiljde sig åt allt för mycket.
APA, Harvard, Vancouver, ISO, and other styles
4

KUMAR, SUSMIT. "NEAREST NEIGHBOR SEARCH IN DISTRIBUTED DATABASES." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1022879916.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ram, Parikshit. "New paradigms for approximate nearest-neighbor search." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49112.

Full text
Abstract:
Nearest-neighbor search is a very natural and universal problem in computer science. Often times, the problem size necessitates approximation. In this thesis, I present new paradigms for nearest-neighbor search (along with new algorithms and theory in these paradigms) that make nearest-neighbor search more usable and accurate. First, I consider a new notion of search error, the rank error, for an approximate neighbor candidate. Rank error corresponds to the number of possible candidates which are better than the approximate neighbor candidate. I motivate this notion of error and present new efficient algorithms that return approximate neighbors with rank error no more than a user specified amount. Then I focus on approximate search in a scenario where the user does not specify the tolerable search error (error constraint); instead the user specifies the amount of time available for search (time constraint). After differentiating between these two scenarios, I present some simple algorithms for time constrained search with provable performance guarantees. I use this theory to motivate a new space-partitioning data structure, the max-margin tree, for improved search performance in the time constrained setting. Finally, I consider the scenario where we do not require our objects to have an explicit fixed-length representation (vector data). This allows us to search with a large class of objects which include images, documents, graphs, strings, time series and natural language. For nearest-neighbor search in this general setting, I present a provably fast novel exact search algorithm. I also discuss the empirical performance of all the presented algorithms on real data.
APA, Harvard, Vancouver, ISO, and other styles
6

Chanzy, Philippe. "Range search and nearest neighbor search in k-d trees." Thesis, McGill University, 1993. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=68164.

Full text
Abstract:
This thesis presents an analysis of the expected complexity of range searching and nearest neighbor searching in random 2-d trees. We show that range searching in a random rectangle $ Delta sb{x} times Delta sb{y}$ can be done in $O lbrack Delta sb{x} Delta sb{y} n+( Delta sb{x}+ Delta sb{y}) n sp alpha +{ rm ln} n rbrack$ expected time. A matching lower bound is also obtained. We also show that nearest neighbor searching in random 2-d trees by any algorithm must take time bounded by $ Omega lbrack n sp{ alpha-1/2}/({ rm ln} n) sp alpha rbrack$ where $ alpha=( sqrt{17}-3)/2$. This disproves a conjecture by Bentley that nearest neighbor search in random 2-d trees can be done in O(1) expected time.
APA, Harvard, Vancouver, ISO, and other styles
7

MESEJO-LEON, DANIEL ALEJANDRO. "APPROXIMATE NEAREST NEIGHBOR SEARCH FOR THE KULLBACK-LEIBLER DIVERGENCE." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2018. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=33305@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Em uma série de aplicações, os pontos de dados podem ser representados como distribuições de probabilidade. Por exemplo, os documentos podem ser representados como modelos de tópicos, as imagens podem ser representadas como histogramas e também a música pode ser representada como uma distribuição de probabilidade. Neste trabalho, abordamos o problema do Vizinho Próximo Aproximado onde os pontos são distribuições de probabilidade e a função de distância é a divergência de Kullback-Leibler (KL). Mostramos como acelerar as estruturas de dados existentes, como a Bregman Ball Tree, em teoria, colocando a divergência KL como um produto interno. No lado prático, investigamos o uso de duas técnicas de indexação muito populares: Índice Invertido e Locality Sensitive Hashing. Os experimentos realizados em 6 conjuntos de dados do mundo real mostraram que o Índice Invertido é melhor do que LSH e Bregman Ball Tree, em termos de consultas por segundo e precisão.
In a number of applications, data points can be represented as probability distributions. For instance, documents can be represented as topic models, images can be represented as histograms and also music can be represented as a probability distribution. In this work, we address the problem of the Approximate Nearest Neighbor where the points are probability distributions and the distance function is the Kullback-Leibler (KL) divergence. We show how to accelerate existing data structures such as the Bregman Ball Tree, by posing the KL divergence as an inner product embedding. On the practical side we investigated the use of two, very popular, indexing techniques: Inverted Index and Locality Sensitive Hashing. Experiments performed on 6 real world data-sets showed the Inverted Index performs better than LSH and Bregman Ball Tree, in terms of queries per second and precision.
APA, Harvard, Vancouver, ISO, and other styles
8

Varricchio, Valerio. "Efficient nearest-neighbor search algorithms for sub-Riemannian geometries." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122500.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references.
The Motion Planning problem has been at the core of a significant amount of research in the past decades and it has recently gained traction outside academia with the rise of commercial interest in self-driving cars and autonomous aerial vehicles. Among the leading algorithms to tackle the problem are sampling-based planners, such as Probabilistic Road Maps (PRMs), Rapidly-exploring Random Trees (RRTs) and a large number of variants thereof. In this thesis, we focus on a crucial building block shared by these algorithms: nearest-neighbor search. While nearest-neighbor search is known as the asymptotically dominant bottleneck of sampling-based planners, popular algorithms to efficiently identify neighbors are limited to robots capable of unconstrained motions, commonly referred to as holonomic.
Nevertheless, this is rarely the case in the vast majority of practical applications, where the dynamical system at hand is often subject to a class of differential constraints called nonholonomic. We tackle the problem with sub-Riemannian geometries, a mathematical tool to study manifolds that can be traversed under local constraints. After drawing the parallel with nonholonomic mechanical systems, we exploit peculiar properties of these geometries and their natural notion of distance to devise specialized, efficient nearest-neighbor search algorithms. Our contributions are two-fold: First, we generalize existing space-partitioning techniques (k-d trees) to sub-Riemannian metrics. This is achieved by introducing i) a criterion - the outer Box Bound - that discards halfspaces consistently with the metric and ii) a space-partitioning technique - the Lie splitting strategy - that organizes the dataset for optimal asymptotic performance.
Second, we propose pruning techniques to further improve the query runtime. This is achieved by reducing the number of distance evaluations required to discern the nearest neighbors and exploiting heuristics that provably approximate a sub-Riemannian metric up to a constant factor, asymptotically.
by Valerio Varricchio.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Aeronautics and Astronautics
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Peiwu, and 张培武. "Voronoi-based nearest neighbor search for multi-dimensional uncertain databases." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B49618179.

Full text
Abstract:
In Voronoi-based nearest neighbor search, the Voronoi cell of every point p in a database can be used to check whether p is the closest to some query point q. We extend the notion of Voronoi cells to support uncertain objects, whose attribute values are inexact. Particularly, we propose the Possible Voronoi cell (or PV-cell). A PV-cell of a multi-dimensional uncertain object o is a region R, such that for any point p ∈ R, o may be the nearest neighbor of p. If the PV-cells of all objects in a database S are known, they can be used to identify objects that have a chance to be the nearest neighbor of q. However, there is no efficient algorithm for computing an exact PV-cell. We hence study how to derive an axis-parallel hyper-rectangle (called the Uncertain Bounding Rectangle, or UBR) that tightly contains a PV-cell. We further develop the PV-index, a structure that stores UBRs, to evaluate probabilistic nearest neighbor queries over uncertain data. An advantage of the PV-index is that upon updates on S, it can be incrementally updated. Extensive experiments on both synthetic and real datasets are carried out to validate the performance of the PV-index.
published_or_final_version
Computer Science
Master
Master of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
10

Andoni, Alexandr. "Nearest neighbor search : the old, the new, and the impossible." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/55090.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 165-178).
Over the last decade, an immense amount of data has become available. From collections of photos, to genetic data, and to network traffic statistics, modern technologies and cheap storage have made it possible to accumulate huge datasets. But how can we effectively use all this data? The ever growing sizes of the datasets make it imperative to design new algorithms capable of sifting through this data with extreme efficiency. A fundamental computational primitive for dealing with massive dataset is the Nearest Neighbor (NN) problem. In the NN problem, the goal is to preprocess a set of objects, so that later, given a query object, one can find efficiently the data object most similar to the query. This problem has a broad set of applications in data processing and analysis. For instance, it forms the basis of a widely used classification method in machine learning: to give a label for a new object, find the most similar labeled object and copy its label. Other applications include information retrieval, searching image databases, finding duplicate files and web pages, vector quantization, and many others. To represent the objects and the similarity measures, one often uses geometric notions. For example, a black-and-white image may be modeled by a high-dimensional vector, with one coordinate per pixel, whereas the similarity measure may be the standard Euclidean distance between the resulting vectors. Many other, more elaborate ways of representing objects by high-dimensional feature vectors have been studied. In this thesis, we study the NN problem, as well as other related problems that occur frequently when dealing with the massive datasets.
(cont.) Our contribution is two-fold: we significantly improve the algorithms within the classical approaches to NN, as well as propose new approaches where the classical ones fail. We focus on several key distances and similarity measures, including the Euclidean distance, string edit distance and the Earth-Mover Distance (a popular method for comparing images). We also give a number of impossibility results, pointing out the limits of the NN algorithms. The high-level structure of our thesis is summarized as follows. New algorithms via the classical approaches. We give a new algorithm for the approximate NN problem in the d-dimensional Euclidean space. For an approximation factor c > 1, our algorithm achieves dnP query time and dnl+P space for p = 1/c 2+o(1). This greatly improves on the previous algorithms that achieved p that was only slightly smaller than 1/c. The same technique also yields an algorithm with dno(p) query time and space near-linear in n. Furthermore, our algorithm is near-optimal in the class of "hashing" algorithms. Failure of the classical approaches for some hard distances. We give an evidence that the classical approaches to NN under certain hard distances, such as the string edit distance, meet a concrete barrier at a nearly logarithmic approximation. Specifically, we show that for all classical approaches to NN under the edit distance, involving embeddings into a general class of spaces (such as l1, powers of l2, etc), the resulting approximation has to be at least near-logarithmic in the strings' length. A new approach to NN under hard distances.
(cont.) Motivated by the above impossibility results, we develop a new approach to the NN problem, where the classical approaches fail. Using this approach, we give a new efficient NN algorithm for a variant of the edit distance, the Ulam distance, which achieves a double-logarithmic approximation. This is an exponential improvement over the lower bound on the approximation achievable via the previous classical approaches to this problem. Data structure lower bounds. To complement our algorithms, we prove lower bounds on NN data structures for the Euclidean distance and for the mysterious but important case of the ... distance. In both cases, our lower bounds are the first ones to hold in the same computational model as the respective upper bounds. Furthermore, for both problems, our lower bounds are optimal in the considered models. External applications. Although our main focus is on the NN problem, our techniques naturally extend to related problems. We give such applications for each of our algorithmic tools. For example, we give an algorithm for computing the edit distance between two strings of length d in near-linear time. Our algorithm achieves approximation 20 ..., improving over the previous bound of ... . We note that this problem has a classical exact algorithm based on dynamic programming, running in quadratic time.
by Alexandr Andoni.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
11

Kuhlman, Caitlin Anne. "Pivot-based Data Partitioning for Distributed k Nearest Neighbor Mining." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-theses/1212.

Full text
Abstract:
This thesis addresses the need for a scalable distributed solution for k-nearest-neighbor (kNN) search, a fundamental data mining task. This unsupervised method poses particular challenges on shared-nothing distributed architectures, where global information about the dataset is not available to individual machines. The distance to search for neighbors is not known a priori, and therefore a dynamic data partitioning strategy is required to guarantee that exact kNN can be found autonomously on each machine. Pivot-based partitioning has been shown to facilitate bounding of partitions, however state-of-the-art methods suffer from prohibitive data duplication (upwards of 20x the size of the dataset). In this work an innovative method for solving exact distributed kNN search called PkNN is presented. The key idea is to perform computation over several rounds, leveraging pivot-based data partitioning at each stage. Aggressive data-driven bounds limit communication costs, and a number of optimizations are designed for efficient computation. Experimental study on large real-world data (over 1 billion points) compares PkNN to the state-of-the-art distributed solution, demonstrating that the benefits of additional stages of computation in the PkNN method heavily outweigh the added I/O overhead. PkNN achieves a data duplication rate close to 1, significant speedup over previous solutions, and scales effectively in data cardinality and dimension. PkNN can facilitate distributed solutions to other unsupervised learning methods which rely on kNN search as a critical building block. As one example, a distributed framework for the Local Outlier Factor (LOF) algorithm is given. Testing on large real-world and synthetic data with varying characteristics measures the scalability of PkNN and the distributed LOF framework in data size and dimensionality.
APA, Harvard, Vancouver, ISO, and other styles
12

Bergholm, Marcus, and Viktor Kronvall. "Disc : Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186502.

Full text
Abstract:
Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high-performance computing for general-purpose applications. The K-Nearest Neighbors problem is widely used in applications ranging from classification to gathering of photons in the Photon Mapping algorithm. Using the euclidean distance measure when gathering photons can cause false bleeding of colors between surfaces. Ellipsoidical search boundaries for photon gathering are shown to reduce artifacts due to this false bleeding. Shifted Sorting has been found to yield high performance on GPUs while simultaneously retaining a high approximation rate. This study presents an algorithm for approximatively solving the K-Nearest Neighbors problem modified to use a distance measure creating an ellipsoidical search boundary. The ellipsoidical search boundary is used to alleviate the issue of false bleeding of colors between surfaces in Photon Mapping. The Approximative K-Nearest Neighbors algorithm presented is a modification of the Shifted Sorting algorithm. The algorithm is found to be highly parallelizable and performs to a factor of 86% queries processed per millisecond compared to a reference implementation using spherical search boundaries implied by the euclidean distance. The rate of compression from spherical to ellipsoidical search boundary is appropriately chosen in the range 3.0 to 7.0. The algorithm is found to scale well in respect to increases in both number of data points and number of query points.
Grafikprocessorer (GPU-er) har på senare tid möjliggjort högprestandaberäkningar till låga kostnader för generella applikationer. K-Nearest Neighbors problemet har vida applikationsområden, från klassifikation inom maskininlärning till insamlande av fotoner i Photon Mapping för rendering av tredimensionella scener. Användning av euklidiska avstånd vid insamling av fotoner kan leda till en felaktig bladning av färger mellan ytor. Ellipsoidiska sökområden vid fotoninsamling har visats reducera artefakter oraskade av denna typ av felaktiga färgutblandning. Shifted Sorting har visats ge hög prestanda på GPU-er utan att förlora kvalitet av approximationsgrad. Denna rapport undersöker hur den approximativa varianten av K-Nearest Neighborsalgoritmen med Shifted Sorting presterar på GPU-er med avståndsmåttet modifierat sådant att ett ellipsoidiskt sökområde bildas. Algoritmen används för att reduceras problemet av felaktig blanding av färg i Photon Mapping. Algoritmen visas vara mycket parallelliserbar och presterar till en grad av 86% behandlade sökpunkter per millisekund i jämförelse med en referensimplementation som använder sfäriska sökområden. Kompressionsgraden längs sökpunktens ytnormal väljs fördelaktligen till ett värde i intervallet 3,0 till 7,0. Algoritmen visas skala väl med avseende på både ökningar i antal data punkter och antal sökpunkter.
APA, Harvard, Vancouver, ISO, and other styles
13

Woerner, August Eric, and August Eric Woerner. "On the Neutralome of Great Apes and Nearest Neighbor Search in Metric Spaces." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/621578.

Full text
Abstract:
Problems of population genetics are magnified by problems of big data. My dissertation spans the disciplines of computer science and population genetics, leveraging computational approaches to biological problems to address issues in genomics research. In this dissertation I develop more efficient metric search algorithms. I also show that vast majority of the genomes of great apes are impacted by the forces of natural selection. Finally, I introduce a heuristic to identify neutralomes—regions that are evolving with minimal selective pressures—and use these neutralomes for inferences on effective population size in great apes. We begin with a formal and far-reaching problem that impacts a broad array of disciplines including biology and computer science; the 𝑘-nearest neighbors problem in generalized metric spaces. The 𝑘-nearest neighbors (𝑘-NN) problem is deceptively simple. The problem is as follows: given a query q and dataset D of size 𝑛, find the 𝑘-closest points to q. This problem can be easily solved by algorithms that compute 𝑘th order statistics in O(𝑛) time and space. It follows that if D can be ordered, then it is perhaps possible to solve 𝑘-NN queries in sublinear time. While this is not possible for an arbitrary distance function on the points in D, I show that if the points are constrained by the triangle inequality (such as with metric spaces), then the dataset can be properly organized into a dispersion tree (Appendix A). Dispersion trees are a hierarchical data structure that is built around a large dispersed set of points. Dispersion trees have construction times that are sub-quadratic (O(𝑛¹·⁵ log⁡ 𝑛)) and use O(𝑛) space, and they use a provably optimal search strategy that minimizes the number of times the distance function is invoked. While all metric data structures have worst-case O(𝑛) search times, dispersion trees have average-case search times that are substantially faster than a large sampling of comparable data structures in the vast majority of spaces sampled. Exceptions to this include extremely high dimensional space (d>20) which devolve into near-linear scans of the dataset, and unstructured low-dimensional (d<6) Euclidean spaces. Dispersion trees have empirical search times that appear to scale as O(𝑛ᶜ) for 0
APA, Harvard, Vancouver, ISO, and other styles
14

Thomas, Joseph Scott. "An Adaptive Data Structure for Nearest Neighbors Search in a General Metric Space." Thesis, The University of Arizona, 2010. http://hdl.handle.net/10150/146680.

Full text
Abstract:
We consider the problem of computing nearest neighbors in an arbitrary metric space, particularly a metric space that cannot be easily embedded in Rn. We present a data structure, the Partition Tree, that can be constructed in O(n log n) time, where n is the size of the set of points to be searched, and has been experimentally shown to have an average query time that is a sublinear function of n. Our experiments show that this data structure could have applications in bioinformatics, particularly protein secondary structure prediction, where it can be used for similarity search among short sequences of proteins' primary structure.
APA, Harvard, Vancouver, ISO, and other styles
15

José, Silva Leite Pedro. "Massively parallel nearest neighbors searches in dynamic point clouds on GPU." Universidade Federal de Pernambuco, 2010. https://repositorio.ufpe.br/handle/123456789/2356.

Full text
Abstract:
Made available in DSpace on 2014-06-12T15:57:17Z (GMT). No. of bitstreams: 2 arquivo3157_1.pdf: 3737373 bytes, checksum: 7ca491f9a72f2e9cf51764a7acac3e3c (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2010
Conselho Nacional de Desenvolvimento Científico e Tecnológico
Esta dissertação introduz uma estrutura de dados baseada em gride implementada em GPU. Ela foi desenvolvida para pesquisa dos vizinhos mais próximos em nuvens de pontos dinâmicas, de uma forma massivamente paralela. A implementação possui desempenho em tempo real e é executada em GPU, ambas construção do gride e pesquisas dos vizinhos mais próximos (exatos e aproximados). Dessa forma, a transferência de memória entre sistema e dispositivo é minimizada, aumentando o desempenho de uma forma geral. O algoritmo proposto pode ser usado em diferentes aplicações com cenários estáticos ou dinâmicos. Além disso, a estrutura de dados suporta nuvens de pontos tridimensionais e dada sua natureza dinâmica, o usuário pode mudar seus parâmetros em tempo de execução. O mesmo se aplica ao número de vizinhos pesquisados. Uma referência em CPU foi implementada e comparações de desempenho justificam o uso de GPUs como processadores massivamente paralelos. Em adição, o desempenho da estrutura de dados proposta é comparada com implementações em CPU e GPU de trabalhos anteriores. Finalmente, uma aplicação de renderização baseada em pontos foi desenvolvida de forma a verificar o potencial da estrutura de dados
APA, Harvard, Vancouver, ISO, and other styles
16

Heinrich-Litan, Laura. "Exact L-nearest neighbor search in high dimensions Exakte L-nächster-Nachbar-Suche in hohen Dimensionen /." [S.l. : s.n.], 2002. http://www.diss.fu-berlin.de/2003/80/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Pfeifer, John. "Novel Data Structures for Advanced Computational Movement Analysis." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/27296.

Full text
Abstract:
In this work we present novel data structures and algorithms that answer queries on movement data, which is defined as the recording of an object's spatial location over a set of discrete time intervals. Specifically, we describe three extensive studies that propose new solutions to the problem of proximity searches on trajectory data under the continuous Fréchet distance. We provide both theoretical analysis and experimental results that improve over previous work. The first study discusses a scalable approach for range and k-nearest-neighbor trajectory queries on an input trajectory data set. We introduce a novel metric index tree structure whose size is linear in the number of input trajectories, improve on or introduce new upper and lower bound algorithms for computing the continuous Fréchet distance, and give search heuristics that reduce computations. The second study researches the problem of searching for the nearest sub-trajectory within a large input trajectory, given a query trajectory. A new linear size simplification tree structure and adaptive clustering-based query algorithm are described, and experiments on real and synthetic data sets show effective search space pruning compared to baseline approaches. The final study explores the problem of accurately recognizing human sign language words by mining and constructing geometric feature space trajectories from human body movement. Nearest-neighbor classification applies speed-invariant distance measures to trajectories, which improve sign recognition accuracy over recent state-of-the-art neural network approaches. Our research focuses on practical and scalable solutions which give exact or approximate results, and can be implemented in industrial applications. Moreover, we strive for interpretable algorithms where it is easy follow the steps and precisely understand how an answer is computed.
APA, Harvard, Vancouver, ISO, and other styles
18

Madrigali, Andrea. "Analysis of Local Search Methods for 3D Data." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016.

Find full text
Abstract:
In questa tesi sono stati analizzati alcuni metodi di ricerca per dati 3D. Viene illustrata una panoramica generale sul campo della Computer Vision, sullo stato dell’arte dei sensori per l’acquisizione e su alcuni dei formati utilizzati per la descrizione di dati 3D. In seguito è stato fatto un approfondimento sulla 3D Object Recognition dove, oltre ad essere descritto l’intero processo di matching tra Local Features, è stata fatta una focalizzazione sulla fase di detection dei punti salienti. In particolare è stato analizzato un Learned Keypoint detector, basato su tecniche di apprendimento di machine learning. Quest ultimo viene illustrato con l’implementazione di due algoritmi di ricerca di vicini: uno esauriente (K-d tree) e uno approssimato (Radial Search). Sono state riportate infine alcune valutazioni sperimentali in termini di efficienza e velocità del detector implementato con diversi metodi di ricerca, mostrando l’effettivo miglioramento di performance senza una considerabile perdita di accuratezza con la ricerca approssimata.
APA, Harvard, Vancouver, ISO, and other styles
19

Günther, Michael. "FREDDY." Association for Computing Machinery, 2018. https://tud.qucosa.de/id/qucosa%3A38451.

Full text
Abstract:
Word embeddings are useful in many tasks in Natural Language Processing and Information Retrieval, such as text mining and classification, sentiment analysis, sentence completion, or dictionary construction. Word2vec and its predecessor fastText, both well-known models to produce word embeddings, are powerful techniques to study the syntactic and semantic relations between words by representing them in a low-dimensional vector. By applying algebraic operations on these vectors semantic relationships such as word analogies, gender-inflections, or geographical relationships can be easily recovered. The aim of this work is to investigate how word embeddings could be utilized to augment and enrich queries in DBMSs, e.g. to compare text values according to their semantic relation or to group rows according to the similarity of their text values. For this purpose, we use pre-trained word embedding models of large text corpora such as Wikipedia. By exploiting this external knowledge during query processing we are able to apply inductive reasoning on text values. Thereby, we reduce the demand for explicit knowledge in database systems. In the context of the IMDB database schema, this allows for example to query movies that are semantically close to genres such as historical fiction or road movie without maintaining this information. Another example query is sketched in Listing 1, that returns the top-3 nearest neighbors (NN) of each movie in IMDB. Given the movie “Godfather” as input this results in “Scarface”, “Goodfellas” and “Untouchables”.
APA, Harvard, Vancouver, ISO, and other styles
20

Carraher, Lee A. "A Parallel Algorithm for Query Adaptive, Locality Sensitive Hash Search." University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1337886738.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Sammon, Ryan. "Data Collection, Analysis, and Classification for the Development of a Sailing Performance Evaluation System." Thèse, Université d'Ottawa / University of Ottawa, 2013. http://hdl.handle.net/10393/25481.

Full text
Abstract:
The work described in this thesis contributes to the development of a system to evaluate sailing performance. This work was motivated by the lack of tools available to evaluate sailing performance. The goal of the work presented is to detect and classify the turns of a sailing yacht. Data was collected using a BlackBerry PlayBook affixed to a J/24 sailing yacht. This data was manually annotated with three types of turn: tack, gybe, and mark rounding. This manually annotated data was used to train classification methods. Classification methods tested were multi-layer perceptrons (MLPs) of two sizes in various committees and nearest- neighbour search. Pre-processing algorithms tested were Kalman filtering, categorization using quantiles, and residual normalization. The best solution was found to be an averaged answer committee of small MLPs, with Kalman filtering and residual normalization performed on the input as pre-processing.
APA, Harvard, Vancouver, ISO, and other styles
22

Silva, Eliezer de Souza da 1988. "Metric space indexing for nearest neighbor search in multimedia context : Indexação de espaços métricos para busca de vizinho mais próximo em contexto multimídia." [s.n.], 2014. http://repositorio.unicamp.br/jspui/handle/REPOSIP/258942.

Full text
Abstract:
Orientador: Eduardo Alves do Valle Junior
Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação
Made available in DSpace on 2018-08-26T08:10:33Z (GMT). No. of bitstreams: 1 Silva_EliezerdeSouzada_M.pdf: 2350845 bytes, checksum: dd31928bd19312563101a08caea74d63 (MD5) Previous issue date: 2014
Resumo: A crescente disponibilidade de conteúdo multimídia é um desafio para a pesquisa em Recuperação de Informação. Usuários querem não apenas ter acesso aos documentos multimídia, mas também obter semântica destes documentos, de modo que a capacidade de encontrar um conteúdo específico em grandes coleções de documentos textuais e não textuais é fundamental. Nessas grandes escalas, sistemas de informação multimídia de recuperação devem contar com a capacidade de executar a busca por semelhança de forma eficiente. No entanto, documentos multimídia são muitas vezes representados por descritores multimídia representados por vetores de alta dimensionalidade, ou por outras representações complexas em espaços métricos. Fornecer a possibilidade de uma busca por similaridade eficiente para esse tipo de dados é extremamente desafiador. Neste projeto, vamos explorar uma das famílias mais citado de soluções para a busca de similaridade, o Hashing Sensível à Localidade (LSH - Locality-sensitive Hashing em inglês), que se baseia na criação de funções de hash que atribuem, com maior probabilidade, a mesma chave para os dados que são semelhantes. O LSH está disponível apenas para um punhado funções de distância, mas, quando disponíveis, verificou-se ser extremamente eficiente para arquiteturas com custo de acesso uniforme aos dados. A maioria das funções LSH existentes são restritas a espaços vetoriais. Propomos dois métodos novos para o LSH, generalizando-o para espaços métricos quaisquer utilizando particionamento métrico (centróides aleatórios e k-medoids). Apresentamos uma comparação com os métodos LSH bem estabelecidos em espaços vetoriais e com os últimos concorrentes novos métodos para espaços métricos. Desenvolvemos uma modelagem teórica do comportamento probalístico dos algoritmos propostos e demonstramos algumas relações e limitantes para a probabilidade de colisão de hash. Dentre os algoritmos propostos para generelizar LSH para espaços métricos, esse desenvolvimento teórico é novo. Embora o problema seja muito desafiador, nossos resultados demonstram que ela pode ser atacado com sucesso. Esta dissertação apresentará os desenvolvimentos do método, a formulação teórica e a discussão experimental dos métodos propostos
Abstract: The increasing availability of multimedia content poses a challenge for information retrieval researchers. Users want not only have access to multimedia documents, but also make sense of them --- the ability of finding specific content in extremely large collections of textual and non-textual documents is paramount. At such large scales, Multimedia Information Retrieval systems must rely on the ability to perform search by similarity efficiently. However, Multimedia Documents are often represented by high-dimensional feature vectors, or by other complex representations in metric spaces. Providing efficient similarity search for that kind of data is extremely challenging. In this project, we explore one of the most cited family of solutions for similarity search, the Locality-Sensitive Hashing (LSH), which is based upon the creation of hashing functions which assign, with higher probability, the same key for data that are similar. LSH is available only for a handful distance functions, but, where available, it has been found to be extremely efficient for architectures with uniform access cost to the data. Most existing LSH functions are restricted to vector spaces. We propose two novel LSH methods (VoronoiLSH and VoronoiPlex LSH) for generic metric spaces based on metric hyperplane partitioning (random centroids and K-medoids). We present a comparison with well-established LSH methods in vector spaces and with recent competing new methods for metric spaces. We develop a theoretical probabilistic modeling of the behavior of the proposed algorithms and show some relations and bounds for the probability of hash collision. Among the algorithms proposed for generalizing LSH for metric spaces, this theoretical development is new. Although the problem is very challenging, our results demonstrate that it can be successfully tackled. This dissertation will present the developments of the method, theoretical and experimental discussion and reasoning of the methods performance
Mestrado
Engenharia de Computação
Mestre em Engenharia Elétrica
APA, Harvard, Vancouver, ISO, and other styles
23

Giusti, Giulia. "Similarità tra stringhe, applicazione dell'algoritmo TLSH a testi di ingegneria." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19129/.

Full text
Abstract:
La similarity search, ricerca di oggetti simili, è una tecnica fondamentale per l'ottenimento di considerazioni importanti sull'enorme quantità di dati che vengono prodotti ogni giorno. Il progetto riguarda lo sviluppo di un software di ricerca, raggruppamento e classificazione di somiglianza, basato sulla funzione TLSH, di documenti aziendali. A tal proposito, è stata eseguita una ricerca di similarità articolata su tre livelli: sezione, paragrafo e frase. La funzione TLSH utilizza le migliori tecnologie in termini di efficienza e applicabilità al problema dei Nearest Neighbors: Locality Sensitive Hashing (LSH), Context Triggered Piecewise Hashing (CTPH) e Features Extraction. Queste ultime verificano la proprietà di creazione di digest simili per oggetti simili, fondamentale nella similarity search. Il sistema del progetto di tesi è composto da due parti distinte: l'ambiente di test della funzione TLSH e il servizio relativo all'API per il confronto tra funzioni LSH. La valutazione dell'ambiente è stata eseguita attraverso due differenti analisi: quantitativa e qualitativa. La prima con lo scopo di esaminare l'efficienza dell'ambiente in relazione al tempo di esecuzione e alla precisione dei risultati ottenuti. Mentre la valutazione qualitativa riguarda la soddisfazione da parte dell'utente in termini di usabilità del sistema attraverso la somministrazione del questionario SUS (System Usability Scale). Dall'indagine dei test quantitativi si sono potuti osservare risultati soddisfacenti in efficienza e precisione. Inoltre, i risultati qualitativi si sono rivelati eccellenti con un punteggio SUS molto più elevato del valore medio ottenuto sperimentalmente. Sviluppi futuri del sistema TLSH potrebbero concernere l'ideazione di un metodo di inserimento dei digest all'interno di una tabella hash, sfruttando al meglio le proprietà di LSH, e l'eliminazione del limite minimo di caratteri richiesti per la creazione di digest.
APA, Harvard, Vancouver, ISO, and other styles
24

Jain, Himalaya. "Learning compact representations for large scale image search." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S027/document.

Full text
Abstract:
Cette thèse aborde le problème de la recherche d'images à grande échelle. Pour aborder la recherche d'images à grande échelle, il est nécessaire de coder des images avec des représentations compactes qui peuvent être efficacement utilisées pour comparer des images de manière significative. L'obtention d'une telle représentation compacte peut se faire soit en comprimant des représentations efficaces de grande dimension, soit en apprenant des représentations compactes de bout en bout. Le travail de cette thèse explore et avance dans ces deux directions. Dans notre première contribution, nous étendons les approches de quantification vectorielle structurée telles que la quantification de produit en proposant une représentation somme pondérée de codewords. Nous testons et vérifions les avantages de notre approche pour la recherche approximative du plus proche voisin sur les caractéristiques d'image locales et globales, ce qui est un moyen important d'aborder la recherche d'images à grande échelle. L'apprentissage de la représentation compacte pour la recherche d'images a récemment attiré beaucoup d'attention avec diverses approches basées sur le hachage profond proposées. Dans de telles approches, les réseaux de neurones convolutifs profonds apprennent à coder des images en codes binaires compacts. Dans cette thèse, nous proposons une approche d'apprentissage supervisé profond pour la représentation binaire structurée qui rappelle une approche de quantification vectorielle structurée telle que PQ. Notre approche bénéficie de la recherche asymétrique par rapport aux approches de hachage profond et apporte une nette amélioration de la précision de la recherche au même débit binaire. L'index inversé est une autre partie importante du système de recherche à grande échelle en dehors de la représentation compacte. À cette fin, nous étendons nos idées pour l'apprentissage de la représentation compacte supervisée pour la construction d'index inversés. Dans ce travail, nous abordons l'indexation inversée avec un apprentissage approfondi supervisé et essayons d'unifier l'apprentissage de l'indice inversé et de la représentation compacte. Nous évaluons minutieusement toutes les méthodes proposées sur divers ensembles de données accessibles au public. Nos méthodes surpassent ou sont compétitives avec l'état de l'art
This thesis addresses the problem of large-scale image search. To tackle image search at large scale, it is required to encode images with compact representations which can be efficiently employed to compare images meaningfully. Obtaining such compact representation can be done either by compressing effective high dimensional representations or by learning compact representations in an end-to-end manner. The work in this thesis explores and advances in both of these directions. In our first contribution, we extend structured vector quantization approaches such as Product Quantization by proposing a weighted codeword sum representation. We test and verify the benefits of our approach for approximate nearest neighbor search on local and global image features which is an important way to approach large scale image search. Learning compact representation for image search recently got a lot of attention with various deep hashing based approaches being proposed. In such approaches, deep convolutional neural networks are learned to encode images into compact binary codes. In this thesis we propose a deep supervised learning approach for structured binary representation which is a reminiscent of structured vector quantization approaches such as PQ. Our approach benefits from asymmetric search over deep hashing approaches and gives a clear improvement for search accuracy at the same bit-rate. Inverted index is another important part of large scale search system apart from the compact representation. To this end, we extend our ideas for supervised compact representation learning for building inverted indexes. In this work we approach inverted indexing with supervised deep learning and make an attempt to unify the learning of inverted index and compact representation. We thoroughly evaluate all the proposed methods on various publicly available datasets. Our methods either outperform, or are competitive with the state-of-the-art
APA, Harvard, Vancouver, ISO, and other styles
25

Tagami, Yukihiro. "Practical Web-scale Recommender Systems." Kyoto University, 2018. http://hdl.handle.net/2433/235110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Lloyd, Michael. "Nearest neighbour epidemic processes." Thesis, Heriot-Watt University, 1994. http://hdl.handle.net/10399/747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Bermejo, Sánchez Sergio. "Learning with nearest neighbour classifiers." Doctoral thesis, Universitat Politècnica de Catalunya, 2000. http://hdl.handle.net/10803/6323.

Full text
Abstract:
Premi extraordinari ex-aequo en l'àmbit d'Electrònica i Telecomunicacions. Convocatoria 1999 - 2000
Nearest Neighbour (NN) classifiers are one of the most celebrated algorithms in machine learning. In recent years, interest in these methods has flourished again in several fields (including statistics, machine learning and pattern recognition) since, in spite of their simplicity, they reveal as powerful non-parametric classification systems in real-world problems. The present work is mainly devoted to the development of new learning algorithms for these classifiers and is focused on the following topics:

- Development of learning algorithms for crisp and soft k-NN classifiers with large margin
- Extension and generalization of Kohonen's LVQ algorithms
- Local stabilization techniques for ensembles of NN classifiers
- Study of the finite-sample convergence of the on-line LVQ1 and k-means algorithms

Besides, a novel oriented principal component analysis (OPCA) addressed for feature
extraction in classification is introduced. The method integrates the feature extraction into the classifier and performs global training to extract those features useful for the classifier. The application of this general technique in the context of NN classifiers derives in a problem of learning their weight metric.
APA, Harvard, Vancouver, ISO, and other styles
28

Kini, Ananth Ullal. "On the effect of INQUERY term-weighting scheme on query-sensitive similarity measures." Texas A&M University, 2005. http://hdl.handle.net/1969.1/3116.

Full text
Abstract:
Cluster-based information retrieval systems often use a similarity measure to compute the association among text documents. In this thesis, we focus on a class of similarity measures named Query-Sensitive Similarity (QSS) measures. Recent studies have shown QSS measures to positively influence the outcome of a clustering procedure. These studies have used QSS measures in conjunction with the ltc term-weighting scheme. Several term-weighting schemes have superseded the ltc term-weighing scheme and demonstrated better retrieval performance relative to the latter. We test whether introducing one of these schemes, INQUERY, will offer any benefit over the ltc scheme when used in the context of QSS measures. The testing procedure uses the Nearest Neighbor (NN) test to quantify the clustering effectiveness of QSS measures and the corresponding term-weighting scheme. The NN tests are applied on certain standard test document collections and the results are tested for statistical significance. On analyzing results of the NN test relative to those obtained for the ltc scheme, we find several instances where the INQUERY scheme improves the clustering effectiveness of QSS measures. To be able to apply the NN test, we designed a software test framework, Ferret, by complementing the features provided by dtSearch, a search engine. The test framework automates the generation of NN coefficients by processing standard test document collection data. We provide an insight into the construction and working of the Ferret test framework.
APA, Harvard, Vancouver, ISO, and other styles
29

Gundogdu, Erhan. "Feature Detection And Matching Towards Augmented Reality Applications On Mobile Devices." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614618/index.pdf.

Full text
Abstract:
Local feature detection and its applications in different problems are quite popular in vision research. In order to analyze a scene, its invariant features, which are distinguishable in many views of this scene, are used in pose estimation, object detection and augmented reality. However, required performance metrics might change according to the application type
in general, the main metrics are accepted as accuracy and computational complexity. The contributions in this thesis provide improving these metrics and can be divided into three parts, as local feature detection, local feature description and description matching in different views of the same scene. In this thesis an efficient feature detection algorithm with sufficient repeatability performance is proposed. This detection method is convenient for real-time applications. For local description, a novel local binary pattern outperforming state-of-the-art binary pattern is proposed. As a final task, a fuzzy decision tree method is presented for approximate nearest neighbor search. In all parts of the system, computational efficiency is considered and the algorithms are designed according to limited processing time. Finally, an overall system capable of matching different views of the same scene has been proposed and executed in a mobile platform. The results are quite promising such that the presented system can be used in real-time applications, such as augmented reality, object retrieval, object tracking and pose estimation.
APA, Harvard, Vancouver, ISO, and other styles
30

Asyhari, Agustian Taufiq. "Nearest neighbour decoding for fading channels." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610448.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Curtin, Ryan Ross. "Improving dual-tree algorithms." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54354.

Full text
Abstract:
This large body of work is entirely centered around dual-tree algorithms, a class of algorithm based on spatial indexing structures that often provide large amounts of acceleration for various problems. This work focuses on understanding dual-tree algorithms using a new, tree-independent abstraction, and using this abstraction to develop new algorithms. Stated more clearly, the thesis of this entire work is that we may improve and expand the class of dual-tree algorithms by focusing on and providing improvements for each of the three independent components of a dual-tree algorithm: the type of space tree, the type of pruning dual-tree traversal, and the problem-specific BaseCase() and Score() functions. This is demonstrated by expressing many existing dual-tree algorithms in the tree-independent framework, and focusing on improving each of these three pieces. The result is a formidable set of generic components that can be used to assemble dual-tree algorithms, including faster traversals, improved tree theory, and new algorithms to solve the problems of max-kernel search and k-means clustering.
APA, Harvard, Vancouver, ISO, and other styles
32

Mittinty, Murthy N. "Nearest neighbour imputation and variance estimation methods." Thesis, University of Canterbury. Mathematics and Statistics, 2004. http://hdl.handle.net/10092/5643.

Full text
Abstract:
In large-scale surveys, non-response is a common phenomenon. This non-response can be of two types; unit and item non-response. In this thesis we deal with item non-response as other responses from the survey unit can be used for adjustment. Usually non-response adjustment is carried out in one of three ways; weighting, imputation and no adjustments. Imputation is the most commonly used adjustment method, either as single imputation or multiple imputations. In this thesis we study single imputation, in particular nearest neighbour methods, and we have developed a new method. Our method is based on dissimilarity measures and is nonparametric and handles categorical and continuous covariates without requiring any transformations. One drawback with this method was that it is relatively computer intensive, so we investigated data reduction methods. For data reduction we developed a new method that uses propensity scores. Propensity score is used as it has properties that suggest that it would make a good method for matching the respondents and non-respondents. We also looked at subset selection of the covariates using graphical modelling and principal component analysis. We found that the data reduction methods gave as good a result as when using all variables and there was considerable reduction in computation time especially with the propensity score method. As the imputed values are not true values, estimating the variance of the parameter of interest using standard methods would underestimate the variance if no allowance is made for the extra uncertainty due to imputed data being used. We examined various existing methods of variance estimation, particularly the bootstrap method, because both nearest neighbour imputation and bootstrap are non parametric. Also bootstrap is a unified method for estimating smooth as well as non-smooth parameters. Shao and Sitter (1996) proposed a bootstrap method, but for some extreme situations this method has problems. We have modified the bootstrap method of Shao and Sitter to overcome this problem and simulations indicate that both methods give good results. The conclusions from the study are that our new method of multivariate nearest neighbour is at least as good as regression based nearest neighbour and is often better. For large data sets, data reduction may be desirable and we recommend our propensity score method as it was observed to be the fastest among the subset selection methods as well as have some other advantages over the others. Imputing using any of the subsets methods we looked at appear to have similar results to imputing using all covariates. To compute the variance of the imputed data, we recommend the method proposed by Shao and Sitter or our modification of Shao and Sitter's method.
APA, Harvard, Vancouver, ISO, and other styles
33

Kucuktunc, Onur. "Result Diversification on Spatial, Multidimensional, Opinion, and Bibliographic Data." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374148621.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Muja, Marius. "Scalable nearest neighbour methods for high dimensional data." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44402.

Full text
Abstract:
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbour matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbour matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this thesis, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbour algorithm and its parameters depend on the dataset characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular dataset. In order to scale to very large datasets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbour matching framework that can be used with any of the algorithms described in the thesis. All this research has been released as an open source library called FLANN (Fast Library for Approximate Nearest Neighbours), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbour matching.
APA, Harvard, Vancouver, ISO, and other styles
35

Payne, Terry R. "Dimensionality reduction and representation for nearest neighbour learning." Thesis, University of Aberdeen, 1999. https://eprints.soton.ac.uk/257788/.

Full text
Abstract:
An increasing number of intelligent information agents employ Nearest Neighbour learning algorithms to provide personalised assistance to the user. This assistance may be in the form of recognising or locating documents that the user might find relevant or interesting. To achieve this, documents must be mapped into a representation that can be presented to the learning algorithm. Simple heuristic techniques are generally used to identify relevant terms from the documents. These terms are then used to construct large, sparse training vectors. The work presented here investigates an alternative representation based on sets of terms, called set-valued attributes, and proposes a new family of Nearest Neighbour learning algorithms that utilise this set-based representation. The importance of discarding irrelevant terms from the documents is then addressed, and this is generalised to examine the behaviour of the Nearest Neighbour learning algorithm with high dimensional data sets containing such values. A variety of selection techniques used by other machine learning and information retrieval systems are presented, and empirically evaluated within the context of a Nearest Neighbour framework. The thesis concludes with a discussion of ways in which attribute selection and dimensionality reduction techniques may be used to improve the selection of relevant attributes, and thus increase the reliability and predictive accuracy of the Nearest Neighbour learning algorithm.
APA, Harvard, Vancouver, ISO, and other styles
36

Andersson, Josefine. "Insurances against job loss and disability : Private and public interventions and their effects on job search and labor supply." Doctoral thesis, Uppsala universitet, Nationalekonomiska institutionen, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-327916.

Full text
Abstract:
Essay I: Employment Security Agreements, which are elements of Swedish collective agreements, offer a unique opportunity to study very early job search counselling of displaced workers. These agreements provide individual job search assistance to workers who are dismissed due to redundancy, often as early as during the period of notice. Compared to traditional labor market policies, the assistance provided is earlier and more responsive to the needs of the individual worker. In this study, I investigate the effects of the individual counseling and job search assistance provided through the Employment Security Agreement for Swedish blue-collar workers on job finding and subsequent job quality. The empirical strategy is based on the rules of eligibility in a regression discontinuity framework. I estimate the effect for workers with short tenure, who are dismissed through mass-layoffs. My results do not suggest that the program has an effect on the probability of becoming unemployed, the duration of unemployment, or income. However, the results indicate that the program has a positive effect on the duration of the next job. Essay II: The well-known positive relationship between the unemployment benefit level and unemployment duration can be separated into two potential sources; a moral hazard effect, and a liquidity effect pertaining to the increased ability to smooth consumption. The latter is a socially optimal response due to credit and insurance market failures. These two effects are difficult to separate empirically, but the social optimality of an unemployment insurance policy can be evaluated by studying the effect of a non-distortionary lump-sum severance grant on unemployment durations. In this study, I evaluate the effects on unemployment duration and subsequent job quality of a lump-sum severance grant provided to displaced workers, by means of a Swedish collective agreement. I use a regression discontinuity design, based on the strict age requirement to be eligible for the grant. I find that the lump-sum grant has a positive effect on the probability of becoming unemployed and the length of the completed unemployment duration, but no effect on subsequent job quality. My analysis also indicates that spousal income is important for the consumption smoothing abilities of displaced workers, and that the grant may have a greater effect in times of more favorable labor market conditions. Essay III: Evidence from around the world suggest that individuals who are awarded disability benefits in some cases still have residual working capacity, while disability insurance systems typically involve strong disincentives for benefit recipients to work. Some countries have introduced policies to incentivize disability insurance recipients to use their residual working capacities on the labor market. One such policy is the continuous deduction program in Sweden, introduced in 2009. In this study, I investigate whether the financial incentives provided by this program induce disability insurance recipients to increase their labor supply or education level. Retroactively determined eligibility to the program with respect to time of benefit award provides a setting resembling a natural experiment, which could be used to estimate the effects of the program using a regression discontinuity design. However, a simultaneous regime change of disability insurance eligibility causes covariate differences between treated and controls, which I adjust for using a matching strategy. My results suggest that the financial incentives provided by the program have not had any effect on labor supply or educational attainment.
APA, Harvard, Vancouver, ISO, and other styles
37

Chandgotia, Nishant. "Markov random fields and measures with nearest neighbour Gibbs potential." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/37000.

Full text
Abstract:
This thesis will discuss the relationship between stationary Markov random fields and probability measures with a nearest neighbour Gibbs potential. While the relationship has been well explored when the measures are fully supported, we shall discuss what happens when we weaken this assumption.
APA, Harvard, Vancouver, ISO, and other styles
38

Guo, Gongde. "A study on the nearest neighbour method and its applications." Thesis, University of Ulster, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.407756.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Hatko, Stan. "k-Nearest Neighbour Classification of Datasets with a Family of Distances." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/33361.

Full text
Abstract:
The k-nearest neighbour (k-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the k-NN classifier. In this thesis we investigate the use of alternative distances for the k-NN classifier. We start by introducing some background notions in statistical machine learning. We define the k-NN classifier and discuss Stone's theorem and the proof that k-NN is universally consistent on the normed space R^d. We then prove that k-NN is universally consistent if we take a sequence of random norms (that are independent of the sample and the query) from a family of norms that satisfies a particular boundedness condition. We extend this result by replacing norms with distances based on uniformly locally Lipschitz functions that satisfy certain conditions. We discuss the limitations of Stone's lemma and Stone's theorem, particularly with respect to quasinorms and adaptively choosing a distance for k-NN based on the labelled sample. We show the universal consistency of a two stage k-NN type classifier where we select the distance adaptively based on a split labelled sample and the query. We conclude by giving some examples of improvements of the accuracy of classifying various datasets using the above techniques.
APA, Harvard, Vancouver, ISO, and other styles
40

Lazar, Iustin. "A multi-level nearest-neighbour algorithm for predicting protein secondary structure." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ39987.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Luk, Andrew. "Some new results in nearest neighbour classification and lung sound analysis." Thesis, University of Glasgow, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.280756.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Halachkin, Aliaksei. "Klasifikace vozidel na základě odezvy indukčních senzorů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-316383.

Full text
Abstract:
This project is dedicated to the problem of vehicle classification using inductive loop sensors. We created the dataset that contains more than 11000 labeled inductive loop signatures collected at different times and from different parts of the world. Multiple classification methods and their optimizations were employed to the vehicle classification. Final model that combines K-nearest neighbors and logistic regression achieves 94\% accuracy on classification scheme with 9 classes. The vehicle classifier was implemented in C++.
APA, Harvard, Vancouver, ISO, and other styles
43

Carrier, Kevin. "Recherche de presque-collisions pour le décodage et la reconnaissance de codes correcteurs." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS281.

Full text
Abstract:
Les codes correcteurs d'erreurs sont des outils ayant pour fonction originale de corriger les erreurs produites par des canaux de communication imparfaits. Dans un contexte non coopératif, se pose le problème de reconnaître des codes inconnus à partir de la seule connaissance de mots de code bruités. Ce problème peut s'avérer difficile pour certaines familles de codes, notamment pour les codes LDPC qui sont très présents dans nos systèmes de télécommunication modernes. Dans cette thèse, nous proposons de nouvelles techniques pour reconnaître plus facilement ces codes. À la fin des années 70, McEliece eu l'idée de détourner la fonction première des codes pour les utiliser dans des chiffrements, initiant ainsi une famille de solutions cryptographiques alternative à celle fondée sur la théorie des nombres. Un des avantages de la cryptographie fondée sur les codes est qu'elle semble résister au paradigme de calcul quantique ; notamment grâce à la robustesse du problème de décodage générique. Ce dernier a été profondément étudié ces 60 dernières années. Les dernières améliorations utilisent toutes des algorithmes de recherche de couples de points proches dans une liste. Dans cette thèse, nous améliorons le décodage générique en proposant notamment une nouvelle façon de rechercher des couples proches. Notre méthode repose sur l'utilisation de décodages en liste de codes polaires pour construire des fonctions de hachage floues. Dans ce manuscrit, nous traitons également la recherche de couples de points éloignés. Notre solution peut être utilisée pour améliorer le décodage en grandes distances qui a récemment trouvé des applications dans des designs de signature
Error correcting codes are tools whose initial function is to correct errors caused by imperfect communication channels. In a non-cooperative context, there is the problem of identifying unknown codes based solely on knowledge of noisy codewords. This problem can be difficult for certain code families, in particular LDPC codes which are very common in modern telecommunication systems. In this thesis, we propose new techniques to more easily recognize these codes. At the end of the 1970s, McEliece had the idea of ​​redirecting the original function of codes to use in ciphers; thus initiating a family of cryptographic solutions which is an alternative to those based on number theory problems. One of the advantages of code-based cryptography is that it seems to withstand the quantum computing paradigm; notably thanks to the robustness of the generic decoding problem. The latter has been thoroughly studied for more than 60 years. The latest improvements all rely on using algorithms for finding pairs of points that are close to each other in a list. This is the so called near-collisions search problem. In this thesis, we improve the generic decoding by asking in particular for a new way to find close pairs. To do this, we use list decoding of Arikan's polar codes to build new fuzzy hashing functions. In this manuscript, we also deal with the search for pairs of far points. Our solution can be used to improve decoding over long distances. This new type of decoding finds very recent applications in certain signature models
APA, Harvard, Vancouver, ISO, and other styles
44

Wasito, Ito. "Least squares algorithms with nearest neighbour techniques for imputing missing data values." Thesis, Birkbeck (University of London), 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.406224.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Berrett, Thomas Benjamin. "Modern k-nearest neighbour methods in entropy estimation, independence testing and classification." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/267832.

Full text
Abstract:
Nearest neighbour methods are a classical approach in nonparametric statistics. The k-nearest neighbour classifier can be traced back to the seminal work of Fix and Hodges (1951) and they also enjoy popularity in many other problems including density estimation and regression. In this thesis we study their use in three different situations, providing new theoretical results on the performance of commonly-used nearest neighbour methods and proposing new procedures that are shown to outperform these existing methods in certain settings. The first problem we discuss is that of entropy estimation. Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this chapter, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko (1987), based on the k-nearest neighbour distances of a sample. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient in up to three dimensions. A related topic of study is the estimation of the mutual information between two random vectors, and its application to testing for independence. We propose tests for the two different situations of the marginal distributions being known or unknown and analyse their performance. Finally, we study the classical k-nearest neighbour classifier of Fix and Hodges (1951) and provide a new asymptotic expansion for its excess risk. We also show that, in certain situations, a new modification of the classifier that allows k to vary with the location of the test point can provide improvements. This has applications to the field of semi-supervised learning, where, in addition to labelled training data, we also have access to a large sample of unlabelled data.
APA, Harvard, Vancouver, ISO, and other styles
46

Stiff, Philip, and Carl Holmqvist. "Jämförelse av avståndsmått för K-nearest neighbour-klassificering av resmål hos nya användare." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186354.

Full text
Abstract:
Användarbeteende är idag ett område som blir allt viktigare för företag som vill erbjuda tjänster anpassade efter sina kunder. För att kunna konkurrera på marknaden vill företagen kunna föreslå sina kunder en tjänst redan innan kunderna vet om att de behöver den. Det finns ett flertal kända algoritmer för att uppnå detta. I denna studie undersöks K-nearest neighbour-algoritmen och hur den bör anpassas för att mäta likhet mellan instanser av kunder i en databas. För att göra detta jämförs en egenutvecklad metod baserad på instansernas generella förhållanden med några befintliga metoder. Jämförelsen genomförs på en databas innehållande användarkonton från ett resebolag och görs med ett flertal olika värden på K-nearest neighbour-algoritmens olika parametrar. För att studera prestandan för de olika metoderna jämförs träffsäkerheten i antal korrekta klassificeringar. Resultaten visar en mycket liten skillnad mellan metoderna vilket snarare indikerar en skevhet i den valda databasen än hur väl metoderna presterar. Därmed kan inte mycket sägas om hur de valda metoderna står sig mot varandra.
User behavior prediction is becoming increasingly important for companies that want to offer services tailored for their customers. In order to compete in the market, companies want to propose a service before customers know they want it. There are several known algorithms for achieving this. In this study we investigate the K-nearest neighbor algorithm and how it should be adapted to measure the similarity between instances of customers in a database. To do this we compare a new method based on the instances’ general relationships with some existing methods. The comparison is performed on a database containing user accounts from a travel agency and is made with several values for the K-nearest neighbor algorithms different parameters. To study the performance of the various methods their accuracy is compared. The results show a very slight difference between the methods which rather indicate a distortion in the database than how well the methods perform. Thus, not much can be said about the performance of the methods.
APA, Harvard, Vancouver, ISO, and other styles
47

Faustino, Bruno Filipe Fernandes Simões Salgueiro. "Implementation for spatial data of the shared nearest neighbour with metric data structures." Master's thesis, Faculdade de Ciências e Tecnologia, 2012. http://hdl.handle.net/10362/8489.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Stiernborg, Sebastian, and Sara Ervik. "Evaluation of Machine Learning Classification Methods : Support Vector Machines, Nearest Neighbour and Decision Tree." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209119.

Full text
Abstract:
With more and more data available, the interest and use for machine learning is growing and so does the need for classification. Classifica- tion is an important method within machine learning for data simpli- fication and prediction. This report evaluates three classification methods for supervised learn- ing: Support Vector Machines (SVM) with several kernels, Nearest Neighbor (k-NN) and Decision Tree (DT). The methods were evalu- ated based on the factors accuracy, precision, recall and time. The experiments were conducted on artificial data created to represent a variation of distributions with a limitation of only 2 features and 3 classes. Different distributions of data were chosen to challenge each classification method. The results show that the measurements for ac- curacy and time vary considerably for the different distributed dataset. SVM with RBF kernel performed better for accuracy in comparison to the other classification methods. k-NN scored slightly lower accuracy values than SVM with RBF kernel in general, but performed better on the challenging dataset. DT is the less time consuming algorithm and was significally faster than the other classification methods. The only method that could compete with DT on time was k-NN that was faster than DT for the dataset with small spread and coinciding classes. Although a clear trend can be seen in the results the area needs to be studied further to draw a comprehensive conclusion due to limitation of the artificially generated datasets in this study.
Med växande data och tillgänglighet ökar intresset och användning- en för maskininlärning, tillsammans med behovet för klassificering. Klassificering är en viktig metod inom maskininlärning för att förenk- la data och göra förutstägelser. Denna rapport utvärderar tre klassificeringsmetoder för övervakad in- lärning: Stödvektormaskiner (SVM) med olika kärnor, Närmaste Gran- ne (k-NN) och Beslutsträd (DT). Metoderna utvärderades baserat på nogrannhet, precision, återkallelse och tid. Experimenten utfördes på artificiell data skapad för att representera en variation av fördelningar med en begränsning av endast 2 egenskaper och 3 klasser. Resultaten visar att mätningarna för noggrannhet och tid varierar avsevärt för olika variationer av dataset. SVM med RBF-kärna gav generellt högre värden för noggrannhet i jämförelse med de and- ra klassificeringsmetoderna. k-NN visade något lägre noggrannhet än SVM med RBF-kärna i allmänhet, men presterade bättre på det mest utmanande datasetet. DT är den minst tidskrävande algoritmen och var signifikant snabbare än de andra klassificeringsmetoderna. Den enda metoden som kunde konkurrera med DT i tid var SVM med k- NN som var snabbare än DT för det dataset som hade liten spridning och sammanfallande klasser. Även om en tydlig trend kan ses i resultaten behöver området studeras ytterligare för att dra en omfattande slutsats på grund av begränsning av dataset i denna studie.
APA, Harvard, Vancouver, ISO, and other styles
49

Borén, Mirjam. "Classification of discrete stress levels in users using eye tracker and K- Nearest Neighbour algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-176258.

Full text
Abstract:
The advancement of the Head Mounted Display (HMD) used for Virtual Reality (VR) has come a long way and now the option of eye tracking is available in some HMD. The eyes show physiological responses when healthy individuals are stressed, justifying eye tracking as a tool to estimate at minimum, the very presence of stress. Stress can present itself in many shapes and may be caused by different factors such as work, social situations, cognitive load and many others. The stress test Group Stroop Color Word Test (GSCWT) can induce four different levels of stress in users; no stress, low stress, medium stress and high stress. In this thesis GSCWT was implemented in a virtual reality and users had their pupil dilation and blinking rate recorded. The data was then used to train and test a K-Nearest Neighbour algorithm (KNN). The KNN- algorithm could not accurately predict between the four different stress classes but it could predict the presence or absence of stress. VR has been used successfully as a tool for practicing different social skills and other everyday life skills for individuals with Autism Spectrum Disorder (ASD). By correctly identifying the stress level in the user in VR, tools for practicing social skills for ASD individuals could be more personalized and improved.
APA, Harvard, Vancouver, ISO, and other styles
50

Zandi, Zand Sajjad. "FPGA implementation of ROI extraction for visual-IR smart cameras." Thesis, Mittuniversitetet, Avdelningen för elektronikkonstruktion, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-26076.

Full text
Abstract:
Video surveillance systems have been popular as a security tool for years, and the technological development helps monitoring accident-prone areas with the help of digital image processing.A thermal and a visual camera are being used in the surveillance project. The thermal camera is sensitive to the heat emitted by objects, and it is essential to employ the thermal camera as the visual camera is only useful in the presence of light. These cameras do not provide images of the same resolution. In order to extract the region of interest (ROI) of the visual camera, the images of these cameras need to have the same resolution; therefore the thermal images are processed in order to have the same size as the visual image.The ROI extraction is needed in order to reduce the data that needs to be transmitted. The region of interest is extracted from the visual image and the required processes are mostly done on the thermal image as it has lower resolution and therefore requires less computational processing. The image taken from the thermal camera is up scaled by using the nearest neighbor algorithm and it is zero-padded to make the resolutions of the two images equal, and then the region of interest is extracted by masking the result with the related converted version of visual image to YCbCr color space.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!