Dissertations / Theses on the topic 'Query algorithm'

To see the other types of publications on this topic, follow the link: Query algorithm.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Query algorithm.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Cheng, Jiang. "Preserving query privacy with a query-based memorizing algorithm." Thesis, Wichita State University, 2014. http://hdl.handle.net/10057/10950.

Full text
Abstract:
Query privacy is a critical concern to users of location-based services. A majority of existing query privacy protection techniques are based on the notion of k-anonymity, wherein a user's exact location is obfuscated into a spatial range containing at least k users, called the cloaking region. Thus, the user who issues the query cannot be distinguished from k-1 other users. However, when mobile users issue continuous queries using such a k-anonymity scheme, an adversary can exploit the overlapped areas of the corresponding cloaking regions to determine the query issuer with a significantly higher probability. This thesis proposes a query-based memorizing algorithm to specifically address this issue. The main idea in this thesis is to memorize the identity of the users in an anonymity set or cloaking region. When a user issues sequential location-based queries, the cloaking regions are determined such that they include a maximum number of users that have appeared in the past cloaking regions. The query-based memorizing approach is empirically evaluated by means of simulation experiments and a detailed comparative analysis with three other popular privacy protection algorithms using standard privacy metrics is performed. The results show that the proposed algorithm efficiently protects users' query privacy against the overlapped area attack, especially when users are highly mobile.
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
2

Suryavanshi, Chetna. "Query AutoAwesome." DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7546.

Full text
Abstract:
This research investigates how to improve legacy queries. Legacy queries are queries that programmers have coded and are used in applications. A database application typically has tens to hundreds of such queries. One way to improve legacy queries is to add new, interesting queries that are similar to or based on the set of queries. We propose Query AutoAwesome, a tool to generate new queries from legacy queries. The Query AutoAwesome philosophy is taken from Google’s AutoAwesomizer tool for photos, which automatically improves a photo uploaded to Google by animating the photo or adding special effects. In a similar vein, Query AutoAwesome automatically enhances a query by ingesting a database and the query. Query AutoAwesome produces a set of enhanced queries that a user can then choose to use or discard. A key problem that we solve is that the space of potential enhancements is large, so we introduce objective functions to narrow the search space to a tractable space. We describe our plans for implementing Query AutoAwesome and discuss our ideas for future work.
APA, Harvard, Vancouver, ISO, and other styles
3

Lin, Han-Hsuan. "Topics in quantum algorithms : adiabatic algorithm, quantum money, and bomb query complexity." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/99300.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Physics, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 111-115).
In this thesis, I present three results on quantum algorithms and their complexity. The first one is a numerical study on the quantum adiabatic algorithm( QAA) . We tested the performance of the QAA on random instances of MAX 2-SAT on 20 qubits and showed 3 strategics that improved QAA's performance, including a counter intuitive strategy of decreasing the overall evolution time. The second result is a security proof for the quantum money by knots proposed by Farhi et. al. We proved that quantum money by knots can not be cloned in a black box way unless graph isomorphism is efficiently solvable by a quantum computer. Lastly we defined a modified quantum query model, which we called bomb query complexity B(J), inspired by the Elitzur-Vaidman bomb-testing problem. We completely characterized bomb query complexity be showing that B(f) = [Theta](Q(f)2 ). This result implies a new method to find upper bounds on quantum query complexity, which we applied on the maximum bipartite matching problem to get an algorithm with O(n1.75) quantum query complexity, improving from the best known trivial O(n2 ) upper bound.
by Han-Hsuan Lin.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
4

Samara, Rafat. "TOP-K AND SKYLINE QUERY PROCESSING OVER RELATIONAL DATABASE." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-20108.

Full text
Abstract:
Top-k and Skyline queries are a long study topic in database and information retrieval communities and they are two popular operations for preference retrieval. Top-k query returns a subset of the most relevant answers instead of all answers. Efficient top-k processing retrieves the k objects that have the highest overall score. In this paper, some algorithms that are used as a technique for efficient top-k processing for different scenarios have been represented. A framework based on existing algorithms with considering based cost optimization that works for these scenarios has been presented. This framework will be used when the user can determine the user ranking function. A real life scenario has been applied on this framework step by step. Skyline query returns a set of points that are not dominated (a record x dominates another record y if x is as good as y in all attributes and strictly better in at least one attribute) by other points in the given datasets. In this paper, some algorithms that are used for evaluating the skyline query have been introduced. One of the problems in the skyline query which is called curse of dimensionality has been presented. A new strategy that based on the skyline existing algorithms, skyline frequency and the binary tree strategy which gives a good solution for this problem has been presented. This new strategy will be used when the user cannot determine the user ranking function. A real life scenario is presented which apply this strategy step by step. Finally, the advantages of the top-k query have been applied on the skyline query in order to have a quickly and efficient retrieving results.
APA, Harvard, Vancouver, ISO, and other styles
5

Pielech, Bradford Charles. "Adaptive Scheduling Algorithm Selection in a Streaming Query System." Digital WPI, 2004. https://digitalcommons.wpi.edu/etd-theses/79.

Full text
Abstract:
Many modern applications process queries over unbounded streams of data. These applications include tracking financial data from international markets, intrusion detection in networks, monitoring remote sensors, and monitoring patients vital signs. These data streams arrive in real time, are unbounded in length and have unpredictable arrival patterns due to external uncontrollable factors such as network congestion or weather in the case of remote sensors. This thesis presents a novel technique for adapting the execution of stream queries that, to my knowledge, is not present in any other continuous query system to date. This thesis hypothesizes that utilizing a single scheduling algorithm to execute a continuous query, as is employed in other state-of-the-art continuous query systems, is not sufficient because existing scheduling algorithms all have inherent flaws or tradeoffs. Thus, one scheduling algorithm cannot optimally meet an arbitrary set of Quality of Service (QoS) requirements. Therefore, to meet unique features of specific monitoring applications, an adaptive strategy selector guidable by QoS requirements was developed. The adaptive strategy selector monitors the effects of its behavior on its environment through a feedback mechanism, with the aim of exploiting previously beneficial behavior and exploring alternative behavior. The feedback mechanism is guided by qualitatively comparing how well each algorithm has met the QoS requirements. Then the next scheduling algorithm is chosen by spinning a roulette wheel where each candidate is chosen with a probability equal to its performance score. The adaptive algorithm is general, being able to employ any candidate scheduling algorithm and to react to any combination of quality of service preferences. As part of this thesis, the Raindrop system was developed as exploratory test bed in which to conduct an experimental study. In that experimental study, the adaptive algorithm was shown to be effective in outperforming single scheduling algorithms for many QoS combinations and data arrival patterns.
APA, Harvard, Vancouver, ISO, and other styles
6

Staicu, Laurian. "Multiple query points parallel search algorithm (Comb algorithm) for multimedia database systems." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ59340.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lim, Heechul. "Evaluation of Shortest Path Query Algorithm in Spatial Databases." Thesis, University of Waterloo, 2003. http://hdl.handle.net/10012/1085.

Full text
Abstract:
Many variations of algorithms for finding the shortest path in a large graph have been introduced recently due to the needs of applications like the Geographic Information System (GIS) or Intelligent Transportation System (ITS). The primary subjects of those algorithms are materialization and hierarchical path views. Some studies focus on the materialization and sacrifice the pre-computational costs and storage costs for faster computation of a query. Other studies focus on the shortest-path algorithm, which has less pre-computation and storage but takes more time to compute the shortest path. The main objective of this thesis is to accelerate the computation time for the shortest-path queries while keeping the degree of materialization as low as possible. This thesis explores two different categories: 1) the reduction of the I/O-costs for multiple queries, and 2) the reduction of search spaces in a graph. The thesis proposes two simple algorithms to reduce the I/O-costs, especially for multiple queries. To tackle the problem of reducing search spaces, we give two different levels of materializations, namely, the boundary set distance matrix and x-Hop sketch graph, both of which materialize the shortest-path view of the boundary nodes in a partitioned graph. Our experiments show that a combination of the suggested solutions for 1) and 2) performs better than the original Disk-based SP algorithm [7], on which our work is based, and requires much less storage than HEPV [3].
APA, Harvard, Vancouver, ISO, and other styles
8

Pielech, Bradford Charles. "Adaptive scheduling algorithm selection in a streaming query system." Link to electronic thesis, 2003. http://www.wpi.edu/Pubs/ETD/Available/etd-0113104-194126.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Carraher, Lee A. "A Parallel Algorithm for Query Adaptive, Locality Sensitive Hash Search." University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1337886738.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Chen. "An evaluation of a 2-way semijoin distributed query processing algorithm." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ62199.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Kangas, Carl-Evert. "Ranking Highscores : Evaluation of a dynamic Bucket with Global Query algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-127677.

Full text
Abstract:
The task of ranking highscores in a computer game may sound like a trivial task. It turns out it is not, because the naive solution have a time complexity not suitable for online applications in terms of response time and running cost. An overview of a few approaches to ranking is presented: how an N-ary tree could be used to do ranking and how to do linear approximation. Two ways of obtaining a model for doing linear approximation are demonstrated, a method called Buckets with Global Queryis described and a method based on Frugal Streaming is elaborated on.Finally, a variant of the Buckets with Global Query algorithm where the buckets are adjusted continuosly according to the changes in the distribution of high scores is evaluated. The dynamic variant of the algorithm performs well in terms of accuracy for at least 100 000 highscore up-dates but have no significant gains in reduced CPU-time.
APA, Harvard, Vancouver, ISO, and other styles
12

Jayakeerthy, Arunkumar Thippur Lim Alvin S. "Query-localized route repair mechanism for ad-hoc on-demand distance vector routing algorithm." Auburn, Ala, 2009. http://hdl.handle.net/10415/1608.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Yang, Di. "Mining and Managing Neighbor-Based Patterns in Data Streams." Digital WPI, 2012. https://digitalcommons.wpi.edu/etd-dissertations/16.

Full text
Abstract:
The current data-intensive world is continuously producing huge volumes of live streaming data through various kinds of electronic devices, such as sensor networks, smart phones, GPS and RFID systems. To understand these data sources and thus better leverage them to serve human society, the demands for mining complex patterns from these high speed data streams have significantly increased in a broad range of application domains, such as financial analysis, social network analysis, credit fraud detection, and moving object monitoring. In this dissertation, we present a framework to tackle the mining and management problem for the family of neighbor-based patterns in data streams, which covers a broad range of popular pattern types, including clusters, outliers, k-nearest neighbors and others. First, we study the problem of efficiently executing single neighbor-based pattern mining queries. We propose a general optimization principle for incremental pattern maintenance in data streams, called "Predicted Views". This general optimization principle exploits the "predictability" of sliding window semantics to eliminate both the computational and storage effort needed for handling the expiration of stream objects, which usually constitutes the most expensive operations for incremental pattern maintenance. Second, the problem of multiple query optimization for neighbor-based pattern mining queries is analyzed, which aims to efficiently execute a heavy workload of neighbor-based pattern mining queries using shared execution strategies. We present an integrated pattern maintenance strategy to represent and incrementally maintain the patterns identified by queries with different query parameters within a single compact structure. Our solution realizes fully shared execution of multiple queries with arbitrary parameter settings. Third, the problem of summarization and matching for neighbor-based patterns is examined. To solve this problem, we first propose a summarization format for each pattern type. Then, we present computation strategies, which efficiently summarize the neighbor-based patterns either during or after the online pattern extraction process. Lastly, to compare patterns extracted on different time horizon of the stream, we design an efficient matching mechanism to identify similar patterns in the stream history for any given pattern of interest to an analyst. Our comprehensive experimental studies, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrate superiority of our proposed strategies over alternate methods in both effectiveness and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
14

Sevinc, Ender. "Genetic Algorithms For Distributed Database Design And Distributed Database Query Optimization." Phd thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/3/12611194/index.pdf.

Full text
Abstract:
The increasing performance of computers, reduced prices and ability to connect systems with low cost gigabit ethernet LAN and ATM WAN networks make distributed database systems an attractive research area. However, the complexity of distributed database query optimization is still a limiting factor. Optimal techniques, such as dynamic programming, used in centralized database query optimization are not feasible because of the increased problem size. The recently developed genetic algorithm (GA) based optimization techniques presents a promising alternative. We compared the best known GA with a random algorithm and showed that it achieves almost no improvement over the random search algorithm generating an equal number of random solutions. Then, we analyzed a set of possible GA parameters and determined that two-point truncate technique using GA gives the best results. New mutation and crossover operators defined in our GA are experimentally analyzed within a synthetic distributed database having increasing the numbers of relations and nodes. The designed synthetic database replicated relations, but there was no horizontal/vertical fragmentation. We can translate a select-project-join query including a fragmented relation with N fragments into a corresponding query with N relations. Comparisons with optimal results found by exhaustive search are only 20% off the results produced by our new GA formulation showing a 50% improvement over the previously known GA based algorithm.
APA, Harvard, Vancouver, ISO, and other styles
15

Stoler, Moshe. "Query processing optimization for distributed relational database systems: an implementation of a heuristic based algorithm." Master's thesis, Virginia Polytechnic Institute and State University, 1987. http://hdl.handle.net/10919/64491.

Full text
Abstract:
The first step of the program is to input the statistical information concerning the relations of th· database. This information is stored in the log file and the file matrix data structures. Next, the query itself is read and stored in an array called the query matrix. The program examines the various fields of this matrix and decides which relations in the database are necessary to answer the query. For these relations it determines those attributes which should be eliminated and those which should be preserved for further processing. The key attributes are identified and are projected along with the other attributes. After the initial projection is completed the sizes of the new temporary relations are evaluated and stored in the appropriate fields of the file matrix structure. The program then examines that part of the query which contains the various restrictions on the attributes. The values of the attributes are sorted and those values which do not match the restrictions are eliminated from the log file. Again, the sizes of the new relations are estimated according to the method described by Egyhazy et al. [6]. A second projection is performed to eliminate attributes which were required by the selection phase but are not part of the final answer to the query. The remaining relations are those relations which need to be joined to form a relation with the required information. In order to decide upon which relations to join, a special table, the join matrix, is created. This table contains pairs of relations which have common attributes and common values and therefore are joinable. The LP algorithm is used to determine the least expensive join out of all the possible joins. This process is repeated until all of the relations are joined to form a single relation which answers the query. As in the case of projection and selection the size of the temporary relations after each join is estimated. As a last step, we remove the key attributes which helped in joining the files but are not part of the answer to the query.
Master of Engineering
APA, Harvard, Vancouver, ISO, and other styles
16

Guzun, Gheorghi. "Distributed indexing and scalable query processing for interactive big data explorations." Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2087.

Full text
Abstract:
The past few years have brought a major surge in the volumes of collected data. More and more enterprises and research institutions find tremendous value in data analysis and exploration. Big Data analytics is used for improving customer experience, perform complex weather data integration and model prediction, as well as personalized medicine and many other services. Advances in technology, along with high interest in big data, can only increase the demand on data collection and mining in the years to come. As a result, and in order to keep up with the data volumes, data processing has become increasingly distributed. However, most of the distributed processing for large data is done by batch processing and interactive exploration is hardly an option. To efficiently support queries over large amounts of data, appropriate indexing mechanisms must be in place. This dissertation proposes an indexing and query processing framework that can run on top of a distributed computing engine, to support fast, interactive data explorations in data warehouses. Our data processing layer is built around bit-vector based indices. This type of indexing features fast bit-wise operations and scales up well for high dimensional data. Additionally, compression can be applied to reduce the index size, and thus utilize less memory and network communication. Our work can be divided into two areas: index compression and query processing. Two compression schemes are proposed for sparse and dense bit-vectors. The design of these encoding methods is hardware-driven, and the query processing is optimized for the available computing hardware. Query algorithms are proposed for selection, aggregation, and other specialized queries. The query processing is supported on single machines, as well as computer clusters.
APA, Harvard, Vancouver, ISO, and other styles
17

Silva, Asima. "Multiple continuous query processing with relative window predicates "Juggler"." Link to electronic thesis, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0527104-223456/.

Full text
Abstract:
Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: reordering predicates; multi-join operator; sliding windows; window predicates; join algorithm; continuous queries. Includes bibliographical references (p. 101-103).
APA, Harvard, Vancouver, ISO, and other styles
18

Stantic, Bela, and n/a. "Access Methods for Temporal Databases." Griffith University. School of Information and Communication Technology, 2005. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20060906.144815.

Full text
Abstract:
A Temporal database is one that supports some aspect of time distinct from user defined time. Over the last two decades interest in the field of temporal databases has increased significantly, with contributions from many researchers. However, the lack of efficient access methods is perhaps one of the reasons why commercial RDBMS vendors have been reluctant to adopt the advances in temporal database research. Therefore, an obvious research question is: can we develop more robust and more efficient access methods for temporal databases than the existing ones? This thesis attempts to address this question, and the main contributions of this study are summarised as follows: We investigated different representations of 'now' and how the modelling of current time influences the efficiency of accessing 'now relative' temporal data. A new method, called the 'Point' approach, is proposed. Our approach not only elegantly models the current time but also significantly outperforms the existing methods. We proposed a new index structure, called a Virtual Binary tree (VB-tree), based on spatial representation of interval data and a regular triangular decomposition of this space. Further, we described a sound and complete query algorithm. The performance of the algorithm is then evaluated both asymptotically and experimentally with respect to the state-of-the-art in the field. We claim that the VB-tree requires less space and uses fewer disk accesses than the currently best known structure - the RI-tree.
APA, Harvard, Vancouver, ISO, and other styles
19

Stantic, Bela. "Access Methods for Temporal Databases." Thesis, Griffith University, 2005. http://hdl.handle.net/10072/365973.

Full text
Abstract:
A Temporal database is one that supports some aspect of time distinct from user defined time. Over the last two decades interest in the field of temporal databases has increased significantly, with contributions from many researchers. However, the lack of efficient access methods is perhaps one of the reasons why commercial RDBMS vendors have been reluctant to adopt the advances in temporal database research. Therefore, an obvious research question is: can we develop more robust and more efficient access methods for temporal databases than the existing ones? This thesis attempts to address this question, and the main contributions of this study are summarised as follows: We investigated different representations of 'now' and how the modelling of current time influences the efficiency of accessing 'now relative' temporal data. A new method, called the 'Point' approach, is proposed. Our approach not only elegantly models the current time but also significantly outperforms the existing methods. We proposed a new index structure, called a Virtual Binary tree (VB-tree), based on spatial representation of interval data and a regular triangular decomposition of this space. Further, we described a sound and complete query algorithm. The performance of the algorithm is then evaluated both asymptotically and experimentally with respect to the state-of-the-art in the field. We claim that the VB-tree requires less space and uses fewer disk accesses than the currently best known structure - the RI-tree.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
20

Dieng, Cheikh Tidiane. "Etude et implantation de l'extraction de requetes frequentes dans les bases de donnees multidimensionnelles." Thesis, Cergy-Pontoise, 2011. http://www.theses.fr/2011CERG0530.

Full text
Abstract:
Au cours de ces dernières années, le problème de la recherche de requêtes fréquentes dans les bases de données est un problème qui a suscité de nombreuses recherches. En effet, beaucoup de motifs intéressants comme les règles d'association, des dépendances fonctionnelles exactes ou approximatives, des dépendances fonctionnelles conditionnelles exactes ou approximatives peuvent être découverts simplement, contrairement au méthodes classiques qui requièrent plusieurs transformations de la base pour extraire de tels motifs.Cependant, le problème de la recherche de requêtes fréquentes dans les bases de données relationnelles est un problème difficile car, d'une part l'espace de recherche est très grand (puisque égal à l'ensemble de toutes les requêtes pouvant être posées sur une base de données), et d'autre part, savoir si deux requêtes sont équivalentes (donc engendrant les calculs de support redondants) est un problème NP-Complet.Dans cette thèse, nous portons notre attention sur les requêtes de type projection-selection-jointure, et nous supposons que la base de données est définie selon un schéma étoile. Sous ces hypothèses, nous définissons une relation de pré-ordre (≼) entre les requêtes et nous montrons que :1. La mesure de support est anti-monotone par rapport à ≼, et2. En définissant, q ≡ q′ si et seulement si q ≼ q′ et q′ ≼ q, alors toutes les requêtes d'une même classe d'équivalence ont même support.Les principales contributions de cette thèse sont, d'une part d'étudier formellement les propriétés du pré-ordre et de la relation d'équivalence ci-dessus, et d'autre part, de proposer un algorithme par niveau de type Apriori pour rechercher l'ensemble des requêtes fréquentes d'une base de données définie sur un schéma étoile. De plus, cet algorithme a été implémenté et les expérimentations que nous avons réalisées montrent que, selon notre approche, le temps de calcul des requêtes fréquentes dans une base de données définie sur un schéma étoile reste acceptable, y compris dans le cas de grandes tables de faits
The problem of mining frequent queries in a database has motivated many research efforts during the last two decades. This is so because many interesting patterns, such as association rules, exact or approximative functional dependencies and exact or approximative conditional functional dependencies can be easily retrieved, which is not possible using standard techniques.However, the problem mining frequent queries in a relational database is not easy because, on the one hand, the size of the search space is huge (because encompassing all possible queries that can be addressed to a given database), and on the other hand, testing whether two queries are equivalent (which entails redundant support computations) is NP-Complete.In this thesis, we focus on projection-selection-join queries, assuming that the database is defined over a star schema. In this setting, we define a pre-ordering (≼) between queries and we prove the following basic properties:1. The support measure is anti-monotonic with respect to ≼, and2. Defining q ≡ q′ if and only if q ≼ q′ and q′ ≼ q, all equivalent queries have the same support.The main contributions of the thesis are, on the one hand to formally sudy properties of the pre-ordering and the equivalence relation mentioned above, and on the other hand, to prose a levewise, Apriori like algorithm for the computation of all frequent queries in a relational database defined over a star schema. Moreover, this algorithm has been implemented and the reported experiments show that, in our approach, runtime is acceptable, even in the case of large fact tables
APA, Harvard, Vancouver, ISO, and other styles
21

Maniu, Silviu. "Gestion des données dans les réseaux sociaux." Thesis, Paris, ENST, 2012. http://www.theses.fr/2012ENST0053/document.

Full text
Abstract:
Nous abordons dans cette thèse quelques-unes des questions soulevées par I'émergence d'applications sociales sur le Web, en se concentrant sur deux axes importants: l'efficacité de recherche sociale dans les applications Web et l'inférence de liens sociaux signés à partir des interactions entre les utilisateurs dans les applications Web collaboratives. Nous commençons par examiner la recherche sociale dans les applications de "tag- ging". Ce problème nécessite une adaptation importante des techniques existantes, qui n'utilisent pas des informations sociaux. Dans un contexte ou le réseau est importante, on peut (et on devrait) d'exploiter les liens sociaux, ce qui peut indiquer la façon dont les utilisateurs se rapportent au demandeur et combien de poids leurs actions de "tagging" devrait avoir dans le résultat. Nous proposons un algorithme qui a le potentiel d'évoluer avec la taille des applications actuelles, et on le valide par des expériences approfondies. Comme les applications de recherche sociale peut être considérée comme faisant partie d'une catégorie plus large des applications sensibles au contexte, nous étudions le problème de répondre aux requêtes à partir des vues, en se concentrant sur deux sous-problèmes importants. En premier, la manipulation des éventuelles différences de contexte entre les différents points de vue et une requête d'entrée conduit à des résultats avec des score incertains, valables pour le nouveau contexte. En conséquence, les algorithmes top-k actuels ne sont plus directement applicables et doivent être adaptés aux telle incertitudes dans les scores des objets. Deuxièmement, les techniques adaptées de sélection de vue sont nécessaires, qui peuvent s’appuyer sur les descriptions des requêtes et des statistiques sur leurs résultats. Enfin, nous présentons une approche pour déduire un réseau signé (un "réseau de confiance") à partir de contenu généré dans Wikipedia. Nous étudions les mécanismes pour deduire des relations entre les contributeurs Wikipédia - sous forme de liens dirigés signés - en fonction de leurs interactions. Notre étude met en lumière un réseau qui est capturée par l’interaction sociale. Nous examinons si ce réseau entre contributeurs Wikipedia représente en effet une configuration plausible des liens signes, par l’étude de ses propriétés globaux et locaux du reseau, et en évaluant son impact sur le classement des articles de Wikipedia
We address in this thesis some of the issues raised by the emergence of social applications on the Web, focusing on two important directions: efficient social search inonline applications and the inference of signed social links from interactions between users in collaborative Web applications. We start by considering social search in tagging (or bookmarking) applications. This problem requires a significant departure from existing, socially agnostic techniques. In a network-aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their tagging actions should have in the result build-up. We propose an algorithm that has the potential to scale to current applications, and validate it via extensive experiments. As social search applications can be thought of as part of a wider class of context-aware applications, we consider context-aware query optimization based on views, focusing on two important sub-problems. First, handling the possible differences in context between the various views and an input query leads to view results having uncertain scores, i.e., score ranges valid for the new context. As a consequence, current top-k algorithms are no longer directly applicable and need to be adapted to handle such uncertainty in object scores. Second, adapted view selection techniques are needed, which can leverage both the descriptions of queries and statistics over their results. Finally, we present an approach for inferring a signed network (a "web of trust")from user-generated content in Wikipedia. We investigate mechanisms by which relationships between Wikipedia contributors - in the form of signed directed links - can be inferred based their interactions. Our study sheds light into principles underlying a signed network that is captured by social interaction. We investigate whether this network over Wikipedia contributors represents indeed a plausible configuration of link signs, by studying its global and local network properties, and at an application level, by assessing its impact in the classification of Wikipedia articles.javascript:nouvelleZone('abstract');_ajtAbstract('abstract')
APA, Harvard, Vancouver, ISO, and other styles
22

Phan, Duy-Hung. "Algorithmes d'aggrégation pour applications Big Data." Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0043.

Full text
Abstract:
Les bases de données traditionnelles sont confrontées à des problèmes de scalabilité et d'efficacité en raison d’importants volumes de données. Ainsi, les systèmes de gestion de base de données modernes, tels que Apache Hadoop et Spark, peuvent désormais être distribués sur des clusters de milliers de machines: ces systèmes sont donc devenus les principaux outils pour le traitement des données à grande échelle. De nombreuses optimisations ont été développées pour les bases de données conventionnelles, cependant celles-ci ne peuvent être appliquées aux nouvelles architectures et modèles de programmation. Dans ce contexte, cette thèse vise à optimiser une des opérations les plus prédominantes dans le traitement des données : l'agrégation de données pour ces systèmes à grande échelle. Nos principales contributions sont les optimisations logiques et physiques de l'agrégation de grands volumes de données. Ces optimisations sont fortement interconnectées : le problème d'optimisation d'agrégation de données ne pourrait être entièrement résolu si l’une d’entre elles venait à manquer. Par ailleurs, nous avons intégré les optimisations dans le moteur d'optimisation multi-requêtes, ce qui est transparent pour les usagers. Le moteur, les optimisations logiques et physiques proposées dans cette thèse forment une solution complété exécutable et prête à répondre aux requêtes d'agrégation de données à grande échelle. Nos optimisations ont été évaluées de manière théorique et expérimentale. Les résultats d'analyses ont démontré que le passage à l’échelle et l’efficacité de nos algorithmes et techniques surpassent les résultats des études antérieures
Traditional databases are facing problems of scalability and efficiency dealing with a vast amount of big-data. Thus, modern data management systems that scale to thousands of nodes, like Apache Hadoop and Spark, have emerged and become the de-facto platforms to process data at massive scales. In such systems, many data processing optimizations that were well studied in the database domain have now become futile because of the novel architectures and programming models. In this context, this dissertation pledged to optimize one of the most predominant operations in data processing: data aggregation for such systems.Our main contributions were the logical and physical optimizations for large-scale data aggregation, including several algorithms and techniques. These optimizations are so intimately related that without one or the other, the data aggregation optimization problem would not be solved entirely. Moreover, we integrated these optimizations in our multi-query optimization engine, which is totally transparent to users. The engine, the logical and physical optimizations proposed in this dissertation formed a complete package that is runnable and ready to answer data aggregation queries at massive scales. We evaluated our optimizations both theoretically and experimentally. The theoretical analyses showed that our algorithms and techniques are much more scalable and efficient than prior works. The experimental results using a real cluster with synthetic and real datasets confirmed our analyses, showed a significant performance boost and revealed various angles about our works. Last but not least, our works are published as open sources for public usages and studies
APA, Harvard, Vancouver, ISO, and other styles
23

Saltin, Joakim. "Interactive visualization of financial data : Development of a visual data mining tool." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-181225.

Full text
Abstract:
In this project, a prototype visual data mining tool was developed, allowing users to interactively investigate large multi-dimensional datasets visually (using 2D visualization techniques) using so called drill-down, roll-up and slicing operations. The project included all steps of the development, from writing specifications and designing the program to implementing and evaluating it. Using ideas from data warehousing, custom methods for storing pre-computed aggregations of data (commonly referred to as materialized views) and retrieving data from these were developed and implemented in order to achieve higher performance on large datasets. View materialization enables the program to easily fetch or calculate a view using other views, something which can yield significant performance gains if view sizes are much smaller than the underlying raw dataset. The choice of which views to materialize was done in an automated manner using a well-known algorithm - the greedy algorithm for view materialization - which selects the fraction of all possible views that is likely (but not guaranteed) to yield the best performance gain. The use of materialized views was shown to have good potential to increase performance for large datasets, with an average speedup (compared to on-the-fly queries) between 20 and 70 for a test dataset containing 500~000 rows. The end result was a program combining flexibility with good performance, which was also reflected by good scores in a user-acceptance test, with participants from the company where this project was carried out.
APA, Harvard, Vancouver, ISO, and other styles
24

Onder, Ibrahim Seckin. "Execution Of Distributed Database Queries On A Hpc System." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/2/12611524/index.pdf.

Full text
Abstract:
Increasing performance of computers and ability to connect computers with high speed communication networks make distributed databases systems an attractive research area. In this study, we evaluate communication and data processing capabilities of a HPC machine. We calculate accurate cost formulas for high volume data communication between processing nodes and experimentally measure sorting times. A left deep query plan executer has been implemented and experimentally used for executing plans generated by two different genetic algorithms for a distributed database environment using message passing paradigm to prove that a parallel system can provide scalable performance by increasing the number of nodes used for storing database relations and processing nodes. We compare the performance of plans generated by genetic algorithms with optimal plans generated by exhaustive search algorithm. Our results have verified that optimal plans are better than those of genetic algorithms, as expected.
APA, Harvard, Vancouver, ISO, and other styles
25

Verlaine, Lionel. "Optimisation des requêtes dans une machine bases de données." Paris 6, 1986. http://www.theses.fr/1986PA066532.

Full text
Abstract:
CCette thèse propose des solutions optimisant l'évaluation de questions et la jointure. Ces propositions sont étudiées et mises en œuvre à partir du SGBD Sabrina issu du projet SABRE sur matériel Carrousel à la SAGEM. L'évaluation de questions permet d'optimiser le niveau logique du traitement d'une requête. La décomposition la plus pertinente est établie en fonction d'heuristiques simples. L'algorithme de jointure propose utilise des mécanismes minimisant à la fois le nombre d'entrées/sorties disque et le nombre de comparaisons. Il admet un temps d'exécution proportionnel au nombre de tuples. L'ordonnancement de jointures est résolu par un algorithme original de jointure multi-relations et par une méthode d'ordonnancement associée permettant un haut degré de parallélisme.
APA, Harvard, Vancouver, ISO, and other styles
26

Perez-Urbina, Hector M. "Tractable query answering for description logics via query rewriting." Thesis, University of Oxford, 2010. http://ora.ox.ac.uk/objects/uuid:cd62cd80-aa62-467b-87cd-4b9d0cfb2dbd.

Full text
Abstract:
We consider the problem of answering conjunctive queries over description logic knowledge bases via query rewriting. Given a conjunctive query Q and a TBox T, we compute a new query Q′ that incorporates the semantic consequences of T such that, for any ABox A, evaluating Q over T and A can be done by evaluating the new query Q′ over A alone. We present RQR—a novel resolution-based rewriting algorithm for the description logic ELHIO¬ that generalizes and extends existing approaches. RQR not only handles a spectrum of logics ranging from DL-Lite_core up to ELHIO¬, but it is worst-case optimal with respect to data complexity for all of these logics; moreover, given the form of the rewritten queries, their evaluation can be delegated to off-the-shelf (deductive) database systems. We use RQR to derive the novel complexity results that conjunctive query answering for ELHIO¬ and DL-Lite+ are, respectively, PTime and NLogSpace complete with respect to data complexity. In order to show the practicality of our approach, we present the results of an empirical evaluation. Our evaluation suggests that RQR, enhanced with various straightforward optimizations, can be successfully used in conjunction with a (deductive) database system in order to answer queries over knowledge bases in practice. Moreover, in spite of being a more general procedure, RQR will often produce significantly smaller rewritings than the standard query rewriting algorithm for the DL-Lite family of logics.
APA, Harvard, Vancouver, ISO, and other styles
27

Preisinger, Timotheus. "Graph-based algorithms for Pareto preference query evaluation." Norderstedt Books on Demand, 2009. http://d-nb.info/1000465993/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Faisal, Farhan. "Query-by-Pointing: Algorithms and Pointing Error Compensation." Fogler Library, University of Maine, 2003. http://www.library.umaine.edu/theses/pdf/FaisalF2003.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Pieris, Andreas. "Ontological query answering : new languages, algorithms and complexity." Thesis, University of Oxford, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.547515.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Cao, Phuong Thao. "Approximation of OLAP queries on data warehouses." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00905292.

Full text
Abstract:
We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet.
APA, Harvard, Vancouver, ISO, and other styles
31

Wåreus, Linus, and Max Wällstedt. "Comparison and Implementation of Query Containment Algorithms for XPath." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186467.

Full text
Abstract:
This thesis investigates the practical aspects of implementing Query Containment algorithms for the query language XPath. Query Containment is the problem to decide if the results of one query are a subset of the results of another query for any database. Query Containment algorithms can be used for the purpose of optimising the querying process in database systems. Two algorithms have been implemented and compared, The Canonical Model and The Homomorphism Technique. The algorithms have been compared with respect to speed, ease of implementation, accuracy and usability in database systems. Benchmark tests were developed to measure the execution times of the algorithms on a specific set of queries. A simple database system was developed to investigate the performance gain of using the algorithms. It was concluded that The Homomorphism Technique outperforms The Canonical Model in every test case with respect to speed. The Canonical Model is however more accurate than The Homomorphism Technique. Both algorithms were easy to implement, but The Homomorphism Technique was easier. In the database system, there was performance to be gained by using Query Containment algorithms for a certain type of queries, but in most cases there was a performance loss. A database system that utilises Query Containment algorithms for optimisation would for every issued query have to evaluate if such an algorithm should be used.
Denna rapport undersöker de praktiska aspekterna av att implementera Query Containment-algoritmer för queryspråket XPath. Query Containment är problemet att avgöra om resultaten av en query är en delmängd av resultaten av en annan query, oavsett databas. Query Containment-algoritmer kan användas för ändamålet att optimera queryingprocessen i databassystem. Två algoritmer har implementerats och jämförts, The Canonical Model och The Homomorphism Technique. Algoritmerna har jämförts med avseende på hastighet, lätthet att implementera, exakthet och användbarhet i riktiga databassystem. Prestandatester utvecklades för att mäta exekveringstider för algoritmerna på specifikt framtagna queries. Ett enkelt databassystem utvecklades för att undersöka prestandavinsten av att använda algoritmerna. Slutsatsen att The Homomorphism Technique presterar bättre än The Canonical Model i samtliga testfall med avseende på hastighet drogs. The Canonical Model är dock mer exakt än The Homomorphism Technique. Båda algoritmerna var lätta att implementera, men The Homomorphism Technique var lättare. I databassystemet fanns det en prestandavinst i att använda Query Containment-algoritmer för en viss typ av queries, men i de flesta fall var det en prestandaförlust. Ett databassystem som använder Query Containment-algoritmer för optimering bör för varje query avgöra om en sådan algoritm ska användas.
APA, Harvard, Vancouver, ISO, and other styles
32

Eriksson, Simon. "COMPARING NATURAL LANGUAGE PROCESSING TO STRUCTURED QUERY LANGUAGE ALGORITHMS." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163310.

Full text
Abstract:
Using natural language processing to create Structured Query Language (SQL) queries has many benefi€ts in theory. Even though SQL is an expressive and powerful language it requires certain technical knowledge to use. An interface effectively utilizing natural language processing would instead allow the user to communicate with the SQL database as if they were communicating with another human being. In this paper I compare how two of the currently most advanced open source algorithms (TypeSQL and SyntaxSQL) in this €field can understandadvanced SQL. I show that SyntaxSQL is signi€cantly more accurate but makes some sacri€ces in execution time compared to TypeSQL.
APA, Harvard, Vancouver, ISO, and other styles
33

Thomazo, Michaël. "Conjunctive Query Answering Under Existential Rules - Decidability, Complexity, and Algorithms." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00925722.

Full text
Abstract:
L'objectif du problème appelé "Ontology-based data access" (OBDA) est d'améliorer la réponse à des requêtes en prenant en compte des connaissances d'ordre général durant l'évaluation des requêtes. Ces connaissances générales sont représentées à l'aide d'une ontologie, qui est exprimée dans cette thèse grâce à des formules logiques du premier ordre, appelées règles existentielles, et aussi connues sous le nom de "tuple-generating dependencies" et Datalog+/-. L'expressivité des formules utilisées est telle que l'évaluation de requêtes devient un problème indécidable, et cela a conduit la communauté à définir de nombreux cas décidables, c'est-à-dire des restrictions sur les ensembles de règles existentielles considérés. La contribution de cette thèse est double : tout d'abord, nous proposons une vue unifiée sur une grande fraction des cas décidables connus, et fournissons par là même une analyse de complexité et un algorithme optimal dans le pire des cas. Nous considérons également l'approche couramment utilisée de réécriture de requêtes, et proposons un algorithme générique qui permet de surmonter certaines causes évidentes d'explosion combinatoire qui rendent les approches classiques pratiquement inapplicables.
APA, Harvard, Vancouver, ISO, and other styles
34

Preisinger, Timotheus [Verfasser]. "Graph-based algorithms for Pareto preference query evaluation / Timotheus Preisinger." Norderstedt : Books on Demand, 2009. http://d-nb.info/1000465993/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Dai, Xiangyuan. "Spatial queries based on non-spatial constraints." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B38436395.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Dai, Xiangyuan, and 戴祥元. "Spatial queries based on non-spatial constraints." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B38436395.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Liu, Kaiyang. "Efficient structural join processing algorithms /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LIU.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Trias, Mansilla Albert. "Unstructured P2P social search query routing algorithms for agentified social networks." Doctoral thesis, Universitat de Girona, 2013. http://hdl.handle.net/10803/131395.

Full text
Abstract:
The village paradigm presents some benefits compared with the library paradigm; people can adapt the answer content as a function of who is requesting the information, or furthermore, people can perform clarifications of the answer content; at the same time, the explanations in documents remain static and are the same for all of the readers, and no clarifications can be performed. The proliferation of online social networks along with the advances in artificial intelligence allow us to consider village paradigm automation. The contributions of this doctoral dissertation are the following: to analyze the village paradigm, seeking the aspects that might be automated; the Asknext social search protocol, which uses stop messages; to study the effect of social network topologies on the proposed protocol; the algorithm Question Waves, which contributes in improving the relevance of the received answers
El paradigma del poble presenta alguns beneficis enfront al de la biblioteca, com que les persones poden adaptar el contingut de la resposta en funció de qui tenen davant o fins i tot poden fer aclariments sobre el contingut, mentre el contingut dels textos es manté estàtic. La proliferació de les xarxes socials, conjuntament amb els avenços en intel·ligència artificial, permeten considerar l’automatització del paradigma del poble. Les contribucions d’aquesta tesi són: analitzar el paradigma del poble per veure quins aspectes són automatitzables; el protocol de cerca social Asknext, que utilitza missatges d’aturada; estudiar l’efecte de les característiques de les topologies de les xarxes socials en el protocol Asknext; l’algorisme Question Waves, que contribueix en millorar la rellevància de les respostes rebudes en el procés de cerca
APA, Harvard, Vancouver, ISO, and other styles
39

Piwko, Karel. "Nativní XML rozhraní pro relační databázi." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-235541.

Full text
Abstract:
XML has emerged as leading document format for exchanging data. Because of vast amounts of XML documents available and transfered, there is a strong need to store and query information in these documents. However, the most companies are still using a RDBMS for their data warehouses and it is often necessary to combine legacy data with the ones in XML format, so it might be useful to consider storage possibilities for XML documents in a relation database. In this thesis we focused on structured and semi-structured data-based XML documents, because they are the most common when exchanging data and they can be easily validated against an XML schema. We propose a slightly modified Hybrid algorithm to shred doc- uments into relations using an XSD scheme and we allowed redundancy to make queries faster. Our goal was not to provide an academic solution, but fully working system supporting latest standards, which will beat up native XML databases both by performance and vertical scalability.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhang, Wangda, and 张望达. "Evaluating multi-way joins over discounted hitting time." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/196484.

Full text
Abstract:
The prevalence of graphs in emerging applications has recently raised a lot of research interests. To acquire interesting information hidden in large graphs, tasks including link prediction, collaborative recommendation, and reputation ranking, all make use of proximities between graph nodes. The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, has shown to be useful in various applications. In this thesis, we examine a novel query, called the multi-way join (or n-way join), over DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a ranked list of n-tuples with the k highest scores, according to some aggregation function of DHT values. By extracting such top-k results, this query enables the analysis and prediction of various complex relationships among n sets of nodes on a large graph. Since an n-way join is expensive to evaluate, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since the process of PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which saves the trouble of recomputation and allows the results of top-(m+1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. For better performance, we further examine efficient processing algorithms and pruning techniques for 2-way joins. Through extensive experiments on three real graph datasets, we show that the proposed PJ algorithm accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions.
published_or_final_version
Computer Science
Master
Master of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
41

Yang, Jing. "Designing Superior Evolutionary Algorithms via Insights From Black-Box Complexity Theory." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLX054/document.

Full text
Abstract:
Il a été observé que l'exécution des heuristiques de recherche aléatoire dépend d'un ou de plusieurs paramètres. Un certain nombre de résultats montrent un avantage des paramètres dynamiques, c'est-à-dire que les paramètres de l'algorithme sont modifiés au cours de son exécution. Dans ce travail, nous montrons que la complexité de la boîte noire sans biais de la classe de fonction de référence OneMax est $n ln(n) - cn pm o(n)$ pour une constante $c$ comprise entre $0.2539$ et $0.2665$. L'exécution peut être réalisé avec un algorithme simple de type-(1+1) utilisant une puissance de mutation fitness dépendant. Une fois traduite dans le cas du budget fixe, notre algorithme trouve des solutions plus proches de l'optimum de 13% que celles des meilleurs algorithmes connus.Basé sur la puissance de mutation optimale analysée pour OneMaX, nous montrons qu'un choix auto-ajusté du nombre de bits à retourner atteint le même temps d'exécution (excepté $o(n)$ termes inférieurs) et le même (asymptotique) 13% amélioration de la fitness-distance par rapport au RLS. Le mécanisme d'ajustement doit apprendre de manière adaptative la puissance de mutation actuellement optimale des itérations précédentes. Cela vise à la fois à exploiter le fait que des problèmes généralement différents peuvent nécessiter des puissances de mutation différentes et que, pour un problème fixe, différentes puissances peuvent devenir optimales à différentes étapes du processus d'optimisation.Nous étendons ensuite notre stratégie d'auto-ajustement aux algorithmes évolutifs basés sur la population dans des espaces discrets de recherche. Grosso modo, il consiste à créer la moitié de la descendance avec un taux de mutation qui est deux fois plus élevé que le taux de mutation actuel et l'autre moitié avec la moitié du taux actuel. Le taux de mutation est ensuite mis à jour au taux utilisé dans cette sous-population qui contient la meilleure descendance. Nous analysons comment l'algorithme d'évolution $(1+lambda)$ avec ce taux de mutation auto-ajustable optimise la fonction de test OneMax. Nous montrons que cette version dynamique de $(1+lambda)$~EA trouve l'optimum dans un temps d'optimisation attendu (nombre d'évaluations de la fitness) de $O(nlambda/loglambda+nlog n)$. Le temps est asymptotiquement plus petit que le temps d'optimisation de l'EA classique $(1+lambda)$. Des travaux antérieurs montrent que cette performance est la meilleure possible parmi tous les algorithmes de boîtes noires sans biais unaire basés sur des mutations $lambda$-parallèles.Nous proposons et analysons également une version auto-réglage de l'algorithme évolutionnaire $(1,lambda)$ dans lequel le taux de mutation actuel fait partie de l'individu et donc également sujet à mutation. Une analyse d'exécution rigoureuse sur la fonction de référence OneMax révèle qu'un simple schéma de mutation pour le taux conduit à un temps d'optimisation attendu du meilleur $O(nlambda/loglambda+nlog n)$. Notre résultat montre que l'auto-réglage dans le calcul évolutif peut trouver automatiquement des paramètres optimaux complexes. En même temps, cela prouve qu'un schéma d'auto-ajustement relativement compliqué pour le taux de mutation peut être remplacé par notre schéma endogène simple
It has been observed that the runtime of randomized search heuristics depend on one or more parameters. A number of results show an advantage of dynamic parameter settings, that is, the parameters of the algorithm are changed during its execution. In this work, we prove that the unary unbiased black-box complexity of the OneMax benchmark function class is $n ln(n) - cn pm o(n)$ for a constant $c$ which is between $0.2539$ and $0.2665$. This runtime can be achieved with a simple (1+1)-type algorithm using a fitness-dependent mutation strength. When translated into the fixed-budget perspective, our algorithm finds solutions which are roughly 13% closer to the optimum than those of the best previously known algorithms.Based on the analyzed optimal mutation strength for OneMax, we show that a self-adjusting choice of the number of bits to be flipped attains the same runtime (apart from $o(n)$ lower-order terms) and the same (asymptotic) 13% fitness-distance improvement over RLS. The adjusting mechanism is to adaptively learn the currently optimal mutation strength from previous iterations. This aims both at exploiting that generally different problems may need different mutation strengths and that for a fixed problem different strengths may become optimal in different stages of the optimization process.We then extend our self-adjusting strategy to population-based evolutionary algorithms in discrete search spaces. Roughly speaking, it consists of creating half the offspring with a mutation rate that is twice the current mutation rate and the other half with half the current rate. The mutation rate is then updated to the rate used in that subpopulation which contains the best offspring. We analyze how the $(1+lambda)$ evolutionary algorithm with this self-adjusting mutation rate optimizes the OneMax test function. We prove that this dynamic version of the $(1+lambda)$~EA finds the optimum in an expected optimization time (number of fitness evaluations) of $O(nlambda/loglambda+nlog n)$. This time is asymptotically smaller than the optimization time of the classic $(1+lambda)$ EA. Previous work shows that this performance is best-possible among all $lambda$-parallel mutation-based unbiased black-box algorithms.We also propose and analyze a self-adaptive version of the $(1,lambda)$ evolutionary algorithm in which the current mutation rate is part of the individual and thus also subject to mutation. A rigorous runtime analysis on the OneMax benchmark function reveals that a simple local mutation scheme for the rate leads to an expected optimization time of the best possible $O(nlambda/loglambda+nlog n)$. Our result shows that self-adaptation in evolutionary computation can find complex optimal parameter settings on the fly. At the same time, it proves that a relatively complicated self-adjusting scheme for the mutation rate can be replaced by our simple endogenous scheme
APA, Harvard, Vancouver, ISO, and other styles
42

Zavodny, Jakub. "Factorisation in relational databases." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:54c9a3a7-caac-40d9-90fb-83797ced9c5a.

Full text
Abstract:
We study representation systems for relational data based on relational algebra expressions with unions, products, and singleton relations. Algebraic factorisation using the distributivity of product over union allows succinct representation of many-to-many relationships; further succinctness is brought by sharing repeated subexpressions. We show that these techniques are especially applicable to results of conjunctive queries. In the first part of the dissertation we derive tight asymptotic size bounds for two flavours of factorised representations of results of conjunctive queries. Any conjunctive query is characterised by rational parameters that govern the factorisability of its results independently of the database instance. We relate these parameters to fractional edge covers and fractional hypertree decompositions. Factorisation naturally extends from relational data to its provenance. We characterise conjunctive queries by tight bounds on their readability, which captures how many times each input tuple is used to contribute to an output tuple, and we define syntactically the class of queries with bounded readability. In the second part of the dissertation we describe FDB, a relational database engine that uses factorised representations at the physical layer to reduce data redundancy and boost query performance. We develop algorithms for optimisation and evaluation of queries with selection, projection, join, aggregation and order-by clauses on factorised representations. By introducing novel operators for factorisation restructuring and a new optimisation objective to maintain intermediate and final results succinctly factorised, we allow query evaluation with lower time complexity than on flat relations. Experiments show that for data sets with many-to-many relationships, FDB can outperform relational engines by orders of magnitude.
APA, Harvard, Vancouver, ISO, and other styles
43

Waite, Edwin Richard. "Web Based Query Optimization Simulator." CSUSB ScholarWorks, 2004. https://scholarworks.lib.csusb.edu/etd-project/2519.

Full text
Abstract:
The Web Based Query Optimization Simulator (WBQOS) is a software tool designed to enhance understanding of query optimization with a Relational Database Management System (RDBMS). WBQOS allows the user to visualize and participate in query optimization, which enhances the learning process.
APA, Harvard, Vancouver, ISO, and other styles
44

Wang, Wei. "Structural join : processing algorithms and size estimation /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?COMP%202004%20WANG.

Full text
Abstract:
Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 107-120). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
45

Palacios, Villa Jesus Alejandro. "CGU: A common graph utility for DL Reasoning and Conjunctive Query Optimization." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/1101.

Full text
Abstract:
We consider the overlap between reasoning involved in conjunctive query optimization (CQO) and in tableaux-based approaches to reasoning about subsumption in description logics (DLs). In both cases, an underlying graph is created, searched and modified. This process is determined by a given query and database schema in the first case and by a given description and terminology in the second. The opportunities for overlap derive from an abundance of reductions of various schema languages to terminologies for common DL dialects, and from the fact that descriptions can in turn be viewed as queries that compute a single column.

Our main contributions are as follows. We present the design and implementation of a common graph utility that integrates the requirements for both CQO and DL reasoning. We then verify this model by also presenting the design and implementation for two drivers, one that implements a query optimizer for a conjunctive query language extended with descriptions, and one that implements a complete DL reasoner for a feature based DL dialect.
APA, Harvard, Vancouver, ISO, and other styles
46

Yang, Yin. "Join processing in non-conventional databases /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20YANG.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Poppe, Olga. "Event stream analytics." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-dissertations/530.

Full text
Abstract:
Advances in hardware, software and communication networks have enabled applications to generate data at unprecedented volume and velocity. An important type of this data are event streams generated from financial transactions, health sensors, web logs, social media, mobile devices, and vehicles. The world is thus poised for a sea-change in time-critical applications from financial fraud detection to health care analytics empowered by inferring insights from event streams in real time. Event processing systems continuously evaluate massive workloads of Kleene queries to detect and aggregate event trends of interest. Examples of these trends include check kites in financial fraud detection, irregular heartbeat in health care analytics, and vehicle trajectories in traffic control. These trends can be of any length. Worst yet, their number may grow exponentially in the number of events. State-of-the-art systems do not offer practical solutions for trend analytics and thus suffer from long delays and high memory costs. In this dissertation, we propose the following event trend detection and aggregation techniques. First, we solve the trade-off between CPU processing time and memory usage while computing event trends over high-rate event streams. Namely, our event trend detection approach guarantees minimal CPU processing time given limited memory. Second, we compute online event trend aggregation at multiple granularity levels from fine (per matched event), to medium (per event type), to coarse (per pattern). Thus, we minimize the number of aggregates – reducing both time and space complexity compared to the state-of-the-art approaches. Third, we share intermediate aggregates among multiple event sequence queries while avoiding the expensive construction of matched event sequences. In several comprehensive experimental studies, we demonstrate the superiority of the proposed strategies over the state-of-the-art techniques with respect to latency, throughput, and memory costs.
APA, Harvard, Vancouver, ISO, and other styles
48

Calonder, Michael. "Multi-objective clustering of gene expression data with evolutionary algorithms a query gene approach /." Zürich : ETH, Eidgenössische Technische Hochschule Zürich, Institut für Technische Informatik und Kommunikationsnetze, 2006. http://e-collection.ethbib.ethz.ch/show?type=dipl&nr=229.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Rode, Henning. "Methods and cost models for XPath Query Processing in main memory databases." [S.l. : s.n.], 2003. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB11051694.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Ye, Xun. "Design and Implementation of Routing Algorithms for Supporting Multi-dimensional Range Query in HD Tree." Thesis, University of Ottawa (Canada), 2011. http://hdl.handle.net/10393/28826.

Full text
Abstract:
The HD Tree is a novel data structure, which was recently proposed by Yunfeng Gu to support multi-dimensional range queries in a distributed environment. My thesis work is a follow-up to the research on the HD Tree. It focuses on the basic routing strategies in order to support multi-dimensional range queries in the HD Tree. There are two basic routing strategies supported by the HD Tree data structure: hierarchical routing and distributed routing. Hierarchical routing is an adapted version of routing in the tree structure, while distributed routing is a novel routing strategy built over the distributed structure of the HD Tree in order to accommondate the distributed environment. Hierarchical routing can be applied to any source and destination pair in the HD Tree, but results in a massive work load on nodes near the root. It also has very limited options in case of a node failure. Although distributed routing is applied only to the source and destination pair at the same depth of the HD Tree, its communication load is distributed to two sets of nodes at neighboring depths. Distributed routing has more options to survive a node failure. In my thesis work, four distributed routing algorithms were explored. Each of these four is a stand-alone routing algorithm, and can be applied in different routing conditions. Both theoretical analysis and experimental results indicate that any routing algorithm alone is able to achieve similar or better performance compared to the equivalent tree structure. However, the significance of this thesis work pertains not only to the performance issue. These different routing algorithms provide multiple options to accommondate a changing environment. Based on my thesis work, there were two advanced routing algorithms developed, a X2X routing algorithm which is a combination of four distributed routing algorithms, and a DROCR algorithm which is a combination of the hierarchical and distributed routing algorithms. These algorithms not only achieve considerable performance gain over its equivalent tree structure, but also make the routing in the HD Tree highly error resilient and load balancing. They are the fundamental algorithms in supporting multi-dimensional range queries in the HD Tree.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography