Academic literature on the topic 'Query processing and optimisation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Query processing and optimisation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Query processing and optimisation"

1

Diallo, Ousmane, Joel J. P. C. Rodrigues, Mbaye Sene, and Feng Xia. "Real-time query processing optimisation for wireless sensor networks." International Journal of Sensor Networks 18, no. 1/2 (2015): 49. http://dx.doi.org/10.1504/ijsnet.2015.069863.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Akili, Samira, and Matthias Weidlich. "Reasoning on the Efficiency of Distributed Complex Event Processing." Fundamenta Informaticae 179, no. 2 (March 10, 2021): 113–34. http://dx.doi.org/10.3233/fi-2021-2017.

Full text
Abstract:
Complex event processing (CEP) evaluates queries over streams of event data to detect situations of interest. If the event data are produced by geographically distributed sources, CEP may exploit in-network processing that distributes the evaluation of a query among the nodes of a network. To this end, a query is modularized and individual query operators are assigned to nodes, especially those that act as data sources. Existing solutions for such operator placement, however, are limited in that they assume all query results to be gathered at one designated node, commonly referred to as a sink. Hence, existing techniques postulate a hierarchical structure of the network that generates and processes the event data. This largely neglects the optimisation potential that stems from truly decentralised query evaluation with potentially many sinks. To address this gap, in this paper, we propose Multi-Sink Evaluation (MuSE) graphs as a formal computational model to evaluate common CEP queries in a decentralised manner. We further prove the completeness of query evaluation under this model. Striving for distributed CEP that can scale to large volumes of high-frequency event streams, we show how to reason on the network costs induced by distributed query evaluation and prune inefficient query execution plans. As such, our work lays the foundation for distributed CEP that is both, sound and efficient.
APA, Harvard, Vancouver, ISO, and other styles
3

Kumar A, Dinesh, and S. Smys. "A clique-based scheduling in real-time query processing optimisation for cloud-based wireless body area networks." International Journal of Biomedical Engineering and Technology 29, no. 4 (2019): 327. http://dx.doi.org/10.1504/ijbet.2019.10022031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

A, Dinesh Kumar, and S. Smys. "A clique-based scheduling in real-time query processing optimisation for cloud-based wireless body area networks." International Journal of Biomedical Engineering and Technology 29, no. 4 (2019): 327. http://dx.doi.org/10.1504/ijbet.2019.100268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chun, Sejin, Jooik Jung, Seungmin Seo, Wonwoo Ro, and Kyong-Ho Lee. "An adaptive plan-based approach to integrating semantic streams with remote RDF data." Journal of Information Science 43, no. 6 (October 1, 2016): 852–65. http://dx.doi.org/10.1177/0165551516670278.

Full text
Abstract:
To satisfy a user’s complex requirements, Resource Description Framework (RDF) Stream Processing (RSP) systems envision the fusion of remote RDF data with semantic streams, using common data models to query semantic streams continuously. While streaming data are changing at a high rate and are pushed into RSP systems, the remote RDF data are retrieved from different remote sources. With the growth of SPARQL endpoints that provide access to remote RDF data, RSP systems can easily integrate the remote data with streams. Such integration provides new opportunities for mixing static (or quasi-static) data with streams on a large scale. However, the current RSP systems do not offer any optimisation for the integration. In this article, we present an adaptive plan-based approach to efficiently integrate sematic streams with the static data from a remote source. We create a query execution plan based on temporal constraints among constituent services for the timely acquisition of remote data. To predict the change of remote sources in real time, we propose an adaptive process of detecting a source update, forecasting the update in the future, deciding a new plan to obtain remote data and reacting to a new plan. We extend a SPARQL query with operators for describing the multiple strategies of the proposed adaptive process. Experimental results show that our approach is more efficient than the conventional RSP systems in distributed settings.
APA, Harvard, Vancouver, ISO, and other styles
6

Haryanto, Anasthasia Agnes, David Taniar, and Kiki Maulana Adhinugraha. "Group Reverse kNN Query optimisation." Journal of Computational Science 11 (November 2015): 205–21. http://dx.doi.org/10.1016/j.jocs.2015.09.006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Deshpande, Amol, Zachary Ives, and Vijayshankar Raman. "Adaptive Query Processing." Foundations and Trends® in Databases 1, no. 1 (2007): 1–140. http://dx.doi.org/10.1561/1900000001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

WEI, Xiao-Juan. "Skyline Query Processing." Journal of Software 19, no. 6 (October 21, 2008): 1386–400. http://dx.doi.org/10.3724/sp.j.1001.2008.01386.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Tang, Dixin, Zechao Shang, Aaron J. Elmore, Sanjay Krishnan, and Michael J. Franklin. "Intermittent query processing." Proceedings of the VLDB Endowment 12, no. 11 (July 2019): 1427–41. http://dx.doi.org/10.14778/3342263.3342278.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Haritsa, Jayant R. "Robust query processing." Proceedings of the VLDB Endowment 13, no. 12 (August 2020): 3425–28. http://dx.doi.org/10.14778/3415478.3415561.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Query processing and optimisation"

1

Manolescu, Ioana. "Efficient XML query processing." Habilitation à diriger des recherches, Université Paris Sud - Paris XI, 2009. http://tel.archives-ouvertes.fr/tel-00542801.

Full text
Abstract:
Nous présentons des travaux autour de la thématique de l'évaluation efficace de requêtes XML. Une première partie est liée à l'optimisation de l'accès aux données XML dans des bases de données centralisées. La deuxième partie considère des architectures distribuées à grande échelle pour le partage de données XML.
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Hoqani, Noura Y. S. "In-network database query processing for wireless sensor networks." Thesis, Loughborough University, 2018. https://dspace.lboro.ac.uk/2134/36226.

Full text
Abstract:
In the past research, smart sensor devices have become mature enough for large, distributed networks of such sensors to start to be deployed. Such networks can include tens or hundreds of independent nodes that can perform their functions without human interactions such as recharging of batteries, the configuration of network routes and others. Each of the sensors in the wireless sensor network is considered as microsystem, which consists of memory, processor, transducers and low bandwidth as well as a low range radio transceiver. This study investigates an adaptive sampling strategy for WSS aimed at reducing the number of data samples by sensing data only when a significant change in these processes is detected. This detection strategy is based on an extension to Holt's Method and statistical model. To investigate this strategy, the water consumption in a household is used as a case study. A query distribution approach is proposed, which is presented in detail in chapter 5. Our developed wireless sensor query engine is programmed on Sensinode testbed cc2430. The implemented model used on the wireless sensor platform and the architecture of the model is presented in chapters six, seven, and eight. This thesis presents a contribution by designing the experimental simulation setup and by developing the required database interface GUI sensing system, which enables the end user to send the inquiries to the sensor s network whenever needed, the On-Demand Query Sensing system ODQS is enhanced with a probabilistic model for the purpose of sensing only when the system is insufficient to answer the user queries. Moreover, a dynamic aggregation methodology is integrated so as to make the system more adaptive to query message costs. Dynamic on-demand approach for aggregated queries is implemented, based in a wireless sensor network by integrating the dynamic programming technique for the most optimal query decision, the optimality factor in our experiment is the query cost. In-network query processing of wireless sensor networks is discussed in detail in order to develop a more energy efficient approach to query processing. Initially, a survey of the research on existing WSN query processing approaches is presented. Building on this background, novel primary achievements includes an adaptive sampling mechanism and a dynamic query optimiser. These new approaches are extremely helpful when existing statistics are not sufficient to generate an optimal plan. There are two distinct aspects in query processing optimisation; query dynamic adaptive plans, which focus on improving the initial execution of a query, and dynamic adaptive statistics, which provide the best query execution plan to improve subsequent executions of the aggregation of on-demand queries requested by multiple end-users. In-network query processing is attractive to researchers developing user-friendly sensing systems. Since the sensors are a limited resource and battery powered devices, more robust features are recommended to limit the communication access to the sensor nodes in order to maximise the sensor lifetime. For this reason, a new architecture that combines a probability modelling technique with dynamic programming (DP) query processing to optimise the communication cost of queries is proposed. In this thesis, a dynamic technique to enhance the query engine for the interactive sensing system interface is developed. The probability technique is responsible for reducing communication costs for each query executed outside the wireless sensor networks. As remote sensors have limited resources and rely on battery power, control strategies should limit communication access to sensor nodes to maximise battery life. We propose an energy-efficient data acquisition system to extend the battery life of nodes in wireless sensor networks. The system considers a graph-based network structure, evaluates multiple query execution plans, and selects the best plan with the lowest cost obtained from an energy consumption model. Also, a genetic algorithm is used to analyse the performance of the approach. Experimental testing are provided to demonstrate the proposed on-demand sensing system capabilities to successfully predict the query answer injected by the on-demand sensing system end-user based-on a sensor network architecture and input query statement attributes and the query engine ability to determine the best and close to the optimal execution plan, given specific constraints of these query attributes . As a result of the above, the thesis contributes to the state-of-art in a network distributed wireless sensor network query design, implementation, analysis, evaluation, performance and optimisation.
APA, Harvard, Vancouver, ISO, and other styles
3

Belghoul, Abdeslem. "Optimizing Communication Cost in Distributed Query Processing." Thesis, Université Clermont Auvergne‎ (2017-2020), 2017. http://www.theses.fr/2017CLFAC025/document.

Full text
Abstract:
Dans cette thèse, nous étudions le problème d’optimisation du temps de transfert de données dans les systèmes de gestion de données distribuées, en nous focalisant sur la relation entre le temps de communication de données et la configuration du middleware. En réalité, le middleware détermine, entre autres, comment les données sont divisées en lots de F tuples et messages de M octets avant d’être communiqués à travers le réseau. Concrètement, nous nous concentrons sur la question de recherche suivante : étant donnée requête Q et l’environnement réseau, quelle est la meilleure configuration de F et M qui minimisent le temps de communication du résultat de la requête à travers le réseau?A notre connaissance, ce problème n’a jamais été étudié par la communauté de recherche en base de données.Premièrement, nous présentons une étude expérimentale qui met en évidence l’impact de la configuration du middleware sur le temps de transfert de données. Nous explorons deux paramètres du middleware que nous avons empiriquement identifiés comme ayant une influence importante sur le temps de transfert de données: (i) la taille du lot F (c’est-à-dire le nombre de tuples dans un lot qui est communiqué à la fois vers une application consommant des données) et (ii) la taille du message M (c’est-à-dire la taille en octets du tampon du middleware qui correspond à la quantité de données à transférer à partir du middleware vers la couche réseau). Ensuite, nous décrivons un modèle de coût permettant d’estimer le temps de transfert de données. Ce modèle de coût est basé sur la manière dont les données sont transférées entre les noeuds de traitement de données. Notre modèle de coût est basé sur deux observations cruciales: (i) les lots et les messages de données sont communiqués différemment sur le réseau : les lots sont communiqués de façon synchrone et les messages dans un lot sont communiqués en pipeline (asynchrone) et (ii) en raison de la latence réseau, le coût de transfert du premier message d’un lot est plus élevé que le coût de transfert des autres messages du même lot. Nous proposons une stratégie pour calibrer les poids du premier et non premier messages dans un lot. Ces poids sont des paramètres dépendant de l’environnement réseau et sont utilisés par la fonction d’estimation du temps de communication de données. Enfin, nous développons un algorithme d’optimisation permettant de calculer les valeurs des paramètres F et M qui fournissent un bon compromis entre un temps optimisé de communication de données et une consommation minimale de ressources. L’approche proposée dans cette thèse a été validée expérimentalement en utilisant des données issues d’une application en Astronomie
In this thesis, we take a complementary look to the problem of optimizing the time for communicating query results in distributed query processing, by investigating the relationship between the communication time and the middleware configuration. Indeed, the middleware determines, among others, how data is divided into batches and messages before being communicated over the network. Concretely, we focus on the research question: given a query Q and a network environment, what is the best middleware configuration that minimizes the time for transferring the query result over the network? To the best of our knowledge, the database research community does not have well-established strategies for middleware tuning. We present first an intensive experimental study that emphasizes the crucial impact of middleware configuration on the time for communicating query results. We focus on two middleware parameters that we empirically identified as having an important influence on the communication time: (i) the fetch size F (i.e., the number of tuples in a batch that is communicated at once to an application consuming the data) and (ii) the message size M (i.e., the size in bytes of the middleware buffer, which corresponds to the amount of data that can be communicated at once from the middleware to the network layer; a batch of F tuples can be communicated via one or several messages of M bytes). Then, we describe a cost model for estimating the communication time, which is based on how data is communicated between computation nodes. Precisely, our cost model is based on two crucial observations: (i) batches and messages are communicated differently over the network: batches are communicated synchronously, whereas messages in a batch are communicated in pipeline (asynchronously), and (ii) due to network latency, it is more expensive to communicate the first message in a batch compared to any other message that is not the first in its batch. We propose an effective strategy for calibrating the network-dependent parameters of the communication time estimation function i.e, the costs of first message and non first message in their batch. Finally, we develop an optimization algorithm to effectively compute the values of the middleware parameters F and M that minimize the communication time. The proposed algorithm allows to quickly find (in small fraction of a second) the values of the middleware parameters F and M that translate a good trade-off between low resource consumption and low communication time. The proposed approach has been evaluated using a dataset issued from application in Astronomy
APA, Harvard, Vancouver, ISO, and other styles
4

Mesmoudi, Amin. "Declarative parallel query processing on large scale astronomical databases." Thesis, Lyon 1, 2015. http://www.theses.fr/2015LYO10326.

Full text
Abstract:
Les travaux de cette thèse s'inscrivent dans le cadre du projet Petasky. Notre objectif est de proposer des outils permettant de gérer des dizaines de Peta-octets de données issues d'observations astronomiques. Nos travaux se focalisent essentiellement sur la conception des nouveaux systèmes permettant de garantir le passage à l'échelle. Dans cette thèse, nos contributions concernent trois aspects : Benchmarking des systèmes existants, conception d'un nouveau système et optimisation du système. Nous avons commencé par analyser la capacité des systèmes fondés sur le modèle MapReduce et supportant SQL à gérer les données LSST et leurs capacités d'optimisation de certains types de requêtes. Nous avons pu constater qu'il n'y a pas de technique « magique » pour partitionner, stocker et indexer les données mais l'efficacité des techniques dédiées dépend essentiellement du type de requête et de la typologie des données considérées. Suite à notre travail de Benchmarking, nous avons retenu quelques techniques qui doivent être intégrées dans un système de gestion de données à large échelle. Nous avons conçu un nouveau système de façon à garantir la capacité dudit système à supporter plusieurs mécanismes de partitionnement et plusieurs opérateurs d'évaluation. Nous avons utilisé BSP (Bulk Synchronous Parallel) comme modèle de calcul. Les données sont représentées logiquement par des graphes. L'évaluation des requêtes est donc faite en explorant le graphe de données en utilisant les arcs entrants et les arcs sortants. Les premières expérimentations ont montré que notre approche permet une amélioration significative des performances par rapport aux systèmes Map/Reduce
This work is carried out in framework of the PetaSky project. The objective of this project is to provide a set of tools allowing to manage Peta-bytes of data from astronomical observations. Our work is concerned with the design of a scalable approach. We first started by analyzing the ability of MapReduce based systems and supporting SQL to manage the LSST data and ensure optimization capabilities for certain types of queries. We analyzed the impact of data partitioning, indexing and compression on query performance. From our experiments, it follows that there is no “magic” technique to partition, store and index data but the efficiency of dedicated techniques depends mainly on the type of queries and the typology of data that are considered. Based on our work on benchmarking, we identified some techniques to be integrated to large-scale data management systems. We designed a new system allowing to support multiple partitioning mechanisms and several evaluation operators. We used the BSP (Bulk Synchronous Parallel) model as a parallel computation paradigm. Unlike MapeReduce model, we send intermediate results to workers that can continue their processing. Data is logically represented as a graph. The evaluation of queries is performed by exploring the data graph using forward and backward edges. We also offer a semi-automatic partitioning approach, i.e., we provide the system administrator with a set of tools allowing her/him to choose the manner of partitioning data using the schema of the database and domain knowledge. The first experiments show that our approach provides a significant performance improvement with respect to Map/Reduce systems
APA, Harvard, Vancouver, ISO, and other styles
5

Oğuz, Damla. "Méthodes d'optimisation pour le traitement de requêtes réparties à grande échelle sur des données liées." Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30067/document.

Full text
Abstract:
Données Liées est un terme pour définir un ensemble de meilleures pratiques pour la publication et l'interconnexion des données structurées sur le Web. A mesure que le nombre de fournisseurs de Données Liées augmente, le Web devient un vaste espace de données global. La fédération de requêtes est l'une des approches permettant d'interroger efficacement cet espace de données distribué. Il est utilisé via un moteur de requêtes fédéré qui vise à minimiser le temps de réponse du premier tuple du résultat et le temps d'exécution pour obtenir tous les tuples du résultat. Il existe trois principales étapes dans un moteur de requêtes fédéré qui sont la sélection de sources de données, l'optimisation de requêtes et l'exécution de requêtes. La plupart des études sur l'optimisation de requêtes dans ce contexte se concentrent sur l'optimisation de requêtes statique qui génère des plans d'exécution de requêtes avant l'exécution et nécessite des statistiques. Cependant, l'environnement des Données Liées a plusieurs caractéristiques spécifiques telles que les taux d'arrivée de données imprévisibles et les statistiques peu fiables. En conséquence, l'optimisation de requêtes statique peut provoquer des plans d'exécution inefficaces. Ces contraintes montrent que l'optimisation de requêtes adaptative est une nécessité pour le traitement de requêtes fédéré sur les données liées. Dans cette thèse, nous proposons d'abord un opérateur de jointure adaptatif qui vise à minimiser le temps de réponse et le temps d'exécution pour les requêtes fédérées sur les endpoints SPARQL. Deuxièmement, nous étendons la première proposition afin de réduire encore le temps d'exécution. Les deux propositions peuvent changer la méthode de jointure et l'ordre de jointures pendant l'exécution en utilisant une optimisation de requêtes adaptative. Les opérateurs adaptatifs proposés peuvent gérer différents taux d'arrivée des données et le manque de statistiques sur des relations. L'évaluation de performances dans cette thèse montre l'efficacité des opérateurs adaptatifs proposés. Ils offrent des temps d'exécution plus rapides et presque les mêmes temps de réponse, comparé avec une jointure par hachage symétrique. Par rapport à bind join, les opérateurs proposés se comportent beaucoup mieux en ce qui concerne le temps de réponse et peuvent également offrir des temps d'exécution plus rapides. En outre, le deuxième opérateur proposé obtient un temps de réponse considérablement plus rapide que la bind-bloom join et peut également améliorer le temps d'exécution. Comparant les deux propositions, la deuxième offre des temps d'exécution plus rapides que la première dans toutes les conditions. En résumé, les opérateurs de jointure adaptatifs proposés présentent le meilleur compromis entre le temps de réponse et le temps d'exécution. Même si notre objectif principal est de gérer différents taux d'arrivée des données, l'évaluation de performance révèle qu'ils réussissent à la fois avec des taux d'arrivée de données fixes et variés
Linked Data is a term to define a set of best practices for publishing and interlinking structured data on the Web. As the number of data providers of Linked Data increases, the Web becomes a huge global data space. Query federation is one of the approaches for efficiently querying this distributed data space. It is employed via a federated query engine which aims to minimize the response time and the completion time. Response time is the time to generate the first result tuple, whereas completion time refers to the time to provide all result tuples. There are three basic steps in a federated query engine which are data source selection, query optimization, and query execution. This thesis contributes to the subject of query optimization for query federation. Most of the studies focus on static query optimization which generates the query plans before the execution and needs statistics. However, the environment of Linked Data has several difficulties such as unpredictable data arrival rates and unreliable statistics. As a consequence, static query optimization can cause inefficient execution plans. These constraints show that adaptive query optimization should be used for federated query processing on Linked Data. In this thesis, we first propose an adaptive join operator which aims to minimize the response time and the completion time for federated queries over SPARQL endpoints. Second, we extend the first proposal to further reduce the completion time. Both proposals can change the join method and the join order during the execution by using adaptive query optimization. The proposed operators can handle different data arrival rates of relations and the lack of statistics about them. The performance evaluation of this thesis shows the efficiency of the proposed adaptive operators. They provide faster completion times and almost the same response times, compared to symmetric hash join. Compared to bind join, the proposed operators perform substantially better with respect to the response time and can also provide faster completion times. In addition, the second proposed operator provides considerably faster response time than bind-bloom join and can improve the completion time as well. The second proposal also provides faster completion times than the first proposal in all conditions. In conclusion, the proposed adaptive join operators provide the best trade-off between the response time and the completion time. Even though our main objective is to manage different data arrival rates of relations, the performance evaluation reveals that they are successful in both fixed and different data arrival rates
APA, Harvard, Vancouver, ISO, and other styles
6

Gillani, Syed. "Semantically-enabled stream processing and complex event processing over RDF graph streams." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSES055/document.

Full text
Abstract:
Résumé en français non fourni par l'auteur
There is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods
APA, Harvard, Vancouver, ISO, and other styles
7

Alrammal, Muath. "Algorithms for XML stream processing : massive data, external memory and scalable performance." Phd thesis, Université Paris-Est, 2011. http://tel.archives-ouvertes.fr/tel-00779309.

Full text
Abstract:
Plusieurs applications modernes nécessitent un traitement de flux massifs de données XML, cela crée de défis techniques. Parmi ces derniers, il y a la conception et la mise en ouvre d'outils pour optimiser le traitement des requêtes XPath et fournir une estimation précise des coûts de ces requêtes traitées sur un flux massif de données XML. Dans cette thèse, nous proposons un nouveau modèle de prédiction de performance qui estime a priori le coût (en termes d'espace utilisé et de temps écoulé) pour les requêtes structurelles de Forward XPath. Ce faisant, nous réalisons une étude expérimentale pour confirmer la relation linéaire entre le traitement de flux, et les ressources d'accès aux données. Par conséquent, nous présentons un modèle mathématique (fonctions de régression linéaire) pour prévoir le coût d'une requête XPath donnée. En outre, nous présentons une technique nouvelle d'estimation de sélectivité. Elle se compose de deux éléments. Le premier est le résumé path tree: une présentation concise et précise de la structure d'un document XML. Le second est l'algorithme d'estimation de sélectivité: un algorithme efficace de flux pour traverser le synopsis path tree pour estimer les valeurs des paramètres de coût. Ces paramètres sont utilisés par le modèle mathématique pour déterminer le coût d'une requête XPath donnée. Nous comparons les performances de notre modèle avec les approches existantes. De plus, nous présentons un cas d'utilisation d'un système en ligne appelé "online stream-querying system". Le système utilise notre modèle de prédiction de performance pour estimer le coût (en termes de temps / mémoire) d'une requête XPath donnée. En outre, il fournit une réponse précise à l'auteur de la requête. Ce cas d'utilisation illustre les avantages pratiques de gestion de performance avec nos techniques
APA, Harvard, Vancouver, ISO, and other styles
8

Phan, Duy-Hung. "Algorithmes d'aggrégation pour applications Big Data." Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0043.

Full text
Abstract:
Les bases de données traditionnelles sont confrontées à des problèmes de scalabilité et d'efficacité en raison d’importants volumes de données. Ainsi, les systèmes de gestion de base de données modernes, tels que Apache Hadoop et Spark, peuvent désormais être distribués sur des clusters de milliers de machines: ces systèmes sont donc devenus les principaux outils pour le traitement des données à grande échelle. De nombreuses optimisations ont été développées pour les bases de données conventionnelles, cependant celles-ci ne peuvent être appliquées aux nouvelles architectures et modèles de programmation. Dans ce contexte, cette thèse vise à optimiser une des opérations les plus prédominantes dans le traitement des données : l'agrégation de données pour ces systèmes à grande échelle. Nos principales contributions sont les optimisations logiques et physiques de l'agrégation de grands volumes de données. Ces optimisations sont fortement interconnectées : le problème d'optimisation d'agrégation de données ne pourrait être entièrement résolu si l’une d’entre elles venait à manquer. Par ailleurs, nous avons intégré les optimisations dans le moteur d'optimisation multi-requêtes, ce qui est transparent pour les usagers. Le moteur, les optimisations logiques et physiques proposées dans cette thèse forment une solution complété exécutable et prête à répondre aux requêtes d'agrégation de données à grande échelle. Nos optimisations ont été évaluées de manière théorique et expérimentale. Les résultats d'analyses ont démontré que le passage à l’échelle et l’efficacité de nos algorithmes et techniques surpassent les résultats des études antérieures
Traditional databases are facing problems of scalability and efficiency dealing with a vast amount of big-data. Thus, modern data management systems that scale to thousands of nodes, like Apache Hadoop and Spark, have emerged and become the de-facto platforms to process data at massive scales. In such systems, many data processing optimizations that were well studied in the database domain have now become futile because of the novel architectures and programming models. In this context, this dissertation pledged to optimize one of the most predominant operations in data processing: data aggregation for such systems.Our main contributions were the logical and physical optimizations for large-scale data aggregation, including several algorithms and techniques. These optimizations are so intimately related that without one or the other, the data aggregation optimization problem would not be solved entirely. Moreover, we integrated these optimizations in our multi-query optimization engine, which is totally transparent to users. The engine, the logical and physical optimizations proposed in this dissertation formed a complete package that is runnable and ready to answer data aggregation queries at massive scales. We evaluated our optimizations both theoretically and experimentally. The theoretical analyses showed that our algorithms and techniques are much more scalable and efficient than prior works. The experimental results using a real cluster with synthetic and real datasets confirmed our analyses, showed a significant performance boost and revealed various angles about our works. Last but not least, our works are published as open sources for public usages and studies
APA, Harvard, Vancouver, ISO, and other styles
9

Camacho, Rodriguez Jesus. "Efficient techniques for large-scale Web data management." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112229/document.

Full text
Abstract:
Le développement récent des offres commerciales autour du cloud computing a fortement influé sur la recherche et le développement des plateformes de distribution numérique. Les fournisseurs du cloud offrent une infrastructure de distribution extensible qui peut être utilisée pour le stockage et le traitement des données.En parallèle avec le développement des plates-formes de cloud computing, les modèles de programmation qui parallélisent de manière transparente l'exécution des tâches gourmandes en données sur des machines standards ont suscité un intérêt considérable, à commencer par le modèle MapReduce très connu aujourd'hui puis par d'autres frameworks plus récents et complets. Puisque ces modèles sont de plus en plus utilisés pour exprimer les tâches de traitement de données analytiques, la nécessité se fait ressentir dans l'utilisation des langages de haut niveau qui facilitent la charge de l'écriture des requêtes complexes pour ces systèmes.Cette thèse porte sur des modèles et techniques d'optimisation pour le traitement efficace de grandes masses de données du Web sur des infrastructures à grande échelle. Plus particulièrement, nous étudions la performance et le coût d'exploitation des services de cloud computing pour construire des entrepôts de données Web ainsi que la parallélisation et l'optimisation des langages de requêtes conçus sur mesure selon les données déclaratives du Web.Tout d'abord, nous présentons AMADA, une architecture d'entreposage de données Web à grande échelle dans les plateformes commerciales de cloud computing. AMADA opère comme logiciel en tant que service, permettant aux utilisateurs de télécharger, stocker et interroger de grands volumes de données Web. Sachant que les utilisateurs du cloud prennent en charge les coûts monétaires directement liés à leur consommation de ressources, notre objectif n'est pas seulement la minimisation du temps d'exécution des requêtes, mais aussi la minimisation des coûts financiers associés aux traitements de données. Plus précisément, nous étudions l'applicabilité de plusieurs stratégies d'indexation de contenus et nous montrons qu'elles permettent non seulement de réduire le temps d'exécution des requêtes mais aussi, et surtout, de diminuer les coûts monétaires liés à l'exploitation de l'entrepôt basé sur le cloud.Ensuite, nous étudions la parallélisation efficace de l'exécution de requêtes complexes sur des documents XML mis en œuvre au sein de notre système PAXQuery. Nous fournissons de nouveaux algorithmes montrant comment traduire ces requêtes dans des plans exprimés par le modèle de programmation PACT (PArallelization ConTracts). Ces plans sont ensuite optimisés et exécutés en parallèle par le système Stratosphere. Nous démontrons l'efficacité et l'extensibilité de notre approche à travers des expérimentations sur des centaines de Go de données XML.Enfin, nous présentons une nouvelle approche pour l'identification et la réutilisation des sous-expressions communes qui surviennent dans les scripts Pig Latin. Notre algorithme, nommé PigReuse, agit sur les représentations algébriques des scripts Pig Latin, identifie les possibilités de fusion des sous-expressions, sélectionne les meilleurs à exécuter en fonction du coût et fusionne d'autres expressions équivalentes pour partager leurs résultats. Nous apportons plusieurs extensions à l'algorithme afin d’améliorer sa performance. Nos résultats expérimentaux démontrent l'efficacité et la rapidité de nos algorithmes basés sur la réutilisation et des stratégies d'optimisation
The recent development of commercial cloud computing environments has strongly impacted research and development in distributed software platforms. Cloud providers offer a distributed, shared-nothing infrastructure, that may be used for data storage and processing.In parallel with the development of cloud platforms, programming models that seamlessly parallelize the execution of data-intensive tasks over large clusters of commodity machines have received significant attention, starting with the MapReduce model very well known by now, and continuing through other novel and more expressive frameworks. As these models are increasingly used to express analytical-style data processing tasks, the need for higher-level languages that ease the burden of writing complex queries for these systems arises.This thesis investigates the efficient management of Web data on large-scale infrastructures. In particular, we study the performance and cost of exploiting cloud services to build Web data warehouses, and the parallelization and optimization of query languages that are tailored towards querying Web data declaratively.First, we present AMADA, an architecture for warehousing large-scale Web data in commercial cloud platforms. AMADA operates in a Software as a Service (SaaS) approach, allowing users to upload, store, and query large volumes of Web data. Since cloud users support monetary costs directly connected to their consumption of resources, our focus is not only on query performance from an execution time perspective, but also on the monetary costs associated to this processing. In particular, we study the applicability of several content indexing strategies, and show that they lead not only to reducing query evaluation time, but also, importantly, to reducing the monetary costs associated with the exploitation of the cloud-based warehouse.Second, we consider the efficient parallelization of the execution of complex queries over XML documents, implemented within our system PAXQuery. We provide novel algorithms showing how to translate such queries into plans expressed in the PArallelization ConTracts (PACT) programming model. These plans are then optimized and executed in parallel by the Stratosphere system. We demonstrate the efficiency and scalability of our approach through experiments on hundreds of GB of XML data.Finally, we present a novel approach for identifying and reusing common subexpressions occurring in Pig Latin scripts. In particular, we lay the foundation of our reuse-based algorithms by formalizing the semantics of the Pig Latin query language with extended nested relational algebra for bags. Our algorithm, named PigReuse, operates on the algebraic representations of Pig Latin scripts, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and merges other equivalent expressions to share its result. We bring several extensions to the algorithm to improve its performance. Our experiment results demonstrate the efficiency and effectiveness of our reuse-based algorithms and optimization strategies
APA, Harvard, Vancouver, ISO, and other styles
10

Geng, Ke. "XML semantic query optimisation." Thesis, University of Auckland, 2011. http://hdl.handle.net/2292/6815.

Full text
Abstract:
XML Semantic Query Optimisation (XSQO) is a method that optimises execution of queries based on semantic constraints, which are extracted from XML documents. Currently most research into XSQO concentrates on optimisation based on structural constraints in the XML documents. Research, which optimises XML query execution based on semantic constraints, has been limited because of the flexibility of XML. In this thesis, we introduce a method, which optimises XML query execution based on the constraints on the content of XML documents. In our method, elements are analysed and classified based on the distribution of values of sub-elements. Information about the classification is extracted and represented in OWL, which is stored in the database together with the XML document. The user input XML query is evaluated and transformed to a new query, which will execute faster and return exactly the same results, based on the element classification information. There are three kinds of transformation that may be carried out in our method: Elimination, which blocks the non-result queries, Reduction, which simplifies the query conditions by removing redundant conditions, and Introduction, which reduces the search area by introducing a new query condition. Two engines are designed and built for the research. The data analysis engine is designed to analyse the XML documents and classify the specified elements. The query transformation engine evaluates the input XML queries and carries out the query transformation automatically based on the classification information. A case study has been carried out with the data analysis engine and we carried out a series of experiments with the query transformation engine. The results show that: a. XML documents can be analysed and elements can be classified using our method, and the classification results satisfy the requirement of XML query transformation. b. content based XML query transformation can improve XML query execution performance by about 20% to 30%. In this thesis, we also introduce a data generator, which is designed and built to support the research. With this generator, users can build semantic information into the XML dataset with specified structure, size and selectivity. A case study with the generator shows that the generator satisfies the requirements of content-based XSQO research.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Query processing and optimisation"

1

Photothongtham, Sant. Query processing and optimisation on ERT-SQL. Manchester: UMIST, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

A, Al-Ghabra M. Investigation of Databases Query Optimisation. London: University ofEast London, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Deshpande, Amol. Adaptive query processing. Boston: Now, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Catania, Barbara, and Lakhmi C. Jain, eds. Advanced Query Processing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-28323-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Gao, Yunjun, and Xiaoye Miao. Query Processing over Incomplete Databases. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-031-01863-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kim, Won, David S. Reiner, and Don S. Batory, eds. Query Processing in Database Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, 1985. http://dx.doi.org/10.1007/978-3-642-82375-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kim, Won. Query Processing in Database Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cardiff, J. The design of an efficient and extensiblesystem for performing semantic query optimisation. Dublin: Trinity College, Department of Computer Science, 1991.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

1954-, Freytag Johann Christoph, Maier David 1953-, and Vossen Gottfried, eds. Query processing for advanced database systems. San Mateo, Calif: Morgan Kaufmann Publishers, 1994.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Signorile, Danielle Larocca. SAP Query Reporting. Upper Saddle River: Sams Publishing, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Query processing and optimisation"

1

Noon, Nan N., and Janusz R. Getta. "Optimisation of Query Processing with Multilevel Storage." In Intelligent Information and Database Systems, 691–700. Berlin, Heidelberg: Springer Berlin Heidelberg, 2016. http://dx.doi.org/10.1007/978-3-662-49390-8_67.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bertino, Elisa, Barbara Catania, and Elena Ferrari. "Query Processing." In Multimedia Databases in Perspective, 181–217. London: Springer London, 1997. http://dx.doi.org/10.1007/978-1-4471-0957-0_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Pitoura, Evaggelia. "Query Processing." In Encyclopedia of Database Systems, 1–2. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_860-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Pitoura, Evaggelia. "Query Processing." In Encyclopedia of Database Systems, 2288. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_860.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sciore, Edward. "Query Processing." In Database Design and Implementation, 213–38. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-33836-7_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Pitoura, Evaggelia. "Query Processing." In Encyclopedia of Database Systems, 3026–27. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_860.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Liu, Qianhong, and Peter A. Ng. "Query Transformation." In Document Processing and Retrieval, 201–18. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-1295-6_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Böhlen, Michael H. "Temporal Query Processing." In Encyclopedia of Database Systems, 1–4. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_408-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Qing. "Approximate Query Processing." In Encyclopedia of Database Systems, 1–7. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4899-7993-3_534-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Sattler, Kai-Uwe. "Distributed Query Processing." In Encyclopedia of Database Systems, 1–6. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_704-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Query processing and optimisation"

1

Ul Ain Ali, Qurat. "Heterogeneous Model Query Optimisation." In 2021 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). IEEE, 2021. http://dx.doi.org/10.1109/models-c53483.2021.00104.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Harangsri, Banchong, John Shepherd, and Anne Ngu. "Query optimisation in multidatabase systems using query classification." In the 1996 ACM symposium. New York, New York, USA: ACM Press, 1996. http://dx.doi.org/10.1145/331119.331170.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hartmann, Sven, and Sebastian Link. "XML Query Optimisation: Specify your Selectivity." In 18th International Conference on Database and Expert Systems Applications (DEXA 2007). IEEE, 2007. http://dx.doi.org/10.1109/dexa.2007.19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Hartmann, Sven, and Sebastian Link. "XML Query Optimisation: Specify your Selectivity." In 18th International Conference on Database and Expert Systems Applications (DEXA 2007). IEEE, 2007. http://dx.doi.org/10.1109/dexa.2007.4312851.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tsialiamanis, Petros, Lefteris Sidirourgos, Irini Fundulaki, Vassilis Christophides, and Peter Boncz. "Heuristics-based query optimisation for SPARQL." In the 15th International Conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2247596.2247635.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Tamine, L., and M. Boughanem. "Query optimisation using an improved genetic algorithm." In the ninth international conference. New York, New York, USA: ACM Press, 2000. http://dx.doi.org/10.1145/354756.354842.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cornacchia, Roberto, Alex van Ballegooij, and Arjen P. de Vries. "A case study on array query optimisation." In the 1st international workshop. New York, New York, USA: ACM Press, 2004. http://dx.doi.org/10.1145/1039470.1039476.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chawla, Tanvi, Girdhari Singh, and Emmanuel S. Pilli. "A shortest path approach to SPARQL chain query optimisation." In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2017. http://dx.doi.org/10.1109/icacci.2017.8126102.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Jindal, Vandana, and Anil Kumar Verma. "Query Processing." In International Conference on Computer Applications — Database Systems. Singapore: Research Publishing Services, 2010. http://dx.doi.org/10.3850/978-981-08-7300-4_1662.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

de Moor, Oege, Damien Sereni, Pavel Avgustinov, and Mathieu Verbaere. "Type inference for datalog and its application to query optimisation." In the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1376916.1376957.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Query processing and optimisation"

1

Liu, Jane. Monotone Approximate Query Processing. Fort Belvoir, VA: Defense Technical Information Center, September 1992. http://dx.doi.org/10.21236/ada267153.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Armbrust, Michael P. Scale-Independent Relational Query Processing. Fort Belvoir, VA: Defense Technical Information Center, October 2013. http://dx.doi.org/10.21236/ada597352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wu, Kesheng, Ekow Otoo, and Arie Shoshani. Compressed bitmap indices for efficient query processing. Office of Scientific and Technical Information (OSTI), September 2001. http://dx.doi.org/10.2172/808915.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kavraki, Lydia, Jean-Claude Latombe, Rajeew Motwani, and P. Raghavan. Randomized Query Processing in Robot Motion Planning. Fort Belvoir, VA: Defense Technical Information Center, December 1994. http://dx.doi.org/10.21236/ada326821.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rotem, Doron, Kurt Stockinger, and Kesheng Wu. Towards Optimal Multi-Dimensional Query Processing with BitmapIndices. Office of Scientific and Technical Information (OSTI), September 2005. http://dx.doi.org/10.2172/881846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

SAMATOVA, Nagiza Faridovna. In Situ Indexing and Query Processing of AMR Data. Office of Scientific and Technical Information (OSTI), August 2018. http://dx.doi.org/10.2172/1502394.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Perich, Filip, Jeffrey Undercoffer, Lalana Kagal, Anupam Joshi, Timothy Finin, and Yelena Yesha. In Reputation We Believe: Query Processing in Mobile Ad-Hoc Networks. Fort Belvoir, VA: Defense Technical Information Center, January 2005. http://dx.doi.org/10.21236/ada439635.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Yanpei, Sara Alspaugh, and Randy H. Katz. Interactive Query Processing in Big Data Systems: A Cross Industry Study of MapReduce Workloads. Fort Belvoir, VA: Defense Technical Information Center, April 2012. http://dx.doi.org/10.21236/ada561769.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Furey, John, Austin Davis, and Jennifer Seiter-Moser. Natural language indexing for pedoinformatics. Engineer Research and Development Center (U.S.), September 2021. http://dx.doi.org/10.21079/11681/41960.

Full text
Abstract:
The multiple schema for the classification of soils rely on differing criteria but the major soil science systems, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources soil classification systems, are primarily based on inferred pedogenesis. Largely these classifications are compiled from individual observations of soil characteristics within soil profiles, and the vast majority of this pedologic information is contained in nonquantitative text descriptions. We present initial text mining analyses of parsed text in the digitally available USDA soil taxonomy documentation and the Soil Survey Geographic database. Previous research has shown that latent information structure can be extracted from scientific literature using Natural Language Processing techniques, and we show that this latent information can be used to expedite query performance by using syntactic elements and part-of-speech tags as indices. Technical vocabulary often poses a text mining challenge due to the rarity of its diction in the broader context. We introduce an extension to the common English vocabulary that allows for nearly-complete indexing of USDA Soil Series Descriptions.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography