Artykuły w czasopismach: „Cardinality Estimation”

1

Harmouch, Hazar, i Felix Naumann. "Cardinality estimation". Proceedings of the VLDB Endowment 11, nr 4 (grudzień 2017): 499–512. http://dx.doi.org/10.1145/3186728.3164145.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

2

Kwon, Suyong, Woohwan Jung i Kyuseok Shim. "Cardinality estimation of approximate substring queries using deep learning". Proceedings of the VLDB Endowment 15, nr 11 (lipiec 2022): 3145–57. http://dx.doi.org/10.14778/3551793.3551859.

Pełny tekst źródła

Streszczenie:

Cardinality estimation of an approximate substring query is an important problem in database systems. Traditional approaches build a summary from the text data and estimate the cardinality using the summary with some statistical assumptions. Since deep learning models can learn underlying complex data patterns effectively, they have been successfully applied and shown to outperform traditional methods for cardinality estimations of queries in database systems. However, since they are not yet applied to approximate substring queries, we investigate a deep learning approach for cardinality estimation of such queries. Although the accuracy of deep learning models tends to improve as the train data size increases, producing a large train data is computationally expensive for cardinality estimation of approximate substring queries. Thus, we develop efficient train data generation algorithms by avoiding unnecessary computations and sharing common computations. We also propose a deep learning model as well as a novel learning method to quickly obtain an accurate deep learning-based estimator. Extensive experiments confirm the superiority of our data generation algorithms and deep learning model with the novel learning method.

Style APA, Harvard, Vancouver, ISO itp.

3

Liu, Jie, Wenqian Dong, Qingqing Zhou i Dong Li. "Fauce". Proceedings of the VLDB Endowment 14, nr 11 (lipiec 2021): 1950–63. http://dx.doi.org/10.14778/3476249.3476254.

Pełny tekst źródła

Streszczenie:

Cardinality estimation is a fundamental and critical problem in databases. Recently, many estimators based on deep learning have been proposed to solve this problem and they have achieved promising results. However, these estimators struggle to provide accurate results for complex queries, due to not capturing real inter-column and inter-table correlations. Furthermore, none of these estimators contain the uncertainty information about their estimations. In this paper, we present a join cardinality estimator called Fauce. Fauce learns the correlations across all columns and all tables in the database. It also contains the uncertainty information of each estimation. Among all studied learned estimators, our results are promising: (1) Fauce is a light-weight estimator, it has 10× faster inference speed than the state of the art estimator; (2) Fauce is robust to the complex queries, it provides 1.3×--6.7× smaller estimation errors for complex queries compared with the state of the art estimator; (3) To the best of our knowledge, Fauce is the first estimator that incorporates uncertainty information for cardinality estimation into a deep learning model.

Style APA, Harvard, Vancouver, ISO itp.

4

Sun, Ji, Jintao Zhang, Zhaoyan Sun, Guoliang Li i Nan Tang. "Learned cardinality estimation". Proceedings of the VLDB Endowment 15, nr 1 (wrzesień 2021): 85–97. http://dx.doi.org/10.14778/3485450.3485459.

Pełny tekst źródła

Streszczenie:

Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial and open-source DBMSs. Nevertheless, histograms and samplings can only be used to summarize one or few columns, which fall short of capturing the joint data distribution over an arbitrary combination of columns, because of the oversimplification of histograms and samplings over the original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries over multiple columns, with multiple predicates, and joins between multiple tables. Recently, learned cardinality estimators have been widely studied. Because these learned estimators can better capture the data distribution and query characteristics, empowered by the recent advance of (deep learning) models, they outperform non-learned methods on many cases. The goals of this paper are to provide a design space exploration of learned cardinality estimators and to have a comprehensive comparison of the SOTA learned approaches so as to provide a guidance for practitioners to decide what method to use under various practical scenarios.

Style APA, Harvard, Vancouver, ISO itp.

5

Han, Yuxing, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng i in. "Cardinality estimation in DBMS". Proceedings of the VLDB Endowment 15, nr 4 (grudzień 2021): 752–65. http://dx.doi.org/10.14778/3503585.3503586.

Pełny tekst źródła

Streszczenie:

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source DBMS PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability. We obtain a number of key findings under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric (Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the generated query plan quality. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. It could serve as a better optimization objective for future CardEst methods.

Style APA, Harvard, Vancouver, ISO itp.

6

Yang, Zongheng, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan i Ion Stoica. "Deep unsupervised cardinality estimation". Proceedings of the VLDB Endowment 13, nr 3 (listopad 2019): 279–92. http://dx.doi.org/10.14778/3368289.3368294.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

7

Chen, Jeremy, Yuqing Huang, Mushi Wang, Semih Salihoglu i Ken Salem. "Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs". Proceedings of the VLDB Endowment 15, nr 8 (kwiecień 2022): 1533–45. http://dx.doi.org/10.14778/3529337.3529339.

Pełny tekst źródła

Streszczenie:

This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We begin by analyzing how optimistic estimators use pre-computed statistics to generate cardinality estimates. We show these estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains sub-queries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. While on acyclic queries and queries with small-size cycles, using the maximum-weight path is effective to address the well known underestimation problem, on queries with larger cycles these estimates tend to overestimate, which can be addressed by using minimum weight paths. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to adopt an optimization from pessimistic estimators to optimistic ones, and provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

Style APA, Harvard, Vancouver, ISO itp.

8

Chen, Jeremy, Yuqing Huang, Mushi Wang, Semih Salihoglu i Kenneth Salem. "Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs". ACM SIGMOD Record 52, nr 1 (7.06.2023): 94–102. http://dx.doi.org/10.1145/3604437.3604458.

Pełny tekst źródła

Streszczenie:

We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins: (i) optimistic estimators, which were defined in the context of graph database management systems, that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We show that optimistic estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains subqueries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

Style APA, Harvard, Vancouver, ISO itp.

9

Jie, Xu, Lan Haoliang, Ding Wei i Ju Ao. "Network Host Cardinality Estimation Based on Artificial Neural Network". Security and Communication Networks 2022 (24.03.2022): 1–14. http://dx.doi.org/10.1155/2022/1258482.

Pełny tekst źródła

Streszczenie:

Cardinality estimation plays an important role in network security. It is widely used in host cardinality calculation of high-speed network. However, the cardinality estimation algorithm itself is easy to be disturbed by random factors and produces estimation errors. How to eliminate the influence of these random factors is the key to further improving the accuracy of estimation. To solve the above problems, this paper proposes an algorithm that uses artificial neural network to predict the estimation bias and adjust the cardinality estimation value according to the prediction results. Based on the existing algorithms, the novel algorithm reduces the interference of random factors on the estimation results and improves the accuracy by adding the steps of cardinality estimation sampling, artificial neural network training, and error prediction. The experimental results show that, using the cardinality estimation algorithm proposed in this paper, the average absolute deviation of cardinality estimation can be reduced by more than 20%.

Style APA, Harvard, Vancouver, ISO itp.

10

Gao, Jintao, Zhanhuai Li i Wenjie Liu. "A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result". Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 36, nr 4 (sierpień 2018): 768–77. http://dx.doi.org/10.1051/jnwpu/20183640768.

Pełny tekst źródła

Streszczenie:

Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.

Style APA, Harvard, Vancouver, ISO itp.

11

Grigorev, Y. A., i O. Yu Pluzhnikova. "ESTIMATION OF ATTRIBUTE VALUES IN JOIN TABLES WHILE OPTIMIZING RELATION-AL DATABASE QUERY". Informatika i sistemy upravleniya, nr 1 (2021): 3–18. http://dx.doi.org/10.22250/isu.2021.67.3-18.

Pełny tekst źródła

Streszczenie:

The article analyzes the problem of estimating join tables cardinality in the process of calculating the cost of relational database query plan. A new algorithm for estimating the distinct values of attributes is proposed. The algorithm allows reducing inaccuracy in cardinality estimation. The consistency of proposed algorithm is proved.

Style APA, Harvard, Vancouver, ISO itp.

12

Sakr, Sherif. "Algebra‐based XQuery cardinality estimation". International Journal of Web Information Systems 4, nr 1 (4.04.2008): 6–47. http://dx.doi.org/10.1108/17440080810865611.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

13

Rusu, Florin, Zixuan Zhuang, Mingxi Wu i Chris Jermaine. "Workload-Driven Antijoin Cardinality Estimation". ACM Transactions on Database Systems 40, nr 3 (23.10.2015): 1–41. http://dx.doi.org/10.1145/2818178.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

14

Cohen, Reuven, Liran Katzir i Aviv Yehezkel. "Cardinality Estimation Meets Good-Turing". Big Data Research 9 (wrzesień 2017): 1–8. http://dx.doi.org/10.1016/j.bdr.2017.04.002.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

15

Suciu, Dan. "Technical Perspective: Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs". ACM SIGMOD Record 52, nr 1 (7.06.2023): 93. http://dx.doi.org/10.1145/3604437.3604457.

Pełny tekst źródła

Streszczenie:

Query engines are really good at choosing an efficient query plan. Users don't need to worry about how they write their query, since the optimizer makes all the right choices for executing the query, while taking into account all aspects of data, such as its size, the characteristics of the storage device, the distribution pattern, the availability of indexes, and so on. The query optimizer always makes the best choice, no matter how complex the query is, or how contrived it was written. Or, this is what we expect today from a modern query optimizer. Unfortunately, reality is not as nice.

Style APA, Harvard, Vancouver, ISO itp.

16

Potoniec, Jedrzej. "Mining Cardinality Restrictions in OWL". Foundations of Computing and Decision Sciences 45, nr 3 (1.09.2020): 195–216. http://dx.doi.org/10.2478/fcds-2020-0011.

Pełny tekst źródła

Streszczenie:

AbstractWe present an approach to mine cardinality restriction axioms from an existing knowledge graph, in order to extend an ontology describing the graph. We compare frequency estimation with kernel density estimation as approaches to obtain the cardinalities in restrictions. We also propose numerous strategies for filtering obtained axioms in order to make them more available for the ontology engineer. We report the results of experimental evaluation on DBpedia 2016-10 and show that using kernel density estimation to compute the cardinalities in cardinality restrictions yields more robust results that using frequency estimation. We also show that while filtering is of limited usability for minimum cardinality restrictions, it is much more important for maximum cardinality restrictions. The presented findings can be used to extend existing ontology engineering tools in order to support ontology construction and enable more efficient creation of knowledge-intensive artificial intelligence systems.

Style APA, Harvard, Vancouver, ISO itp.

17

Qi, Kaiyang, Jiong Yu i Zhenzhen He. "A Cardinality Estimator in Complex Database Systems Based on TreeLSTM". Sensors 23, nr 17 (23.08.2023): 7364. http://dx.doi.org/10.3390/s23177364.

Pełny tekst źródła

Streszczenie:

Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accurately capture the correlation between multiple tables. Several recent studies have revealed that learning-based cardinality estimation methods can address the shortcomings of traditional methods and provide more accurate estimates. However, the learning-based cardinality estimation methods still have large errors when an SQL query involves multiple tables or is very complex. To address this problem, we propose a sampling-based tree long short-term memory (TreeLSTM) neural network to model queries. The proposed model addresses the weakness of traditional methods when no sampled tuples match the predicates and considers the join relationship between multiple tables and the conjunction and disjunction operations between predicates. We construct subexpressions as trees using operator types between predicates and improve the performance and accuracy of cardinality estimation by capturing the join-crossing correlations between tables and the order dependencies between predicates. In addition, we construct a new loss function to overcome the drawback that Q-error cannot distinguish between large and small cardinalities. Extensive experimental results from real-world datasets show that our proposed model improves the estimation quality and outperforms traditional cardinality estimation methods and the other compared deep learning methods in three evaluation metrics: Q-error, MAE, and SMAPE.

Style APA, Harvard, Vancouver, ISO itp.

18

Varagnolo, Damiano, Gianluigi Pillonetto i Luca Schenato. "Distributed Cardinality Estimation in Anonymous Networks". IEEE Transactions on Automatic Control 59, nr 3 (marzec 2014): 645–59. http://dx.doi.org/10.1109/tac.2013.2287113.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

19

Bozkus, Cem, i Basilio B. Fraguela. "Accelerating the HyperLogLog Cardinality Estimation Algorithm". Scientific Programming 2017 (2017): 1–8. http://dx.doi.org/10.1155/2017/2040865.

Pełny tekst źródła

Streszczenie:

In recent years, vast amounts of data of different kinds, from pictures and videos from our cameras to software logs from sensor networks and Internet routers operating day and night, are being generated. This has led to new big data problems, which require new algorithms to handle these large volumes of data and as a result are very computationally demanding because of the volumes to process. In this paper, we parallelize one of these new algorithms, namely, the HyperLogLog algorithm, which estimates the number of different items in a large data set with minimal memory usage, as it lowers the typical memory usage of this type of calculation from O(n) to O(1). We have implemented parallelizations based on OpenMP and OpenCL and evaluated them in a standard multicore system, an Intel Xeon Phi, and two GPUs from different vendors. The results obtained in our experiments, in which we reach a speedup of 88.6 with respect to an optimized sequential implementation, are very positive, particularly taking into account the need to run this kind of algorithm on large amounts of data.

Style APA, Harvard, Vancouver, ISO itp.

20

Ré, Christopher, i D. Suciu. "Understanding cardinality estimation using entropy maximization". ACM Transactions on Database Systems 37, nr 1 (luty 2012): 1–31. http://dx.doi.org/10.1145/2109196.2109202.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

21

Adam, H., E. Yanmaz i C. Bettstetter. "Contention-Based Estimation of Neighbor Cardinality". IEEE Transactions on Mobile Computing 12, nr 3 (marzec 2013): 542–55. http://dx.doi.org/10.1109/tmc.2012.19.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

22

Hirata, Kohei, Daichi Amagata i Takahiro Hara. "Cardinality Estimation in Inner Product Space". IEEE Open Journal of the Computer Society 3 (2022): 208–16. http://dx.doi.org/10.1109/ojcs.2022.3215206.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

23

Wang, Jiayi, Chengliang Chai, Jiabin Liu i Guoliang Li. "FACE". Proceedings of the VLDB Endowment 15, nr 1 (wrzesień 2021): 72–84. http://dx.doi.org/10.14778/3485450.3485458.

Pełny tekst źródła

Streszczenie:

Cardinality estimation is one of the most important problems in query optimization. Recently, machine learning based techniques have been proposed to effectively estimate cardinality, which can be broadly classified into query-driven and data-driven approaches. Query-driven approaches learn a regression model from a query to its cardinality; while data-driven approaches learn a distribution of tuples, select some samples that satisfy a SQL query, and use the data distributions of these selected tuples to estimate the cardinality of the SQL query. As query-driven methods rely on training queries, the estimation quality is not reliable when there are no high-quality training queries; while data-driven methods have no such limitation and have high adaptivity. In this work, we focus on data-driven methods. A good data-driven model should achieve three optimization goals. First, the model needs to capture data dependencies between columns and support large domain sizes (achieving high accuracy). Second, the model should achieve high inference efficiency, because many data samples are needed to estimate the cardinality (achieving low inference latency). Third, the model should not be too large (achieving a small model size). However, existing data-driven methods cannot simultaneously optimize the three goals. To address the limitations, we propose a novel cardinality estimator FACE, which leverages the Normalizing Flow based model to learn a continuous joint distribution for relational data. FACE can transform a complex distribution over continuous random variables into a simple distribution (e.g., multivariate normal distribution), and use the probability density to estimate the cardinality. First, we design a dequantization method to make data more "continuous". Second, we propose encoding and indexing techniques to handle Like predicates for string data. Third, we propose a Monte Carlo method to efficiently estimate the cardinality. Experimental results show that our method significantly outperforms existing approaches in terms of estimation accuracy while keeping similar latency and model size.

Style APA, Harvard, Vancouver, ISO itp.

24

Woltmann, Lucas, Dominik Olwig, Claudio Hartmann, Dirk Habich i Wolfgang Lehner. "PostCENN". Proceedings of the VLDB Endowment 14, nr 12 (lipiec 2021): 2715–18. http://dx.doi.org/10.14778/3476311.3476327.

Pełny tekst źródła

Streszczenie:

In this demo, we present PostCENN , an enhanced PostgreSQL database system with an end-to-end integration of machine learning (ML) models for cardinality estimation. In general, cardinality estimation is a topic with a long history in the database community. While traditional models like histograms are extensively used, recent works mainly focus on developing new approaches using ML models. However, traditional as well as ML models have their own advantages and disadvantages. With PostCENN , we aim to combine both to maximize their potentials for cardinality estimation by introducing ML models as a novel means to increase the accuracy of the cardinality estimation for certain parts of the database schema. To achieve this, we integrate ML models as first class citizen in PostgreSQL with a well-defined end-to-end life cycle. This life cycle consists of creating ML models for different sub-parts of the database schema, triggering the training, using ML models within the query optimizer in a transparent way, and deleting ML models.

Style APA, Harvard, Vancouver, ISO itp.

25

Woltmann, Lucas, Claudio Hartmann, Dirk Habich i Wolfgang Lehner. "Aggregate-based Training Phase for ML-based Cardinality Estimation". Datenbank-Spektrum 22, nr 1 (10.01.2022): 45–57. http://dx.doi.org/10.1007/s13222-021-00400-z.

Pełny tekst źródła

Streszczenie:

AbstractCardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our aggregate-based training phase and thus outperform indexes.

Style APA, Harvard, Vancouver, ISO itp.

26

Lee, Kukjin, Anshuman Dutt, Vivek Narasayya i Surajit Chaudhuri. "Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server". Proceedings of the VLDB Endowment 16, nr 11 (lipiec 2023): 2871–83. http://dx.doi.org/10.14778/3611479.3611494.

Pełny tekst źródła

Streszczenie:

Cardinality estimation is widely believed to be one of the most important causes of poor query plans. Prior studies evaluate the impact of cardinality estimation on plan quality on a set of Select-Project-Join queries on PostgreSQL DBMS. Our empirical study broadens the scope of prior studies in significant ways. First, we include complex SQL queries containing group-by, aggregation, outer joins and sub-queries from real-world workloads and industry benchmarks. We evaluate on both row-oriented and column-oriented physical designs. Our empirical study uses Microsoft SQL Server, an industry-strength DBMS with a state-of-the-art query optimizer that is equipped with techniques to optimize such complex queries. Second, we analyze the sensitivity of plan quality to cardinality errors in two ways by: (a) varying the subset of query sub-expressions for which accurate cardinalities are used, and (b) introducing progressively larger cardinality errors. Third, query processing techniques such as bitmap filtering and adaptive join have the potential to mitigate the impact of cardinality estimation errors by reducing the latency of bad plans. We evaluate the importance of accurate cardinalities in the presence of these techniques.

Style APA, Harvard, Vancouver, ISO itp.

27

Wang, Xiaoying, Changbo Qu, Weiyuan Wu, Jiannan Wang i Qingqing Zhou. "Are we ready for learned cardinality estimation?" Proceedings of the VLDB Endowment 14, nr 9 (maj 2021): 1640–54. http://dx.doi.org/10.14778/3461535.3461552.

Pełny tekst źródła

Streszczenie:

Cardinality estimation is a fundamental but long unresolved problem in query optimization. Recently, multiple papers from different research groups consistently report that learned models have the potential to replace existing cardinality estimators. In this paper, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? Our study consists of three main parts. Firstly, we focus on the static environment (i.e., no data updates) and compare five new learned methods with nine traditional methods on four real-world datasets under a unified workload setting. The results show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs. Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates). We find that they cannot catch up with fast data updates and return large errors for different reasons. For less frequent updates, they can perform better but there is no clear winner among themselves. Thirdly, we take a deeper look into learned models and explore when they may go wrong. Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size. More importantly, their behaviors are much harder to interpret and often unpredictable. Based on these findings, we identify two promising research directions (control the cost of learned models and make learned models trustworthy) and suggest a number of research opportunities. We hope that our study can guide researchers and practitioners to work together to eventually push learned cardinality estimators into real database systems.

Style APA, Harvard, Vancouver, ISO itp.

28

Qian, Chen, Hoilun Ngan, Yunhao Liu i Lionel M. Ni. "Cardinality Estimation for Large-Scale RFID Systems". IEEE Transactions on Parallel and Distributed Systems 22, nr 9 (wrzesień 2011): 1441–54. http://dx.doi.org/10.1109/tpds.2011.36.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

29

Järvelin, Kalervo. "Cardinality estimation in numeric on-line databases". Information Processing & Management 22, nr 6 (styczeń 1986): 523–48. http://dx.doi.org/10.1016/0306-4573(86)90103-2.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

30

Bruschi, Valerio, Pedro Reviriego, Salvatore Pontarelli, Daniel Ting i Giuseppe Bianchi. "More Accurate Streaming Cardinality Estimation With Vectorized Counters". IEEE Networking Letters 3, nr 2 (czerwiec 2021): 75–79. http://dx.doi.org/10.1109/lnet.2021.3076048.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

31

Dao, DinhNguyen, DaeHun Nyang i KyungHee Lee. "Effect of Sampling for Multi-set Cardinality Estimation". KIPS Transactions on Computer and Communication Systems 4, nr 1 (31.01.2015): 15–22. http://dx.doi.org/10.3745/ktccs.2015.4.1.15.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

32

Shah-Mansouri, Vahid, i Vincent W. S. Wong. "Cardinality Estimation in RFID Systems with Multiple Readers". IEEE Transactions on Wireless Communications 10, nr 5 (maj 2011): 1458–69. http://dx.doi.org/10.1109/twc.2011.030411.100390.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

33

Yoon, Myungkeun, i Young Jae Kim. "Address Block Counting Using Two-Tier Cardinality Estimation". IEEE Access 7 (2019): 125754–61. http://dx.doi.org/10.1109/access.2019.2938977.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

34

Luo, Cheng, Zhewei Jiang, Wen-Chi Hou, Shan He i Qiang Zhu. "A sampling approach for skyline query cardinality estimation". Knowledge and Information Systems 32, nr 2 (16.09.2011): 281–301. http://dx.doi.org/10.1007/s10115-011-0441-1.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

35

Negi, Parimarjan, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska i Mohammad Alizadeh. "Robust Query Driven Cardinality Estimation under Changing Workloads". Proceedings of the VLDB Endowment 16, nr 6 (luty 2023): 1520–33. http://dx.doi.org/10.14778/3583140.3583164.

Pełny tekst źródła

Streszczenie:

Query driven cardinality estimation models learn from a historical log of queries. They are lightweight, having low storage requirements, fast inference and training, and are easily adaptable for any kind of query. Unfortunately, such models can suffer unpredictably bad performance under workload drift, i.e., if the query pattern or data changes. This makes them unreliable and hard to deploy. We analyze the reasons why models become unpredictable due to workload drift, and introduce modifications to the query representation and neural network training techniques to make query-driven models robust to the effects of workload drift. First, we emulate workload drift in queries involving some unseen tables or columns by randomly masking out some table or column features during training. This forces the model to make predictions with missing query information, relying more on robust features based on up-to-date DBMS statistics that are useful even when query or data drift happens. Second, we introduce join bitmaps, which extends sampling-based features to be consistent across joins using ideas from sideways information passing. Finally, we show how both of these ideas can be adapted to handle data updates. We show significantly greater generalization than past works across different workloads and databases. For instance, a model trained with our techniques on a simple workload (JOBLight-train), with 40 k synthetically generated queries of at most 3 tables each, is able to generalize to the much more complex Join Order Benchmark, which include queries with up to 16 tables, and improve query runtimes by 2× over PostgreSQL. We show similar robustness results with data updates, and across other workloads. We discuss the situations where we expect, and see, improvements, as well as more challenging workload drift scenarios where these techniques do not improve much over PostgreSQL. However, even in the most challenging scenarios, our models never perform worse than PostgreSQL, while standard query driven models can get much worse than PostgreSQL.

Style APA, Harvard, Vancouver, ISO itp.

36

Guo, Wenhua, Kaixuan Ye, Yiyan Qi, Peng Jia i Pinghui Wang. "Generalized Sketches for Streaming Sets". Applied Sciences 12, nr 15 (22.07.2022): 7362. http://dx.doi.org/10.3390/app12157362.

Pełny tekst źródła

Streszczenie:

Many real-world datasets are given as a stream of user–interest pairs, where a user–interest pair represents a link from a user (e.g., a network host) to an interest (e.g., a website), and may appear more than once in the stream. Monitoring and mining statistics, including cardinality, intersection cardinality, and Jaccard similarity of users’ interest sets on high-speed streams, are widely employed by applications such as network anomaly detection. Although estimating set cardinality, set intersection cardinality, and set Jaccard similarity, respectively, is well studied, there is no effective method that provides a one-shot solution for estimating all these three statistics. To solve the above challenge, we develop a novel framework, SimCar. SimCar online builds an order-hashing (OH) sketch for each user occurring in the data stream of interest. At any time of interest, one can query the cardinalities, intersection cardinalities, and Jaccard similarities of users’ interest sets. Specially, using OH sketches, we develop maximum likelihood estimation (MLE) methods to estimate cardinalities and intersection cardinalities of users’ interest sets. In addition, we use OH sketches to estimate Jaccard similarities of users’ interest sets and build locality-sensitive hashing tables to search for users with similar interests with sub-linear time. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate the superiority of our methods.

Style APA, Harvard, Vancouver, ISO itp.

37

Si, Weijian, Hongfan Zhu i Zhiyu Qu. "A Novel Structure for a Multi-Bernoulli Filter without a Cardinality Bias". Electronics 8, nr 12 (5.12.2019): 1484. http://dx.doi.org/10.3390/electronics8121484.

Pełny tekst źródła

Streszczenie:

The original multi-target multi-Bernoulli (MeMBer) filter for multi-target tracking (MTT) is shown analytically to have a significant bias in its cardinality estimation. A novel cardinality balance multi-Bernoulli (CBMeMBer) filter reduces the cardinality bias by calculating the exact cardinality of the posterior probability generating functional (PGFl) without the second assumption of the original MeMBer filter. However, the CBMeMBer filter can only have a good performance under a high detection probability, and retains the first assumption of the MeMBer filter, which requires measurements that are well separated in the surveillance region. An improved MeMBer filter proposed by Baser et al. alleviates the cardinality bias by modifying the legacy tracks. Although the cardinality is balanced, the improved algorithm employs a low clutter density approximation. In this paper, we propose a novel structure for a multi-Bernoulli filter without a cardinality bias, termed as a novel multi-Bernoulli (N-MB) filter. We remove the approximations employed in the original MeMBer filter, and consequently, the N-MB filter performs well in a high clutter intensity and low signal-to-noise environment. Numerical simulations highlight the improved tracking performance of the proposed filter.

Style APA, Harvard, Vancouver, ISO itp.

38

Lan, Hai, Zhifeng Bao i Yuwei Peng. "A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration". Data Science and Engineering 6, nr 1 (15.01.2021): 86–101. http://dx.doi.org/10.1007/s41019-020-00149-7.

Pełny tekst źródła

Streszczenie:

AbstractQuery optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.

Style APA, Harvard, Vancouver, ISO itp.

39

Ngo, Hung Q. "RECENT RESULTS ON CARDINALITY ESTIMATION AND INFORMATION THEORETIC INEQUALITIES". Journal of Computer Science and Cybernetics 37, nr 3 (7.10.2021): 223–38. http://dx.doi.org/10.15625/1813-9663/37/3/16129.

Pełny tekst źródła

Streszczenie:

I would like to dedicate this little exposition to Prof. Phan Dinh Dieu, one of the giants and pioneers of Mathematics in Computer Science in Vietnam. In the past 15 years or so, new and exciting connections between fundamental problems in database theory and information theory have emerged. There are several angles one can take to describe this connection. This paper takes one such angle, influenced by the author's own bias and research results. In particular, we will describe how the cardinality estimation problem -- a corner-stone problem for query optimizers -- is deeply connected to information theoretic inequalities. Furthermore, we explain how inequalities can also be used to derive a couple of classic geometric inequalities such as the Loomis-Whitney inequality. A purpose of the article is to introduce the reader to these new connections, where theory and practice meet in a wonderful way. Another objective is to point the reader to a research area with many new open questions.

Style APA, Harvard, Vancouver, ISO itp.

40

Reviriego, Pedro, i Daniel Ting. "Security of HyperLogLog (HLL) Cardinality Estimation: Vulnerabilities and Protection". IEEE Communications Letters 24, nr 5 (maj 2020): 976–80. http://dx.doi.org/10.1109/lcomm.2020.2972895.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

41

Reviriego, Pedro, Valerio Bruschi, Salvatore Pontarelli, Daniel Ting i Giuseppe Bianchi. "Fast Updates for Line-Rate HyperLogLog-Based Cardinality Estimation". IEEE Communications Letters 24, nr 12 (grudzień 2020): 2737–41. http://dx.doi.org/10.1109/lcomm.2020.3018336.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

42

Shironoshita, E. Patrick, Michael T. Ryan i Mansur R. Kabuka. "Cardinality estimation for the optimization of queries on ontologies". ACM SIGMOD Record 36, nr 2 (czerwiec 2007): 13–18. http://dx.doi.org/10.1145/1328854.1328856.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

43

Zhou, Zhongbao, Qianying Jin, Helu Xiao, Qian Wu i Wenbin Liu. "Estimation of cardinality constrained portfolio efficiency via segmented DEA". Omega 76 (kwiecień 2018): 28–37. http://dx.doi.org/10.1016/j.omega.2017.03.006.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

44

Papapetrou, Odysseas, Wolf Siberski i Wolfgang Nejdl. "Cardinality estimation and dynamic length adaptation for Bloom filters". Distributed and Parallel Databases 28, nr 2-3 (2.09.2010): 119–56. http://dx.doi.org/10.1007/s10619-010-7067-2.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

45

Borovica-Gajic, Renata, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski i Campbell Fraser. "Smooth Scan: robust access path selection without cardinality estimation". VLDB Journal 27, nr 4 (29.05.2018): 521–45. http://dx.doi.org/10.1007/s00778-018-0507-8.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

46

Tiakas, Eleftherios, Apostolos N. Papadopoulos i Yannis Manolopoulos. "On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets". International Journal of Knowledge-Based Organizations 3, nr 4 (październik 2013): 61–83. http://dx.doi.org/10.4018/ijkbo.2013100104.

Pełny tekst źródła

Streszczenie:

The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are examples of such techniques. A skyline query computes the items that are not dominated, whereas a top-k dominating query returns the k items with the highest domination score. To enable query optimization, it is important to estimate the expected number of skyline items as well as the maximum domination value of an item. In this article, the authors provide an estimation for the maximum domination value under the dinstinct values and attribute independence assumptions. The authors provide three different methodologies for estimating and calculating the maximum domination value and the authors test their performance and accuracy. Among the proposed estimation methods, their method Estimation with Roots outperforms all others and returns the most accurate results. They also introduce the eliminating dimension, i.e., the dimension beyond which all domination values become zero, and the authors provide an efficient estimation of that dimension. Moreover, the authors provide an accurate estimation of the skyline cardinality of a data set.

Style APA, Harvard, Vancouver, ISO itp.

47

Negi, Parimarjan, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska i Mohammad Alizadeh. "Flow-loss". Proceedings of the VLDB Endowment 14, nr 11 (lipiec 2021): 2019–32. http://dx.doi.org/10.14778/3476249.3476259.

Pełny tekst źródła

Streszczenie:

Recently there has been significant interest in using machine learning to improve the accuracy of cardinality estimation. This work has focused on improving average estimation error, but not all estimates matter equally for downstream tasks like query optimization. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, for learning cardinality estimation models. Flow-Loss approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain "plan graph", in which different paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark (CEB) which contains the ground truth cardinalities for sub-plans of over 16 K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the plan costs and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, models trained with both loss functions perform well. However, the Q-Error-trained model degrades significantly when evaluated on slightly different queries (e.g., similar but unseen query templates), while the Flow-Loss-trained model generalizes better to such situations, achieving 4 -- 8× better 99th percentile runtimes on unseen templates with the same model architecture and training data.

Style APA, Harvard, Vancouver, ISO itp.

48

Kadam, Sachin, Sesha Vivek Yenduri, Potharaju Hari Prasad, Rajesh Kumar i Gaurav S. Kasbekar. "Rapid Node Cardinality Estimation in Heterogeneous Machine-to-Machine Networks". IEEE Transactions on Vehicular Technology 70, nr 2 (luty 2021): 1836–50. http://dx.doi.org/10.1109/tvt.2021.3054594.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

49

Lu, Yao, Srikanth Kandula, Arnd Christian König i Surajit Chaudhuri. "Pre-training summarization models of structured datasets for cardinality estimation". Proceedings of the VLDB Endowment 15, nr 3 (listopad 2021): 414–26. http://dx.doi.org/10.14778/3494124.3494127.

Pełny tekst źródła

Streszczenie:

We consider the problem of pre-training models which convert structured datasets into succinct summaries that can be used to answer cardinality estimation queries. Doing so avoids per-dataset training and, in our experiments, reduces the time to construct summaries by up to 100×. When datasets change, our summaries are incrementally updateable. Our key insights are to use multiple summaries per dataset, use learned summaries for columnsets for which other simpler techniques do not achieve high accuracy, and that analogous to similar pre-trained models for images and text, structured datasets have some common frequency and correlation patterns which our models learn to capture by pre-training on a large and diverse corpus of datasets.

Style APA, Harvard, Vancouver, ISO itp.

50

Zheng, Yuanqing, i Mo Li. "Towards More Efficient Cardinality Estimation for Large-Scale RFID Systems". IEEE/ACM Transactions on Networking 22, nr 6 (grudzień 2014): 1886–96. http://dx.doi.org/10.1109/tnet.2013.2288352.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Artykuły w czasopismach na temat „Cardinality Estimation”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych