Journal articles: 'Cardinality Estimation'

1

Harmouch, Hazar, and Felix Naumann. "Cardinality estimation." Proceedings of the VLDB Endowment 11, no. 4 (December 2017): 499–512. http://dx.doi.org/10.1145/3186728.3164145.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Kwon, Suyong, Woohwan Jung, and Kyuseok Shim. "Cardinality estimation of approximate substring queries using deep learning." Proceedings of the VLDB Endowment 15, no. 11 (July 2022): 3145–57. http://dx.doi.org/10.14778/3551793.3551859.

Full text

Abstract:

Cardinality estimation of an approximate substring query is an important problem in database systems. Traditional approaches build a summary from the text data and estimate the cardinality using the summary with some statistical assumptions. Since deep learning models can learn underlying complex data patterns effectively, they have been successfully applied and shown to outperform traditional methods for cardinality estimations of queries in database systems. However, since they are not yet applied to approximate substring queries, we investigate a deep learning approach for cardinality estimation of such queries. Although the accuracy of deep learning models tends to improve as the train data size increases, producing a large train data is computationally expensive for cardinality estimation of approximate substring queries. Thus, we develop efficient train data generation algorithms by avoiding unnecessary computations and sharing common computations. We also propose a deep learning model as well as a novel learning method to quickly obtain an accurate deep learning-based estimator. Extensive experiments confirm the superiority of our data generation algorithms and deep learning model with the novel learning method.

APA, Harvard, Vancouver, ISO, and other styles

3

Liu, Jie, Wenqian Dong, Qingqing Zhou, and Dong Li. "Fauce." Proceedings of the VLDB Endowment 14, no. 11 (July 2021): 1950–63. http://dx.doi.org/10.14778/3476249.3476254.

Full text

Abstract:

Cardinality estimation is a fundamental and critical problem in databases. Recently, many estimators based on deep learning have been proposed to solve this problem and they have achieved promising results. However, these estimators struggle to provide accurate results for complex queries, due to not capturing real inter-column and inter-table correlations. Furthermore, none of these estimators contain the uncertainty information about their estimations. In this paper, we present a join cardinality estimator called Fauce. Fauce learns the correlations across all columns and all tables in the database. It also contains the uncertainty information of each estimation. Among all studied learned estimators, our results are promising: (1) Fauce is a light-weight estimator, it has 10× faster inference speed than the state of the art estimator; (2) Fauce is robust to the complex queries, it provides 1.3×--6.7× smaller estimation errors for complex queries compared with the state of the art estimator; (3) To the best of our knowledge, Fauce is the first estimator that incorporates uncertainty information for cardinality estimation into a deep learning model.

APA, Harvard, Vancouver, ISO, and other styles

4

Sun, Ji, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. "Learned cardinality estimation." Proceedings of the VLDB Endowment 15, no. 1 (September 2021): 85–97. http://dx.doi.org/10.14778/3485450.3485459.

Full text

Abstract:

Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial and open-source DBMSs. Nevertheless, histograms and samplings can only be used to summarize one or few columns, which fall short of capturing the joint data distribution over an arbitrary combination of columns, because of the oversimplification of histograms and samplings over the original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries over multiple columns, with multiple predicates, and joins between multiple tables. Recently, learned cardinality estimators have been widely studied. Because these learned estimators can better capture the data distribution and query characteristics, empowered by the recent advance of (deep learning) models, they outperform non-learned methods on many cases. The goals of this paper are to provide a design space exploration of learned cardinality estimators and to have a comprehensive comparison of the SOTA learned approaches so as to provide a guidance for practitioners to decide what method to use under various practical scenarios.

APA, Harvard, Vancouver, ISO, and other styles

5

Han, Yuxing, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, et al. "Cardinality estimation in DBMS." Proceedings of the VLDB Endowment 15, no. 4 (December 2021): 752–65. http://dx.doi.org/10.14778/3503585.3503586.

Full text

Abstract:

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source DBMS PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability. We obtain a number of key findings under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric (Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the generated query plan quality. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. It could serve as a better optimization objective for future CardEst methods.

APA, Harvard, Vancouver, ISO, and other styles

6

Yang, Zongheng, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. "Deep unsupervised cardinality estimation." Proceedings of the VLDB Endowment 13, no. 3 (November 2019): 279–92. http://dx.doi.org/10.14778/3368289.3368294.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Chen, Jeremy, Yuqing Huang, Mushi Wang, Semih Salihoglu, and Ken Salem. "Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs." Proceedings of the VLDB Endowment 15, no. 8 (April 2022): 1533–45. http://dx.doi.org/10.14778/3529337.3529339.

Full text

Abstract:

This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We begin by analyzing how optimistic estimators use pre-computed statistics to generate cardinality estimates. We show these estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains sub-queries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. While on acyclic queries and queries with small-size cycles, using the maximum-weight path is effective to address the well known underestimation problem, on queries with larger cycles these estimates tend to overestimate, which can be addressed by using minimum weight paths. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to adopt an optimization from pessimistic estimators to optimistic ones, and provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Jeremy, Yuqing Huang, Mushi Wang, Semih Salihoglu, and Kenneth Salem. "Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs." ACM SIGMOD Record 52, no. 1 (June 7, 2023): 94–102. http://dx.doi.org/10.1145/3604437.3604458.

Full text

Abstract:

We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins: (i) optimistic estimators, which were defined in the context of graph database management systems, that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We show that optimistic estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains subqueries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

APA, Harvard, Vancouver, ISO, and other styles

9

Jie, Xu, Lan Haoliang, Ding Wei, and Ju Ao. "Network Host Cardinality Estimation Based on Artificial Neural Network." Security and Communication Networks 2022 (March 24, 2022): 1–14. http://dx.doi.org/10.1155/2022/1258482.

Full text

Abstract:

Cardinality estimation plays an important role in network security. It is widely used in host cardinality calculation of high-speed network. However, the cardinality estimation algorithm itself is easy to be disturbed by random factors and produces estimation errors. How to eliminate the influence of these random factors is the key to further improving the accuracy of estimation. To solve the above problems, this paper proposes an algorithm that uses artificial neural network to predict the estimation bias and adjust the cardinality estimation value according to the prediction results. Based on the existing algorithms, the novel algorithm reduces the interference of random factors on the estimation results and improves the accuracy by adding the steps of cardinality estimation sampling, artificial neural network training, and error prediction. The experimental results show that, using the cardinality estimation algorithm proposed in this paper, the average absolute deviation of cardinality estimation can be reduced by more than 20%.

APA, Harvard, Vancouver, ISO, and other styles

10

Gao, Jintao, Zhanhuai Li, and Wenjie Liu. "A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 36, no. 4 (August 2018): 768–77. http://dx.doi.org/10.1051/jnwpu/20183640768.

Full text

Abstract:

Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.

APA, Harvard, Vancouver, ISO, and other styles

11

Grigorev, Y. A., and O. Yu Pluzhnikova. "ESTIMATION OF ATTRIBUTE VALUES IN JOIN TABLES WHILE OPTIMIZING RELATION-AL DATABASE QUERY." Informatika i sistemy upravleniya, no. 1 (2021): 3–18. http://dx.doi.org/10.22250/isu.2021.67.3-18.

Full text

Abstract:

The article analyzes the problem of estimating join tables cardinality in the process of calculating the cost of relational database query plan. A new algorithm for estimating the distinct values of attributes is proposed. The algorithm allows reducing inaccuracy in cardinality estimation. The consistency of proposed algorithm is proved.

APA, Harvard, Vancouver, ISO, and other styles

12

Sakr, Sherif. "Algebra‐based XQuery cardinality estimation." International Journal of Web Information Systems 4, no. 1 (April 4, 2008): 6–47. http://dx.doi.org/10.1108/17440080810865611.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Rusu, Florin, Zixuan Zhuang, Mingxi Wu, and Chris Jermaine. "Workload-Driven Antijoin Cardinality Estimation." ACM Transactions on Database Systems 40, no. 3 (October 23, 2015): 1–41. http://dx.doi.org/10.1145/2818178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Cohen, Reuven, Liran Katzir, and Aviv Yehezkel. "Cardinality Estimation Meets Good-Turing." Big Data Research 9 (September 2017): 1–8. http://dx.doi.org/10.1016/j.bdr.2017.04.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Suciu, Dan. "Technical Perspective: Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs." ACM SIGMOD Record 52, no. 1 (June 7, 2023): 93. http://dx.doi.org/10.1145/3604437.3604457.

Full text

Abstract:

Query engines are really good at choosing an efficient query plan. Users don't need to worry about how they write their query, since the optimizer makes all the right choices for executing the query, while taking into account all aspects of data, such as its size, the characteristics of the storage device, the distribution pattern, the availability of indexes, and so on. The query optimizer always makes the best choice, no matter how complex the query is, or how contrived it was written. Or, this is what we expect today from a modern query optimizer. Unfortunately, reality is not as nice.

APA, Harvard, Vancouver, ISO, and other styles

16

Potoniec, Jedrzej. "Mining Cardinality Restrictions in OWL." Foundations of Computing and Decision Sciences 45, no. 3 (September 1, 2020): 195–216. http://dx.doi.org/10.2478/fcds-2020-0011.

Full text

Abstract:

AbstractWe present an approach to mine cardinality restriction axioms from an existing knowledge graph, in order to extend an ontology describing the graph. We compare frequency estimation with kernel density estimation as approaches to obtain the cardinalities in restrictions. We also propose numerous strategies for filtering obtained axioms in order to make them more available for the ontology engineer. We report the results of experimental evaluation on DBpedia 2016-10 and show that using kernel density estimation to compute the cardinalities in cardinality restrictions yields more robust results that using frequency estimation. We also show that while filtering is of limited usability for minimum cardinality restrictions, it is much more important for maximum cardinality restrictions. The presented findings can be used to extend existing ontology engineering tools in order to support ontology construction and enable more efficient creation of knowledge-intensive artificial intelligence systems.

APA, Harvard, Vancouver, ISO, and other styles

17

Qi, Kaiyang, Jiong Yu, and Zhenzhen He. "A Cardinality Estimator in Complex Database Systems Based on TreeLSTM." Sensors 23, no. 17 (August 23, 2023): 7364. http://dx.doi.org/10.3390/s23177364.

Full text

Abstract:

Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accurately capture the correlation between multiple tables. Several recent studies have revealed that learning-based cardinality estimation methods can address the shortcomings of traditional methods and provide more accurate estimates. However, the learning-based cardinality estimation methods still have large errors when an SQL query involves multiple tables or is very complex. To address this problem, we propose a sampling-based tree long short-term memory (TreeLSTM) neural network to model queries. The proposed model addresses the weakness of traditional methods when no sampled tuples match the predicates and considers the join relationship between multiple tables and the conjunction and disjunction operations between predicates. We construct subexpressions as trees using operator types between predicates and improve the performance and accuracy of cardinality estimation by capturing the join-crossing correlations between tables and the order dependencies between predicates. In addition, we construct a new loss function to overcome the drawback that Q-error cannot distinguish between large and small cardinalities. Extensive experimental results from real-world datasets show that our proposed model improves the estimation quality and outperforms traditional cardinality estimation methods and the other compared deep learning methods in three evaluation metrics: Q-error, MAE, and SMAPE.

APA, Harvard, Vancouver, ISO, and other styles

18

Varagnolo, Damiano, Gianluigi Pillonetto, and Luca Schenato. "Distributed Cardinality Estimation in Anonymous Networks." IEEE Transactions on Automatic Control 59, no. 3 (March 2014): 645–59. http://dx.doi.org/10.1109/tac.2013.2287113.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Bozkus, Cem, and Basilio B. Fraguela. "Accelerating the HyperLogLog Cardinality Estimation Algorithm." Scientific Programming 2017 (2017): 1–8. http://dx.doi.org/10.1155/2017/2040865.

Full text

Abstract:

In recent years, vast amounts of data of different kinds, from pictures and videos from our cameras to software logs from sensor networks and Internet routers operating day and night, are being generated. This has led to new big data problems, which require new algorithms to handle these large volumes of data and as a result are very computationally demanding because of the volumes to process. In this paper, we parallelize one of these new algorithms, namely, the HyperLogLog algorithm, which estimates the number of different items in a large data set with minimal memory usage, as it lowers the typical memory usage of this type of calculation from O(n) to O(1). We have implemented parallelizations based on OpenMP and OpenCL and evaluated them in a standard multicore system, an Intel Xeon Phi, and two GPUs from different vendors. The results obtained in our experiments, in which we reach a speedup of 88.6 with respect to an optimized sequential implementation, are very positive, particularly taking into account the need to run this kind of algorithm on large amounts of data.

APA, Harvard, Vancouver, ISO, and other styles

20

Ré, Christopher, and D. Suciu. "Understanding cardinality estimation using entropy maximization." ACM Transactions on Database Systems 37, no. 1 (February 2012): 1–31. http://dx.doi.org/10.1145/2109196.2109202.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Adam, H., E. Yanmaz, and C. Bettstetter. "Contention-Based Estimation of Neighbor Cardinality." IEEE Transactions on Mobile Computing 12, no. 3 (March 2013): 542–55. http://dx.doi.org/10.1109/tmc.2012.19.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Hirata, Kohei, Daichi Amagata, and Takahiro Hara. "Cardinality Estimation in Inner Product Space." IEEE Open Journal of the Computer Society 3 (2022): 208–16. http://dx.doi.org/10.1109/ojcs.2022.3215206.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Wang, Jiayi, Chengliang Chai, Jiabin Liu, and Guoliang Li. "FACE." Proceedings of the VLDB Endowment 15, no. 1 (September 2021): 72–84. http://dx.doi.org/10.14778/3485450.3485458.

Full text

Abstract:

Cardinality estimation is one of the most important problems in query optimization. Recently, machine learning based techniques have been proposed to effectively estimate cardinality, which can be broadly classified into query-driven and data-driven approaches. Query-driven approaches learn a regression model from a query to its cardinality; while data-driven approaches learn a distribution of tuples, select some samples that satisfy a SQL query, and use the data distributions of these selected tuples to estimate the cardinality of the SQL query. As query-driven methods rely on training queries, the estimation quality is not reliable when there are no high-quality training queries; while data-driven methods have no such limitation and have high adaptivity. In this work, we focus on data-driven methods. A good data-driven model should achieve three optimization goals. First, the model needs to capture data dependencies between columns and support large domain sizes (achieving high accuracy). Second, the model should achieve high inference efficiency, because many data samples are needed to estimate the cardinality (achieving low inference latency). Third, the model should not be too large (achieving a small model size). However, existing data-driven methods cannot simultaneously optimize the three goals. To address the limitations, we propose a novel cardinality estimator FACE, which leverages the Normalizing Flow based model to learn a continuous joint distribution for relational data. FACE can transform a complex distribution over continuous random variables into a simple distribution (e.g., multivariate normal distribution), and use the probability density to estimate the cardinality. First, we design a dequantization method to make data more "continuous". Second, we propose encoding and indexing techniques to handle Like predicates for string data. Third, we propose a Monte Carlo method to efficiently estimate the cardinality. Experimental results show that our method significantly outperforms existing approaches in terms of estimation accuracy while keeping similar latency and model size.

APA, Harvard, Vancouver, ISO, and other styles

24

Woltmann, Lucas, Dominik Olwig, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. "PostCENN." Proceedings of the VLDB Endowment 14, no. 12 (July 2021): 2715–18. http://dx.doi.org/10.14778/3476311.3476327.

Full text

Abstract:

In this demo, we present PostCENN , an enhanced PostgreSQL database system with an end-to-end integration of machine learning (ML) models for cardinality estimation. In general, cardinality estimation is a topic with a long history in the database community. While traditional models like histograms are extensively used, recent works mainly focus on developing new approaches using ML models. However, traditional as well as ML models have their own advantages and disadvantages. With PostCENN , we aim to combine both to maximize their potentials for cardinality estimation by introducing ML models as a novel means to increase the accuracy of the cardinality estimation for certain parts of the database schema. To achieve this, we integrate ML models as first class citizen in PostgreSQL with a well-defined end-to-end life cycle. This life cycle consists of creating ML models for different sub-parts of the database schema, triggering the training, using ML models within the query optimizer in a transparent way, and deleting ML models.

APA, Harvard, Vancouver, ISO, and other styles

25

Woltmann, Lucas, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. "Aggregate-based Training Phase for ML-based Cardinality Estimation." Datenbank-Spektrum 22, no. 1 (January 10, 2022): 45–57. http://dx.doi.org/10.1007/s13222-021-00400-z.

Full text

Abstract:

AbstractCardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our aggregate-based training phase and thus outperform indexes.

APA, Harvard, Vancouver, ISO, and other styles

26

Lee, Kukjin, Anshuman Dutt, Vivek Narasayya, and Surajit Chaudhuri. "Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server." Proceedings of the VLDB Endowment 16, no. 11 (July 2023): 2871–83. http://dx.doi.org/10.14778/3611479.3611494.

Full text

Abstract:

Cardinality estimation is widely believed to be one of the most important causes of poor query plans. Prior studies evaluate the impact of cardinality estimation on plan quality on a set of Select-Project-Join queries on PostgreSQL DBMS. Our empirical study broadens the scope of prior studies in significant ways. First, we include complex SQL queries containing group-by, aggregation, outer joins and sub-queries from real-world workloads and industry benchmarks. We evaluate on both row-oriented and column-oriented physical designs. Our empirical study uses Microsoft SQL Server, an industry-strength DBMS with a state-of-the-art query optimizer that is equipped with techniques to optimize such complex queries. Second, we analyze the sensitivity of plan quality to cardinality errors in two ways by: (a) varying the subset of query sub-expressions for which accurate cardinalities are used, and (b) introducing progressively larger cardinality errors. Third, query processing techniques such as bitmap filtering and adaptive join have the potential to mitigate the impact of cardinality estimation errors by reducing the latency of bad plans. We evaluate the importance of accurate cardinalities in the presence of these techniques.

APA, Harvard, Vancouver, ISO, and other styles

27

Wang, Xiaoying, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. "Are we ready for learned cardinality estimation?" Proceedings of the VLDB Endowment 14, no. 9 (May 2021): 1640–54. http://dx.doi.org/10.14778/3461535.3461552.

Full text

Abstract:

Cardinality estimation is a fundamental but long unresolved problem in query optimization. Recently, multiple papers from different research groups consistently report that learned models have the potential to replace existing cardinality estimators. In this paper, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? Our study consists of three main parts. Firstly, we focus on the static environment (i.e., no data updates) and compare five new learned methods with nine traditional methods on four real-world datasets under a unified workload setting. The results show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs. Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates). We find that they cannot catch up with fast data updates and return large errors for different reasons. For less frequent updates, they can perform better but there is no clear winner among themselves. Thirdly, we take a deeper look into learned models and explore when they may go wrong. Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size. More importantly, their behaviors are much harder to interpret and often unpredictable. Based on these findings, we identify two promising research directions (control the cost of learned models and make learned models trustworthy) and suggest a number of research opportunities. We hope that our study can guide researchers and practitioners to work together to eventually push learned cardinality estimators into real database systems.

APA, Harvard, Vancouver, ISO, and other styles

28

Qian, Chen, Hoilun Ngan, Yunhao Liu, and Lionel M. Ni. "Cardinality Estimation for Large-Scale RFID Systems." IEEE Transactions on Parallel and Distributed Systems 22, no. 9 (September 2011): 1441–54. http://dx.doi.org/10.1109/tpds.2011.36.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Järvelin, Kalervo. "Cardinality estimation in numeric on-line databases." Information Processing & Management 22, no. 6 (January 1986): 523–48. http://dx.doi.org/10.1016/0306-4573(86)90103-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Bruschi, Valerio, Pedro Reviriego, Salvatore Pontarelli, Daniel Ting, and Giuseppe Bianchi. "More Accurate Streaming Cardinality Estimation With Vectorized Counters." IEEE Networking Letters 3, no. 2 (June 2021): 75–79. http://dx.doi.org/10.1109/lnet.2021.3076048.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Dao, DinhNguyen, DaeHun Nyang, and KyungHee Lee. "Effect of Sampling for Multi-set Cardinality Estimation." KIPS Transactions on Computer and Communication Systems 4, no. 1 (January 31, 2015): 15–22. http://dx.doi.org/10.3745/ktccs.2015.4.1.15.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Shah-Mansouri, Vahid, and Vincent W. S. Wong. "Cardinality Estimation in RFID Systems with Multiple Readers." IEEE Transactions on Wireless Communications 10, no. 5 (May 2011): 1458–69. http://dx.doi.org/10.1109/twc.2011.030411.100390.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Yoon, Myungkeun, and Young Jae Kim. "Address Block Counting Using Two-Tier Cardinality Estimation." IEEE Access 7 (2019): 125754–61. http://dx.doi.org/10.1109/access.2019.2938977.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Luo, Cheng, Zhewei Jiang, Wen-Chi Hou, Shan He, and Qiang Zhu. "A sampling approach for skyline query cardinality estimation." Knowledge and Information Systems 32, no. 2 (September 16, 2011): 281–301. http://dx.doi.org/10.1007/s10115-011-0441-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Negi, Parimarjan, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska, and Mohammad Alizadeh. "Robust Query Driven Cardinality Estimation under Changing Workloads." Proceedings of the VLDB Endowment 16, no. 6 (February 2023): 1520–33. http://dx.doi.org/10.14778/3583140.3583164.

Full text

Abstract:

Query driven cardinality estimation models learn from a historical log of queries. They are lightweight, having low storage requirements, fast inference and training, and are easily adaptable for any kind of query. Unfortunately, such models can suffer unpredictably bad performance under workload drift, i.e., if the query pattern or data changes. This makes them unreliable and hard to deploy. We analyze the reasons why models become unpredictable due to workload drift, and introduce modifications to the query representation and neural network training techniques to make query-driven models robust to the effects of workload drift. First, we emulate workload drift in queries involving some unseen tables or columns by randomly masking out some table or column features during training. This forces the model to make predictions with missing query information, relying more on robust features based on up-to-date DBMS statistics that are useful even when query or data drift happens. Second, we introduce join bitmaps, which extends sampling-based features to be consistent across joins using ideas from sideways information passing. Finally, we show how both of these ideas can be adapted to handle data updates. We show significantly greater generalization than past works across different workloads and databases. For instance, a model trained with our techniques on a simple workload (JOBLight-train), with 40 k synthetically generated queries of at most 3 tables each, is able to generalize to the much more complex Join Order Benchmark, which include queries with up to 16 tables, and improve query runtimes by 2× over PostgreSQL. We show similar robustness results with data updates, and across other workloads. We discuss the situations where we expect, and see, improvements, as well as more challenging workload drift scenarios where these techniques do not improve much over PostgreSQL. However, even in the most challenging scenarios, our models never perform worse than PostgreSQL, while standard query driven models can get much worse than PostgreSQL.

APA, Harvard, Vancouver, ISO, and other styles

36

Guo, Wenhua, Kaixuan Ye, Yiyan Qi, Peng Jia, and Pinghui Wang. "Generalized Sketches for Streaming Sets." Applied Sciences 12, no. 15 (July 22, 2022): 7362. http://dx.doi.org/10.3390/app12157362.

Full text

Abstract:

Many real-world datasets are given as a stream of user–interest pairs, where a user–interest pair represents a link from a user (e.g., a network host) to an interest (e.g., a website), and may appear more than once in the stream. Monitoring and mining statistics, including cardinality, intersection cardinality, and Jaccard similarity of users’ interest sets on high-speed streams, are widely employed by applications such as network anomaly detection. Although estimating set cardinality, set intersection cardinality, and set Jaccard similarity, respectively, is well studied, there is no effective method that provides a one-shot solution for estimating all these three statistics. To solve the above challenge, we develop a novel framework, SimCar. SimCar online builds an order-hashing (OH) sketch for each user occurring in the data stream of interest. At any time of interest, one can query the cardinalities, intersection cardinalities, and Jaccard similarities of users’ interest sets. Specially, using OH sketches, we develop maximum likelihood estimation (MLE) methods to estimate cardinalities and intersection cardinalities of users’ interest sets. In addition, we use OH sketches to estimate Jaccard similarities of users’ interest sets and build locality-sensitive hashing tables to search for users with similar interests with sub-linear time. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate the superiority of our methods.

APA, Harvard, Vancouver, ISO, and other styles

37

Si, Weijian, Hongfan Zhu, and Zhiyu Qu. "A Novel Structure for a Multi-Bernoulli Filter without a Cardinality Bias." Electronics 8, no. 12 (December 5, 2019): 1484. http://dx.doi.org/10.3390/electronics8121484.

Full text

Abstract:

The original multi-target multi-Bernoulli (MeMBer) filter for multi-target tracking (MTT) is shown analytically to have a significant bias in its cardinality estimation. A novel cardinality balance multi-Bernoulli (CBMeMBer) filter reduces the cardinality bias by calculating the exact cardinality of the posterior probability generating functional (PGFl) without the second assumption of the original MeMBer filter. However, the CBMeMBer filter can only have a good performance under a high detection probability, and retains the first assumption of the MeMBer filter, which requires measurements that are well separated in the surveillance region. An improved MeMBer filter proposed by Baser et al. alleviates the cardinality bias by modifying the legacy tracks. Although the cardinality is balanced, the improved algorithm employs a low clutter density approximation. In this paper, we propose a novel structure for a multi-Bernoulli filter without a cardinality bias, termed as a novel multi-Bernoulli (N-MB) filter. We remove the approximations employed in the original MeMBer filter, and consequently, the N-MB filter performs well in a high clutter intensity and low signal-to-noise environment. Numerical simulations highlight the improved tracking performance of the proposed filter.

APA, Harvard, Vancouver, ISO, and other styles

38

Lan, Hai, Zhifeng Bao, and Yuwei Peng. "A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration." Data Science and Engineering 6, no. 1 (January 15, 2021): 86–101. http://dx.doi.org/10.1007/s41019-020-00149-7.

Full text

Abstract:

AbstractQuery optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.

APA, Harvard, Vancouver, ISO, and other styles

39

Ngo, Hung Q. "RECENT RESULTS ON CARDINALITY ESTIMATION AND INFORMATION THEORETIC INEQUALITIES." Journal of Computer Science and Cybernetics 37, no. 3 (October 7, 2021): 223–38. http://dx.doi.org/10.15625/1813-9663/37/3/16129.

Full text

Abstract:

I would like to dedicate this little exposition to Prof. Phan Dinh Dieu, one of the giants and pioneers of Mathematics in Computer Science in Vietnam. In the past 15 years or so, new and exciting connections between fundamental problems in database theory and information theory have emerged. There are several angles one can take to describe this connection. This paper takes one such angle, influenced by the author's own bias and research results. In particular, we will describe how the cardinality estimation problem -- a corner-stone problem for query optimizers -- is deeply connected to information theoretic inequalities. Furthermore, we explain how inequalities can also be used to derive a couple of classic geometric inequalities such as the Loomis-Whitney inequality. A purpose of the article is to introduce the reader to these new connections, where theory and practice meet in a wonderful way. Another objective is to point the reader to a research area with many new open questions.

APA, Harvard, Vancouver, ISO, and other styles

40

Reviriego, Pedro, and Daniel Ting. "Security of HyperLogLog (HLL) Cardinality Estimation: Vulnerabilities and Protection." IEEE Communications Letters 24, no. 5 (May 2020): 976–80. http://dx.doi.org/10.1109/lcomm.2020.2972895.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Reviriego, Pedro, Valerio Bruschi, Salvatore Pontarelli, Daniel Ting, and Giuseppe Bianchi. "Fast Updates for Line-Rate HyperLogLog-Based Cardinality Estimation." IEEE Communications Letters 24, no. 12 (December 2020): 2737–41. http://dx.doi.org/10.1109/lcomm.2020.3018336.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Shironoshita, E. Patrick, Michael T. Ryan, and Mansur R. Kabuka. "Cardinality estimation for the optimization of queries on ontologies." ACM SIGMOD Record 36, no. 2 (June 2007): 13–18. http://dx.doi.org/10.1145/1328854.1328856.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Zhou, Zhongbao, Qianying Jin, Helu Xiao, Qian Wu, and Wenbin Liu. "Estimation of cardinality constrained portfolio efficiency via segmented DEA." Omega 76 (April 2018): 28–37. http://dx.doi.org/10.1016/j.omega.2017.03.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Papapetrou, Odysseas, Wolf Siberski, and Wolfgang Nejdl. "Cardinality estimation and dynamic length adaptation for Bloom filters." Distributed and Parallel Databases 28, no. 2-3 (September 2, 2010): 119–56. http://dx.doi.org/10.1007/s10619-010-7067-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Borovica-Gajic, Renata, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, and Campbell Fraser. "Smooth Scan: robust access path selection without cardinality estimation." VLDB Journal 27, no. 4 (May 29, 2018): 521–45. http://dx.doi.org/10.1007/s00778-018-0507-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Tiakas, Eleftherios, Apostolos N. Papadopoulos, and Yannis Manolopoulos. "On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets." International Journal of Knowledge-Based Organizations 3, no. 4 (October 2013): 61–83. http://dx.doi.org/10.4018/ijkbo.2013100104.

Full text

Abstract:

The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are examples of such techniques. A skyline query computes the items that are not dominated, whereas a top-k dominating query returns the k items with the highest domination score. To enable query optimization, it is important to estimate the expected number of skyline items as well as the maximum domination value of an item. In this article, the authors provide an estimation for the maximum domination value under the dinstinct values and attribute independence assumptions. The authors provide three different methodologies for estimating and calculating the maximum domination value and the authors test their performance and accuracy. Among the proposed estimation methods, their method Estimation with Roots outperforms all others and returns the most accurate results. They also introduce the eliminating dimension, i.e., the dimension beyond which all domination values become zero, and the authors provide an efficient estimation of that dimension. Moreover, the authors provide an accurate estimation of the skyline cardinality of a data set.

APA, Harvard, Vancouver, ISO, and other styles

47

Negi, Parimarjan, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. "Flow-loss." Proceedings of the VLDB Endowment 14, no. 11 (July 2021): 2019–32. http://dx.doi.org/10.14778/3476249.3476259.

Full text

Abstract:

Recently there has been significant interest in using machine learning to improve the accuracy of cardinality estimation. This work has focused on improving average estimation error, but not all estimates matter equally for downstream tasks like query optimization. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, for learning cardinality estimation models. Flow-Loss approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain "plan graph", in which different paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark (CEB) which contains the ground truth cardinalities for sub-plans of over 16 K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the plan costs and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, models trained with both loss functions perform well. However, the Q-Error-trained model degrades significantly when evaluated on slightly different queries (e.g., similar but unseen query templates), while the Flow-Loss-trained model generalizes better to such situations, achieving 4 -- 8× better 99th percentile runtimes on unseen templates with the same model architecture and training data.

APA, Harvard, Vancouver, ISO, and other styles

48

Kadam, Sachin, Sesha Vivek Yenduri, Potharaju Hari Prasad, Rajesh Kumar, and Gaurav S. Kasbekar. "Rapid Node Cardinality Estimation in Heterogeneous Machine-to-Machine Networks." IEEE Transactions on Vehicular Technology 70, no. 2 (February 2021): 1836–50. http://dx.doi.org/10.1109/tvt.2021.3054594.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Lu, Yao, Srikanth Kandula, Arnd Christian König, and Surajit Chaudhuri. "Pre-training summarization models of structured datasets for cardinality estimation." Proceedings of the VLDB Endowment 15, no. 3 (November 2021): 414–26. http://dx.doi.org/10.14778/3494124.3494127.

Full text

Abstract:

We consider the problem of pre-training models which convert structured datasets into succinct summaries that can be used to answer cardinality estimation queries. Doing so avoids per-dataset training and, in our experiments, reduces the time to construct summaries by up to 100×. When datasets change, our summaries are incrementally updateable. Our key insights are to use multiple summaries per dataset, use learned summaries for columnsets for which other simpler techniques do not achieve high accuracy, and that analogous to similar pre-trained models for images and text, structured datasets have some common frequency and correlation patterns which our models learn to capture by pre-training on a large and diverse corpus of datasets.

APA, Harvard, Vancouver, ISO, and other styles

50

Zheng, Yuanqing, and Mo Li. "Towards More Efficient Cardinality Estimation for Large-Scale RFID Systems." IEEE/ACM Transactions on Networking 22, no. 6 (December 2014): 1886–96. http://dx.doi.org/10.1109/tnet.2013.2288352.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Cardinality Estimation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles