Exploiting data-distribution patterns in modeling tuple selectivities in a database (Q1803861)

The paper concerns an estimation of the number of tuples satisfying a database query. The model is based on data distribution and is independent of an used database system. Queries are considered over one relation modelled as multidimensional bit map. A partial-range query specifies a hyperrectangle within a bit map. The main idea of the paper is the following: if the 1's in the bit map are distributed not homogeneously, then the map is divided into cells in which the 1's are distributed homogeneously. Much of the paper is devoted to obtaining the best trade-off between storage requirements for bit map representation and accuracy of estimates of the number of 1's is any hyperrectangle. The authors discuss in the first four chapters a number of other approaches related to the problem. In Chapter 5, they define a new notion of homogeneity and compare it with related notions occurred in the literature. Chapter 6 contains a new method based on sampling. It is shown that it requires a sample size of 100 to produce accurate estimates 95\% of the time. The paper is written in detail but not homogeneously. All new ideas are presented rather briefly in comparing to the parts explaining other approaches etc. Any discussion concerning an implementation of the method in a database environment is not presented here.

0 references

zbMATH Keywords

tuple selectivity

0 references

query evaluation

0 references

data distribution

0 references

reviewed by

Jaroslav Pokorný

0 references

MaRDI profile type