Notes of "Quotient Cube: How to Summarize the Semantics of a Data Cube"
Roll up:向上综合
Drill down:向下细化或者向下钻取
w.r.t. :with respect to 关于
lattice: A lattice is partially ordered set (L, ≼) such that every pair of elements in has a least upper bound (lub) and a greatest lower bound (glb)
A convex partition retains semantics:
C1 roll up C2, C2 roll up C3, C1 C3 ∈CLS, and we get C2∈CLS
Convexity means "holes" cannot exist in classes.
The equivalence relation defined solely on the basis of equality of count values is always convex. Suppose the domain of the measure attribute contain only non-negative (or only non-positive) values. Then equivalence defined solely on the basis of equality of sum values is convex.
The proposition follows from the observation that whenever there is a cell c'' in between c and c', i.e. c≼c''≼c', c'' must contain all tuples that c' has, and for COUNT and SUM on positive measure, it cannot form a hole.
We say an equivalence class is connected if its local internal structure is a connected DAG. DAG is short for Directed Acyclic Graphic ( 有向无圈图 ).
Specifically, say that two cells c and c' are cover equivalent, c ≡ Cov c', provided the tuples contained in those cells is the same.
Let Pcov be the partition associated with the cover equivalence relation ≡Cov. Then Pcov is necessarily convex.
For a cell c, a tuple t in base table is in c's cover if can be rolled up to c.
All cells having the same cover are in a class.
Cells c1 and c2 have the same cover -> there must be some common ancestor c3 of c1 and c2 st c3 has the same cover.
Cells c1 and c2 are connected if a series of rollup/drilldown operation starting from c1 can touch c2.
Intuitively, (each class of ) a partition should be connected.
All cells in a cover partition carry the same aggregate value w.r.t. any aggregate function. But cells in a class of MIN() may have different covers.
For COUNT() and SUM() (Positive), cover equivalence coincides with aggregate equivalence.
We say that ≡ is a congruence provided for every c, c', d, d' ∈ L, whenever we have c≡c', d≡d', and c ≼ d, we have c' ≼ d.
Let (L, ≼) be any cube lattice and ≡ any equivalence relation on its cells. We say that ≡ is a weak congruence provided for every c, c', d, d' ∈ L, whenever we have c≡c', d≡d', c ≼ d, and d' ≼ c, we also have c ≡ d
Convex ó no "hole" in the class ó weak Congruence
They preserve the rollup/drilldown semantics.
Quotient cube lattice is the lattice of convex classes.
Input: base table B, monotone aggregate function f;
Output: Quotient cube
Method:
Step 1: let b = (all, ..., all); call DFS(b, B, 0);
Step 2: merge those temp classes sharing some comm. Upper bounds; if C1 and C2 sharing a same upper bound c, then merge them;
//e.g. if MIN((a, b)) = MIN((a, all)) = MIN((all, b)); the temp classes of the two cells (a, all) and (all, b) should be merged, since they share the upper bound(a, b)
Step 3: output the classes, and their bounds, but only output true lower bounds, by removing lower bounds that have descendants in the merged class;
//e.g. when we process DFS on (all, b, all), it may in turn call a DFS on (all, b, c) and then form a temp class C1 = {(all, b, c), (d, b, c)}. Later, when the search branches to (all, all, c), it may form another temp class {(all, all, c),(d, b, c)}. The two classes share a common upper bound and so are merged. In the merged class, (all, b, c) is not a lower bound anymore and hence should be removed.
Function DFS(c, Bc, k)
//c is a cell and Bc is the corresponding partition of the base table;
Step 1: Compute aggregate of cell c by one scan of Bc, in the same scan, collect dimension-value statistics info for CountSort;
//similarly to the BUC algorithm;
Step 2: compute the set of upper bounds UB(c) of the class of c, by "jumping" to the appropriate upper bounds depending on the aggregate function used;
//see text for details
Step 3: Record a temp class with lower bound c and upper bound(s) in UB(c);
Step 4: for each upper bound in UB(c), do
[4.1] if there is some j<k s.t. c[j]=all and d[j]!=all then such a bound has been examined before;do nothing;
//e.g. suppose when searching (all, all, c, all), we find that (a, all, c, d) is an upper bound. Then it must have been explored in the (a, all, all, all) branch.
[4.2] else for each k<j≤n, s.t. d[j] = all do
For each value x in dimension j of base table
Let d[j] = x ; form Bd;
If Bd is not empty, call DFS(d, Bd, j);
Step 5: Return
输入:基本表 B,单一的聚合函数f
输出:Quotient Cube;
方法:
//如果MIN((a, b))=MIN( (a, all))=MIN((all, b)),临时类的两个方格(a, all) 和 (all, b)应该合并,因为它们具有相同的上确界(a, b)
//e.g.当我们在(all, b, all)调用DFS时,它可能在(all, b, c)上轮流调用DFS,并生成一个临时类C1={ (all, b, c), (d, b, c)}。然后,当它查找到分支(all, all, c), 他可能生成成另外一个临时类{ (all, all ,c), (d, b, c)}。这两个临时类具有相同的上确界,因此合并他们。在这两个被合并的类中,(all, b, c)不在是一个下确界了,因此必须移除它。
函数 DFS(c, Bc, k)
//c 是一个数据方格,Bc 是相对应的基本表的一部分
//e.g. 当我们要查找(all ,all ,c ,all)的时候,我们发现(a, all ,c ,d ) 是一个上确界。于是它一定已在(a, all, all, all)分支查找过了。
对于每一个在基本表的维度j中的x,让d[j]=x; 生成Bd;
如果Bb是非空的,调用DFS(d, Bd, j);
Input: Base table B, non-monotone aggregate function f;
Output: Quotient cube w.r.t a maximal convex partition;
Method:
For each parent class P of C at the next higher level with the same measure value as C {