approximate-top-n-query-processing-18c

Approximate Top-N Query Processing (APPROX_RANK, APPROX_SUM, APPROX_COUNT) In
Oracle Database 18c

The APPROX_RANK, APPROX_SUM and APPROX_COUNT functions were introduced
in Oracle 18c to allow approximate top-n query processing. This extends
the approximate query processing that was introduced in the previous
two releases ([46]12cR1 and [47]12cR2).
* [48]Setup
* [49]Introduction
* [50]APPROX_SUM and APPROX_RANK
* [51]APPROX_COUNT and APPROX_RANK

Related articles.
* [52]Top-N Queries
* [53]APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database
12cR1 (12.1.0.2)
* [54]Approximate Query Processing in Oracle Database 12c Release 2
(12.2)
* [55]Analytic Functions : All Articles

Setup

The examples in this article require the following table. The table is
populated with random data, so your results may look a little different
than those presented here.
DROP TABLE t1 PURGE;

CREATE TABLE t1 AS
SELECT level AS id,
TRUNC(DBMS_RANDOM.value(1, 5))*10 department,
UPPER(TO_CHAR(TO_DATE(TRUNC(DBMS_RANDOM.value(1, 6)),'j'), 'jsp')) AS rec
ord_type,
ROUND(DBMS_RANDOM.value(1,51),2) AS record_value
FROM dual
CONNECT BY level <= 100000;

You can get an idea of your data set using the following query.
COLUMN record_type FORMAT A12

SELECT department,
record_type,
COUNT(*) AS record_count,
SUM(record_value) AS sum_record_val
FROM t1
GROUP BY department, record_type
ORDER BY department, record_type;

DEPARTMENT RECORD_TYPE RECORD_COUNT SUM_RECORD_VAL
---------- ------------ ------------ --------------
10 FIVE 5012 130044.4
10 FOUR 5042 131431.59
10 ONE 4912 128534.48
10 THREE 4922 126185.08
10 TWO 5160 133586.67
20 FIVE 4816 124278.87
20 FOUR 4983 130158.06
20 ONE 4961 128156.44
20 THREE 5049 131219.07
20 TWO 5018 129587.6
30 FIVE 5059 131168.33
30 FOUR 4969 128337.92
30 ONE 5042 131607.88
30 THREE 5115 131455.11
30 TWO 4918 127673.23
40 FIVE 4965 128567.84
40 FOUR 5007 128968.23
40 ONE 5017 130735.96
40 THREE 4944 130084.71
40 TWO 5089 131310.33

SQL>

Introduction

In this section I'll keep showing the same query, highlighting
different parts of the statement to illustrate the point being made.
It's not important what the query actually does at this point. Just
focus on the syntax. The examples are all based around the APPROX_SUM
function, but the points apply equally to the APPROX_COUNT function.

The approximate top-n query processing currently focuses on scenarios
where you are ranking groups of data based on a SUM of values or a
COUNT of the number of rows in the group, hence the APPROX_SUM and
APPROX_COUNT functions. These functions can only be used in combination
with the APPROX_RANK function, which must be present in the HAVING
clause of the query, but can also appear in the SELECT list, if you
want to display the rank of the data.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) <= 2
ORDER BY 1, 4;

The same approximate function used in the SELECT list must also appear
in the ORDER BY clause of the APPROX_RANK function.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) <= 2
ORDER BY 1, 4;

The ORDER BY clause is mandatory and must always include the DESC
keyword, so this can't be used to perform bottom-n queries like most
other methods.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) <= 2
ORDER BY 1, 4;

The PARTITION BY clause must include all or a subset of the columns in
the GROUP BY clause. If there is more than one column in the GROUP BY
clause the PARTITION BY clause is mandatory. The PARTITION BY clause
can be omitted if there is only a single column in the GROUP BY clause,
as this will automatically be the partition.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) <= 2
ORDER BY 1, 4;

The HAVING clause limits the ranks returned by the query. This is the
top-n.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) <= 2
ORDER BY 1, 4;

The APPROX_SUM and APPROX_COUNT functions can accept a second parameter
of 'MAX_ERROR', which causes them to display the maximum error between
the actual and approximate sum or count respectively. I've not seen
this return anything but "0" in the data sets I've tried.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) <= 2
ORDER BY 1, 5;

The following sections will provide some examples of the usage.

APPROX_SUM and APPROX_RANK

In this section we will show some examples of using the APPROX_SUM and
APPROX_RANK functions together.

The following example displays the top-2 record types, based on the
approximate sum of the values per record type across all departments.
SELECT record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;

RECORD_TYPE APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
------------ -------------- ----------------- ---------------
ONE 519034.76 0 2
TWO 522157.83 0 1

SQL>

The following example displays the top-2 record types in each
department, based on the approximate sum of the values per record type
for each department.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DES
C) <= 2
ORDER BY 1, 5;

DEPARTMENT RECORD_TYPE APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
---------- ------------ -------------- ----------------- ---------------
10 TWO 133631 0 1
10 FOUR 131452 0 2
20 THREE 131247 0 1
20 FOUR 130162 0 2
30 ONE 131646 0 1
30 THREE 131482 0 2
40 TWO 131315 0 1
40 ONE 130750 0 2

SQL>

The following example displays the top department, based on the
approximate sum of the values per department for all record types.
SELECT department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 4;

DEPARTMENT APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
---------- -------------- ----------------- ---------------
30 650242.47 0 1

SQL>

The following example displays the top department for each record type,
based on the approximate sum of the values per department for each
record type.
SELECT record_type,
department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DE
SC) AS approx_rank_val
FROM t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DE
SC) <= 1
ORDER BY 1, 5;

RECORD_TYPE DEPARTMENT APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
------------ ---------- -------------- ----------------- ---------------
FIVE 30 131171 0 1
FOUR 10 131452 0 1
ONE 30 131646 0 1
THREE 30 131482 0 1
TWO 10 133631 0 1

SQL>

APPROX_COUNT and APPROX_RANK

In this section we will show some examples of using the APPROX_COUNT
and APPROX_RANK functions together.

The following example displays the top-2 record types, based on the
approximate number of rows per record type across all departments.
SELECT record_type,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type
HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 2
ORDER BY 1, 4;

RECORD_TYPE APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
------------ ---------------- ------------------- ---------------
THREE 20030 0 2
TWO 20185 0 1

SQL>

The following example displays the top-2 record types in each
department, based on the approximate number of rows per record type in
each department.
SELECT department,
record_type,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) AS app
rox_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) <= 2
ORDER BY 1, 5;

DEPARTMENT RECORD_TYPE APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
---------- ------------ ---------------- ------------------- ---------------
10 TWO 5160 0 1
10 FOUR 5042 0 2
20 THREE 5049 0 1
20 TWO 5018 0 2
30 THREE 5115 0 1
30 FIVE 5059 0 2
40 TWO 5089 0 1
40 ONE 5017 0 2

SQL>

The following example displays the top department, based on the
approximate number of rows of all record types in the department
SELECT department,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM t1
GROUP BY department
HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 1
ORDER BY 1, 4;

DEPARTMENT APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
---------- ---------------- ------------------- ---------------
30 25103 0 1

SQL>

The following example displays the top department for each record type,
based on the approximate number of rows of that record type in the
department
SELECT record_type,
department,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) AS ap
prox_rank_val
FROM t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) <= 1
ORDER BY 1, 5;

RECORD_TYPE DEPARTMENT APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
------------ ---------- ---------------- ------------------- ---------------
FIVE 30 5059 0 1
FOUR 10 5042 0 1
ONE 30 5042 0 1
THREE 30 5115 0 1
TWO 10 5160 0 1

SQL>

For more information see:
* [56]APPROX_RANK
* [57]APPROX_COUNT
* [58]APPROX_SUM
* [59]Top-N Queries
* [60]APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database
12cR1 (12.1.0.2)
* [61]Approximate Query Processing in Oracle Database 12c Release 2
(12.2)
* [62]Analytic Functions : All Articles

Hope this helps. Regards Tim...

[63]Back to the Top.
[INS: :INS]
--------------------------
Source .... : https://oracle-base.com/articles/18c/approximate-top-n-query-processing-18c
--------------------------

Social Buttons

Breaking

Thursday, March 22, 2018

approximate-top-n-query-processing-18c

No comments:

Post a Comment

Total Pageviews

Follow Us

Facebook

Blog Archive

Search This Blog

About Me

Tags

Categories

LABELS

Pages

Labels

Recent

Comments

Technology

Blog Archive

Recent Post

Recent In Internet

Popular