Monday, August 18, 2014

Zone Maps

Zone Maps is a new feature that got officially introduced in 12.1.0.2.0 so I've decided to take a closer look.

From the Oracle Documentation:
For full table access, zone maps allow I/O pruning of data based on the physical location of the data on disk, acting like an anti-index.

Let's start by creating a test table:
SQL> create table t pctfree 95 clustering by linear order (n) as
  2   select level n, rpad('x', 200, 'x') v
  3    from dual
  4    connect by level <= 5000;
 
Table created
I've used a high pctfree setting to make sure the table gets spread out on disk -- each row will occupy it's own block:
SQL> select count(*) num_rows,
  2    count(distinct dbms_rowid.rowid_block_number(rowid)) num_blocks
  3   from t;
 
  NUM_ROWS NUM_BLOCKS
---------- ----------
      5000       5000
Zone Maps do not require attribute clustering, however, I did use clustering in this particular case to make sure that value ranges for column N do not overlap when physically stored on disk to make it easier to see what's happening when a Zone Map is created.

Now let's create a Zone Map:
SQL> create materialized zonemap zm$t on t (n);
 
Done
The first thing to notice is we now got additional table in our schema with the same name as a Zone Map:
SQL> select * from zm$t order by zone_id$;
 
    ZONE_ID$    MIN_1_N    MAX_1_N ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
------------ ---------- ---------- ----------- ----------- ----------
324421046272          1        748           0           0        748
324421046273        749       1756           0           0       1008
324421046274       1757       2764           0           0       1008
324421046275       2765       3772           0           0       1008
324421046276       3773       4780           0           0       1008
324421046277       4781       5000           0           0        220
 
6 rows selected
As you can see we've got six zones defined with most of them covering a range of about thousand rows with the exception of the first and the last ones. I can now map each ZONE_ID$ to it's respective block range on disk:
SQL> select zone_id$, min_block_id, max_block_id, zone_rows$
  2  from (
  3  select zm$t.zone_id$,
  4    min(dbms_rowid.rowid_block_number(t.rowid)) min_block_id,
  5    max(dbms_rowid.rowid_block_number(t.rowid)) max_block_id,
  6    max(zone_rows$) zone_rows$
  7   from t, zm$t
  8   where t.n between zm$t.min_1_n and zm$t.max_1_n
  9   group by zm$t.zone_id$
 10  ) order by zone_id$;
 
  ZONE_ID$   MIN_BLOCK_ID MAX_BLOCK_ID ZONE_ROWS$
------------ ------------ ------------ ----------
324421046272          179         1023        748
324421046273         1026         2047       1008
324421046274         2050         3071       1008
324421046275         3074         4095       1008
324421046276         4098         5119       1008
324421046277         5122         5343        220
 
6 rows selected
Based on a couple more tests I've done the algorithm appears to work itself out until the fist block number in a segment which divides evenly by 1024 after which all subsequent blocks get mapped to 8MB regions. The last map has 221 blocks because that's where the end of the last table block happens to be.

Zone Maps seems to be completely independent from the actual extents on disk. In my tests I was able to get multiple Zone Maps pointing at the same extent and in the case above we have multiple extents covered by the same Zone Map. In all cases zones were 8MB in size (more on that a little bit later). By the way if you're wondering why do we have 1008 rows and a little bit strange starting block_ids (i.e. 1026 instead of 1024 and so on) remember that the fist couple blocks are used for first level bmb and do not store actual table data.

So how is this all pieced together?

SYS_OP_ZONE_ID

SYS_OP_ZONE_ID function computes ZONE_ID$ for a given ROWID value. The calculation appears to be a straightforward math based on the ROWID value and assumes 8MB Zone Map chunks -- this explains why Zone Maps are created adjacent to 8MB boundaries since ZONE_ID$ values in the Zone Map are computed using the same function:
SQL> select rowid, SYS_OP_ZONE_ID(rowid)
  2   from t
  3   where rownum=1;
 
ROWID              SYS_OP_ZONE_ID(ROWID)
------------------ ---------------------
AAAS4kAAFAAAACzAAA          324421046272

Essentially if we imagine the entire possible database space divided into 8MB regions this function would return into which region a particular ROWID value would belong.

SYS_ZMAP_FILTER

Let's look at the explain plan for the following query:
SQL> select * from t where n=1;


Execution Plan
----------------------------------------------------------
Plan hash value: 2931408918

--------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%C
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     1 |   205 |  1380
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| T    |     1 |   205 |  1380
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("N"=1)
       filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
              BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_N" > :1 OR zm."MAX_1_N"
              < :2) THEN 3 ELSE 2 END END FROM "ROOT"."ZM$T" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY
              zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),1,1)<3 AND "N"=1)
We can see that SYS_ZMAP_FILTER appears to be the function involved in figuring out which Zone Maps needs to be accessed in order to execute the query. The condition inside the query (zm."MIN_1_N" > :1 OR zm."MAX_1_N" < :2) will be used to eliminate Zone Maps which do not have a value we're looking for and is dynamically constructed based on the predicate(-s) we have in the query. From here ZONE_ID$ can be mapped back to ROWID ranges (a reverse operation relative to SYS_OP_ZONE_ID) in order to scan only required portions of the data on disk.

This looks to be a very exciting feature and I can't help but think that it's a little bit disappointing that it's limited to Exadata storage only as it could have been very useful on other systems due to lack of storage indexes support.

3 comments:

  1. Hi Alex,

    why do you think ZM is limited to Exadata only? I couldn't find anything in the official docs that suggest that.
    Also 8MB zones come from the default parameter scale=10

    ReplyDelete
    Replies
    1. I was a little bit puzzled how SYS_OP_ZONE_ID would be able to tell which scale was used when creating a Zone Map. In my tests the function could accept an arbitrary ROWID (even the ones which did not belong to any tables in the database or were outside of possible ROWID range for a given database). It turned out SYS_OP_ZONE_ID accepts a second parameter (which I initially did not realize it had) to tell it what scale to use for slicing. During the parse time the scale is looked up and, in case it was non-default one, it will be passed into SYS_OP_ZONE_ID.

      Delete
  2. HI Timur,

    I could not create a Zone Map with any non-Exadata storage as it results in an error:

    ORA-31969: ZONEMAP not supported for table stored in tablespace of this storage type

    Somebody over at Kerry's blog pointed out that licensing guide mentions that the feature is limited to Exa/SC and here it is:

    http://docs.oracle.com/database/121/DBLIC/options.htm#DBLIC152

    Zone Maps (Available starting with Oracle Database 12c Release 1 (12.1.0.2); Requires Exadata or Supercluster). There seems to be no technical reason for things to be this way as I verified that Zone Maps work even when storage servers are in block I/O mode.

    Good find about the scale parameter! So 2^10 (=1024) is the default Zone Map size.

    ReplyDelete