Tuesday, April 28, 2009

Oracle Column Selectivity – An Performance Impact

Lot of times I hear developer saying what is the big difference it is going to make when we execute “select * from tab1 where c1=1” instead of “select c2 from tab1 where c1=1”. (Even though they need only c2).  I bet, performance impact will  be big.

Benefits of selecting only required columns are listed below (Below list may not exclusive):
1. Network round trip between application server and database server.
2. Application server’s record set size will consume more memory which adds overhead to JVM’s GC engine.
3. Database server IO (Quite obvious IO will be having big impact on OLTP systems).

First 2 points are simple, straight and it is easy to understand. Regarding 3rd point, see my example below which shows IO impact it has because of bad query(selecting all columns).

SQL> create table tab1 as select * from all_objects;

Table created.

SQL> desc tab1
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 OBJECT_NAME                               NOT NULL VARCHAR2(30)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                   NOT NULL DATE
 LAST_DDL_TIME                             NOT NULL DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)
 NAMESPACE                                 NOT NULL NUMBER
 EDITION_NAME                                       VARCHAR2(30)

SQL> create index ind1 on tab1(object_id,object_name);

Index created.


SQL> exec dbms_stats.gather_table_stats( user, 'TAB1');

PL/SQL procedure successfully completed.

SQL> select * from tab1;

53762 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2211052296

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 53762 |  4882K|   228   (1)| 00:00:03 |
|   1 |  TABLE ACCESS FULL| TAB1 | 53762 |  4882K|   228   (1)| 00:00:03 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       4352  consistent gets
          0  physical reads
          0  redo size
    6269625  bytes sent via SQL*Net to client
      39840  bytes received via SQL*Net from client
       3586  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      53762  rows processed

SQL> select object_id,object_name from tab1;

53762 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4126477464

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 53762 |  1522K|    87   (0)| 00:00:02 |
|   1 |  INDEX FAST FULL SCAN| IND1 | 53762 |  1522K|    87   (0)| 00:00:02 |
-----------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3887  consistent gets
          0  physical reads
          0  redo size
    1845080  bytes sent via SQL*Net to client
      39840  bytes received via SQL*Net from client
       3586  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      53762  rows processed

SQL> select * from tab1 where object_id=2001
  2  ;


Execution Plan
----------------------------------------------------------
Plan hash value: 2086140937

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time   |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    93 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB1 |     1 |    93 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND1 |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=2001)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       1349  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select object_name from tab1 where object_id=2001;


Execution Plan
----------------------------------------------------------
Plan hash value: 3743434205

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |    29 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IND1 |     1 |    29 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"=2001)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        428  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 


If we carefully look in our above example, query "select * from tab1" requires 4882K bytes whereas "select object_id,object_name from tab1" requires only 1522K bytes.

Likewise "select * from tab1 where object_id=2001" requires 93bytes whereas "select object_name from tab1 where object_id=2001" requires just 29 bytes.

All of this IO optimization is because of 
  • "INDEX FAST FULL SCAN" ==>Fast full index scans are an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the NOT NULL constraint. A fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation, because the data is not ordered by the index key. It reads the entire index using multiblock reads, unlike a full index scan, and can be parallelized
  • "INDEX RANGE SCAN" ==> An index range scan is a common operation for accessing selective data. It can be bounded (bounded on both sides) or unbounded (on one or both sides). Data is returned in the ascending order of index columns. Multiple rows with identical values are sorted in ascending order by rowid.

During any index access, if index has all column required to satisfy an SQL, table will not be accessed at all(avoids extra IO/buffer gets).

Note: Bytes in above examples means "Estimate by the cost-based approach of the number of bytes accessed by the operation".

As a single user, we might not see much performance impact; but at high load this will have huge impact. Whoever developing code, please mark this valuable performance tuning hints.

As usual, your comments are always welcome.

-Thiru