Lot of times I hear developer saying what is the big difference it is going to make when we execute “select * from tab1 where c1=1” instead of “select c2 from tab1 where c1=1”. (Even though they need only c2). I bet, performance impact will be big.
Benefits of selecting only required columns are listed below (Below list may not exclusive):
1. Network round trip between application server and database server.
2. Application server’s record set size will consume more memory which adds overhead to JVM’s GC engine.
3. Database server IO (Quite obvious IO will be having big impact on OLTP systems).
First 2 points are simple, straight and it is easy to understand. Regarding 3rd point, see my example below which shows IO impact it has because of bad query(selecting all columns).
SQL> create table tab1 as select * from all_objects;
Table created.
SQL> desc tab1
Name Null? Type
----------------------------------------- -------- ----------------------------
OWNER NOT NULL VARCHAR2(30)
OBJECT_NAME NOT NULL VARCHAR2(30)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NOT NULL NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED NOT NULL DATE
LAST_DDL_TIME NOT NULL DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
NAMESPACE NOT NULL NUMBER
EDITION_NAME VARCHAR2(30)
SQL> create index ind1 on tab1(object_id,object_name);
Index created.
SQL> exec dbms_stats.gather_table_stats( user, 'TAB1');
PL/SQL procedure successfully completed.
SQL> select * from tab1;
53762 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2211052296
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 53762 | 4882K| 228 (1)| 00:00:03 |
| 1 | TABLE ACCESS FULL| TAB1 | 53762 | 4882K| 228 (1)| 00:00:03 |
--------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4352 consistent gets
0 physical reads
0 redo size
6269625 bytes sent via SQL*Net to client
39840 bytes received via SQL*Net from client
3586 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
53762 rows processed
SQL> select object_id,object_name from tab1;
53762 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4126477464
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 53762 | 1522K| 87 (0)| 00:00:02 |
| 1 | INDEX FAST FULL SCAN| IND1 | 53762 | 1522K| 87 (0)| 00:00:02 |
-----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3887 consistent gets
0 physical reads
0 redo size
1845080 bytes sent via SQL*Net to client
39840 bytes received via SQL*Net from client
3586 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
53762 rows processed
SQL> select * from tab1 where object_id=2001
2 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 2086140937
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 93 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB1 | 1 | 93 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND1 | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=2001)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1349 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select object_name from tab1 where object_id=2001;
Execution Plan
----------------------------------------------------------
Plan hash value: 3743434205
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| IND1 | 1 | 29 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OBJECT_ID"=2001)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
428 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
If we carefully look in our above example, query "select * from tab1" requires 4882K bytes whereas "select object_id,object_name from tab1" requires only 1522K bytes.
Likewise "select * from tab1 where object_id=2001" requires 93bytes whereas "select object_name from tab1 where object_id=2001" requires just 29 bytes.
All of this IO optimization is because of
- "INDEX FAST FULL SCAN" ==>Fast full index scans are an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the NOT NULL constraint. A fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation, because the data is not ordered by the index key. It reads the entire index using multiblock reads, unlike a full index scan, and can be parallelized
- "INDEX RANGE SCAN" ==> An index range scan is a common operation for accessing selective data. It can be bounded (bounded on both sides) or unbounded (on one or both sides). Data is returned in the ascending order of index columns. Multiple rows with identical values are sorted in ascending order by rowid.
During any index access, if index has all column required to satisfy an SQL, table will not be accessed at all(avoids extra IO/buffer gets).
Note: Bytes in above examples means "Estimate by the cost-based approach of the number of bytes accessed by the operation".
As a single user, we might not see much performance impact; but at high load this will have huge impact. Whoever developing code, please mark this valuable performance tuning hints.
As usual, your comments are always welcome.
-Thiru