Tuesday, October 7, 2008

Response-Time Analysis Made Easy in Oracle Database 10g

Response-Time Analysis Made Easy in Oracle Database 10g

 

With Oracle Database 10g, many previously difficult-to-attain response-time metrics are now at your fingertips.

Downloads for this article:

 Oracle Database 10g

 

Historically, in trying to achieve maximum database performance, Oracle DBAs and performance analysts have fought an uphill battle to obtain solid response time metrics for system as well as user session activity. The problem facing DBAs has always had two facets: first, determining exactly "where" the database or user sessions have been spending their time; and second, determining the objective nature of the user experience.

Given all the possible activities and interactions within the database, these tasks are far from trivial. The Oracle Wait Interface, which was introduced many versions ago, has been a great jump-start for admins who know how to use it, but even it has lacked the ideal ability to tell a DBA how well the system or user sessions have efficiently processed transactions or queries. Enabling and poring over trace files can garner this level of detail, but for most overworked DBAs who have to manage large database farms, this exercise is a luxury.

Fortunately, those Oracle DBAs who have upgraded to Oracle Database 10g will find major response-time enhancements that allow a much better picture to be presented about system and session-level response-time metrics. Most significantly, the Oracle Automatic Database Diagnostic Monitor (ADDM) provides insights into response times and much more by automatically analyzing collected statistics, identifying problem areas, and even providing recommendations through the Oracle Enterprise Manager Grid Control GUI.

Furthermore, and most relevant to our discussion here, historical mechanisms in Oracle Database 10g allow DBAs to look back in time to perform their own response-time trend analysis, which helps them determine peak and nonpeak transaction/system times as well as locate rogue processes and SQL that elongate batch cycles or ETL jobs.

In this article, I'll explore the use of some of these historical mechanisms at the system, session, and SQL levels. For more information about ADDM, see the Oracle documentation as well as the "ADDM and SQL Tuning Advisor" installment of Arup Nanda's "Oracle Database 10g: Top 20 Features for DBAs."

System-Level Response-Time Analysis

Starting at the global or system level, DBAs typically want answers to these questions:

•           In general, how well is my database running? What defines efficiency?

•           What average response time are my users experiencing?

•           Which activities affect overall response time the most?

The answers to these questions have been fairly elusive for DBAs before Oracle Database 10g, but now such metrics can be somewhat easy to capture if you happen to be using the latest and greatest Oracle database.

First of all, part of the answer to how well, in general, a database is running can be obtained by issuing this query in Oracle Database 10g:

select  METRIC_NAME,

        VALUE

from    SYS.V_$SYSMETRIC

where   METRIC_NAME IN ('Database CPU Time Ratio',

                        'Database Wait Time Ratio') AND

        INTSIZE_CSEC =

        (select max(INTSIZE_CSEC) from SYS.V_$SYSMETRIC);

 

METRIC_NAME                         VALUE

------------------------------ ----------

Database Wait Time Ratio                6

Database CPU Time Ratio                94

The Oracle Database 10g V$SYSMETRIC view contains several excellent response-time metrics, two of which are the Database Wait Time Ratio and Database CPU Time Ratio. The query above shows the latest snapshot of these two statistics, which help you determine if your database is currently experiencing a high percentage of waits/bottlenecks vs. smoothly running operations. The Database CPU Time Ratio is calculated by dividing the amount of CPU expended in the database by the amount of "database time," which is defined as the time spent by the database on user-level calls (with instance background process activity being excluded). High values (90-95+ percent) are good and indicate few wait/bottleneck actions, but take this threshold only as a general rule of thumb because every system is different.

You can also take a quick look over the last hour to see if the database has experienced any dips in overall performance by using this query:

select  end_time,

        value

from    sys.v_$sysmetric_history

where   metric_name = 'Database CPU Time Ratio'

order by 1;

 

END_TIME                  VALUE

-------------------- ----------

22-NOV-2004 10:00:38         98

22-NOV-2004 10:01:39         96

22-NOV-2004 10:02:37         99

22-NOV-2004 10:03:38        100

22-NOV-2004 10:04:37         99

22-NOV-2004 10:05:38         77

22-NOV-2004 10:06:36        100

22-NOV-2004 10:07:37         96

22-NOV-2004 10:08:39        100

.

.

And, you can get a good idea of the minimum, maximum, and average values of overall database efficiency by querying the V$SYSMETRIC_SUMMARY view with a query such as this:

select  CASE METRIC_NAME

            WHEN 'SQL Service Response Time' then 'SQL Service Response Time (secs)'

            WHEN 'Response Time Per Txn' then 'Response Time Per Txn (secs)'

            ELSE METRIC_NAME

            END METRIC_NAME,

                        CASE METRIC_NAME

            WHEN 'SQL Service Response Time' then ROUND((MINVAL / 100),2)

            WHEN 'Response Time Per Txn' then ROUND((MINVAL / 100),2)

            ELSE MINVAL

            END MININUM,

                        CASE METRIC_NAME

            WHEN 'SQL Service Response Time' then ROUND((MAXVAL / 100),2)

            WHEN 'Response Time Per Txn' then ROUND((MAXVAL / 100),2)

            ELSE MAXVAL

            END MAXIMUM,

                        CASE METRIC_NAME

            WHEN 'SQL Service Response Time' then ROUND((AVERAGE / 100),2)

            WHEN 'Response Time Per Txn' then ROUND((AVERAGE / 100),2)

            ELSE AVERAGE

            END AVERAGE

from    SYS.V_$SYSMETRIC_SUMMARY

where   METRIC_NAME in ('CPU Usage Per Sec',

                      'CPU Usage Per Txn',

                      'Database CPU Time Ratio',

                      'Database Wait Time Ratio',

                      'Executions Per Sec',

                      'Executions Per Txn',

                      'Response Time Per Txn',

                      'SQL Service Response Time',

                      'User Transaction Per Sec')

ORDER BY 1

 

METRIC_NAME                       MINIMUM    MAXIMUM    AVERAGE

------------------------------ ---------- ---------- ----------

CPU Usage Per Sec                       0          7          1

CPU Usage Per Txn                       1         29          8

Database CPU Time Ratio                61        100         94

Database Wait Time Ratio                0         39          5

Executions Per Sec                      2         60          8

Executions Per Txn                     16        164         41

Response Time Per Txn (secs)            0        .28        .08

SQL Service Response Time (sec          0          0          0

User Transaction Per Sec                0          1          0

The query above contains more response-time metrics than simply the Database CPU and Wait Time Ratios (we'll cover those later), but you can see the benefit in being able to acquire this information. For this particular instance, the average Database CPU Time Ratio is 94, which is well within our acceptable limits.

The next question DBAs pose at the system level involves the average level of response time that their user community is experiencing. (Prior to Oracle Database 10g this type of data was difficult to capture, but not anymore.) The query shown above that interrogates the V$SYSMETRIC_SUMMARY view tells us what we need to know. If complaints of unacceptable response times are mounting from users, the DBA can check the Response Time Per Txn and SQL Service Response Time metrics to see if a database issue exists. For example, the statistics shown above report that the maximum response time per user transaction has been only .28 second, with the average response time being a blazing .08 second. Oracle certainly wouldn't be to blame in this case.

If, however, response times are longer than desired, the DBA will then want to know what types of user activities are responsible for making the database work so hard. Again, before Oracle Database 10g, this information was more difficult to acquire, but now the answer is only a query away:

select  case db_stat_name

            when 'parse time elapsed' then

                'soft parse time'

            else db_stat_name

            end db_stat_name,

        case db_stat_name

            when 'sql execute elapsed time' then

                time_secs - plsql_time

            when 'parse time elapsed' then

                time_secs - hard_parse_time

            else time_secs

            end time_secs,

        case db_stat_name

            when 'sql execute elapsed time' then

                round(100 * (time_secs - plsql_time) / db_time,2)

            when 'parse time elapsed' then

                round(100 * (time_secs - hard_parse_time) / db_time,2) 

            else round(100 * time_secs / db_time,2) 

            end pct_time

from

(select stat_name db_stat_name,

        round((value / 1000000),3) time_secs

    from sys.v_$sys_time_model

    where stat_name not in('DB time','background elapsed time',

                            'background cpu time','DB CPU')),

(select round((value / 1000000),3) db_time

    from sys.v_$sys_time_model

    where stat_name = 'DB time'),

(select round((value / 1000000),3) plsql_time

    from sys.v_$sys_time_model

    where stat_name = 'PL/SQL execution elapsed time'),

(select round((value / 1000000),3) hard_parse_time

    from sys.v_$sys_time_model

    where stat_name = 'hard parse elapsed time')

order by 2 desc;

 

 

DB_STAT                          TIME_SECS       PCT_TIME

-----------------------------    ---------       --------

sql execute elapsed time         13263.707       45.84                                

PL/SQL execution elapsed time    13234.738       45.74                                

hard parse elapsed time           1943.687        6.72                                 

soft parse time                    520.584         1.8

.

.

The example output above shows a database that has spent the vast majority of its time handling SQL and PL/SQL requests. Complete descriptions of all the statistics supported by the V$SYS_TIME_MODEL view can be found here.

In addition to active time, a DBA will want to know the global wait times as well. Prior to Oracle Database 10g, a DBA had to view individual wait events to understand waits and bottlenecks, but now Oracle provides a summary/rollup mechanism for waits via wait classes:

select  WAIT_CLASS,

        TOTAL_WAITS,

        round(100 * (TOTAL_WAITS / SUM_WAITS),2) PCT_WAITS,

        ROUND((TIME_WAITED / 100),2) TIME_WAITED_SECS,

        round(100 * (TIME_WAITED / SUM_TIME),2) PCT_TIME

from

(select WAIT_CLASS,

        TOTAL_WAITS,

        TIME_WAITED

from    V$SYSTEM_WAIT_CLASS

where   WAIT_CLASS != 'Idle'),

(select  sum(TOTAL_WAITS) SUM_WAITS,

        sum(TIME_WAITED) SUM_TIME

from    V$SYSTEM_WAIT_CLASS

where   WAIT_CLASS != 'Idle')

order by 5 desc;

 

WAIT_CLASS      TOTAL_WAITS  PCT_WAITS TIME_WAITED_SECS   PCT_TIME

--------------- ----------- ---------- ---------------- ----------

User I/O            2245204       7.48          4839.43      54.39

System I/O          2438387       8.12          2486.21      27.94

Application          920385       3.07           513.56       5.77

Other                 39962        .13           422.36       4.75

Commit               200872        .67           284.76        3.2

Network            24133213      80.38           162.26       1.82

Concurrency            6867        .02           102.63       1.15

Configuration         39377        .13            86.21        .97

It's much easier to tell now that the bulk of overall wait time is due, for example, to user I/O waits than to try to tally individual wait events to get a global picture. As with response-time metrics, you can also look back in time over the last hour with a query like this one:

select  to_char(a.end_time,'DD-MON-YYYY HH:MI:SS') end_time,

        b.wait_class,

        round((a.time_waited / 100),2) time_waited

from    sys.v_$waitclassmetric_history a,

        sys.v_$system_wait_class b

where   a.wait_class# = b.wait_class# and

        b.wait_class != 'Idle'

order by 1,2;

 

END_TIME             WAIT_CLASS      TIME_WAITED

-------------------- --------------- -----------

22-NOV-2004 11:28:37 Application               0

22-NOV-2004 11:28:37 Commit                  .02

22-NOV-2004 11:28:37 Concurrency               0

22-NOV-2004 11:28:37 Configuration             0

22-NOV-2004 11:28:37 Network                 .01

22-NOV-2004 11:28:37 Other                     0

22-NOV-2004 11:28:37 System I/O              .05

22-NOV-2004 11:28:37 User I/O                  0

.

.

You can, of course, just focus on a single SID with the V$SESS_TIME_MODEL view and obtain data for all statistical areas of a session. You can also view current session wait activity using the new wait classes using the following query:

select  a.sid,

        b.username,

        a.wait_class,

        a.total_waits,

        round((a.time_waited / 100),2) time_waited_secs

from    sys.v_$session_wait_class a,

        sys.v_$session b

where   b.sid = a.sid and

        b.username is not null and

        a.wait_class != 'Idle'

order by 5 desc;

 

SID USERNAME   WAIT_CLASS      TOTAL_WAITS TIME_WAITED_SECS

--- ---------- --------------- ----------- ----------------

257 SYSMAN     Application          356104            75.22

255 SYSMAN     Commit                14508            25.76

257 SYSMAN     Commit                25026            22.02

257 SYSMAN     User I/O              11924            19.98

.

.

.

After this stage, you can check the standard individual wait events as you've been able to do in earlier versions of Oracle with V$SESSION_WAIT and V$SESSION_EVENT. You'll also find the new wait classes in these two modified views with Oracle Database 10g.

If you need to look back in time to discover what sessions were logged on and consuming the most resources, you can use the following query. In the example below, we're looking at activity from midnight to 5 a.m. on November 21, 2004, that involved user I/O waits:

select  sess_id,

        username,

        program,

        wait_event,

        sess_time,

        round(100 * (sess_time / total_time),2) pct_time_waited

from

(select a.session_id sess_id,

        decode(session_type,'background',session_type,c.username) username,

        a.program program,

        b.name wait_event,

        sum(a.time_waited) sess_time

from    sys.v_$active_session_history a,

        sys.v_$event_name b,

        sys.dba_users c

where   a.event# = b.event# and

        a.user_id = c.user_id and

        sample_time > '21-NOV-04 12:00:00 AM' and

        sample_time < '21-NOV-04 05:00:00 AM' and

        b.wait_class = 'User I/O'

group by a.session_id,

        decode(session_type,'background',session_type,c.username),

        a.program,

        b.name),

(select sum(a.time_waited) total_time

from    sys.v_$active_session_history a,

        sys.v_$event_name b

where   a.event# = b.event# and

        sample_time > '21-NOV-04 12:00:00 AM' and

        sample_time < '21-NOV-04 05:00:00 AM' and

        b.wait_class = 'User I/O')

order by 6 desc;

 

SESS_ID USERNAME PROGRAM    WAIT_EVENT                SESS_TIME PCT_TIME_WAITED

------- -------- ---------- ------------------------- ---------- -------------

    242 SYS      exp@RHAT9K db file scattered read       3502978         33.49

    242 SYS      oracle@RHA db file sequential read      2368153         22.64

    242 SYS      oracle@RHA db file scattered read       1113896         10.65

    243 SYS      oracle@RHA db file sequential read       992168          9.49

The Oracle Database 10g V$ACTIVE_SESSION_HISTORY view comes into play here to provide an insightful look back in time at session experiences for a given time period. This view gives you a lot of excellent information without the need for laborious tracing functions. We'll see more use of it in the next section, which deals with analyzing the response times of SQL statements.

SQL Response-Time Analysis

Examining the response time of SQL statements became easier in Oracle9i, and with Oracle Database 10g, DBAs have many tools at their disposal to help them track inefficient database code.

Historically the applicable V$ view here has been V$SQLAREA. Starting with Oracle9i, Oracle added the ELAPSED_TIME and CPU_TIME columns, which have been a huge help in determining the actual end user experience of a SQL statement execution (at least, when dividing them by the EXECUTIONS column, which produces the average amount of time per execution).

In Oracle Database 10g, six new wait-related and timing columns have been added to V$SQLAREA:

•           APPLICATION_WAIT_TIME

•           CONCURRENCY_WAIT_TIME

•           CLUSTER_WAIT_TIME

•           USER_IO_WAIT_TIME

•           PLSQL_EXEC_TIME

•           JAVA_EXEC_TIME

The new columns are helpful in determining, for example, the amount of time that a procedure spends in PL/SQL code vs. standard SQL execution, and if a SQL statement has experienced any particular user I/O waits. For example, a query you could use to find the top five SQL statements with the highest user I/O waits would be:

select *

from

(select sql_text,

        sql_id,

        elapsed_time,

        cpu_time,

        user_io_wait_time

from    sys.v_$sqlarea

order by 5 desc)

where rownum < 6;

 

SQL_TEXT                  SQL_ID       ELAPSED_TIME CPU_TIME  USER_IO_WAIT_TIME

------------------------- ------------ ------------ ---------- ---------------

select /*+ rule */ bucket db78fxqxwxt7     47815369   19000939            3423

SELECT :"SYS_B_0" FROM SY agdpzr94rf6v     36182205   10170226            2649

select obj#,type#,ctime,m 04xtrk7uyhkn     28815527   16768040            1345

select grantee#,privilege 2q93zsrvbdw4     28565755   19619114             803

select /*+ rule */ bucket 96g93hntrzjt      9411028    3754542             606

Of course, getting the SQL statements with the highest elapsed time or wait time is good, but you need more detail to get to the heart of the matter—which is where the V$ACTIVE_SESSION_HISTORY view again comes into play. With this view, you can find out what actual wait events delayed the SQL statement along with the actual files, objects, and object blocks that caused the waits (where applicable).

For example, let's say you've found a particular SQL statement that appears to be extremely deficient in terms of user I/O wait time. You can issue the following query to get the individual wait events associated with the query along with the corresponding wait times, files, and objects that were the source of those waits:

select event,

        time_waited,

        owner,

        object_name,

        current_file#,

        current_block#

from    sys.v_$active_session_history a,

        sys.dba_objects b

where   sql_id = '6gvch1xu9ca3g' and

        a.current_obj# = b.object_id and

        time_waited <> 0;

 

EVENT                     TIME_WAITED OWNER  OBJECT_NAME           file  block

------------------------- ----------- ------ --------------------- ---- ------

db file sequential read         27665 SYSMAN MGMT_METRICS_1HOUR_PK    3  29438

db file sequential read          3985 SYSMAN SEVERITY_PRIMARY_KEY     3  52877

Of course, you can use V$ACTIVE_SESSION_HISTORY in a historical fashion to narrow down unoptimized SQL statements for a particular time period. The point is that Oracle Database 10g makes it a lot easier to conduct response-time analysis on SQL statements with simplified data dictionary views, as opposed to the time-consuming trace-and-digest method.

Conclusion

DBAs and performance analysts who manage the performance of Oracle Database 10g will find many of the response-time metrics they've yearned for over the years now at their fingertips in the latest release of Oracle's flagship database. Such statistics will help accelerate the process of finding the proverbial needle in the haystack of large and complex database performance situations.

________________________________________

Robin Schumacher (Robin.Schumacher@embarcadero.com) is vice president of product management at Embarcadero Technologies, a leading supplier of database software tools. He has 16 years' experience with database engines and has been a feature writer and software reviewer for many database-related publications. Robin is also the author of Oracle Performance Troubleshooting (Rampant TechPress), which has been updated to include Oracle Database 10g diagnostics.