Checkpoint Tuning and Troubleshooting Guide
Checkpoint Tuning and Troubleshooting Guide
1]
--------------------------------------------------------------------------------
Purpose:
incremental checkpoint and a description of four initialization parameters used for checkpoint tuning:
- FAST_START_MTTR_TARGET
- LOG_CHECKPOINT_INTERVAL
- LOG_CHECKPOINT_TIMEOUT
- LOG_CHECKPOINTS_TO_ALERT
It also explains how to interpret and handle checkpoint errors: 'Checkpoint not Complete' and 'Cannot
Allocate New Log' reported in the ALERT<sid>.LOG file.
Contents:
1. What is a Checkpoint?
2. Checkpoints and Performance
5. Understanding Checkpoint Error messages ("Cannot allocate new log" and "Checkpoint not
complete")
1. What is a Checkpoint?
A Checkpoint is a database event which synchronizes the modified data blocks in memory with the
datafiles on disk. It offers Oracle the means for ensuring the consistency of data modified by
transactions. The mechanism of writing modified blocks on disk in Oracle is not synchronized with the
commit of the corresponding transactions.
A checkpoint has two purposes: (1) to establish data consistency, and (2) enable faster database
recovery. How is recovery faster? Because all database changes up to the checkpoint have been
recorded in the datafiles, making it unnecessary to apply redo log entries prior to the checkpoint. The
checkpoint must ensure that all the modified buffers in the cache are really written to the corresponding
datafiles to avoid the loss of data
occurred (SCN)
2. Checkpoints and Performance
highly resource intensive operation, since all datafile headers are frozen
after a crash. This is why some customer sites which have a very low
tolerance for unscheduled system downtime will often choose this option.
this philosophy in many cases. Let's assume the database is up and running 95%
of the time, and unavailable 5% of the time from infrequent instance crashes
makes more sense to tune for the 95% case rather than the rare 5% downtime.
This bulletin assumes that performance is your number one priority and so
recommendations are made accordingly. Therefore, your goal is to minimize the frequency
- FAST_START_MTTR_TARGET
- LOG_CHECKPOINT_INTERVAL
- LOG_CHECKPOINT_TIMEOUT
- LOG_CHECKPOINTS_TO_ALERT
Recommendations are also given for handling "checkpoint not complete" messages
found in the alert log, which indicate a need to tune redo logs and
checkpoints.
Note: Log file switches will always override checkpoints caused by following paarameters.
FAST_START_MTTR_TARGET
to specify the number of seconds the database takes to perform crash recovery
FAST_START_MTTR_TARGET.
is not specified.
under the current MTTR setting and the estimated number of I/Os that would be
resulted by the current workload under other MTTR settings. This view helps
the user to assess the trade-off between runtime performance and setting
LOG_CHECKPOINT_INTERVAL
the incremental checkpoint target should lag the current log tail.
be set or set to 0.
On most Unix systems the operating system block size is 512 bytes.
mean the incremental checkpoint target should not lag the current log tail
by more than 5,120,000 (5M) bytes. . If the size of your redo log is 20M, you are taking 4
updated as the size of the redo log files is changed. The checkpoint
frequency is one of the factors which impacts the time required for the
checkpoints mean that if the system crashes, more time will be needed for the
database to recover. Shorter checkpoint intervals mean that the database will
This parameter also impacts the time required to complete a database recovery
operation during the roll forward phase of recovery. The actual recovery time
is dependent upon this time, and other factors, such as the type of failure
(instance or system crash, media failure, etc.), and the number of archived
LOG_CHECKPOINT_TIMEOUT
the incremental checkpoint target should lag the current log tail.
In another word, it specifies how long a dirty buffer in buffer cache can
remain dirty.
checkpoints mean that more time will be required during database recovery.
every "n" seconds, regardless of the transaction frequency. This can cause
window used for a stand-by database configuration. Log switches cause a checkpoint,but a checkpoint
does not cause a log switch. The only way to cause a log switch is manually with
Sizing of the online redo logs is critical for performance and recovery.
LOG_CHECKPOINTS_TO_ALERT
See Note:76713.1 to have more detail on How those instance parameters can influence the checkpoint.
4. Redo logs and Checkpoint
in progress, the checkpoint forced by the log switch will override the current
checkpoint.
The lag between the incremental checkpoint target and the log tail is
also limited by 90% of the smallest online log file size. This makes sure
that in most cases log switch would not need to wait for checkpoint.
Having your log files too small can increase checkpoint activity and reduce performance.
Oracle recommends the user to set all online log files to be the same size,
and have at least two log groups per thread. The alert log is a valuabletool for
monitoring the rate that log switches occur, and subsequently, checkpoints
occur.
If redo logs switch every 3 minutes, you will see performance degradation.
This indicates the redo logs are not sized large enough to efficiently handle
5. Understanding Checkpoint Error messages (“Cannot allocate new log” and “Checkpoint not
complete”)
Sometimes, you can see in your alert.log file, the following corresponding
messages:
This message indicates that Oracle wants to reuse a redo log file, but
the current checkpoint position is still in that log. In this case, Oracle must
wait until the checkpoint position passes that log. Because the
incremental checkpoint target never lags the current log tail by more than 90%
of the smallest log file size, this situation may be encountered if DBWR writes
too slowly, or if a log switch happens before the log is completely full,
This parameter has been deprecated since Oracle 9i in favor of parameter FAST_START_MTTR_TARGET.
7. Using Statspack to determine Checkpointing problems
Statspack snapshots can be taken every 15 minutes or so, these reports gather useful
information about number of checkpoints started and checkpoints completed and number
of database buffers written during checkpointing for that window of time . It also contains
statistics about redo activity. Gathering and comparing these snapshot reports gives you
Another important thing to watch in statspack report is the following wait events,
they could be a good indication about problems with the redo log throughput and checkpointing:
log switch/archive
In the case when one or more of the above wait events is repeated frequently
with considerable values then you need to take an action like adding More
online redo log files or increasing their sizes and/or modifying checkpointing parameters.
Related
--------------------------------------------------------------------------------
Products
--------------------------------------------------------------------------------
Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition
Keywords
--------------------------------------------------------------------------------
CHECKPOINT
Errors
--------------------------------------------------------------------------------
ERROR HANDLING