Oracle Redo and Undo
目前成都创新互联已为上1000+的企业提供了网站建设、域名、虚拟主机、网站托管、企业网站设计、中宁网站维护等服务,公司将坚持客户导向、应用为本的策略,正道将秉承"和谐、参与、激情"的文化,与客户和合作伙伴齐心协力一起成长,共同发展。
《 Oracle Core Essential Internals for DBAs and Developers 》

在我看来,Oracle 最重要的特性,是 Oracle 6 版本出现的 change vector , 描述数据块改变的机制,也是Redo 和 undo 的核心。
but in my opinion the single most important feature of Oracle is one that first appeared in version 6: the change vector, a mechanism for describing changes to data blocks, the heart of redo and undo.
一 Basic Data Change 基础数据改变
Oracle 有趣的一个特点是会记录两次数据,第一次是将最近的数据写入到 data files( 为了提高效率,最新的数据记录在内存中,批量刷新到 data files) ,第二次是将数据写入到 redo log files( 用来描述如何重建 data files)
One of the strangest features of an Oracle database is that it records your data twice. One copy of the data exists in a set of data files which hold something that is nearly the latest, up-to-date version of your data (although the newest version of some of the data will be in memory, waiting to be copied to disc); the other copy of the data exists as a set of instructions—the redo log files—telling you how to re-create the content of the data files from scratch.
1.1 The Approach 方法
在数据库更改时,你发出了更改数据的命令,Oracle并不是马上在data file(如果数据在内存中,去data buffer cache查找数据)里找到对应的数据,然后去修改对应的数据。而是通过如下四个关键步骤来完成数据的修改:
1.创建如何修改数据的描述。
2.创建如何回退数据的描述。
3.创建如何产生回退数据描述的描述。
4.修改数据。
Under the Oracle approach to data change, when you issue an instruction to change an item of data, Oracle doesn’t just go to a data file (or the in-memory copy if the item happens to be buffered), find the item, and change it. Instead, Oracle works through four critical steps to make the change happen.
Stripped to the bare minimum of detail, these are
1. Create a description of how to change the data item.
2. Create a description of how to re-create the original data item if needed.
3. Create a description of how to create the description of how to re-create the original data item.
4. Change the data item.
其中第三步看上去不好理解,下面换一种方式描述这四个步骤:
1.创建生成data block对应的Redo change vector 。
2.创建Undo record,在Undo表空间生成Undo Block,用于数据回退。
3.创建生成undo record对应的Redo change vector 。
4.修改数据 。
The tongue-twisting nature of the third step gives you some idea of how convoluted the mechanism is, but all will become clear. With the substitution of a few technical labels in these steps, here’s another way of describing the actions of changing a data block:
1. Create a redo change vector describing the change to the data block.
2. Create an undo record for insertion into an undo block in the undo tablespace.
3. Create a redo change vector describing the change to the undo block.
4. Change the data block.
具体技术细节、执行顺序和Oracle版本、事务的性质、执行变更命令前各数据块的状态等有关。
The exact sequence of steps and the various technicalities around the edges vary depending on the version of Oracle, the nature of the transaction, how much work has been done so far in the transaction, what the states of the various database blocks were before you executed the instruction, whether or not you’re looking at the first change of a transaction, and so on.
1.2 An Example 例:
从一个最简单的数据更改示例开始,更新OLTP事务中间的一行,该事务已经更新了一组分散的行。事实上,在历史(以及最普遍的)案例中,步骤的顺序与我在前一节中列出的顺序不同。
I’m going to start with the simplest example of a data change, which you might expect to see as you updated a single row in the middle of an OLTP transaction that had already updated a scattered set of rows. In fact, the order of the steps in the historic (and most general) case is not the order I’ve listed in the preceding section.
这些步骤实际上是按照3、1、2、4的顺序进行的,在修改undo block和data block之前,将这两个重做更改向量组合成一个重做更改记录并复制到redo log (buffer)中。这意味着更准确的版本是:
The steps actually go in the order 3, 1, 2, 4, and the two redo change vectors are combined into a single redo change record and copied into the redo log (buffer) before the undo block and data block are modified (in that order). This means a slightly more accurate version of my list of actions would be:
1.创建生成undo record对应的Redo change vector (描述undo block的改变)。
2.创建生成data block对应的Redo change vector (描述redo block的改变)。
3.创建生成undo record和data block的Redo change vector 合并成一个Redo record写入log buffer(便于重做等)。
4.创建Undo record写入Undo block(便于事务回退等)。
5.更改数据。
1. Create a redo change vector describing how to insert an undo record into an undo block.
2. Create a redo change vector for the data block change.
3. Combine the redo change vectors into a redo record and write it to the log buffer.
4. Insert the undo record into the undo block.
5. Change the data block.
下面是一个小示例,取自一个运行Oracle 9.2.0.8的系统(在上一个版本中,很容易创建该机制的最一般示例)。
Here’s a little sample, taken from a system running Oracle 9.2.0.8 (the last version in which it’s easy to create the most generic example of the mechanism).
我们将执行一个update语句,通过在两个表块之间来回跳转来更新五行,并在更新前后将各种信息位转储到流程跟踪文件中。
We’re going to execute an update statement that updates five rows by jumping back and forth between two table blocks, dumping various bits of information into our process trace file before and after the update.
我需要使我的更新有点复杂,因为我希望示例尽可能简单,同时避免一些特殊情况。
I need to make my update a little bit complicated because I want the example to be as simple as possible while avoiding a few “special case” details.
我编写的代码将更新表的第一个块中的第三、第四和第五行 数据 , 并且在每更新一条后 更新第二个块中的一行 ( core_demo_02.sql ),它会更改每个记录的第三列(varchar2类型的字段),将其由xxxxxx(小写6个字符)更改为YYYYYYYYYY(大写10个字符)。
The code I’ve written will update the third, fourth, and fifth rows in the first block of a table but will update a row in the second block of the table between each of these three updates (see core_demo_02.sql in the code library on www.apress.com), and it’ll change the third column of each row—a varchar2() column—from xxxxxx (lowercase, six characters) to YYYYYYYYYY (uppercase, ten characters).
https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs
--- 使用 core_demo_02.sql 脚本如下:
---core_demo_02.sql
---1 准备
start setenv
set timing off
execute dbms_random.seed ( 0 )
drop table t1 ;
begin
execute immediate 'purge recyclebin';
exception
when others then
null;
end;
begin
dbms_stats.set_system_stats ( 'MBRC' , 8 );
dbms_stats.set_system_stats ( 'MREADTIM' , 26 );
dbms_stats.set_system_stats ( 'SREADTIM' , 12 );
dbms_stats.set_system_stats ( 'CPUSPEED' , 800 );
exception
when others then
null ;
end ;
begin
execute immediate 'begin dbms_stats.delete_system_stats; end;' ;
exception
when others then
null ;
end ;
begin
execute immediate 'alter session set "_optimizer_cost_model"=io' ;
exception
when others then
null ;
end ;
/
---2 创建表和索引
create table t1
as
select
2 * rownum - 1 id ,
rownum n1 ,
cast ( 'xxxxxx' as varchar2 ( 10 )) v1 ,
rpad ( '0' , 100 , '0' ) padding
from
all_objects
where
rownum <= 60
union all
select
2 * rownum id ,
rownum n1 ,
cast ( 'xxxxxx' as varchar2 ( 10 )) v1 ,
rpad ( '0' , 100 , '0' ) padding
from
all_objects
where
rownum <= 60
;
create index t1_i1 on t1 ( id );
---3 收集统计信息
begin
dbms_stats.gather_table_stats ( ownname => user ,
tabname => 'T1' ,
method_opt => 'for all columns size 1' );
end ;
---4 查看表占用的块情况,和每一个块有多少条数据 每一个块都有60条记录
select
dbms_rowid.rowid_block_number ( rowid ) block_number ,
count (*) rows_per_block
from
t1
group by
dbms_rowid.rowid_block_number ( rowid )
order by
block_number
;
BLOCK_NUMBER ROWS_PER_BLOCK
------------ --------------
114153 60
114154 60
---5 转储数据块
alter system switch logfile ;
execute dbms_lock.sleep ( 2 )
spool core_demo_02.lst
---dump_seg存储过程需要执行c_dump_seg.sql生成
---https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs/blob/master/ch_03/c_dump_seg.sql
execute dump_seg ( 't1' )
---6 更新数据
update
/*+ index(t1 t1_i1) */
t1
set
v1 = 'YYYYYYYYYY'
where
id between 5 and 9
;
---7 转储更新块之后的数据块和undo块
alter system checkpoint ;
execute dump_seg ( 't1' )
---dump_undo_block存储过程
---https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs/blob/master/ch_02/c_dump_undo_block.sql
execute dump_undo_block
---8 转储redo块
rollback ;
commit ;
---dump_log存储过程
---https://github.com/Apress/oracle-core-esntl-internals-for-dbas-devs/blob/master/ch_02/c_dump_log.sql
execute dump_log
spool off
[oracle@cjcos trace]$ pwd
/u01/app/oracle/diag/rdbms/cjcdb/cjcdb/trace
[oracle@cjcos trace]$ vim cjcdb_ora_21778.trc

下面是更新前后块中第五行的转储信息:
Here’s a symbolic dump of the fifth row in the block before and after the update:
更新前 第三列(col 2)长度是6,数据是78(x的16进制ASCII码是78)
tab 0, row 4, @0x1d3f
tl: 117 fb: --H-FL-- lb: 0x0 cc: 4
col 0: [ 2] c1 0a
col 1: [ 2] c1 06
col 2: [ 6] 78 78 78 78 78 78
col 3: [100]
30 30 30 30 30 30 30 30 … 30 30 30 30 30 (for 100 characters)
更新后 第三列(col 2)长度是10,数据是59(Y的16进制ASCII码是59)
tab 0, row 4, @0x2a7
tl: 121 fb: --H-FL-- lb: 0x2 cc: 4
col 0: [ 2] c1 0a
col 1: [ 2] c1 06
col 2: [10] 59 59 59 59 59 59 59 59 59 59
col 3: [100]
30 30 30 30 30 30 30 30 … 30 30 30 30 30 (for 100 characters)
我们可以看到第三列(col2)长度变成了10,是10个59(Y的十六进制ASCII码是59),同时行地址由@0x1d3f变成了@0x2a7,说明这一行的空间容不下新增的数据,换了新地址。
As you can see, the third column (col 2:) of the table has changed from a string of 78s (x) to a longer string of 59s (Y). Since the update increased the length of the row, Oracle had to copy it into the block’s free space to make the change, which is why its starting byte position has moved from @0x1d3f to @0x2a7. It is still row 4 (the fifth row) in the block, though; if we were to check the block’s row directory, we would see that the fifth entry has been updated to point to this new row location.
同时,我们能看到lb(lock byte)由0x0变成了0x2,表明这条记录被该块事务槽列表中的第二个事务槽所标识的事务锁定。事务槽可以在块首部看到。我们将在第三章更深入地讨论它。
I dumped the block before committing the change, which is why you can see that the lock byte (lb:) has changed from 0x0 to 0x2—the row is locked by a transaction identified by the second slot in the block’s interested transaction list (ITL). We will be discussing ITLs in more depth in Chapter 3.
我们来看看不同的变化向量。首先,从current redo log file转储中,我们可以检查变化矢量,描述我们对表做了什么:
So let’s look at the various change vectors. First, from a symbolic dump of the current redo log file, we can examine the change vector describing what we did to the table:
TYP:0 CLS: 1 AFN:11 DBA:0x02c0018a SCN:0x0000.03ee485a SEQ: 2 OP:11.5
KTB Redo
op: 0x02 ver: 0x01
op: C uba: 0x0080009a.09d4.0f
KDO Op code: URP row dependencies Disabled
xtype: XA bdba: 0x02c0018a hdba: 0x02c00189
itli: 2 ispac: 0 maxfr: 4863
tabn: 0 slot: 4(0x4) flag: 0x2c lock: 2 ckix: 16
ncol: 4 nnew: 1 size: 4
col 2: [10] 59 59 59 59 59 59 59 59 59 59
可以看到,第五行URP(更新行块),第六行告诉我们正在更新块的块地址(bdba)和对象的段头块(hdba)
I’ll pick out just the most significant bits of this change vector. You can see that the Op code: in line 5 is URP (update row piece). Line 6 tells us the block address of the block we are updating (bdba:) and the segment header block for that object (hdba:).
在第7行,我们看到执行这个更新的事务使用的是ITL条目2 (itli:),它是对tabn: 0 slot: 4的更新(第一个表中的第5行;请记住,集群中的块可以容纳来自许多表的数据,因此每个块必须包含一个列表,用于标识块中有行的表)。
In line 7 we see that the transaction doing this update is using ITL entry 2 (itli:), which confirms
what we saw in the block dump: it’s an update to tabn: 0 slot: 4 (fifth row in the first table; remember that blocks in a cluster can hold data from many tables, so each block has to include a list identifying the tables that have rows in the block).
最后,在最后两行中,我们看到该行有四列(ncol:),其中我们更改了一列(nnew:),将行长度(size:)增加了4个字节,并且我们将第2列更改为YYYYYYYYYY。
Finally, in the last two lines, we see that the row has four columns (ncol:), of which we are changing one (nnew:), increasing the row length (size:) by 4 bytes, and that we are changing column 2 to YYYYYYYYYY.
接下来我们需要看到的是如何 撤销我们的变更 。这将以undo record撤销记录的形式出现,从相关的undo block块中转储。在第3章中将介绍寻找正确的undo block的方法。下面的文本显示了来自块转储的相关记录:
The next thing we need to see is a description of how to put back the old data. This appears in the form of an undo record, dumped from the relevant undo block. The methods for finding the correct undo block will be covered in Chapter 3. The following text shows the relevant record from the symbolic block dump:
*-----------------------------
* Rec #0xf slt: 0x1a objn: 45810(0x0000b2f2) objd: 45810 tblspc: 12(0x0000000c)
* Layer: 11 (Row) opc: 1 rci 0x0e
Undo type: Regular undo Last buffer split: No
Temp Object: No
Tablespace Undo: No
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo
op: 0x02 ver: 0x01
op: C uba: 0x0080009a.09d4.0d
KDO Op code: URP row dependencies Disabled
xtype: XA bdba: 0x02c0018a hdba: 0x02c00189
itli: 2 ispac: 0 maxfr: 4863
tabn: 0 slot: 4(0x4) flag: 0x2c lock: 0 ckix: 16
ncol: 4 nnew: 1 size: -4
col 2: [ 6] 78 78 78 78 78 78
再次,我将忽略一些细节,只是指出这个undo记录的重要组成部分(对我们来说) , 我们看到的行大小减少4个字节,第2列是xxxxxx。
Again, I’m going to ignore a number of details and simply point out that the significant part of this undo record (for our purposes) appears in the last five lines and comes close to repeating the content of the redo change vector, except that we see the row size decreasing by 4 bytes as column 2 becomes xxxxxx.
但是这是一个undo记录,写在undo块中,存储在其中一个数据文件的undo表空间中,而且,正如我前面指出的,Oracle保存所有内容的两个副本,一个在数据文件中,一个在重做日志文件中。因为我们已经将一些内容放入了数据文件中(即使它在undo表空间中),所以我们需要创建一个关于我们所做事情的描述,并将该描述写入重做日志文件中。我们需要另一个重做改变矢量,它是这样的:
But this is an undo record, written into an undo block and stored in the undo tablespace in one of the data files, and, as I pointed out earlier, Oracle keeps two copies of everything, one in the data files and one in the redo log files. Since we’ve put something into a data file (even though it’s in the undo tablespace), we need to create a description of what we’ve done and write that description into the redo log file. We need another redo change vector, which looks like this:
TYP:0 CLS:36 AFN:2 DBA:0x0080009a SCN:0x0000.03ee485a SEQ: 4 OP:5.1
ktudb redo: siz: 92 spc: 6786 flg: 0x0022 seq: 0x09d4 rec: 0x0f
xid: 0x000a.01a.0000255b
ktubu redo: slt: 26 rci: 14 opc: 11.1 objn: 45810 objd: 45810 tsn: 12
Undo type: Regular undo Undo type: Last buffer split: No
Tablespace Undo: No
0x00000000
KDO undo record:
KTB Redo
op: 0x02 ver: 0x01
op: C uba: 0x0080009a.09d4.0d
KDO Op code: URP row dependencies Disabled
xtype: XA bdba: 0x02c0018a hdba: 0x02c00189
itli: 2 ispac: 0 maxfr: 4863
tabn: 0 slot: 4(0x4) flag: 0x2c lock: 0 ckix: 16
ncol: 4 nnew: 1 size: -4
col 2: [ 6] 78 78 78 78 78 78
重做更改向量的下半部分看起来非常像撤消记录,这并不奇怪,因为它毕竟是我们希望放入撤消块的内容的描述。
The bottom half of the redo change vector looks remarkably like the undo record, which shouldn’t be a surprise as it is, after all, a description of what we want to put into the undo block.
重做改变向量的上半部分告诉我们下半部分去了哪里,并包含了一些关于它要 写入 块的块头信息。
The top half of the redo change vector tells us where the bottom half goes, and includes some information about the block header information of the block it’s going into.
最重要的细节,就我们的目的而言,是 第一行的 DBA(数据块地址),该标识块0 x0080009a:如果你知道Oracle块在十六进制数字,你就会认识到这是 2号数据文件 154 号 块(新创建的数据库中撤消表空间的文件号)。
The most significant detail, for our purposes, is the DBA: (data block address) in line 1, which identifies block 0x0080009a: if you know your Oracle block numbers in hex, you’ll recognize that this is block 154 of data file 2 (the file number of the undo tablespace in a newly created database).
1.3 Debriefing 总结
那么,到目前为止,我们取得了什么进展?当我们更改一个数据块时,Oracle会在undo块中插入一个undo记录,告诉我们如何逆转这个更改。但是,对于数据库中发生的每一个块更改,Oracle都会创建一个redo change向量来描述如何进行更改,并且在进行更改之前创建这些向量。历史上,它在创建“正向”更改向量之前创建了撤销更改向量,因此,我前面描述的事件序列(参见图2-1)如下:
So where have we got to so far? When we change a data block, Oracle inserts an undo record into an undo block to tell us how to reverse that change. But for every change that happens to a block in the database, Oracle creates a redo change vector describing how to make that change, and it creates the vectors before it makes the changes. Historically, it created the undo change vector before it created the “forward” change vector, hence, the following sequence of events (see Figure 2-1) that I described earlier occurs:

图2 - 1。在事务中间进行 小 更新的事件序列
1.为撤销记录创建更改向量。
2.为数据块创建更改向量。
3.合并更改向量并将重做记录写入重做日志(缓冲区)。
4.将undo记录插入undo块中。
5.对数据块进行更改。
Figure 2-1. Sequence of events for a small update in the middle of a transaction
1. Create the change vector for the undo record.
2. Create the change vector for the data block.
3. Combine the change vectors and write the redo record into the redo log (buffer).
4. Insert the undo record into the undo block.
5. Make the change to the data block.
当你看到这里的前两个步骤,当然,没有理由相信我把它们按正确的顺序排列。我所描述或抛弃的任何东西都不能说明这些行为一定是按这个顺序发生的。但有一个小细节我现在可以告诉你,我省略了转储的改变向量,部分是因为 从Oracle 10g以后情况会有不同 ,另一部分是因为如果你一开始就以错误的顺序思考,对活动的描述会更容易理解。
When you look at the first two steps here, of course, there’s no reason to believe that I’ve got them in the right order. Nothing I’ve described or dumped shows that the actions must be happening in that order. But there is one little detail I can now show you that I omitted from the dumps of the change vectors, partly because things are different from 10g onwards and partly because the description of the activity is easier to comprehend if you first think about it in the wrong order.
到目前为止,我已经向你们展示了我们的两个变化矢量作为单独的实体;如果我向你展示了这些变化矢量进入重做日志的完整图片,你就会看到它们是如何组合成一个重做记录的:
So far I’ve shown you our two change vectors only as individual entities; if I had shown you the complete picture of the way these change vectors went into the redo log, you would have seen how they were combined into a single redo record:
REDO RECORD - Thread:1 RBA: 0x00036f.00000005.008c LEN: 0x00f8 VLD: 0x01
SCN: 0x0000.03ee485a SUBSCN: 1 03/13/2011 17:43:01
CHANGE #1 TYP:0 CLS:36 AFN:2 DBA:0x0080009a SCN:0x0000.03ee485a SEQ: 4 OP:5.1
…
CHANGE #2 TYP:0 CLS: 1 AFN:11 DBA:0x02c0018a SCN:0x0000.03ee485a SEQ: 2 OP:11.5
…
在重做日志中,更改向量是成对出现的,撤销记录的更改向量出现在相应的正向更改的更改向量之前。
It is a common (though far from universal) pattern in the redo log that change vectors come in matching pairs, with the change vector for an undo record appearing before the change vector for the corresponding forward change.
在查看前面的重做记录的基本内容时,值得注意的是第一行中的LEN:—这是重做记录的长度:0x00f8 = 248字节。
While we’re looking at the bare bones of the preceding redo record, it’s worth noting the LEN: figure in the first line—this is the length of the redo record: 0x00f8 = 248 bytes.
我们所做的只是在一行中将xxxxxx更改为yyyyyyyy,这将花费我们248字节的日志信息。
All we did was change xxxxxx to YYYYYYYYYY in one row and it cost us 248 bytes of logging information.
实际上,考虑到最终结果,这似乎是一个非常昂贵的操作:我们必须生成两个重做更改向量并更新两个数据库块来进行一个很小的更改,这看起来是我们需要执行的步骤的四倍。我们希望所有这些额外的工作都能得到合理的回报。
In fact, it seems to have been a very expensive operation given the net result: we had to generate two redo change vectors and update two database blocks to make a tiny little change, which looks like four times as many steps as we need to do. Let’s hope we get a decent payback for all that extra work.
1.4 Summary of Observations 总结
在继续之前,我们可以总结一下我们的观察结果如下:在数据文件中,我们对自己的数据所做的每一个更改都与Oracle创建的undo记录相匹配(这也是对数据文件的更改);与此同时,Oracle在redo日志中放入了如何进行更改和如何进行自身更改的描述。
Before we continue, we can summarize our observations as follows: in the data files, every change we make to our own data is matched by Oracle with the creation of an undo record (which is also a change to a data file); at the same time Oracle puts into the redo log a description of how to make our change and how to make its own change.
您可能会注意到,由于数据可以“就地”更改,因此我们可以创建“无限”(即。,任意大)对单行数据的更改,但是如果不增加undo表空间的数据文件,显然不能记录无限多的undo记录;如果不不断添加更多的redo日志文件,也不能在redo日志中记录无限多的更改。为了简单起见,我们将推迟无限更改的问题,暂时假装我们可以记录尽可能多的撤消和重做记录。
You might note that since data can be changed “in place,” we could make an “infinite” (i.e.,
arbitrarily large) number of changes to our single row of data, but we clearly can’t record an infinite number of undo records without growing the data files of the undo tablespace, nor can we record an infinite number of changes in the redo log without constantly adding more redo log files. For the sake of simplicity, we’ll postpone the issue of infinite changes and simply pretend for the moment that we can record as many undo and redo records as we need.
二:ACID
尽管我们在本章中不会讨论事务,但值得一提的是事务系统的ACID原则,以及Oracle如何实现undo和redo,从而使Oracle能够满足这些需求。表2-1列出了ACID。
Although we’re not going to look at transactions in this chapter, it is, at this point, worth mentioning the ACID requirements of a transactional system and how Oracle’s implementation of undo and redo gives Oracle the capability of meeting those requirements.
Table 2-1. The ACID Requirements

下面的列表详细介绍了表2-1中的各项内容:
The following list goes into more detail about each of the requirements in Table 2-1:
原子性:当我们进行变更数据时,我们创建一个undo记录描述如何回退修改的数据。这意味着当我们在交易过程中,如果另一个用户试图查看我们修改过的任何数据,可以指示他使用撤消记录以查看该数据的旧版本,从而使我们的工作不可见直到我们决定发布(提交)它的那一刻。我们可以保证对方用户要么什么都看不到,要么什么都看到。
Atomicity: As we make a change, we create an undo record that describes how to
reverse the change. This means that when we are in the middle of a transaction,
another user trying to view any data we have modified can be instructed to use the
undo records to see an older version of that data, thus making our work invisible
until the moment we decide to publish (commit) it. We can ensure that the other
user either sees nothing of what we’ve done or sees everything.
一致性:这个要求实际上是关于定义合法状态的约束的数据库。
我们可以认为undo records撤销记录的存在意味着 其他用户被阻止查看事务的增量情况,因此无法看到事务通过临时的非一致性状态到另一个一致性状态,他们看到的要么是开始事务之前的状态,要么是事务完成后的状态,不会看到事务的中间状态。
Consistency: This requirement is really about constraints defining the legal states
of the database; but we could also argue that the presence of undo records means
that other users can be blocked from seeing the incremental application of our
transaction and therefore cannot see the database moving from one legal state to
another by way of a temporarily illegal state—what they see is either the old state
or the new state and nothing in between. (The internal code, of course, can see all
the intermediate states—and take advantage of being able to see them—but the
end-user code never sees inconsistent data.)
隔离 性 :我们可以再次看到,undo records撤消记录的可用性阻止了其他用户看到我们如何更改数据,直到我们决定事务已经完成并提交 事务 。实际上,我们做得更好:undo records撤消的可用性意味着其他用户不必在他们的事务的整个持续期间看到我们的事务的影响,即使我们在他们的事务的开始和结束之间开始和结束我们的事务。 (这不是Oracle中的默认隔离级别,但它是可用的隔离级别;请参阅“隔离级别”)当然,当两个用户试图同时更改相同的数据时,我们确实会遇到令人困惑的情况;在有限时间的事务里,完美的隔离是不可能的。
Isolation: Yet again we can see that the availability of undo records stops other
users from seeing how we are changing the data until the moment we decide that
our transaction is complete and commit it. In fact, we do better than that: the
availability of undo means that other users need not see the effects of our
transactions for the entire duration of their transactions, even if we start and end
our transaction between the start and end of their transaction. (This is not the
default isolation level in Oracle, but it is an available isolation level; see the
“Isolation Levels” sidebar.) Of course, we do run into confusing situations when
two users try to change the same data at the same time; perfect isolation is not
possible in a world where transactions have to take a finite amount of time.
持久性: 此特性归功于 redo log重做日志。 如何确保已完成的事务在系统故障后仍然存在?最直接的策略是在磁盘发生变化或“完成”事务的最后一步时,继续向磁盘写入任何更改。如果没有重做日志,这可能意味着在更改数据时要写入大量随机数据块。想象一下,将10行插入到包含三个索引的order_lines表中;这可能需要31个随机分布的磁盘写来对1个表块和30个索引块进行持久的更改。但是Oracle有redo重做机制,不需要在更改时写入整个数据块,而是准备一个小的更改描述,31个小的描述可能只会在需要确保有整个事务的永久记录时(相对而言)写入到日志文件的末尾。
Durability: This is the requirement that highlights the benefit of the redo log. How
do you ensure that a completed transaction will survive a system failure? The
obvious strategy is to keep writing any changes to disc, either as they happen or as
the final step that “completes” the transaction. If you didn’t have the redo log, this
could mean writing a lot of random data blocks to disc as you change them.
Imagine inserting ten rows into an order_lines table with three indexes; this could
require 31 randomly distributed disk writes to make changes to 1 table block and
30 index blocks durable. But Oracle has the redo mechanism. Instead of writing an
entire data block as you change it, you prepare a small description of the change,
and 31 small descriptions could end up as just one (relatively) small write to the
end of the log file when you need to make sure that you’ve got a permanent record
of the entire transaction. (We’ll discuss in Chapter 6 what happens to the 31
changed data blocks, and the associated undo blocks, and how recovery might
take place.)
2.1 ISOLATION LEVELS 隔离级别
Oracle提供了三种不同的隔离级别:read committed (默认), read only, 和serializable作为差异的简要说明,请考虑以下场景:表t1有1条数据,表t2和t1有相同的表结构,我们有两个会话按照下面顺序执行:
Oracle offers three isolation levels: read committed (the default), read only, and serializable. As a brief sketch of the differences, consider the following scenario: table t1 holds one row, and table t2 is identical to t1 in structure. We have two sessions that go through the following steps in order:
1. Session 1: select from t1;
2. Session 2: insert into t1 select * from t1;
3. Session 2: commit;
4. Session 1: select from t1;
5. Session 1: insert into t2 select * from t1;
如果会话1的隔离级别是read committed,将会在第一次查询到1条数据,第二次查询到2条数据,最后插入两条数据。
如果会话1的隔离级别是 read only ,将会在第一次查询到1条数据,第二次查询到1条数据,插入数据时报错 “ORA-01456: may not perform insert/delete/update operation inside a READ ONLY transaction.” 。
如果会话1的隔离级别是 serializable ,将会在第一次查询到1条数据,第二次查询到1条数据,最后插入1条数据。
If session 1 is operating at isolation level read committed, it will select one row on the first select, select two rows on the second select, and insert two rows.
If session 1 is operating at isolation level read only, it will select one row on the first select, select one row on the second select, and fail with Oracle error “ORA-01456: may not perform insert/delete/update operation inside a READ ONLY transaction.”
If session 1 is operating at isolation level serializable, it will select one row on the first select, select one row on the second select, and insert one row.
REDO 和 UNDO 机制不仅实现ACID的基本要求,而且在性能和可恢复性方面也具有优势。
在讨论 持久性 时 已经 介绍了redo 重做 对 性能 带来的 好处 ,如果您想要一个有关UNDO撤消的性能优势的示例,请考虑隔离-如果您的用户需要同时更新数据,那么如何运行需要几分钟才能完成的报告?在缺少UNDO撤消机制的情况下,您必须在允许错误结果和锁定更改数据两种情况之间进行选择。这是您必须与其他数据库产品一起做出的选择。UNDO撤销机制允许非常程度的并发,因为根据Oracle的营销宣传,“读者不会阻塞写,写也不会阻塞读者。”
就可恢复性而言(我们将在第6章中更详细地检查可恢复性),如果我们记录对数据库所做更改的完整列表,那么原则上,我们可以从一个全新的数据库开始,然后简单地重新应用每个更改描述来复制原始数据库的最新副本。实际上,当然,我们不(通常)从一个新的数据库开始;而是对数据文件进行定期备份,这样我们只需要重播生成的总重做的一小部分,就可以更新副本数据库。
Not only are the mechanisms for undo and redo sufficient to implement the basic requirements of ACID, they also offer advantages in performance and recoverability.
The performance benefit of redo has already been covered in the comments on durability; if you want an example of the performance benefits of undo, think about isolation—how can you run a report that takes minutes to complete if you have users who need to update data at the same time? In the absence of something like the undo mechanism, you would have to choose between allowing wrong results and locking out everyone who wants to change the data. This is a choice that you have to make with some other database products. The undo mechanism allows for an extraordinary degree of concurrency because, per Oracle’s marketing sound bite, “readers don’t block writers, writers don’t block readers.”
就可恢复性而言(我们将在第6章中更详细地检查可恢复性),如果我们记录下我们对数据库所做的全部更改,那么我们可以从一个全新的数据库开始,然后简单地重新应用每个更改描述来复制原始数据库的最新副本 。当然,实际上我们(通常)不会从一个新数据库开始;相反,我们采取数据文件的常规备份副本,这样我们只需要重播重做的一小部分就可以使副本数据库保持最新。
As far as recoverability is concerned (and we will examine recoverability in more detail in Chapter 6), if we record a complete list of changes we have made to the database, then we could, in principle, start with a brand-new database and simply reapply every single change description to reproduce an upto-date copy of the original database. Practically, of course, we don’t (usually) start with a new database; instead we take regular backup copies of the data files so that we need only replay a small fraction of the total redo generated to bring the copy database up to date.
三: Redo Simplicity --- Redo的简单性
我们处理redo的方法很简单:就是不断地生成redo records的stream,并以最快的速度将它们写入到redo log。最初进入共享内存的一个区域,称为 redo log buffer 重做日志缓冲区。之后在从 redo log buffer 写入到磁盘,即online redo log files。在线重做日志文件的数量是有限的,因此我们必须以循环方式不断地重用它们。
The way we handle redo is quite simple: we just keep generating a continuous stream of redo records and pumping them as fast as we can into the redo log, initially into an area of shared memory known as the redo log buffer. Eventually, of course, Oracle has to deal with writing the buffer to disk and, for operational reasons, actually writes the “continuous” stream to a small set of predefined files—the online redo log files. The number of online redo log files is limited, so we have to reuse them constantly in a round-robin fashion.
为了让online redo log files里的信息保存更长时间,大多数系统会对online redo log files保留1个或多个副本,即归档日志。但是,就redo而言,采用写入忘记机制,一旦 redo record 写入到 redo log (buffer) ,我们(通常)不希望实例重新读取它。这种“写和忘记”的方法使重做成为一种非常简单的机制。
To protect the information stored in the online redo log files over a longer time period, most systems are configured to make a copy, or possibly many copies, of each file as it becomes full before allowing Oracle to reuse it: the copies are referred to as the archived redo log files. As far as redo is concerned, though, it’s essentially write it and forget it—once a redo record has gone into the redo log (buffer), we don’t (normally) expect the instance to reread it. At the basic level, this “write and forget” approach makes redo a very simple mechanism.
注意:
虽然我们只希望对online redo log files进行写和忘记,但是还有一些特殊情况需要读取online redo log files,例如检查内存坏块、从磁盘恢复坏块等,还有一些特性需要读取online redo log files,例如: Log Miner, Streams , asynchronous Change Data Capture 等。近年来还有一些新的特性需要读取online redo log files,如standby database,我们将在第六章讨论这些特性。
Note
Although we don’t usually expect to do anything with the online redo log files except write them and forget them, there is a special case where a session can read the online redo log files when it discovers the inmemory version of a block to be corrupt and attempts to recover from the disk copy of the block. Of course, some
features, such as Log Miner, Streams, and asynchronous Change Data Capture, have been created in recent years to take advantage of the redo log files, and some of the newer mechanisms for dealing with Standby databases have become real-time and are bound into the process that writes the online redo. We will look at such features in Chapter 6.
然后,还有一个复杂的问题,将 redo record 写入到 redo log buffer 过程会出现性能瓶颈。在10g版本之前,oracle会将每个会话下数据改变产生的 redo record (包含一对重做更改向量)写入到 redo log buffer 。但是单个会话在短时间内可能会发生很多变化,也可能有多个会话同时并发操作,但是要访问的redo log buffer只有一个。
There is, however, one complication. There is a critical bottleneck in redo generation, the moment when a redo record has to be copied into the redo log buffer. Prior to 10g, Oracle would insert a redo record (typically consisting of just one pair of redo change vectors) into the redo log buffer for each change a session made to user data. But a single session might make many changes in a very short period of time, and there could be many sessions operating concurrently—and there’s only one redo log buffer that everyone wants to access.
为了解决这个问题,创建一种机制来控制对共享内存每个部分的访问,Oracle使用redo allocation latch来保护 redo log buffer 的使用。当要给进程需要使用log buffer里的部分空间时,需要先去申请获取 redo allocation latch ,一旦获取到对于的latch,就可以把相关信息写入到buffer。这样就避免了多个进程重写log buffer相同块的风险。但是如果有大量的进程申请获取redo allocation latch ,会消耗大量资源(主要时CPU用来 latch spin),或者在第一次spin失败后,离开队列,进入 sleep time 。
It’s relatively easy to create a mechanism to control access to a piece of shared memory, and Oracle’s use of the redo allocation latch to protect the redo log buffer is fairly well known. A process that needs some space in the log buffer tries to acquire (get) the redo allocation latch, and once it has exclusive ownership of that latch, it can reserve some space in the buffer for the information it wants to write into the buffer. This avoids the threat of having multiple processes overwrite the same piece of memory in the log buffer, but if there are lots of processes constantly competing for the redo allocation latch, then the level of competition could end up “invisibly” consuming lots of resources (typically CPU spent on latch spinning) or even lots of sleep time as sessions take themselves off the run queue after failing to get the latch on the first spin.
在Oracle老版本中,当数据库空闲时,redo生成的会很少, “one change = one record = one allocation” 策略对大多数系统时够用的,当随着系统更庞大更繁忙,需要处理更大的并发请求(特别对于OLTP系统),需要更加优化的策略,因此在10g新策略里出现了一种结合私有重做private redo和内存撤消 in-memory undo 的新机制。