TSM问题分析:ANR8779E Unable to open drive error number= 16

TSM client备份进程终止, 不继续备份, 日志的记录日期显示, 日志是三天前所记录的备份未完成的记录, 但日志里没有错误出现. 查看另外几个客户机节点, 也是这样的情况. …

TSM client备份进程终止, 不继续备份, 日志的记录日期显示, 日志是三天前所记录的备份未完成的记录, 但日志里没有错误出现.

查看另外几个客户机节点, 也是这样的情况.

使用dsmadmc进入TSM控制台, q session查看连接, 竟然用了好几次空格键(每按一下空格键向下翻一页), 晕,看来又是哪个该死的客户端节点锁定驱动器后死掉了.

q drive f=d查看到底是哪个该死的客户端节点又锁了驱动器了, 查看信息如下:

Library Name: 3584LIB
Drive Name: DRIVE01
Device Type: LTO
On-Line: Yes
Read Formats: ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2,ULTRIUMC,ULTRIUM
Write Formats: ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2
Element: 258
Drive State: LOADED
Volume Name: A00054L3
Allocated to: AGENT_MHYWDB_A
……

共有4台驱动器, 都是被AGENT_MHYWDB_A锁定了, 看来是该节点的客户端程序又出问题了;

q actlog看一下, 果然, 一大堆的驱动器无法打开的错误:

06/13/2007 15:48:26 ANR8779E Unable to open drive /dev/rmt4, error number=16.
06/13/2007 15:48:26 ANR8779E Unable to open drive /dev/rmt3, error number=16.
06/13/2007 15:48:26 ANR8779E Unable to open drive /dev/rmt1, error number=16.
06/13/2007 15:48:36 ANR8779E Unable to open drive /dev/rmt2, error number=16.

同时有很多该客户端节点连接被重置的错误:

06/13/2007 15:48:26 ANR0454E Session rejected by server AGENT_MHYWDB_A, reason: Communication Failure.
06/13/2007 15:48:26 ANR0454E Session rejected by server AGENT_MHYWDB_A, reason: Communication Failure.
06/13/2007 15:48:26 ANR8390W Failure connecting to library client AGENT_MHYWDB_A to manage volume A00055L3.
06/13/2007 15:48:26 ANR8390W Failure connecting to library client AGENT_MHYWDB_A to manage volume A00054L3.

06/13/2007 15:48:26 ANR8214E Session open with 150.100.16.103 failed due to connection refusal.

一看到这结果, 基本肯定是AGENT_MHYWDB_A客户端节点程序出问题了, 因为以前已经有几次这样的现象了, 不多想了, 直接连接到150.100.16.103上去,然后:

ps -ef | grep dsm
kill -9 dsm_process_id

杀掉之后, 约等了二十秒左右, 听见磁带库发出换磁带的咔嚓声, 知道驱动器被解锁了, q mount看一下, 果然:

tsm:TSM>q mount

ANR8380I LTO volume A00055L3 is mounted R/W in drive DRIVE02 (/dev/rmt2), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume A00054L3 is mounted R/W in drive DRIVE01 (/dev/rmt1), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume A00052L3 is mounted R/W in drive DRIVE03 (/dev/rmt3), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume A00053L3 is mounted R/W in drive DRIVE04 (/dev/rmt4), status: RETRY DISMOUNT FAILURE.
ANR8379I Mount point in device class 3584CLASS is waiting for the volume mount to complete, status: WAITING FOR VOLUME.
ANR8379I Mount point in device class 3584CLASS is waiting for the volume mount to complete, status: WAITING FOR VOLUME.
ANR8379I Mount point in device class 3584CLASS is waiting for the volume mount to complete, status: WAITING FOR VOLUME.
ANR8334I 7 matches found.

接下来, 就老一阵子的咔嚓声, 磁带被换来换去的, 再q session查看, session终于恢复正常了, 再查看各个客户端节点的lev0.log和lev1.log都已经被一个错误记录终止了,看来差不多是正常了:

released channel: t4
released channel: t1
released channel: t2
released channel: t3
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 06/13/2007 15:02:49
ORA-19502: write error on file "0aijql6t_1_1", blockno 2049 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
ANS0278S (RC157) The transaction will be aborted.

作者: admin

为您推荐

返回顶部