- Community Home
- >
- Storage
- >
- Legacy
- >
- Automated Backup
- >
- Re: VBDA inactivity timeouts & IPC EVENT TIMEOUTS
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-22-2005 07:35 AM
тАО03-22-2005 07:35 AM
VBDA inactivity timeouts & IPC EVENT TIMEOUTS
We have a LTO Tape Library attached to a (fully patched) DP5.0 cell server running under Windows, and run two datalists (one for the Linux client, the other for the Windows clients). We are seeing this happen on both datalists.
I have had a pretty good look at cases of this on the Linux systems and have found that at a particular time all of the running disk agents (vbda's) running on the clients go idle... they still exist, but are no longer using any CPU or network traffic. An lsof shows that these agents still have a connection to the cell server, and a parallel running datalist continues to run. This implies that it is the media agent on the cell server which is no longer responding, or the network is causing an issue to upset its connections. It should be noted that other netwok traffic (such as continuous pings, established SSH connections, and other DP datalist backups) are not being upset, so it does not look network related (all systems are attached to 1GB switches).
Another interesting point is that when debugging is enabled on the Linux client VBDA's the following "IPC EVENT ==> TIMEOUT" messages occur 50-60 times a second (per vbda). While these do not seem to be affecting the backup in any way I would like to know what the cause of them is, as it could be related to the connections going idle. I get these even when running very simple, small backups at low network load.
This is snip of a VBDA debug log (anyone have access to bdabuf.c?):
03/23/05 03:13:37 ("da/dacmn/bdabuf.c /main/r50_fix/1":934) A.05.00 bPHSS_32331/DPSOL_00132
IPC EVENT ==> TIMEOUT
03/23/05 03:13:37 ("da/dacmn/bdabuf.c /main/r50_fix/1":934) A.05.00 bPHSS_32331/DPSOL_00132
IPC EVENT ==> TIMEOUT
03/23/05 03:13:37 ("da/dacmn/bdabuf.c /main/r50_fix/1":934) A.05.00 bPHSS_32331/DPSOL_00132
IPC EVENT ==> TIMEOUT
03/23/05 03:13:37 ("da/dacmn/bdabuf.c /main/r50_fix/1":934) A.05.00 bPHSS_32331/DPSOL_00132
IPC EVENT ==> TIMEOUT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-10-2005 07:56 AM
тАО04-10-2005 07:56 AM
Re: VBDA inactivity timeouts & IPC EVENT TIMEOUTS
The following is what is displayed in the log of the client (VBDA) when the connection times out. The same message is displayed for multiple disk agents, on multiple clients that were active at the time. However another datalist running on a different BMA continues just fine. This suggests that it is the specific BMA (on the cell master) for this datalist which is failing.
04/09/05 01:57:36 ("da/dacmn/breconn.c /main/16":269) A.05.00 bPHSS_32331/DPSOL_00132
RcIpcReceiveData failed!
04/09/05 01:57:36 ("da/dacmn/bdabuf.c /main/r50_fix/1":979) A.05.00 bPHSS_32331/DPSOL_00132
IPC EVENT (BSM) ==> SOCKET CLOSED
04/09/05 01:57:36 ("da/bda/tw_unix.c /main/r50_fix/1":Line:423 in 'TwDumpFile' )
Marked: ("":Line:0 in '')
Code is:0
From: VBDA@machine.domain.co.nz "machine " Time: 04/09/05 01:57:36
Connection to Media Agent broken => aborting.
04/09/05 01:57:36 ("da/bda/vbda.c /main/r50_fix/2":1145) A.05.00 bPHSS_32331/DPSOL_00132
===>> (0) VbdaExit {
Args: cc=4
IpcReleaseIpcHandle: 1 (1, 1)
04/09/05 01:57:36 ("da/bda/vbda.c /main/r50_fix/2":Line:1289 in 'VbdaExit' )
Marked: ("":Line:0 in '')
Code is:0
From: VBDA@machine.domain.co.nz "machine " Time: 04/09/05 01:57:36
ABORTED Disk Agent for machine.domain.co.nz:/data1 "machine ".
04/09/05 01:57:36 ("da/bda/vbda.c /main/r50_fix/2":1312) A.05.00 bPHSS_32331/DPSOL_00132
VbdaExit: OB2VBDASLEEPBEFORECLOSE = 0
04/09/05 01:57:36 ("lib/ipc/ipc.c /main/r50_fix/10":766) A.05.00 bPHSS_32331/DPSOL_00132
IpcExit()
04/09/05 01:57:36 ("lib/ipc/ipc.c /main/r50_fix/10":2749) A.05.00 bPHSS_32331/DPSOL_00132
IpcCloseConnection(0)
IpcReleaseIpcHandle: 0 (0, 0)
04/09/05 01:57:36 ("da/bda/vbda.c /main/r50_fix/2":1341) A.05.00 bPHSS_32331/DPSOL_00132
VbdaExit: will exit with exit code 4
04/09/05 01:57:36 ("lib/cmn/common.c /main/r50_fix/1":393) A.05.00 bPHSS_32331/DPSOL_00132
CmnExit()
04/09/05 01:57:36 ("lib/cmn/trouble.c /main/5":121) A.05.00 bPHSS_32331/DPSOL_00132
===>> (1) TrbExit {
04/09/05 01:57:36 ("lib/cmn/trouble.c /main/5":126) A.05.00 bPHSS_32331/DPSOL_00132
<<=== (1) } /* TrbExit */
04/09/05 01:57:36 ("lib/cmn/debug.c /main/r50_fix/7":433) A.05.00 bPHSS_32331/DPSOL_00132
==================================================================
DEBUGGING ENDED PROPERLY
==================================================================
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2005 06:03 AM
тАО12-04-2005 06:03 AM
Re: VBDA inactivity timeouts & IPC EVENT TIMEOUTS
You can actually predict when the failure is going to occur as the VBDA process on the client will sit there using zero CPU time... whereas a working VBDA process will start chewing up CPU time within a minute or two.
The issue seems to be that the disk agent goes to backup a file and then times out with a "RcIpcReceiveData failed" (the timeouts have been reduced to 30 minutes as it makes no difference what this setting is, and 30 minutes allow us to know about the problem sooner in the evening).
from NET: MSG_LISTEN
shmemKey = 1783634327
shmemSize = 2097168
shmemSection = 0
shmemBasePtr = 0x(nil)
bufNum =32
bufSize =65536
bufOffset =160
Attached shmemBasePtr = 0x40230000
boName (addr) = 0xbfffea60, boNameSize = 67
boDesc (addr) = 0x40431008, boDescSize = 1
appName (addr) = 0x8074ce0, appNameSize = 5
appDesc (addr) = 0xbffff004, appDescSize = 32
2005-12-03 23:17:13 ("da/dacmn/bdarec.c /main/32":167) A.05.50 bPHSS_33629_33630_DPSOL_00168
TwBackupObj: Dir /data3
TwBackupObj: objProcessed=
TwBackupObj: Dir /data3/lost+found
TwBackupObj: objProcessed=
TwBackupObj: Dir /data3/dbs
TwBackupObj: objProcessed=
TwBackupObj: Dir /data3/dbs/ren_nth
TwBackupObj: objProcessed=
TwTraverse: Processing file object /data3/dbs/ren_nth/ren_nth_rollback1.dbf
2005-12-03 23:54:25 ("da/dacmn/breconn.c /main/dp55/2":275) A.05.50 bPHSS_33629_33630_DPSOL_00168
RcIpcReceiveData failed!
2005-12-03 23:54:25 ("da/dacmn/bdabuf.c /main/dp55/dp55_fix/4":1537) A.05.50 bPHSS_33629_33630_DPSOL_00168
IPC EVENT (BSM) ==> SOCKET CLOSED
2005-12-03 23:54:25 ("da/bda/tw_unix.c /main/dp55/dp55_fix/4":506) A.05.50 bPHSS_33629_33630_DPSOL_00168
in TwDumpFile
Marked: ("":Line:0 in '')
Code is:0
From: VBDA@husky3.hnz.co.nz "husky3 " Time: 12/03/2005 11:54:25 PM
Connection to Media Agent broken => aborting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-14-2006 11:15 PM
тАО11-14-2006 11:15 PM
Re: VBDA inactivity timeouts & IPC EVENT TIMEOUTS
What did you do to solve this issue? I'am seeing the same stuff on some client machines...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-22-2006 08:20 AM
тАО11-22-2006 08:20 AM
Re: VBDA inactivity timeouts & IPC EVENT TIMEOUTS
By spacing out the Linux system backups and manually regiging the order of the objects within the datalists to spread the load it is possible to drastically reduce the occurrence of this.
It's also possible to try and reduce the load by reducing the level of logging (eg just directories rather than file versions) to see if that helps.