I have an issue with the C7438A DAT 72 tape drive in one of my servers. The machine in question is one of 13 HP G4 DL380 servers each running Red Hat Enterprise version 4. Backups are executed on the internal tape drive every 1st and 3rd Thursday of the month using Storix. The problem I'm running into is that the backup often fails on this one box seemingly because the tape drive goes offline. The only way to bring the drive back is to reboot the machine (which is NOT preferable since it is a production box). The drive itself has been replaced twice and again I want to stress that the problem is not occurring on any of the other 12 servers. When I do a dmesg, I see the following messages.

cciss: cp 39df5280 timedout
scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0
scsi0 (0:0): rejecting I/O to offline device

When I do an mt on the drive it acts like it's just not there.

# mt -f /dev/st0 status
/dev/st0: No such device or address

I tried doing a kudzu to re-recognize the drive but again nothing seems to work aside from cycling the box.

Has anyone else experienced this problem? If there are any other details I can provide, please let me know. I'm pretty new to the world of Unix/Linux system administration.

I've run exactly into the same issue. Same OS version, same sypmtoms, just the server and tape drive are of slightly different models. I've seen that it's been already some time since this post was created, have you managed to solve the problem somehow ? I expect the drive to recover after a system reboot but that should not provide a solution as the box is a production one, same as in your case.


Sorry for the delay. I pretty much gave up on this issue a while back and on a whim tried to solve it one more time recently. I didn't eliminate the root cause but I did find a way to bring the drive back online without a reboot. My tape drive was: st0 at scsi0, channel 0, id 0, lun 0 (found that line when I did a dmesg). To bring it back online I had to run the following two commands:


echo "scsi remove-single-device 0 0 0 0" > /proc/scsi/scsi


to remove the drive from the configuration and


echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi


to bring it back in. I once tried just doing the "add" portion and it never worked. The remove was not very well documented on my servers and I only found this solution in one place in all of my years of Google searching. The problem is not a very big deal for the powers that be so a fix/hardware replacement were never ordered but I'm thrilled to at least be able to avoid production server reboots on a somewhat regular basis.