System Administration
Showing results for 
Search instead for 
Did you mean: 

Server is getting hanged during backup


Server is getting hanged during backup

Dear All,
My server is getting hanged during backup on one file system. Please help me in analysing the issue asap.

Steven Schweda
Honored Contributor

Re: Server is getting hanged during backup

You might start by providing even a little
useful information. For example:

> [...] My server [...]

Not a useful description of the hardware
(computer, backup-destination device
(tape?), et c.).

Not a useful description of the software
(operating system, backup program, et c.).

> [...] during backup [...]

Not a useful description of what you did.

> [...] is getting hanged [...]

Not a useful description of what happened
when you did whatever you did. Did the
backup operation itself hang, or did the
whole system hang? What, exactly, does that
mean to you? How did you decide that "My
server" was hung?
Honored Contributor

Re: Server is getting hanged during backup

"Analysis" based on almost no hard facts at all:

If the server hangs when the backup is trying to access a NFS-mounted disk, it usually indicates network connection problems.

If the disk is a local SCSI disk, the most common reason would be a disk failure.

If there are any error messages on the system console when the server is "getting hanged", they would probably be useful in understanding what is going on. If you need to reboot the server to access it after it "gets hanged", check the /var/adm/syslog/OLDsyslog.log file.

If the timestamps of /var/tombstones/ts99 and/or the most recent file in /var/adm/crash match the time server has been "getting hanged", these files might contain some useful information too.

My recommendation:
To be safe, assume that you have at least one disk that has at least partially failed and might fail totally at any moment.

Stop any unnecessary writing to the system. Any further writes may expand the damaged area and may make it more difficult to recover any damaged files.

If it's important to get the server back to normal production as fast as possible, have the disk replaced and connect the failed disk to some test machine as an extra disk.

Find your latest successful backup and get a listing of files included in that backup (including file timestamps and sizes). Try to find any files on the system that have been changed or created since the latest successful backup, and copy them to some other storage manually. Start with those files that would be hardest to recreate in any other way.

If your system hangs again when attempting to copy a certain file, it will indicate that the file is located in the damaged area of the disk and most likely has been lost. Make a note of the pathname of the file and stop trying to copy that file for now: try to get any other files that have not been backed up.

Eventually, you should have:
- your latest successful backup
- some (hopefully all) of the files changed since the latest successful backup
- a list of files that are now unreadable

At that point, you can decide what to do: if the amount of unreadable files is small and the data that went into those files is available in any other form, you may find it easiest to re-create those files.

If the lost data is important and cannot be recreated in any other way, you might consider commercial data recovery services (it might be expensive, but now you should be able to judge whether it's worth it or not).

Restore the data to a new disk and clearly mark the old disk as failed (so that nobody will attempt to re-use it).

When you've recovered everything that can be recovered and successfully re-created the rest, do whatever you're supposed to do to old disks that are removed from use. A failed disk cannot be reliably wiped programmatically, so if wiping the old disks is a requirement, you may have to demagnetize or physically destroy the failed disk.