1752579 Members
4100 Online
108788 Solutions
New Discussion юеВ

Re: Hot backup.

 
Rushank
Super Advisor

Hot backup.

Hello,
Env : N-class,HP11, Phy Mem: 8G
DB : Oracle 8i, one instance running, table space size more than 275 GB

The scenario:

A script runs every night to execute hot-backup.
Script compress the table spaces using compress command and copy it to a different file system within the system.
This entire process usually takes 5 hours but since last one week it takes more than 11 hours. In the past rebooting the server eliminated this problem for a month then it gradually becomes slow and hotbackup runs for a longer time
I would like to find out why it is happenning and what is causing it and why
reboot fixes it for initial few days.

I tried replacing compress with gzip without any sucess. I'm collecting ps data during hotbackup period but I can not make out what is causing this delay

Any help/suggetions appriciated.

Thanks
22 REPLIES 22
harry d brown jr
Honored Contributor

Re: Hot backup.

What does your memory look like when you run the hot-backup? Is it possible oracle has a memory leak?

live free or die
harry
Live Free or Die
Steven Gillard_2
Honored Contributor

Re: Hot backup.

The first step is to determine whether you have a CPU, disk or memory bottleneck. The fact that it gets slower after a month and is ok after a system reboot indicates that you may have a potential memory issue.

Use glance or vmstat while the backup is running and monitor the paging activity (page out rate in particular). Also keep an eye on your free memory over time. What does glance's memory report tell you?

Also have a read of:

http://us-support3.external.hp.com/cki/bin/doc.pl/sid=b2adc1b00705cc33c8/screen=ckiDisplayDocument?docId=200000050018417

for information about performance troubleshooting.

Regards,
Steve
Alan Casey
Trusted Contributor

Re: Hot backup.

I would suspect that this is an Oracle issue, mabee restarting the database would determine this, does this temporarily solve the problem?

David Lodge
Trusted Contributor

Re: Hot backup.

Could we have some more information? By the way I read this it sounds like you're copying the physical dbf files and compressing them. If you are doing this I really think you should shut down oracle first :-)

This shouldn't as such cause oracle to leak much memory; but it's worth checking a few things:
1) dangling processes, may be taking up extra memory
2) disk bottlenecks
3) use of virtual memory

The copying of files itself uses mainly disk access - are your disks mirrored/striped? Are they internal/external?

Compress uses a significant amount of disk use and a *large* amount of CPU? When this is running is the priority very high (241 - 255)?

Have you got mwa on the box? (to show ps listings and disk bottleneck areas)

Which bit of the batch takes the time - is it the copy or the compess?

dave
John Palmer
Honored Contributor

Re: Hot backup.

Initial thoughts are CPU and memory.

Compress is very CPU intensive - have you got any 'rogue' processes that are using up CPU time? A reboot will kill these.

Run that useful one-liner:
UNIX95= ps -e -o ruser,vsz,pid,args | sort -rnk2 | more
to check if you have any processes with a memory leak.

If it's one of the above then it will be affecting the system now, not just when you run the backup.

Regards,
John

Rushank
Super Advisor

Re: Hot backup.

Hi,

Thanks for the responses.

Compress takes long time and it takes entire cpu during compression. I'm compressing data first and then copying it over a different file system within the server.

Here is the collective data during hotbackup for the last 10 days.
This is the output of
UNIX95= ps -e -o "pcpu vsz ruser pid stime time state args" | sort -rn |head -10
also the uptime and swapinfo -tam.

Krishna Prasad
Trusted Contributor

Re: Hot backup.

It looks like you might have some old compress process running. Check out process id 488. It seemed to be running longer than one day.

I also saw that it did eventually stop. It might have stopped because of the reboot. It looked like the number of days running went back down towards the bottom indicating you rebooted the box and it took about another 35 days before you had bad performance?

Either way do a ps -ef | grep compress and check the time stamp on the box.

Also, If you have old compress commands running you restore program prob. does an uncompress on the files which will fail if the compress never succeded.

I would check a backup and test a restore if you are finding old compress process.
Positive Results requires Positive Thinking
Rushank
Super Advisor

Re: Hot backup.

The ps data is for last 10 days. If you check the data same process are running during hotbackup. system is up now for 46 days. The problem started somewhere in last week.
There is no old compress process are running.
I tried manually compressing data over 1gb before and after when we rebooted the box last time. It took almost half the time to compress same size of the data before reboot.
Rushank
Super Advisor

Re: Hot backup.

Here is the output for two different dates but on the same time during backup. Hotback up running smoothly when system was up for 27 days and it terribly slow now after 45 days!

12:00am up 27 days, 16:55, 0 users, load average: 0.32, 0.33, 0.36
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
reserve - 453 -453
memory 6349 1724 4625 27%
total 7373 2177 5196 30% - 0 -
98.91 488 oracle 26616 23:58:24 01:35 R compress
74.45 488 oracle 26630 23:59:33 00:26 R compress
6.50 28132 oracle 23776 Dec 14 12:17 S oracleprod (LOCAL=NO)
1.23 0 root 34 Nov 18 48:48 R vxfsd
0.99 18400 root 15045 Dec 14 17:34 R /opt/perf/bin/midaemon
0.71 32 root 634 Nov 18 03:42:48 S /usr/sbin/syncer
0.61 32036 oracle 6876 Dec 11 59:58 S oracleprod (LOCAL=NO)
0.56 25676 root 15047 Dec 14 01:58 S /opt/perf/bin/scopeux
0.49 28148 oracle 26642 00:00:00 00:00 R oracleprod (LOCAL=NO)
0.26 29860 oracle 26986 03:10:30 02:41 S oracleprod (LOCAL=NO

------------------------------------------------
12:00am up 45 days, 16:56, 1 user, load average: 0.28, 0.31, 0.35
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
reserve - 733 -733
memory 6349 1719 4630 27%
total 7373 2452 4921 33% - 0 -
57.22 488 oracle 14902 23:55:52 02:21 S compress
56.35 488 oracle 14899 23:55:52 02:21 S compress
1.20 0 root 34 Nov 18 01:16:21 R vxfsd
1.08 28564 oracle 14969 23:56:18 00:03 S oracleprod (LOCAL=NO)
0.87 18384 root 723 08:44:13 12:17 R /opt/perf/bin/midaemon
0.64 32 root 634 Nov 18 06:42:35 S /usr/sbin/syncer
0.56 14028 root 732 08:44:15 01:09 S /opt/perf/bin/scopeux
0.48 1920 precise 27239 Nov 19 04:29:36 S pss_pcs.64 -k prod
0.36 28564 oracle 10999 22:55:04 00:00 S oracleprod (LOCAL=NO)
0.34 4688 precise 27240 Nov 19 01:53:01 S pss_rx -k prod