Simpler Navigation for Servers and Operating Systems
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
cancel
Showing results for 
Search instead for 
Did you mean: 

Hot backup.

Rushank
Super Advisor

Hot backup.

Hello,
Env : N-class,HP11, Phy Mem: 8G
DB : Oracle 8i, one instance running, table space size more than 275 GB

The scenario:

A script runs every night to execute hot-backup.
Script compress the table spaces using compress command and copy it to a different file system within the system.
This entire process usually takes 5 hours but since last one week it takes more than 11 hours. In the past rebooting the server eliminated this problem for a month then it gradually becomes slow and hotbackup runs for a longer time
I would like to find out why it is happenning and what is causing it and why
reboot fixes it for initial few days.

I tried replacing compress with gzip without any sucess. I'm collecting ps data during hotbackup period but I can not make out what is causing this delay

Any help/suggetions appriciated.

Thanks
22 REPLIES
harry d brown jr
Honored Contributor

Re: Hot backup.

What does your memory look like when you run the hot-backup? Is it possible oracle has a memory leak?

live free or die
harry
Live Free or Die
Steven Gillard_2
Honored Contributor

Re: Hot backup.

The first step is to determine whether you have a CPU, disk or memory bottleneck. The fact that it gets slower after a month and is ok after a system reboot indicates that you may have a potential memory issue.

Use glance or vmstat while the backup is running and monitor the paging activity (page out rate in particular). Also keep an eye on your free memory over time. What does glance's memory report tell you?

Also have a read of:

http://us-support3.external.hp.com/cki/bin/doc.pl/sid=b2adc1b00705cc33c8/screen=ckiDisplayDocument?docId=200000050018417

for information about performance troubleshooting.

Regards,
Steve
Alan Casey
Trusted Contributor

Re: Hot backup.

I would suspect that this is an Oracle issue, mabee restarting the database would determine this, does this temporarily solve the problem?

David Lodge
Trusted Contributor

Re: Hot backup.

Could we have some more information? By the way I read this it sounds like you're copying the physical dbf files and compressing them. If you are doing this I really think you should shut down oracle first :-)

This shouldn't as such cause oracle to leak much memory; but it's worth checking a few things:
1) dangling processes, may be taking up extra memory
2) disk bottlenecks
3) use of virtual memory

The copying of files itself uses mainly disk access - are your disks mirrored/striped? Are they internal/external?

Compress uses a significant amount of disk use and a *large* amount of CPU? When this is running is the priority very high (241 - 255)?

Have you got mwa on the box? (to show ps listings and disk bottleneck areas)

Which bit of the batch takes the time - is it the copy or the compess?

dave
John Palmer
Honored Contributor

Re: Hot backup.

Initial thoughts are CPU and memory.

Compress is very CPU intensive - have you got any 'rogue' processes that are using up CPU time? A reboot will kill these.

Run that useful one-liner:
UNIX95= ps -e -o ruser,vsz,pid,args | sort -rnk2 | more
to check if you have any processes with a memory leak.

If it's one of the above then it will be affecting the system now, not just when you run the backup.

Regards,
John

Rushank
Super Advisor

Re: Hot backup.

Hi,

Thanks for the responses.

Compress takes long time and it takes entire cpu during compression. I'm compressing data first and then copying it over a different file system within the server.

Here is the collective data during hotbackup for the last 10 days.
This is the output of
UNIX95= ps -e -o "pcpu vsz ruser pid stime time state args" | sort -rn |head -10
also the uptime and swapinfo -tam.

Krishna Prasad
Trusted Contributor

Re: Hot backup.

It looks like you might have some old compress process running. Check out process id 488. It seemed to be running longer than one day.

I also saw that it did eventually stop. It might have stopped because of the reboot. It looked like the number of days running went back down towards the bottom indicating you rebooted the box and it took about another 35 days before you had bad performance?

Either way do a ps -ef | grep compress and check the time stamp on the box.

Also, If you have old compress commands running you restore program prob. does an uncompress on the files which will fail if the compress never succeded.

I would check a backup and test a restore if you are finding old compress process.
Positive Results requires Positive Thinking
Rushank
Super Advisor

Re: Hot backup.

The ps data is for last 10 days. If you check the data same process are running during hotbackup. system is up now for 46 days. The problem started somewhere in last week.
There is no old compress process are running.
I tried manually compressing data over 1gb before and after when we rebooted the box last time. It took almost half the time to compress same size of the data before reboot.
Rushank
Super Advisor

Re: Hot backup.

Here is the output for two different dates but on the same time during backup. Hotback up running smoothly when system was up for 27 days and it terribly slow now after 45 days!

12:00am up 27 days, 16:55, 0 users, load average: 0.32, 0.33, 0.36
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
reserve - 453 -453
memory 6349 1724 4625 27%
total 7373 2177 5196 30% - 0 -
98.91 488 oracle 26616 23:58:24 01:35 R compress
74.45 488 oracle 26630 23:59:33 00:26 R compress
6.50 28132 oracle 23776 Dec 14 12:17 S oracleprod (LOCAL=NO)
1.23 0 root 34 Nov 18 48:48 R vxfsd
0.99 18400 root 15045 Dec 14 17:34 R /opt/perf/bin/midaemon
0.71 32 root 634 Nov 18 03:42:48 S /usr/sbin/syncer
0.61 32036 oracle 6876 Dec 11 59:58 S oracleprod (LOCAL=NO)
0.56 25676 root 15047 Dec 14 01:58 S /opt/perf/bin/scopeux
0.49 28148 oracle 26642 00:00:00 00:00 R oracleprod (LOCAL=NO)
0.26 29860 oracle 26986 03:10:30 02:41 S oracleprod (LOCAL=NO

------------------------------------------------
12:00am up 45 days, 16:56, 1 user, load average: 0.28, 0.31, 0.35
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
reserve - 733 -733
memory 6349 1719 4630 27%
total 7373 2452 4921 33% - 0 -
57.22 488 oracle 14902 23:55:52 02:21 S compress
56.35 488 oracle 14899 23:55:52 02:21 S compress
1.20 0 root 34 Nov 18 01:16:21 R vxfsd
1.08 28564 oracle 14969 23:56:18 00:03 S oracleprod (LOCAL=NO)
0.87 18384 root 723 08:44:13 12:17 R /opt/perf/bin/midaemon
0.64 32 root 634 Nov 18 06:42:35 S /usr/sbin/syncer
0.56 14028 root 732 08:44:15 01:09 S /opt/perf/bin/scopeux
0.48 1920 precise 27239 Nov 19 04:29:36 S pss_pcs.64 -k prod
0.36 28564 oracle 10999 22:55:04 00:00 S oracleprod (LOCAL=NO)
0.34 4688 precise 27240 Nov 19 01:53:01 S pss_rx -k prod



David Lodge
Trusted Contributor

Re: Hot backup.

Silly question - I've noticed that you are running several consecutive compresses, some of them without parameters (as if compressing stdin->stdout) are you trying to compress all the files at the same time?
(Any chance of seeing that part of the script?)

Another thing - do you get slow downs for anything else? With a memory leak I'd expect the whole system to gradually slowdown, rather than just one process.

How busy is the system generally (bespoke applications etc?)

Have you got any other applications grabbing large amounts of memory? (from memory mib2agt caused a few problems here 'cos it had a big memory leak)

dave
Rushank
Super Advisor

Re: Hot backup.


There is no mib2agt memory leak. I've already checked that. Two compress are running same time with run state some time.
This is a perl script it comresses then do a cp , here is the part of the script

`compress $ARCH_DIR/$ARCHFILE`;
`cp $ARCH_DIR/$ARCHFILE.Z $BACKUP_DATA`;
`mv $ARCH_DIR/$ARCHFILE.Z $OLDARCH_DIR`;
print REST "cp $BACKUP_DATA/$ARCHFILE.Z $ARCH_DIR\n";
David Lodge
Trusted Contributor

Re: Hot backup.

I can tell you for certain that it isn't a memory leak - from the vmstat and ps listing you've provided, memory isn't growing.

I can't see any correlation from time from reboot and CPU usage (one of my first thoughts)...

This leads me to think that it looking more and more to your disc subsystem - why this is refreshing with a reboot is a mystery!

Are you getting any strange errors on dmesg/stm? from your disk devices? I notice you're running MeasureWare Agent - if you have Perfview are your disk access rates climbing?

dave (clutching at straws!)
John Palmer
Honored Contributor

Re: Hot backup.

I also now suspect the disk subsystem may be an issue what is it?

Notice that in the second ps list that you posted the two compresses were only using 57 and 56% CPU whereas the first pair were using 98% and 74%. Not definite but it could be that they are waiting for I/O.

Regards,
John
Christopher Caldwell
Honored Contributor

Re: Hot backup.

Coupla things
1) I doubt you have an N with just 1 CPU, but if you do, I've found that compute intensive apps on HP-UX single processor boxes run better sequentially (no context switching) than in parallel.

2) check your dbc_max_pct, dbc_min_pct tunings. If they're default, your filesystem cache could be consuming up to 50 percent of available memory over time, which might explain why the compress performs well after reboot, but not if the box has been running for a while.
Rushank
Super Advisor

Re: Hot backup.


It's really a mistry and It's driving me nuts.
This is N-Class with 8 cpu. dbc_min and dbc_max are 5 and 8 respectively.

During this backup some time both the compress are in running and some time one in running state. I really don't know how to pin point this problem and why a reboot fixes it initially for few days.




Magdi KAMAL
Respected Contributor

Re: Hot backup.

Hi Rushank,

The post name is "Hot backup", but it seems to me that you are using cold backup when you compress the table space using the "compress" hp-ux command !

You can use RMAN ( oracle recovery manager ) for Hot Full backup ( level 0 )during the weekend and one incremental Hot backup each day using ( Leve l ).

By this you can reduce the time for backup and also the amount of data backed up.

Magdi
Steven Gillard_2
Honored Contributor

Re: Hot backup.

Are you running the measureware agent on your system? If so then it is collecting historical data which will be very useful. If you've got perfview that would make it easier, otherwise I would run an "extract" as follows:

1. Make a copy of /var/opt/perf/reptall somewhere. Edit this file and uncomment the DATA TYPE GLOBAL metrics you would like to look at. I suggest as a start:
DATE
TIME
GBL_MEM_CACHE_HIT_PCT
GBL_CPU_TOTAL_UTIL
GBL_CPU_SYS_MODE_UTIL
GBL_CPU_USER_MODE_UTIL
GBL_CPU_NICE_UTIL
GBL_DISK_UTIL_PEAK
GBL_MEM_UTIL
GBL_MEM_USER_UTIL
GBL_MEM_SYS_UTIL
GBL_MEM_PAGEOUT_RATE
GBL_RUN_QUEUE

2. Run "extract" as follows:

# extract -xp -g -l /var/opt/perf/datafiles/logglob -r reptall -b "today-10" -f perf.txt

That will give you the last 10 days worth of data in the perf.txt file. You can import that file into Excel then graph some of the metrics if you like. You should be able to at least confirm from the above whether you have a disk or CPU bottleneck (it already looks like memory pressure is not the culprit).

Also, it wouldn't hurt for you to run a patch analysis on the system. If you have a support contract you can use the "custom patch manager" on ITRC to generate a system specific patch bundle. There are loads of patches available that fix performance problems.

Regards,
Steve
Rushank
Super Advisor

Re: Hot backup.

Hello,

I 'extracted' data for last 15 days and'm analyzing it. Thanks(Steve) for the tip. The data I collected is now for every 5 minutes interval ; how do I change this interval duration for every 30 minutes or every 1 hour.
and also is there any way I can extract the data only between time 21:00 to 07:00 ?

Thanks in Advance

Steven Gillard_2
Honored Contributor

Re: Hot backup.

You can change the extraction period by using the -b and -e options to specify the start and end time. This information is in the extract man page:

-b
Rushank
Super Advisor

Re: Hot backup.

Hi,

I 've already found the way of shift option in extract, but no luck with summary option.
Now that I've data what is the best way to analyze it and find out the culprit

SVTGanesh
Occasional Advisor

Re: Hot backup.

Hi

Try Clean up all Oracle Connection which is LOCAL=NO wih older than one/two days, This may be causing problem while marking the tablespace begin and end backup before compress start alo do check the disk system which you are using for to copy the backup files whehter the both actual/mirror disk is OK
also try compress the file to the backup device directly this will eliminate mv after compressing

compress<$FILE >$BACKUP_DIR/$FILE

SVTGanesh
SVTGanesh
SVTGanesh
Occasional Advisor

Re: Hot backup.

Hi

Try Clean up all Oracle Connection which is LOCAL=NO wih older than one/two days, This may be causing problem while marking the tablespace begin and end backup before compress start alo do check the disk system which you are using for to copy the backup files whehter the both actual/mirror disk is OK
also try compress the file to the backup device directly this will eliminate mv after compressing

compress -c $FILE >$BACKUP_DIR/$FILE

SVTGanesh
SVTGanesh