Operating System - Tru64 Unix
1828662 Members
1286 Online
109983 Solutions
New Discussion

Re: Tru64 5.1b error

 
SOLVED
Go to solution
Ivan Ferreira
Honored Contributor

Re: Tru64 5.1b error

The ipcs was cutted and I could not see the size of the shared memory, also I would like to know the total size of the RAM, use vmstat -P (Is it 5 GB?) If so, memory kernel parameters are good.

Your I/O to the disks is too high, there are no kernel parameters (in /etc/sysconfigtab) that can help to reduce significativelly this, you need to tune the database and applications.

Anyway, you can tune these parameters:

Create a file called /tmp/sysconfig.stanza, and in the file put:

rt:
aio_task_max_num = 8193

vfs:
fifo_do_adaptive = 0

Then run:

sysconfigdb -m -f /tmp/sysconfig.stanza

You need to reboot the server after that.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Adam Strobel
Frequent Advisor

Re: Tru64 5.1b error

thanks!

Attached is the vmstat -P


--Adam
Ivan Ferreira
Honored Contributor

Re: Tru64 5.1b error

Look, collect data shows the device names, you can know the domain that use that device name by using ls /etc/fdmns/* or by using showfdmn domain_name.

For example:

sample_domain#sample 125829120 74867088 50962032 60% /sample

showfdmn sample_domain

Id Date Created LogPgs Version Domain Name
41637d45.0106fc56 Wed Oct 6 09:06:13 2004 512 4 sample_domain

Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
1L 125829120 52343200 58% on 256 256 /dev/disk/dsk31c


Another thing, after you discovered the device-domain match, you should verify why only two of all disks are too busy. You should try to balance de I/O among the other disks. Try moving datafiles or log files to the other disks.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
David_854
Frequent Advisor

Re: Tru64 5.1b error

Ivan,
Good point and tips.


Adam,
Let us know how it went after you look at the drives.

David
Robert Walker_8
Valued Contributor

Re: Tru64 5.1b error

Adam,

To find out what disks you have do a showfdmn on the disk in question this should return a /dev/disk/dsknn in the config:

Eg:
# df -k
( note data trimmed for example)
dd5#temp 49306800 65104 38522960 1% /data/temp
#
# showfdmn dd5
Id Date Created LogPgs Version Domain Name
42150e88.000ec089 Thu Feb 17 21:37:12 2005 512 4 dd5

Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
1L 98613600 77045920 22% on 256 256 /dev/disk/dsk5c

This points out which dsk it is in this example dsk5c.

# hwmgr show scsi
SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST
HWID: DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH
-------------------------------------------------------------------------
78: 10 fred disk none 2 2 dsk5 [2/6/0]

This disk dsk5c is on a lun of 2/6/0 on in our case and EMC san.

Hope this helps.

Also another area if your database is doing lots of lookups may be that an index on your database has been dropped - usual practice when doing large uploads is to remove indexes and recreate later, however if someone forgot to apply an index then the whole process can slow down considerably when users get access to the system.

Regards,

Robert.
Adam Strobel
Frequent Advisor

Re: Tru64 5.1b error

OK I checked out the disks and from what I can tell they are fine. I know I have a few disks getting hit hard but the overall cpu usage is very low. Can I maybe have a memory problem?

Avenger:root #usage
UID PID PPID C STIME TTY TIME CMD
root 0 0 0.5 Oct 26 ?? 01:58:08 [kernel idle]
root 1032 947 0.6 Oct 26 ?? 13:52.16 /usr/bin/X11/o
oracle 26935 1 0.1 Oct 27 ?? 06:44:31 oracleRPT (LO)
oracle 52053 1 0.1 07:21:58 ?? 3:32.46 oracleRPT (LO)
oracle 52211 52201 1.3 07:30:17 ?? 14:52.19 oracleRPT (DE)
oracle 52251 1 9.1 09:17:25 ?? 1:15.79 oracleRPT (LO)
oracle 53479 1 6.4 08:49:28 ?? 15:19.19 ora_j000_RPT
oracle 53858 1 0.8 08:57:56 ?? 0:16.55 oracleRPT (LO)
oracle 53878 1 0.5 09:03:45 ?? 0:26.70 oracleRPT (LO)
oracle 53885 1 1.8 08:51:09 ?? 5:44.20 oracleRPT (LO)
oracle 53976 1 1.5 09:19:41 ?? 0:34.63 oracleRPT (LO)
oracle 54697 1 2.6 09:40:48 ?? 0:12.94 oracleRPT (LO)
Total CPU usage: 25.7
Avenger:root #
David_854
Frequent Advisor

Re: Tru64 5.1b error

Adam,
Again, what type of environment are you running. Is it NFS servers ?
If so, are these NFS Servers using TCP or UDP?

etc,

David
Ivan Ferreira
Honored Contributor

Re: Tru64 5.1b error

CPU usage is low, but the proccess run slow because they must wait for I/O to complete. That's what iowait means.

Is not a memory problem. If you have page ins and page outs in the output of collect, then the system is paging or swapping, there is when you have memory problems. In your case, you don't have page outs and the swap area is not in use.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
David_854
Frequent Advisor

Re: Tru64 5.1b error

Ivan,
Yes, but it is good to know what the environment is. In this case, it could be something that is already fixed in a patch.

David
Adam Strobel
Frequent Advisor

Re: Tru64 5.1b error

thanks Ivan and David

We do have one NFS share on this unix box but we are not using it. Basically this server is used for Oracle jobs and data warehouse storage.

Is this what you were looking for David?

Right now I'm running "sys_check -escalate" on the box and I will let you guys know what I find when this is done.

thanks again!!
David_854
Frequent Advisor

Re: Tru64 5.1b error

Adam,
That sounds good. You can also look at the options of the sys_check and run it with the performance mode.
sys_check -perf
Run a man on sys_check and take a look on step 4 of the samples.
4. The following command outputs only performance information:
# sys_check -perf > file.html



David
Adam Strobel
Frequent Advisor

Re: Tru64 5.1b error

Very cool.

Thanks David.