1751969 Members
4683 Online
108783 Solutions
New Discussion юеВ

Re: mailq

 
Fred Ruffet
Honored Contributor

Re: mailq

So, a load of 36, an idle percentage nearly 100% and server spending all its time in I/O. Sendmail is right your server is a little bit overloaded. Running processes seems to be waiting on I/O. You may have too slow I/Os on this box or too many concurent processes. Have a look at what processes are running and what they are doing. Once you'll resolve this, sendmail will do his job.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Muthukumar_5
Honored Contributor

Re: mailq

what are you getting with mailq -v
It will contains load details / message failure too.

Is the mail domains are getting resolved check as,

nslookup gmail.com

etc.
Easy to suggest when don't know about the problem!
Ricardo Macedo
Occasional Advisor

Re: mailq


Hi all!

Fred is right! My server is having I/O problems and sendmail is waiting to process messages!

I will verify the potencial I/O problem.

Thanks!
Ricardo Macedo
Bill Hassell
Honored Contributor

Re: mailq

A load average of 36 is extremely high unless you have a 32 processor system! The 99% system overhead indicates massive kernel activity, either due to I/O errors or possibly memory errors. Look at the end of /var/adm/syslog/syslog.log for error messages. If you did not load the diagnostics or they are misconfigured, there may not be any error messages.

A normal load is 1x-2x the amount of processors so if you have a 2 processor system, a normal load is 2 to 4, a high load is 10, and 36 is a serious condition.

sendmail is doing exactly what it is supposed to do: stop when load averages are too high. Depending on the version of sendmail you have, this limit may be 8 or 12 but can be adjusted in /etc/mail/sendmail.cf. Look for "load" in the sendmail.cf file.


Bill Hassell, sysadmin
Ricardo Macedo
Occasional Advisor

Re: mailq


Hi everyone,

My syslog.log file is logging the messages below:

Sep 9 16:52:44 sede3 vmunix: SCSI: Abort Tag -- lbolt: 9333319, dev: 1f041200, io_id: 400252c
Sep 9 16:52:44 sede3 vmunix: SCSI: Request Timeout -- lbolt: 9333319, dev: 1f041000
Sep 9 16:52:44 sede3 vmunix: lbp->state: 4060
Sep 9 16:52:44 sede3 vmunix: lbp->offset: ffffffff
Sep 9 16:52:44 sede3 vmunix:
Sep 9 16:52:44 sede3 vmunix: lbp->uPhysScript: fa7fd000
Sep 9 16:52:44 sede3 vmunix: From most recent interrupt:
Sep 9 16:52:44 sede3 vmunix: ISTAT: 22, SIST0: 04, SIST1: 00, DSTAT: 80, DSPS: 00000006
Sep 9 16:52:44 sede3 vmunix: lsp: be9900
Sep 9 16:52:44 sede3 vmunix: bp->b_dev: 1f041000
Sep 9 16:52:44 sede3 vmunix: scb->io_id: 400252a
Sep 9 16:52:44 sede3 vmunix: scb->cdb: 28 00 00 00 00 10 00 00 04 00
Sep 9 16:52:44 sede3 vmunix: lbolt_at_timeout: 9333119, lbolt_at_start: 9330119
Sep 9 16:52:44 sede3 vmunix: lsp->state: 205
Sep 9 16:52:44 sede3 vmunix: lbp->owner: 1119c00
Sep 9 16:52:44 sede3 vmunix: bp->b_dev: 1f041100
Sep 9 16:52:44 sede3 vmunix: scb->io_id: 400252b
Sep 9 16:52:44 sede3 vmunix: scb->cdb: 28 00 00 00 00 10 00 00 04 00
Sep 9 16:52:44 sede3 vmunix: lbolt_at_timeout: 9330119, lbolt_at_start: 9330119
Sep 9 16:52:44 sede3 vmunix: lsp->state: d
Sep 9 16:52:44 sede3 vmunix:
Sep 9 16:52:44 sede3 vmunix: SCSI: Abort Tag -- lbolt: 9333319, dev: 1f041100, io_id: 400252b
Sep 9 16:52:44 sede3 vmunix: SCSI: Abort Tag -- lbolt: 9333319, dev: 1f041000, io_id: 400252a

I think I├В┬┤m having a disk problem.

Any sugestions?

Thanks in advanc
Ricardo Macedo
Bill Hassell
Honored Contributor

Re: mailq

Big disk problems. Use:

grep lbolt /var/adm/syslog/syslog.log

to see how many disks are affected. The "dev" value is the address of the disk(s). As far as fixing the problem, you need to identify where this disk is used. Start with ioscan -kfnCdisk to see all your disks. Match the hardware address from the lbolt message to a drive, then look below that line to see the device file like /dev/dsk/c1t2d0 or similar. Now take that device file and find it in vgdisplay -v to see the volume group using that disk. If the disk is mirrored, you can use lvreduce and vgreduce to remove the bad disk out of the volume group (several steps).

If the disk is not mirrored, replacement is very dependent on whether the disk is the boot disk or a peripheral disk.


Bill Hassell, sysadmin
Tom Danzig
Honored Contributor

Re: mailq

Look like a problem with disk /dev/dsk/c4t1d0. Better have it replaced soon!

BTW, you can adjust at what load average sendmail will stop sending messages. In the sendmail.cf file:

# load average at which we just queue messages
O QueueLA=8


You could change this to a higher value if desired, however, in this case you should reduce the load on the system.