Operating System - HP-UX
1832680 Members
3044 Online
110043 Solutions
New Discussion

No Login at System Console

 
Ralph Grothe
Honored Contributor

No Login at System Console

Hi,

first I thought this matter was due to network issues,
viz. that e.g. the listen() queue as defined by SOMAXCONN was set too low in the TCP stack as it usually defaults to 20

# ndd -get /dev/tcp tcp_conn_request_max
20


But this morning when the clients again complained that they could not login via telnet (in fact every terminal service was affected, also SSH) I went at the server and tried to login through the system console (i.e. CO in GSP).

But even there I didn't get a Login.
Thus the only remedy was, though processes that already had sockets established could work, to initiate a TOC (i.e. TC in GSP).

From the perfmon data logs I could retrieve that the server wasn't overloaded.

So I think the only thing that could have prevented me from getting a login, could have been either the inability to spawn a getty, maybe because of a corrupt or missing /etc/inittab
(which wasn't the case

# diff /etc/inittab /usr/newconfig/etc/inittab
18,22d17
< ems1::bootwait:/sbin/rm -f /etc/opt/resmon/persistence/runlevel4_flag
< ems2::bootwait:/sbin/cat /etc/opt/resmon/persistence/reboot_flag
< ems3:3456:wait:/usr/bin/touch /etc/opt/resmon/persistence/runlevel4_flag
< ems4:3456:respawn:/etc/opt/resmon/lbin/p_client
< #ups::respawn:rtprio 0 /usr/lbin/ups_mond -f /etc/ups_conf


or by a massive lack of (pseudo) terminals.

Now having the last thought in mind and having a look at the only terminal related kernel tunables

# /usr/sbin/kmtune -q npty -q nstrpty
Parameter Value
===============================================================================
npty 60
nstrpty 60


this raised my suspicion.

Could it be that maybe those 60 max. pseudo and streams based pseudo terminal is possibly too low and should be increased?


What else could have been the reason?

Rgds.
Ralph

Madness, thy name is system administration
32 REPLIES 32
Jeff Schussele
Honored Contributor

Re: No Login at System Console

Hi Ralph,

Yes, 60 is *usually* low. Remember, now that telnet connections are streams-based, they may have to compete with applications who may create connection via streams as well.
We set ours to 512 or 1024 based on volume estimates.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Michael Steele_2
Honored Contributor

Re: No Login at System Console

If you've got another network inetd process running away with the system then this will cause you're problem. So start monitoring process acitivity with top and glance and 'sar'. Verify all your network config.s, etc.
Support Fatherhood - Stop Family Law
Dietmar Konermann
Honored Contributor

Re: No Login at System Console

Ralph,

a lack of pseudo ttys shouldn't prevent you from logging into the console. So I believe that another problem caused the symptom. A closer look at you TOC dump could shed some light on this. Maybe you should open a call with Ratingen? :-)

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Ralph Grothe
Honored Contributor

Re: No Login at System Console

Jeff,

I also think that 60 ptys is a bit low, and try to enforce an increase to at least doubled value in the next maintenance window.
However, as Dietmar stressed I by now also think that this cannot had been the cause which prevented me from logging in at the system console.

Michael,

this is a good hint to specifically monitor those servers that get fired up by inetd.
But since we've got the MWA suite on the boxes I'd preferebly do this through perfmon by defining a new application group of processes in /var/opt/perf/parm.
Nonetheless I'm currently struggling with the weird adviser syntax, and am about to start a new thread on this matter in ITRC.
So if you're interested in acquiring easy points watch the new threads.

Dietmar,

you hit it again.
I realized that when the machine booted up it had written a dump in /var/adm/crash from the dump device.
Thus I already issued a Support Call.
However I'm not sure how to transfer the - though zipped - quite large chunks of the thrown image:

# ll /var/adm/crash/crash.0/
total 62390
-rw-r--r-- 1 root root 672 Oct 6 10:12 INDEX
-rw-r--r-- 1 root root 25990901 Oct 6 10:12 image.1.1.gz
-rw-r--r-- 1 root root 5951414 Oct 6 10:11 vmunix.gz


But maybe I will be given a DIY Q4-Analysis Howto to scrutinize the dump myself.

Btw, the tombstone ts99 has nothing from the registers, and just shows zeros.
Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: No Login at System Console

To whom it may be of interest,

yesterday I tried together with HP support to analize the crash dump.
Unfortunately, we had to conclude that the dump wasn't written to /var/adm/crash completely because the space in there was too little.
Even a later on attempted reexecution of savecrash to another more free filesystem lead to the error message that the dump's header was useless.
This was probably due to paging or swapping activity that must have overwritten the initial dump segments of the swap devices since the machine's reboot after my TOC.
The lesson learned from this was that I created an extra filesystem of same size as the dump space on swap which I mounted at /var/adm/crash.
Thus, at least the next panic should give us a more fruitful post mortem.

Thanks for your participation
Ralph
Madness, thy name is system administration
Francis_12
Trusted Contributor

Re: No Login at System Console

Hello Ralph,

For the crash dump issue,

You need to check the space free for /var/adm/crash against the 'crashconf' output especially the field 'total pages included in dump'.

Hope this helps, Bye.

Francis DERDEYN - HP-UX ASCE.
Ralph Grothe
Honored Contributor

Re: No Login at System Console

Thanks Francis,

I checked it against the output of crashconf, which shows me that currently

# echo $(crashconf -v|awk '/pages included/{print$NF}')/256|bc -l
1098.40234375000000000000

abt. 1 GB will be dumped, provided the information I got regarding a page size of 4 KB is valid.
Madness, thy name is system administration
Michael Steele_2
Honored Contributor

Re: No Login at System Console

1 GB is probably too big. 400 mb is what the O/S dumps. You have the option to get other non - O/S information that's NOT application specific but I've never heard of anyone ever using it. Any this will be > 400 mb.

This is an OLD myth that's still around: "you need to dump the entire contents of the RAM."

Not true. In fact, even if you come up and bypass saving the dump after a panic, which many S.A.'s do because in a prodution environment some users can't wait the 20 extra minutes a dump sometimes takes, you can save the dump from run level 3.

- From run level 3 to save a dump use:

# savecrash -rf /tmp/file

I've always accepted the defaults so that's what I'm reccommending, in fact, I believe HP also reccommends this unless an application dump is needed.

Refer /usr/newconfig/etc/rc.config.d/savecrash for the default. Copy it into /etc/rc.config.d/savecrash.
Support Fatherhood - Stop Family Law
Michael Steele_2
Honored Contributor

Re: No Login at System Console

Sorry,

that's NOT application

...should read...

that IS application
Support Fatherhood - Stop Family Law
RolandH
Honored Contributor

Re: No Login at System Console

Hi Ralph,


if you increase the npty & nstrpty value. As I realy remember me the device files will not automatically increased.


Read also this document.


http://www4.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000067698199
DocID: HONCIKKBRC00000172

HTH
Roland
Sometimes you lose and sometimes the others win
Massimo Bianchi
Honored Contributor

Re: No Login at System Console

Hi,
may be trivial, but were there any other messages in the syslog.log, like nfile or proctable full ?


Since you are going to consider a kernel retuning, also these parameter can affect the inhability to login at the console.

Massimo
Ralph Grothe
Honored Contributor

Re: No Login at System Console

Micheal,

I'd fully agree that 4 Gig is probably far too big a dump area.

But whom should I trust in the first place,
the suppliers of HP-UX, or your more sensible statement ;-)

# sed -n '/CRASH_DIR:/,/CRASH_DIR=/p' /etc/rc.config.d/savecrash
# SAVECRASH_DIR:Directory name for system crash dumps. Note: the filesystem
# in which this directory is located should have as much free
# space as your system has RAM.
# SAVECRASH_DIR=/var/adm/crash
Madness, thy name is system administration
Michael Steele_2
Honored Contributor

Re: No Login at System Console

4 GB is fine. Fill it up with historical data like OLDsyslog.

For this example only ~260 MB is needed.

# crashconf -v
CLASS PAGES INCLUDED IN DUMP DESCRIPTION
-------- ---------- ---------------- -------------------------------------
UNUSED 319144 no, by default unused pages
USERPG 17664 no, by default user process pages
BCACHE 119167 no, by default buffer cache pages
KCODE 1762 no, by default kernel code pages
USTACK 766 yes, by default user process stacks
FSDATA 226 yes, by default file system metadata
KDDATA 34309 yes, by default kernel dynamic data
KSDATA 31246 yes, by default kernel static data

( 766 + 226 + 34,309 + 31,246 ) pages = 66,547 pages

( 66,547 pages ) * ( 4096 bytes / page ) = 272,576,512 bytes

272,576,512 bytes / ( 1,024 * 1,024 ) bytes = ~260 MB

NOTE 1: ( 1,024 * 1,024 ) = 1 mb
NOTE 2: memory page size = 4,096 bytes or 4 kb.
NOTE 3: Refer to /usr/share/docs/mem_mgt.txt
Support Fatherhood - Stop Family Law
Ralph Grothe
Honored Contributor

Re: No Login at System Console

Massimo,

there doesn't seem to be a match on nfile or proctable


# grep -Ei -e nfile -e 'proc(cess)?[[:space:]]*table' /var/adm/syslog/OLDsyslog.log || echo nill
nill


but I had excessive logging from an I/O subsystem reported which must have overwhelmed diaglogd


# grep vmunix /var/adm/syslog/OLDsyslog.log |tail
Oct 6 09:55:53 saturn vmunix: The diagnostic logging facility is no longer receiving excessive
Oct 6 09:55:53 saturn vmunix: errors from the I/O subsystem. 14 I/O error entries were lost.
Oct 6 09:56:03 saturn vmunix: The diagnostic logging facility has started receiving excessive
Oct 6 09:56:03 saturn vmunix: errors from the I/O subsystem. I/O error entries will be lost
Oct 6 09:56:03 saturn vmunix: until the cause of the excessive I/O logging is corrected.
Oct 6 09:56:03 saturn vmunix: If the diaglogd daemon is not active, use the Daemon Startup comm
and
Oct 6 09:56:03 saturn vmunix: in stm to start it.
Oct 6 09:56:03 saturn vmunix: DIAGNOSTIC SYSTEM WARNING:
Oct 6 09:56:03 saturn vmunix: If the diaglogd daemon is active, use the logtool utility in stm
Oct 6 09:56:03 saturn vmunix: to determine which I/O subsystem is logging excessive errors.



Any idea who could be the culprit?

Sorry, for coming up that late with this.
I should have cited syslog.log entries right at the beginning of this thread.

Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: No Login at System Console

Roland,

many thanks for the knowledge base link.
Madness, thy name is system administration
Kent Ostby
Honored Contributor

Re: No Login at System Console

Ralph --

You would need to run the diagnostics or open a hardware call with HP to have them diagnose the problem.

Usually if you get a system hang with the excessive I/O messages, you are looking at a problem with a disk, I/O card, or memory simm (i.e. NOT a cpu).

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Ralph Grothe
Honored Contributor

Re: No Login at System Console

Michael,

still laborating on crash dump.
I think that my assumptions and calculations based thereon were right.
Summing up all those pages that are marked with "yes" in the 3rd column yields the same sum that crashconf comes up with as being the total included.

# crashconf -v|awk '$3~/yes/{s+=$2};END{printf"%10.2f\n",s}';crashconf -v|grep pages\ included
488340.00
Total pages included in dump: 488340

However, I noticed that this comparison needs to be evaluated within a tick because the page number seems to vary quickly between retarded invocations.

As you confirmed that a page size is 4 KB I only need to devide the total by 256 to get the included pages total in MB, like I did on my awk oneliner above.

# crashconf -v|awk '$3~/yes/{s+=$2};END{printf"%10.2f\n",s/256}'
1908.01


Kent,

could you be a bit more specific what tool/command I might use for diagnostics.
See, I've got this many files alone in the run-time fileset of OnlineDiag alone

# swlist -l file OnlineDiag.Sup-Tool-Mgr.STM\*RUN|wc -l
503


Madness, thy name is system administration
Michael Steele_2
Honored Contributor

Re: No Login at System Console

Can you attach 'crashconf -v'?
Support Fatherhood - Stop Family Law
T G Manikandan
Honored Contributor

Re: No Login at System Console

Did you check your /var/adm/syslog/syslog.log OLDsyslog.log files.

I assume there are POWERFAILED messages for a particular disk.

REvert
Ralph Grothe
Honored Contributor

Re: No Login at System Console

T.G.,


damned missed it, of course you're right


# grep -i power /var/adm/syslog/OLDsyslog.log
Oct 6 07:51:57 saturn vmunix: LVM: vg[8]: pvnum=1 (dev_t=0x1f052700) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[8]: pvnum=2 (dev_t=0x1f057500) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[9]: pvnum=1 (dev_t=0x1f051500) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[10]: pvnum=0 (dev_t=0x1f056100) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[10]: pvnum=1 (dev_t=0x1f056200) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[10]: pvnum=2 (dev_t=0x1f056300) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[10]: pvnum=3 (dev_t=0x1f056400) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[10]: pvnum=4 (dev_t=0x1f056500) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[11]: pvnum=0 (dev_t=0x1f054100) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[11]: pvnum=1 (dev_t=0x1f054200) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[11]: pvnum=2 (dev_t=0x1f054300) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[11]: pvnum=3 (dev_t=0x1f054400) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[11]: pvnum=4 (dev_t=0x1f054500) is POWERFAILED
Oct 6 07:51:57 saturn vmunix: LVM: vg[11]: pvnum=5 (dev_t=0x1f054600) is POWERFAILED


T.G. how is the VG indexing to be read,
Pure cronologicly sequential order?
You see, if rather VGs with names like vgb?? such as this

# vgcfgrestore -l -f /etc/lvmconf/vgb01.conf
Volume Group Configuration information in "/etc/lvmconf/vgb01.conf"
VG Name /dev/vgb01
---- Physical volumes : 3 ----
/dev/rdsk/c4t5d0 (Non-bootable)
/dev/rdsk/c5t2d7 (Non-bootable)
/dev/rdsk/c5t7d5 (Non-bootable)



Where can I identify the hex addresses from the dev_t field?


Michael,


this is my crashconf after I had included the kernel code pages:

# crashconf -v
Crash dump configuration has been changed since boot.

CLASS PAGES INCLUDED IN DUMP DESCRIPTION
-------- ---------- ---------------- -------------------------------------
UNUSED 712900 no, by default unused pages
USERPG 1425206 no, by default user process pages
BCACHE 192367 no, by default buffer cache pages
KCODE 1888 yes, forced kernel code pages
USTACK 60172 yes, by default user process stacks
FSDATA 671 yes, by default file system metadata
KDDATA 110950 yes, by default kernel dynamic data
KSDATA 117286 yes, by default kernel static data

Total pages on system: 2621440
Total pages included in dump: 290967

DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME
------------ ---------- ---------- ------------ -------------------------
31:0x016000 88928 1048576 64:0x000002 /dev/vg00/lvol2
31:0x016000 1137504 3145728 64:0x00000b /dev/vg00/lvol11
----------
4194304

roughly one Gig

# echo $((290967/256))
113

Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: No Login at System Console

Oops, beware the strange shell arithmetic is a copy'n'paste mishap.

Btw, do you also always have to click the submit button 3-4 times before the HTTP post request is acknowledged?
Madness, thy name is system administration
Michael Steele_2
Honored Contributor

Re: No Login at System Console

Small lose of one digit otherwise I agree: 1.136 GB or 1,136 MB.

Also you are collecting an addional 1,888 pages which is insignificant in the calculation but not by default. Refer to 'forced':

KCODE 1888 yes, forced kernel code pages

Are you forcing "...kernel code pages..." by request from HP? A vendor? Coder?
Support Fatherhood - Stop Family Law
Michael Steele_2
Honored Contributor

Re: No Login at System Console

Small loss of one digit otherwise I agree: 1.136 GB or 1,136 MB.

Also you are collecting an addional 1,888 pages which is insignificant in the calculation but not by default. Refer to 'forced':

KCODE 1888 yes, forced kernel code pages

Are you forcing "...kernel code pages..." by request from HP? A vendor? Coder?
Support Fatherhood - Stop Family Law
T G Manikandan
Honored Contributor

Re: No Login at System Console

1.check your SCSI cables and connections.

Do you have a TERMINATOR fixed?

2.Make sure that you load the latest SCSI patches.

4.How about the I/O on the disks.

3.still it is erroring out,check for the timeout values of the disks.

pvchange -t 180

default they are 30 seconds.

There are lot of postings on this issue.
Just search the forums for this.

Revert on further help!