Operating System - HP-UX
1834089 Members
2274 Online
110063 Solutions
New Discussion

Re: strange error after upgrading to 11i from 11.00

 
SOLVED
Go to solution
Ken Penland_1
Trusted Contributor

strange error after upgrading to 11i from 11.00

Hey all:

I am not sure if anyone will be able to help me with this problem, as the only issues we have are with home-grown software, but here goes.

One of our systems we reciently moved to a faster machine and at the same time upgraded from 11.00 to 11.i. We moved from our old V-class server to a rp7420. Anyways, we have some old programs written in C which we no longer have the source code for. Ever since the upgrade, some wierd things have been happening when users try to run these programs. if you log in about 10 times to the machine, 6 out of those 10 times works just fine, however, the other times the program errors out...it doesnt look like a problem with the ports themselves because if it doesnt work on a certain port one time, if you log out and come back in on the same port it works..so I am not sure what is causing the problem...the reason I think that someone here may be able to help is because I get this error message when it fails:

getreq: unable to determine TASO user ID: Error 0

now, taso is the group that the user needs to be in to run the script, so it looks as if under some unknown circumstance, the program is unable to determine the groups the user is in or something like that...the groups command works, and as far as I can tell everything is the same between the environments from when it works and when it doesnt..I guess I am hoping that someone knows of a possible bug with 11i and the "getreq" process?

Any thoughts on what may be causing this will be appreciated!

Ken
'
24 REPLIES 24
Jeff Schussele
Honored Contributor

Re: strange error after upgrading to 11i from 11.00

Hi Ken,

Well one possibility could be the primary group the user belongs to IF they reside in multiple groups.
Check the user's /etc/passwd entry to see if they belong to the TASO group as primary & if not then check the TASO /etc/group entry & make sure the user is listed on that line.
Then you can solve this by linking /etc/logingroup to /etc/group
ln /etc/logingroup /etc/group

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Steven E. Protter
Exalted Contributor

Re: strange error after upgrading to 11i from 11.00

Looks like your passwd or group file has been changed in a way your application does not like.

pwck

grpck

This will let you know where the problem is. You will then either have to manually edit the file in question or use useradd or groupadd to get the group you need put back.

Further, your application may have used numeric user id and group and may not function properly until the numeric user id and groups match the way the system was prior to upgrade.

Note that after an upgrade like this many applications require relinking or recompile.

Congrats on the new hardware.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

no joy, /etc/logingroup IS linked to /etc/group...as for what you said about checking their primary group, the tasos group is no ones primary group, I made that my primary but still have the same problem...
'
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

Steven:

Thanks for the reply, grpck and pwck -s come back both clean (we actually run that every day to make sure nothing gets hosed up)

as for group id's or user id's changing, I dont see that happening either...when we upgraded, all data is stored on an EMC, all we did is link the data up to the new system. (for the custom stuff in vg00, we tarred up everything and moved it, stuff like /tcb/auth and /etc/passwd etc..)
'
A. Clay Stephenson
Acclaimed Contributor

Re: strange error after upgrading to 11i from 11.00

One possible answer is stale/invalid pwgrd (passwd and group caching daemon) entries. I would kill the pwgrd process (your system will run just fine without it) and see if the behavior improves.
If it ain't broke, I can fix that.
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

wow, I got all excited with that one....I tried about 6 times and it worked every time....7th time bombed out...

as for what was said before about recompiling..I have been putting some thought processes to it and that should not be the case either....

the reason we dont have the source code is because we have been using this program since 94 or so and had no reason to change it, that was before my time so I dont know if it was orignally on a 9.X box, but I do know it survived upgrading from 10.20 to 11.0 with no problems.

'
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

I dont know if this will help or not, but the only thing that is consistant with this problem is if a user logs on and it works, it will ALWAYS work from that login session...if they log in and it doesn't work, it will ALWAYS fail from that session...that lead me to believe that it was a port problem possibly, but ruled that out cause if it fails on say, /dev/pts/0 and I log out and log back in, it could work the next time on /dev/pts/0
so it isnt "completely" random.
'
A. Clay Stephenson
Acclaimed Contributor

Re: strange error after upgrading to 11i from 11.00

Unfortunately getreq() is not a standard function and generally error 0 (errno ?) indicates no error. Because you say the behavior seems to be tied to a pty/tty session, I would compare stty -a outputs between good and bad sessions. That may point you in the right direction.

I am actually leaning towards some sort of timing loop in your code that almost works perfectly with the faster hardware.
If it ain't broke, I can fix that.
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

nope, stty -a shows the same for both good and bad sessions...so no clues there....I don't know C, so I was hoping getreq was some function that could be looked at...It could be that the new hardware is too fast for the program running on it, so it acts flaky, but there is something not clicking in my mind because of the fact that it always works for certain logins....I am stumped...

Well, the way I look at it, shame on them for nuking the source code.

Thanks everyone for your thoughts...if you think of anything else to try, feel free to chime in! ;)
'
Jeff Schussele
Honored Contributor

Re: strange error after upgrading to 11i from 11.00

Hi (again) Ken,

Well IF it *always* works for certain logins then you have to concentrate on account specific features like .profile or .login files. Check the actual environments themselves for these users including any & all sourced files. Go so far as to backup the "offending" user's home dirs & copy in a "good" user's home dir in it's place.
Also I would verify that it's not workstation-specific - i.e login with a working ID on a station that's problematic.
I bet it'll boil down to an ID or station specific anomaly. Maybe even something as simple as what term type the station reports.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
A. Clay Stephenson
Acclaimed Contributor

Re: strange error after upgrading to 11i from 11.00

By any chance are your user's home directories automounted? It's possible that the system is unable to mount the home directories. Are you running NIS+? LDAP? Trusted or Vanilla passwds? It's also possible that you have a corrupt utmp or wtmp file.
If it ain't broke, I can fix that.
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

when I said it ALWAYS works for a login I did not mean it always works for certain users, when I log in as myself, if it works the first time, it will always work with that login session, if I log in as myself, and it doesnt work the first time, it will never work for that specific login session, so it isnt something to do with profiles etc...

As for the other suggestion, no, we do not use auto-mount nor NIS or LDAP. it is, however a trusted system, utilizing /tcb/auth directories.
wtmp is cleared every night...I cant find where anything is done with utmp, but it looks like that is cleared as well.
'
Stephen Keane
Honored Contributor

Re: strange error after upgrading to 11i from 11.00

Just to recap (as a latecomer to the party)

1. The program you are running is a C executable for which you don't have the source code.

2. The program is not actually involved in the login process itself.

3. When a user is logged in, the program will either work, or not work. If it doesn't work it will never work for that login session. If it does work, it will always work for that login session.

If the above summary is correct ...

If a user is in a login session and the program doesn't run, does doing any of the following allow the program to run.

a) su to another user and run the program

b) . the users own .profile (or equivalent for the shell they are running) and run the program.

c) su to another user and back to yourself and run the program.

Also, has the executable been stripped?



Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

Yes, your summary was correct, and I tried all your suggestions:

a) su to another user and run the program

if it doesnt work for me, it doesnt work when I become another user either.

b) . the users own .profile (or equivalent for the shell they are running) and run the program.

sourcing in the profile doesnt change anything, still doesnt work

c) su to another user and back to yourself and run the program.

going back to myself from the other user, no change, still doesnt work if it didnt work the first time

Also, has the executable been stripped?
I am not familiar with that term, what do you mean by stripped?
'
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

a new development to add to this problem...it isnt only this one program, it seems we have another program with the same symptoms, this one however does not generate any visable errors, when you run this program, it shows a menu of several items to select, if a user is in a certain group, it displays a certain option...about 75% of the time when they run it, they see all the options they need to, the other times it is missing menu options like they are not in the group they are supposed to be in. logging out and back in resolves it just like with the first script...this too is a C binary of which we no longer have source code to.
'
Stephen Keane
Honored Contributor

Re: strange error after upgrading to 11i from 11.00

I mean stipped of all debugging information. If you do

# file foo

where foo is the name of your executable you will see "-not stripped" if the file hasn't been stripped. It would be interesting to see the output of the command regardless, also chatr output too.
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

$ file /prod/appls/aura/aura
/prod/appls/aura/aura: PA-RISC1.1 shared executable dynamically linked -not stripped
'
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: strange error after upgrading to 11i from 11.00

Stripped means that the symbol table has been removed from the executable. You can test for this by running "nm myexe". Man nm for details. Nm can give you some idea of the functions being called.

Given the additional data, there is still a fairly high probability that utmp is corrupt. You could use fwtmp to rebuild it. Man fwtmp for details. You may simply want to null the file and reboot.

Now, do you have ANY logins /group names longer than 8 characters --- and, no it doesn't matter if these are the problem logins or not. The question is do you have ANY? If this is legacy code, you must play by legacy rules. Have you made sure that all filesystems have at least some headroom?


If it ain't broke, I can fix that.
Stephen Keane
Honored Contributor

Re: strange error after upgrading to 11i from 11.00

As A Clay Stephenson says, you can nm the executable to see the functions within in. I guiess we are interested in any function name containing the strings 'grp', 'pwd', 'passwd' or 'group'

Also what does chatr give you, I'm also interested in which libraries it's using.

Also, a brief summary of WHAT the executable actually does would be nice!
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

I will null out utmp and see about getting a time to reboot...as for groups and users, most are 7 chars long, we have a couple that are 8 chars, but none bigger than that.
'
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

To be honest, I am not sure WHAT the application does, they just told me it was broke, I know that overall the script is used to track who has access to what systems, and is used by our security group, but as for what it DOES, I dont know.

Attached is the output from nm, I grepped for grp, pwd, passwd and group, and only found:
generate_group_opt_msg| 30156|extern|code |$CODE$
but attached the entire output...greek to me.

here is the output for chatr:
/prod/appls/aura/aura:
shared executable
shared library dynamic path search:
SHLIB_PATH disabled second
embedded path disabled first Not Defined
shared library list:
dynamic /usr/lib/libcur_colr.1
dynamic /usr/lib/libc.1
shared library binding:
deferred
global hash table disabled
plabel caching disabled
global hash array size:1103
global hash array nbuckets:3
shared vtable support disabled
static branch prediction disabled
executable from stack: D (default)
kernel assisted branch prediction enabled
lazy swap allocation disabled
text segment locking disabled
data segment locking disabled
third quadrant private data space disabled
fourth quadrant private data space disabled
third quadrant global data space disabled
data page size: D (default)
instruction page size: D (default)
nulptr references disabled
shared library private mapping disabled
shared library text merging disabled
'
Stephen Keane
Honored Contributor

Re: strange error after upgrading to 11i from 11.00

The application uses libc.1 (std C library) and libcur_col.1 (curses library).

It calls getlogin() see man (3c) getlogin, which uses the file /etc/utmpx

It calls getpwnam (search password file by user name) and endpwent (but not setpwent).

It calls getgrnam (search password file by group name) and endgrent (but not setgrent.

It calls getuid and setuid.

It doesn't appear to access the password or group files by uid or gid.

So it looks like A Clay Stephenson is on the right lines, but I'd also null /etc/utmpx as well.

Re: strange error after upgrading to 11i from 11.00

Ken,

Here's an off-the-wall suggestion - move the /var/adm/ps_data file and run 'ps -ef' to generate a new one.

The ps_data file is used by 'ps' to reference TTYs and associated users. I'm thinking that there might be something strange in that file that's affecting the application...

Good luck...

Steve Hamilton
Ken Penland_1
Trusted Contributor

Re: strange error after upgrading to 11i from 11.00

I nulled out utmp and utmpx and rebooted the box, and this resolved the problem, thanks to all that helped with this stumper!
'