Operating System - Tru64 Unix
1748150 Members
3508 Online
108758 Solutions
New Discussion юеВ

Tru64 5.1B crash Process_tempid_list

 
Kevin Shannon_1
New Member

Tru64 5.1B crash Process_tempid_list

I've been messing around with a Tru64 install on a couple of ES40s. So I thought I'd post this question here rather than burden support with a call because I'm playing. I moved single member cluster install from one ES40 to another. When it booted I didn't like all of the numbers that the devices were coming up as, for example the first kgpsa came up as emx0, but the second one came up as emx4. I'm far too anal to let that fly, that should be emx2 and I'm going to make it so...

Well after deleting some files that I know are more important than clean underwear and restoring them from files created in memory when the CD boots I get the following error message and a crash.

process_tempid_list: New hardware ID is not unique.

This one stumps me, I can boot into single user with a kernel parameter change (hwc_auto_fix), but when I run /sbin/mountroot I get the error message. What I want to know is how does this process know that the HWID is not unique, and how do I get it not to care so that I can mount my boot disk where it's supposed to be and continue on my merry way?

I know that I can rebuild the cluster, that's not a problem, I'm doing this more as an exercise so that if I ever corrupt my production hwmgr database, I might be able to fix it and continue on, rather than rebuild the cluster, as I've had too in the past.

Thanks in advance
14 REPLIES 14
Hein van den Heuvel
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

I suppose you mucked with /etc/ddr.db and /etc/ddr.dbase ?
You checked out 'man ddr_config' and 'man ddr.base' ?

Tru64, since version V5.0 and onwards really tries hard to have persistent device number over reboot and rebuilds. Once it is seen a given WWID, and handed it an HWID it want to keep it just so... for cluster support.
Yes, it will look around at 'other disks' while installing to pick up old names/numbers. So you really need to install on a clean disk, amongst clean (or disconnected) disks to renumber from 1.

Now, I'm pretty anal myself, and always wanted my disks to have 'nice' numbers, even though that does not really make a difference. (dsk1 is not much fast then dsk27 :-).
For that, I use dsfmgr -e and -m.
In doing so I would for example have dsk10..19 be Oracle Redo and Archive.
dsk20..29 Oracle data
dsk90..99 Online backups

We never went so low as to bother with emx numbering :-).

Cheers,
Hein.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

System cloning is not supported on v5.x systems, so moving one disk to another system will not work and is not supported leading to an unsupported machine from HP side. Have a look to the best practise how to clone Tru64 systems.

Why moving disks to other cluster members if you can simple add/remove cluster members on the fly?
Help() { FirstReadManual(urgently); Go_to_it;; }
Kevin Shannon_1
New Member

Re: Tru64 5.1B crash Process_tempid_list

Well actually it was all the .dat files in the /etc directory and all the dec_ files in the etc directory as well as the ddr files. Basically everything in the /etc directory that starts with a 'd'. I just moved them out of the way and then replaced them with the files created by the CD-Rom. I'm familiar with moving the SCSI devices around. It's great for tape drives. I thought that there might be a way to do that with other devices but I have not found it.

Then I thought that since I've had my hwmgr database creamed twice that it might be a nice execise to try to restore it, or replace it. Restoring it is no problem, but recreating seems to be an issue.

Since I knew from the start that this was not really the way HP wanted to see it's software installed I felt that opening a case with support was the wrong way to go about finding help, but I thought that this forum might be able to help, maybe someone had run into this before and could share that experience. I understand that for this cluster to be supported it will have to have been installed from scratch. But I also have supported clusters that I have had to reinstall from scratch because of hwmgr database corruptions. It's my biggest fear and I wanted to know if there is a way around it.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

thats the reason for a valid backup ;-)

There are enough best practise cookbooks out to protect your cluster, so why not read and use it?

Why not using the method to connect/create a new member and remove the old one prior to move the disks around?

Btw. if a support engineer will detect unsupported ways/methods on your machine it will reject support or change to "best effort". This is critical for a production system.


Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

BTW. The supported way to change the naming of the adapters would be to use the "hwmgr -remove name" and "hwmgr -add name" command, e.g.:

hwmgr -remove name -entry mchan1
hwmgr -add name -component_name mchan \
-component_num 0 -component_typ CONTROLLER \
-parent_name pci -parent_num 1 -slot 2
hwmgr -show name
.
.
.
n/a: mchan0 stocli3 CONTROLLER pci1 slot 2
.
.
.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

Han,

but booting a system not generated for the hardware is absolutly not supported and doesn't make sense because it is easier to create a new cluster than fixing problems due to the unsupported way...
Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

Ralf,

Where does it state that? I agree that doing this is not a best practice.

However I was trying to point out that there are supported commands that can be used to fixed the adapter naming. There are valid reasons for wanting to do that.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

Cloning is only supported as described within the best practice article. Engineering statet many time that moving a disk to a totaly other system is unsupported.

Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

Ralf,

You can not state it that way. "Totally different system" is not an absolute definition.

The original author of this topic replaced an ES40 with another ES40. This definitely falls outside the scope of "totally different system".