Simpler Navigation for Servers and Operating Systems - Please Update Your Bookmarks
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
If you have bookmarked forums or discussion boards in Servers and Operating Systems, we suggest you check and update them as needed.
Operating System - Tru64 Unix
cancel
Showing results for 
Search instead for 
Did you mean: 

Tru64 5.1B crash Process_tempid_list

Kevin Shannon_1
Occasional Visitor

Tru64 5.1B crash Process_tempid_list

I've been messing around with a Tru64 install on a couple of ES40s. So I thought I'd post this question here rather than burden support with a call because I'm playing. I moved single member cluster install from one ES40 to another. When it booted I didn't like all of the numbers that the devices were coming up as, for example the first kgpsa came up as emx0, but the second one came up as emx4. I'm far too anal to let that fly, that should be emx2 and I'm going to make it so...

Well after deleting some files that I know are more important than clean underwear and restoring them from files created in memory when the CD boots I get the following error message and a crash.

process_tempid_list: New hardware ID is not unique.

This one stumps me, I can boot into single user with a kernel parameter change (hwc_auto_fix), but when I run /sbin/mountroot I get the error message. What I want to know is how does this process know that the HWID is not unique, and how do I get it not to care so that I can mount my boot disk where it's supposed to be and continue on my merry way?

I know that I can rebuild the cluster, that's not a problem, I'm doing this more as an exercise so that if I ever corrupt my production hwmgr database, I might be able to fix it and continue on, rather than rebuild the cluster, as I've had too in the past.

Thanks in advance
14 REPLIES
Hein van den Heuvel
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

I suppose you mucked with /etc/ddr.db and /etc/ddr.dbase ?
You checked out 'man ddr_config' and 'man ddr.base' ?

Tru64, since version V5.0 and onwards really tries hard to have persistent device number over reboot and rebuilds. Once it is seen a given WWID, and handed it an HWID it want to keep it just so... for cluster support.
Yes, it will look around at 'other disks' while installing to pick up old names/numbers. So you really need to install on a clean disk, amongst clean (or disconnected) disks to renumber from 1.

Now, I'm pretty anal myself, and always wanted my disks to have 'nice' numbers, even though that does not really make a difference. (dsk1 is not much fast then dsk27 :-).
For that, I use dsfmgr -e and -m.
In doing so I would for example have dsk10..19 be Oracle Redo and Archive.
dsk20..29 Oracle data
dsk90..99 Online backups

We never went so low as to bother with emx numbering :-).

Cheers,
Hein.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

System cloning is not supported on v5.x systems, so moving one disk to another system will not work and is not supported leading to an unsupported machine from HP side. Have a look to the best practise how to clone Tru64 systems.

Why moving disks to other cluster members if you can simple add/remove cluster members on the fly?
Help() { FirstReadManual(urgently); Go_to_it;; }
Kevin Shannon_1
Occasional Visitor

Re: Tru64 5.1B crash Process_tempid_list

Well actually it was all the .dat files in the /etc directory and all the dec_ files in the etc directory as well as the ddr files. Basically everything in the /etc directory that starts with a 'd'. I just moved them out of the way and then replaced them with the files created by the CD-Rom. I'm familiar with moving the SCSI devices around. It's great for tape drives. I thought that there might be a way to do that with other devices but I have not found it.

Then I thought that since I've had my hwmgr database creamed twice that it might be a nice execise to try to restore it, or replace it. Restoring it is no problem, but recreating seems to be an issue.

Since I knew from the start that this was not really the way HP wanted to see it's software installed I felt that opening a case with support was the wrong way to go about finding help, but I thought that this forum might be able to help, maybe someone had run into this before and could share that experience. I understand that for this cluster to be supported it will have to have been installed from scratch. But I also have supported clusters that I have had to reinstall from scratch because of hwmgr database corruptions. It's my biggest fear and I wanted to know if there is a way around it.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

thats the reason for a valid backup ;-)

There are enough best practise cookbooks out to protect your cluster, so why not read and use it?

Why not using the method to connect/create a new member and remove the old one prior to move the disks around?

Btw. if a support engineer will detect unsupported ways/methods on your machine it will reject support or change to "best effort". This is critical for a production system.


Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

BTW. The supported way to change the naming of the adapters would be to use the "hwmgr -remove name" and "hwmgr -add name" command, e.g.:

hwmgr -remove name -entry mchan1
hwmgr -add name -component_name mchan \
-component_num 0 -component_typ CONTROLLER \
-parent_name pci -parent_num 1 -slot 2
hwmgr -show name
.
.
.
n/a: mchan0 stocli3 CONTROLLER pci1 slot 2
.
.
.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

Han,

but booting a system not generated for the hardware is absolutly not supported and doesn't make sense because it is easier to create a new cluster than fixing problems due to the unsupported way...
Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

Ralf,

Where does it state that? I agree that doing this is not a best practice.

However I was trying to point out that there are supported commands that can be used to fixed the adapter naming. There are valid reasons for wanting to do that.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

Cloning is only supported as described within the best practice article. Engineering statet many time that moving a disk to a totaly other system is unsupported.

Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

Ralf,

You can not state it that way. "Totally different system" is not an absolute definition.

The original author of this topic replaced an ES40 with another ES40. This definitely falls outside the scope of "totally different system".
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

Han,

a tottaly different system means:
cards are plugged in different slots, different configuration of the machine.
So a ES40 is not equal to a ES40 without checking the firmware, moving the cards to the same slots and using the same cards!

Why doing all that nonsense if it is much faster to create a new cluster member? So if doing the right thing: using and modifing the cluster creating disk prior to install the cluster you only need this disk to recreate the whole cluster. Easy!
Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

Ralf,

Your solution simply will not work on a single node cluster. Note that single node clusters are fully supported!

I also disagree with your statement. This is not a fully different system. Why else would we have implemented these commands in the first place.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

Han,

I - as an HP member can not give you any unsupported or unofficial method to do what you want. Because it is not official supported by our engieering team. There well documented methods and ways how to backup a cluster for desaster recovery or to create "supported" clones, but not by moving one disk to another system - there are restrictions!

In a single node cluster it is also practical to create a new cluster member. You only need additional disks nothing else.
But taking one disk and use it on another system is not supported by HP. Period.

If you want to play around - you are welcome but don't open a call within an HP support center if a problem exists.
Help() { FirstReadManual(urgently); Go_to_it;; }
Han Pilmeyer
Esteemed Contributor

Re: Tru64 5.1B crash Process_tempid_list

Ralf,

I'm part of HP engineering and what you say is just NOT true.
Ralf Puchner
Honored Contributor

Re: Tru64 5.1B crash Process_tempid_list

Han,

as I can see within peoplefinder you are not part of Tru64 Engineering.

But as I've seen the best practise with title "moving system disks" was removed, the engineering search database switched off.
I have had a lot of discussions with Tru64 engineering and many reinstalls of customers on "cloned" systems. There are many restrictions if moving disks between systems!

From technical view: Booting a disk on a technical different system (e.g. different slots, different firmware, cards) leads to new devicenumbering, minor/major codes and boot will fail. So a cleanup of the hardware databases will be necessary, there is an unsupported script out doing this job - but this is an unofficial script and not supported by HP!

Help() { FirstReadManual(urgently); Go_to_it;; }