Operating System - Tru64 Unix
1752865 Members
3942 Online
108791 Solutions
New Discussion юеВ

msfs_mount error...

 
Derek Haining
Advisor

msfs_mount error...

Well, here goes my first question. During boot, an ES40 running Tru64 V5.1A issues the following messages:

Mounting / (root)
msfs_mount: The mount device does not match the linked device.
Check linked device in /etc/fdmns/domain
msfs_mount: Setting root device name to root_device RW

A "df" command shows that, as indicated, the root filesystem is not root_domain#root, but is root_device.

A check of /etc/fdmns/root_domain shows that the only link for root_domain is dsk0a. The
scu utility shows that dsk0 is on bus 0, target 0, and LUN 0. This is known as DKA0 by the SRM console. So, what is the problem?

The curious thing is that hwmgr shows the disks off of SCSI buses 0 and 1 (two disks on each bus), but it doesn't know what the device names are! That is, "hwmgr -view hierarchy" shows something like:


63: disk bus-0-targ-0-lun-0 WWID:gobbldy-gook

On another system, the output is:

39: disk bus-1-targ-0-lun-0 dsk0

Curiously, "hwmgr -view devices" outputs nothing at all! I've tried to look at the
hwmgr NAME subsystem, but that isn't there either.

The history on this that I have is that a KGPSA card was recently replaced. The hardware folks had some difficulty making the new card work, and the system manager noticed this problem after the system was "fixed".

No devices are currently in use off of the HSG80, so we don't think that the KGPSA is the problem. I think that the device database files are messed up somehow, but I can't seem to get them fixed.

I've booted off of the V5.1A CD-ROM, and exited the installation. From there hwmgr shows the devices just fine, and knows the dsk names. I tried copying the data files from /var/etc onto dsk0/etc, but that didn't fix the problem.

Oh, yes. Even more fun. dsfmgr core dumps
leaving some lock in place. I've been rebooting to clear that problem.

Any ideas?

Thanks,
-Derek

BTW, Where is Mike G. when I need him? :)
4 REPLIES 4
Ralf Puchner
Honored Contributor

Re: msfs_mount error...

Derek,

please read the device management section within the administration guide. With your type of troubleshooting and copy the databases are totally damaged.

First hwmgr uses a dynamic database updated every time the system boots. So booting the os from cd leads to a new database not identically with your database on the real boot device.

Btw. it is not a good idea to copy /dev/* to the boot disk because the devices are not representing the database!

Please try the following:

1. >>> boot -fl s (to boot to single user mode)
2. mount -u /
3. dn_setup -init
4. dsfmgr -K

After running this procedure check with the command

# hwmgr -show scsi

if the devices are identically to previous Id's (dsk0 is root device etc.) if not use hwmgr and dsfmgr to change the device names.

If this doesn't help you must restore a backup from /etc and /dev
Help() { FirstReadManual(urgently); Go_to_it;; }
Derek Haining
Advisor

Re: msfs_mount error...

Ralf,

Thank you for your suggestions, but they
don't solve the problem.

I've tried dn_setup -init, as well as some of the other dn_setup options. Also, as I indicated in my first post, dsfmgr core dumps.

That is, it did not work.

The "hwmgr -show scsi", if I recall correctlty, essentially shows the same data
as "scu show edt". I've just compared the
output and, as I thought, they show essentially the same data. The one major difference is that "hwmgr -show scsi" shows
the device names associated with each BTL. However, as I also indicated, there was no NAMES database! Thus, the device names were always blank, and a "hwmgr -view devices" returned NOTHING.

What we've found at this point is that
"dsfmgr -K" core dumps in single user
mode and leaves a session lock in place.
If, however, rather than trying to create
the device special files from single user
mode you simply exit and come up to
multi-user mode, dsfmgr is run (with the
-K flag, according to the documentation)
and it successfully creates the device
special files.

The original problem was eventually tracked
down to a setting in /etc/sysconfigtab. I see
that I didn't include some data in the original message. Here that is:

After getting back the system from the hardware folks who had replaced the KGPSA card, it was discovered that several of the persistent device database files had become corrupted. (Specifically, they had been overwritten with e-mail messages. How? You got me.) Thus started the odyssey of trying
to recreate these files. In addition, it was learned earlier today that a TZ89 on a local SCSI bus had been connected "improperly". Exactly what this means I don't know, but we could not see the tape drive as a result. That problem has been corrected.

Now, the system was using LSM to mirror the boot device. (/ and /usr) Because of the damage to the operating system, the mirror was
forcibly broken by disabling LSM. The original mount problem was caused by a missed setting in sysconfigtab. That was:

lsm_rootdev_is_volume=1

When this variable was set to 0, the msfs_mount problem went away. I believe that msfs_mount was looking for a device name of
root_vol (or /dev/vol/rootdg/rootvol). Not finding that, it issued the message.

Correcting this problem, however, did not fix the dsfmgr -K problem. It still dumps core if run in single user mode. Although your instructions were very similar to those I had used earlier, we tried to follow them as given. As I expected, the dsfmgr -K failed.
I don't know what is causing this problem.

Oh, the tape drive was being seen as an "unknown" device. It had device special
files in /dev/none. After quite a bit of futzing around, we now have /dev/tape entries for the TZ89.

Anyway, thanks for your help.
Ralf Puchner
Honored Contributor

Re: msfs_mount error...

Sorry, but you told us that you copied over several files used the command "hwmgr". And we found no link that you are using LSM or changed kernel parameters.....

Have you tried the dsfmgr -s and dsfmgr -v -F to verify and fix device special file problems?

Help() { FirstReadManual(urgently); Go_to_it;; }
Derek Haining
Advisor

Re: msfs_mount error...

Ralf,

Sorry I didn't get back on this topic sooner. I thought that I had indicated that
the original problem had been solved.

Out there on the HP web site (I couldn't
tell you exactly where at this point) is a
document that describes how to rebuild the
entire persistent hardware database. Roughly
it says to delete all of the /etc/dec*.db
files, as well as /etc/dccd*, /etc/dcdd*,
and all of the device-special files. Of
course, this is easiest to do if you boot
from the operating system CD-ROM.

This we tried. The copy I mentioned in the
first note was that I copied all of the
device database files that were generated
by the OS installation CD-ROM onto the hard
drive. I knew that these were in a consistent
state. I seem to remember that part of the
hardware database is rebuilt on every boot,
and part of it is persistent.

(To "ignore" the persistent portion when
booting the OS CD-ROM, you clear the SRM
environment variable BOOT_DEFDEV. Then the
procedure doesn't know what disk to look
at to find the persistent database, so it
doesn't look.)

The LSM bit was not revealed to me at first.
The system administrator who presented the
problem to me told me (at some point) that
they had been using LSM to mirror the boot
disk (/, /usr, /var, and swap), but that
that had been broken, and the system was
not using LSM any longer.

However, as I wrote in the last note, the
system administrator had missed one piece
of the LSM puzzle -- the /etc/sysconfigtab
entry. Once that was corrected, the mount
error went away.

This, however, did not correct the dsfmgr
problems. I did try dsfmgr -s as well as
dsfmgr -F -v, but this didn't work. At this
point I do not remember why.

What did work was to allow the system to
come up to multi-user mode. My interpretation
of these events is that >something< goes on
during the change from single-user mode to
multi-user mode that allows dsfmgr -K to work.
As a result, dsfmgr correctly created all of
the missing device special files, and we were
all set.

At this point, from my perspective anyway,
this is a dead horse. :) I only mention
the dsfmgr problems because I suspect that
other people could have similar problems
trying to execute dsfmgr in single-user mode.
The "dangling lock" problem is rather nasty.
I suggest filing a QAR on this problem.

Thanks very much,

-Derek