Operating System - Tru64 Unix
1839140 Members
4237 Online
110136 Solutions
New Discussion

Re: DS20E filesystems missing

 
mastsrl
Frequent Advisor

DS20E filesystems missing

DS20E with: Digital UNIX V4.0F (Rev. 1229); Fri Jun 2 16:25:05 GMT-0300 2000

Mylex controller(cannot remember the code) with external disks and internal cage

Several filesystems belonging to the array have dissapeared:

previously i had:
> df -k
Filesystem 1024-blocks Used Available Capacity Mounted on
root_domain#root 1.000.480 75755 917816 8% /
/proc 0 0 0 100% /proc
usr_domain#usr 4.718.592 1573870 3114656 34% /usr
baan_domain#baan 5.678.384 921497 4737384 17% /baan
dati2_domain#dati2 17.781.760 15188823 2464720 87% /dati2
indici_domain#indici 34.168.032 28536913 5615056 84% /indici
mdii_domain#mdii 11.668.808 6325466 5329280 55% /mdii

now i have:
Filesystem 1024-blocks Used Available Capacity Mounted on
root_domain#root 1000480 93629 899944 10% /
/proc 0 0 0 100% /proc
usr_domain#usr 4718592 1649837 3038920 36% /usr
baan_domain#baan 5678384 921497 4737384 17% /baan
dati_domain#dati 34167520 26706026 7444408 79% /dati
/dev/rz5c 34507762 1 31056984 1% /discobackup


discobackup is a newly added drive with UFS

fstab is ok with everything listed, upon booting it crashed asking for manual FSCK and detected tons of errors, using the mylex utility the arrays check ok, all drives online.

the drives missing are RE1C, RE4C and RE2C, all ADVFS

running a verify in INDICI and MDII results in:
Can't get set info for domain
error E-VD_DMNATTR_DIFF (-1079)
unable to get info for domain
error: -1079

DATI2 verify returns error -1067

trying to mount INDICI and MDII return I/O Error

trying to mount DATI2 returns:
domain is not activated
inconsistency detected

Running a test on the SRM prompt runs ok in all drives and controllers

any ideas on what to do?
15 REPLIES 15
Rob Leadbeater
Honored Contributor

Re: DS20E filesystems missing

Hi,

The first question to ask is what changed when you added discobackup ? What hardware was altered ?

> fstab is ok with everything listed, upon
> booting it crashed asking for manual FSCK
> and detected tons of errors

Knowing what those errors were would probably be useful...

> Running a test on the SRM prompt runs ok
> in all drives and controllers

What test ? Please can you show the actual output rather than what you think is OK...

The output of the SRM commands show dev and show config would be useful to see, as would the console output as the system boots, so we can see what re devices are being detected...

Cheers,

Rob
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

Rob,
there where no hardware changes adding the disk, it was added to the pci controller for the drive cages

the test "test" ran and ran, there wasn't time to let it finish but all drives where advancing their read counters at unison.

i'll see what i can do about the show commands, not having telnet to copypaste the results makes it a hassle.

where can i find the boot logs?
Rob Leadbeater
Honored Contributor

Re: DS20E filesystems missing

Hi,

> there where no hardware changes adding the
> disk, it was added to the pci controller
> for the drive cages

You said earlier that the drive cage was attached to the Mylex RAID controller...

It now appears to be on a standard SCSI controller based on the device /dev/rz5.

The output of "uerf -R | more" will give you the binary error logs, including the most recent boot entry. You might have to scroll down a bit to get to the bit you need. You should also be able to see previous boots from where you can work out what the hardware used to be...

Cheers,

Rob
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

Rob, a couple things i forgot to reply:
1) how can i output "show config" and "show dev" to file so that i can upload it easily?
2) no no, by drive cage i meant the internal hot-plug SCSI cage, that's not into the mylex controller, it's in the PCI SCSI controller that came with that option.
The Mylex is connected to two external disk towers
3) i'll try to run uerf today and see what comes out
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

here's what a tech sent me as output of the command, i think it's missing a lot of data....
Rob Leadbeater
Honored Contributor

Re: DS20E filesystems missing

Hi,

The uerf output shows some errors with re0 although from your original post, that doesn't appear to be one of the disks you were having issues with.

re0 at xcr0 unit 0 (unit status = CRITICAL, raid level = 1)

There's also no sign of re4 in the startup messages. The output of "ls -lR /etc/fdmns" would be useful to see, as that will give the details of which devices are in which domains.

You said earlier that the "mylex utility" says the drives are OK, which obviously conflicts with the Critical status. I'm guessing that as re0 has failed, that could be having a knock on effect on the other devices. You might also want to get the physical connections checked to the external disk shelves.

Can you confirm which mylex utility was run...? If memory serves, you'll have to run ra200rcu from the SRM (typically from floppy) to get to the RAID configuration utility.

Cheers,

Rob
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

Rob,
i'm attaching the output
don't worry about re0, it's been like that for a long time(a failed, missing disk) before this happened.
the external towers connection appear to be fine judging from the ok light on all drives and the utility,

correct, the ra2000rcu utility was run from floppy on the SRM and shows all RAIDs optimal except for the missing one
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

any other suggestions/ideas?
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

anyone else?
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

we're still with this problem, any other insight would be awesome
Rob Leadbeater
Honored Contributor

Re: DS20E filesystems missing

Hi,

I've still not seen the output of "ls -lR /etc/fdmns" they're a lower case L, not the number 1. That will help to determine which physical devices are missing.

If possible, the output of "show dev" and "show config" from the SRM prompt (>>>P00) would be very useful, as that will show the full hardware details of the machine.

This will require the OS to be shutdown.

Cheers,

Rob
cnb
Honored Contributor

Re: DS20E filesystems missing

This doesn't look good for one:

PCI device at bus 0, slot 6, function
_0 could not be configured:
Vendor ID 0x9004, Device ID 0x7895,
_Base class 0x1, Sub class 0x0
_Sub-VID 0x9004 Sub-DID 0x7895
has no matching entry in the PCI
_option table
PCI device at bus 0, slot 6, function
_1 could not be configured:
Vendor ID 0x9004, Device ID 0x7895,
_Base class 0x1, Sub class 0x0
_Sub-VID 0x9004 Sub-DID 0x7895
has no matching entry in the PCI
_option table

As Rob pointed out, show config and show dev from SRM would reveal more hardware details.

What's in slot 6?

Include more of the uerf log file if you can.

Might be useful to also run sys_check.

IMHO I would also suspect/replace the Fuji drive with apparently non-supported firmware at rz5 at some point with one that is. It may work, but when things break down it could be anything from defective hardware to incompatible configurations issues.

Here's what's *officially* supported for DS20e systems: http://www.compaq.com/alphaserver/options/asds20e/asds20e_options.html


hth,
mastsrl
Frequent Advisor

Re: DS20E filesystems missing

Rob, how do i output "show dev" and "show config" to a text file from the SRM(and how do i access it afterwards), their outputs are very large to type manually...

mastsrl
Frequent Advisor

Re: DS20E filesystems missing

here's the output of ls -lR /etc/fdmns
total 56
-r-------- 1 root system 0 Aug 7 1999 .advfslock_baan_domain
-r-------- 1 root system 0 Jul 23 2003 .advfslock_dati2_domain
-r-------- 1 root system 0 Jun 5 2008 .advfslock_dati_domain
-r-------- 1 root system 0 Aug 7 1999 .advfslock_fdmns
-r-------- 1 root system 0 Mar 23 2002 .advfslock_indici_domain
-r-------- 1 root system 0 Aug 7 1999 .advfslock_mdii_domain
-r-------- 1 root system 0 Aug 7 1999 .advfslock_root_domain
-r-------- 1 root system 0 Feb 22 2007 .advfslock_root_nuevo
-r-------- 1 root system 0 Aug 7 1999 .advfslock_usr_domain
drwxr-xr-x 2 root system 8192 Jun 5 2008 baan_domain
drwxr-xr-x 2 root system 8192 Jun 5 2008 dati2_domain
drwxr-xr-x 2 root system 8192 Jun 5 2008 dati_domain
drwxr-xr-x 2 root system 8192 Feb 22 2007 indici_domain
drwxr-xr-x 2 root system 8192 Feb 22 2007 mdii_domain
drwxr-xr-x 2 root system 8192 Feb 22 2007 root_domain
drwxr-xr-x 2 root system 8192 Apr 1 11:33 usr_domain

/etc/fdmns/baan_domain:
total 0
lrwxrwxrwx 1 root system 24 Feb 22 2007 vol-rz0h -> /dev/vol/rootd
g/vol-rz0h

/etc/fdmns/dati2_domain:
total 0
lrwxr-xr-x 1 root system 9 Feb 22 2007 re5c -> /dev/re5c

/etc/fdmns/dati_domain:
total 0
lrwxr-xr-x 1 root system 9 Jun 5 2008 re0c -> /dev/re0c
lrwxr-xr-x 1 root system 9 Jun 5 2008 re3c -> /dev/re3c
lrwxr-xr-x 1 root system 24 Jun 5 2008 rootdg.vol-rz2d -> /dev/vo
l/rootdg/vol-rz2d

/etc/fdmns/indici_domain:
total 0
lrwxr-xr-x 1 root system 9 Feb 22 2007 re1c -> /dev/re1c
lrwxr-xr-x 1 root system 9 Feb 22 2007 re4c -> /dev/re4c
lrwxr-xr-x 1 root system 24 Feb 22 2007 rootdg.vol-rz2e -> /dev/vo
l/rootdg/vol-rz2e

/etc/fdmns/mdii_domain:
total 0
lrwxrwxrwx 1 root system 9 Feb 22 2007 re2c -> /dev/re2c
lrwxr-xr-x 1 root system 24 Feb 22 2007 rootdg.vol-rz2f -> /dev/vo
l/rootdg/vol-rz2f

/etc/fdmns/root_domain:
total 0
lrwxrwxrwx 1 root system 23 Feb 22 2007 rootvol -> /dev/vol/rootdg
/rootvol

/etc/fdmns/usr_domain:
total 0
lrwxrwxrwx 1 root system 24 Feb 22 2007 vol-rz0g -> /dev/vol/rootd
g/vol-rz0g
#


i don't know how to output from SRM to a text file and they're impossible to type by hand
Rob Leadbeater
Honored Contributor

Re: DS20E filesystems missing

Hi,

> i don't know how to output from SRM to a
> text file and they're impossible to type by hand

I assume you're using a graphics console. You could hook up a serial console, and connect using Hyperterminal or similar and cut and paste the output from there...

Cheers,

Rob