1820057 Members
3040 Online
109608 Solutions
New Discussion юеВ

Re: Weird LVM behavior

 
Mike Smith_2
Advisor

Weird LVM behavior

 
20 REPLIES 20
Stefan Farrelly
Honored Contributor

Re: Weird LVM behavior


Looks like a serious connection problem with your EMC. Call EMC and get them to check their EMC frame for any errors. If you reboot your server and do an ioscan and dont get the volumes you expect then youve got a problem connecting to the EMC - or an EMC bin file problem.
Im from Palmerston North, New Zealand, but somehow ended up in London...
John Palmer
Honored Contributor

Re: Weird LVM behavior

It's definitely a problem with the EMC frame. All your data volumes appear to have been lost.
Mike Smith_2
Advisor

Re: Weird LVM behavior

That's the interesting part. I can reboot the machine all I want, the devices never are shown in ioscan or sam, but the machine comes up and uses those data volumes like nothing is wrong.

Those data volumes constitute a database, and it's working normally.

I suppose not only was this question of what's wrong, but also why does it work irregardless of what I see?

The /dev/dsk and /dev/rdsk files are there for the missing volumes. Is that and /etc/lvmtab all that's needed to make it work?
Tim Malnati
Honored Contributor

Re: Weird LVM behavior

I'm in agreement that the problem probably resides within the EMC. I'm a bit curious about some other indications that you have not included in your post. The two in particular is the output from inq and the indications you are seeing in SymmConsole. Does inq see everything? Are mappings diplaying correctly? Any errors or other stange behavior coming up on the EMC side? Are these straight volumes or bcv's? Are you using powerpath? Is this K box just another server or does it have a different function than the rest of the rest of the machines?

I'm also a bit confused about the quantity of gatekeeper devices that are defined. Twelve devices for two interfaces just seems like a lot to me. I have to claim a little ignorance here; I don't know all the considerations involved, but EMC techs that I've worked with have not suggested creating gatekeepers like this.

I know this is more questions than answers but I'm trying to expand my knowledge here a little bit too. The problem that you are dealing with is very strange; totally different that other EMC issues that I've encountered.
Mike Smith_2
Advisor

Re: Weird LVM behavior

 
Tim Malnati
Honored Contributor

Re: Weird LVM behavior

Thanks for the response. I'm trying to do some learning here too.

This is bizarre! Inq is normal but ioscan is confused. And the machine boots, fsck's, and mounts everything like there is nothing wrong. Like I said, bizarre. Another curious thing is the 'Invalid argument' error message from vgscan. This may be normal with a vgscan error though, I don't see enough vgscan errors to know any better. I'm starting to suspect a possible error in ioscan processing (other than the obvious), but I really don't have much of a clue as to what this error could be caused from at this point. I'm assuming that things responded normally at some point in the past. Has anything changed since that point in time from a configuration or patch point of view?

I realize you have alternate paths to the gatekeepers. My confusion comes from EMC never suggesting more than two to me. I'm no EMC wiz kid, and I have no formal EMC training either. I'm just trying to grasp an understanding why so many more with only two scsi interfaces and twelve total partitions defined.
Mike Smith_2
Advisor

Re: Weird LVM behavior

The 12 total partitions aren't 12 at all. They're six, and they're the gatekeepers. INQ shows ALL the volumes, 12 data volumes, and 6 gatekeepers as seen by each SCSI interface.

It would be represented by having device files of both c0t1d2 and c4t1d2. These files are the same volume as seen by each interface.

All the machines get the 'invalid argument' after trying to do c5t2d0. It's the empty CDROM.

Mike Smith_2
Advisor

Re: Weird LVM behavior

The INQ vs. 'insert any HPUX tool here' conflict is really why I posted it here under 'LVM'. I'm not convinced it has much to do with the EMC. I could be wrong, but it just doesn't seem like it.
John Palmer
Honored Contributor

Re: Weird LVM behavior

It sounds very much as if all the 'query' commands such as those used by ioscan and vgscan are not getting any response from the EMC.

Where LVM is specifically driving a SCSI device that it knows about then all is well.

I presume that commands like 'vgdisplay' and 'pvdisplay' are ok.

I'm no expert on EMC. Does it have any facility to stop hosts doing the sort of 'what's on this bus' query commands?

I still think that this is an EMC issue.
Mike Smith_2
Advisor

Re: Weird LVM behavior

I can't say vgscan and ioscan get no response from the EMC, as the gatekeepers show up every time.

I can't block bus scans on the EMC, nor would that explain the gatekeepers showing up, but not the data volumes.

VGdisplay and pvdisplay do come up normal, including their alternate links.

I'll run it by EMC and see what they say.
Tim Malnati
Honored Contributor

Re: Weird LVM behavior

I was referring to the 12 partitions thinking in terms of data partitions from the EMC side; the same thing that you called 12 data volumes. When I read through your inq output this became obvious and it's also obvious that you have six gatekeepers.

I seem to recall that gatekeepers are not mission critical to the continued operation of the data volumes. I was told once that there was no requirement to have gatekeepers redundantly defined on both scsi buses (although preferred). If a scsi bus were to be lost during operation, alternate links would pick up, but there was no specific need for a gatekeeper at this point. I'm working from memory here, so this could be very wrong either on my part or the EMC rep who told me. There is a possibility that there may be something wrong with ioscan, but I'm certainly not convinced that this process is at fault where it is stable in broad terms. But I'm also thinking that the gatekeepers may have something to do with this where I think they are part of the mix that reports what devices are there to the ioscan process. Or in other words, is it possible that one or more of the gatekeepers is corrupted somehow?

Sorry for missing the CDROM thing. It went right over my head at the time. As I said before, I'm no EMC expert. My target is your problem, but gaining some additional knowledge along the way is a big reason I participate in the forums.
Mike Smith_2
Advisor

Re: Weird LVM behavior

You're right, gatekeepers need not be redundant, but it could potentially help if they are.

There are 12 volumes @4.3GB apiece for ~50GB.

I know 4.3GB volumes on an EMC is considered excessively granular (at least, it is to me), but I had a limited amount of disks we could purchase at the time, and a broad range to apply them to.
Wodisch
Honored Contributor

Re: Weird LVM behavior

Hi Mike,
while "ioscan" reports not all drives/targets, does "diskinfo" work on those
"missing devices"? That could be a clue, at least...
What is the timeout (pvdisplay) for those drives?
...still looking for more hints what is going on...
Wodisch

Re: Weird LVM behavior

Couple of things I'd check...

Does 'ioscan -kfC disk' show the same as 'ioscan -fC disk' - i.e. does what the kernel think is there match with what a physical scan of the bus suggests is there ?

You say all the device files are present in /dev/dsk what does the output of 'lssf /dev/dsk/*' return?

Is EMCs Volume Logix software installed on this, or any other machine attached to the saem EMC rig - Volume Logix controls which hosts can see certain devices in a rig, I've never come across any problems with it but...

Have EMC actually dialled into the rig and checked everything out?



I am an HPE Employee
Accept or Kudo
Mike Smith_2
Advisor

Re: Weird LVM behavior

Diskinfo results: (Character filename came from INQ output.)

diskinfo -v /dev/rdsk/c4t0d1
io_search failed: No match found.

ioscan -kfC disk output matches ioscan -fC disk output.

lssf output:

lssf /dev/dsk/c4t0d1
sdisk card instance 4 SCSI target 0 SCSI LUN 1 section 0 at address ??? /dev/dsk/c4t0d1
Abbott Vascular
Occasional Advisor

Re: Weird LVM behavior

have you tried an insf -e? This sometimes works when an ioscan -fn C disk doesn't return the expected volumes when dealing with EMC.
IE Admins
Advisor

Re: Weird LVM behavior

Hi,

I think that you are seeing a common problem with HPUX & EMC - something which I have seen before, but why you see the gatekeepers is strange.
I don't feel that you have a problem with the EMC per see. The number of gatekeepers is OK. In fact you need gatekeepers to run EMC software such as SymmManager etc.

The solution I have used in the past is as follows.

DON'T REMOVE YOUR BOOT/ROOT device files!!

I have done the following online several times with large Oracle DB's and haven't run into any problems. However you should check out the solution before you proceed.

1. Remove both /dev/dsk & dev/rdsk device files with:
rmsf -H your_hardware_path - abbreviate your hardware path so that you remove all device files on the path.

2. check that the device files you expected to remove have in fact gone from both /dev/dsk & /dev/rdsk.

3. Re-Create the device files
insf -e -H your_hardware_path (as above)

4. Look at your hardware again
ioscan -fC disk (you won't see the EMC)

5. Re-Create the device files AGAIN
insf -e -H your_hardware_path

6. Look at your hardware again!!
ioscan -fC disk - The EMC devices SHOULD NOW be there!!

I hope your problem is solved.

Regards
Mike Smith_2
Advisor

Re: Weird LVM behavior

Not meaning to sound like a bonehead, but what do you mean by 'abbreviate your hardware path so that you remove all device files on the path.'?

For example, the missing devices have HW paths of 8/8.0.1 thru 8/8.12.1 with the alternate link being 10/8.0.1 thru 10/8.12.1.
IE Admins
Advisor

Re: Weird LVM behavior

Mike,

Sorry for any confusion. The EMC layout you have is quite simple as you only have one logical disk per target. So if you wanted to remove a "single" device driver eg c0t15d0 the hardware path for instance would be 8/8.15.0

To remove "all" gatekeepers of c0t15d0-5 would be a hardware path of 8/8.15.

To remove "all" devices on the controller say c0 would be a hardware path of 8/8

So by limiting the hardware path allows for controlled removal of one, some or all device files attached at a controller in a single pass.

To get the hang of it pick a single logical say c0t0d1 and follow the instructions and see what happens. If this is successful the do the lot at one time.

Should you need advice my email is weaver@integral.com.au

Good luck.
Mike Smith_2
Advisor

Re: Weird LVM behavior

Just for everyone's information, I figured out why I was seeing what I was seeing.

Patch PHKL_21607 fixes the inability of HPUX 11.0 to support devices that have skipped LUN IDs. My missing volumes had LUN IDs of 1, targets 0-12. There were no volumes with a LUN ID of 0. The gatekeepers were still listed because they started at LUN 0 and went to LUN 5.

So, if you assign LUNs and you skip a number in sequence of LUN IDs, they'll vanish unless this patch is present.

FYI....

Mike