Operating System - HP-UX
1821804 Members
3835 Online
109637 Solutions
New Discussion юеВ

LLT error on ServiceGuard CFS or oracle RAC

 
David Islas Gonz├бlez_1
Frequent Advisor

LLT error on ServiceGuard CFS or oracle RAC

Hi, I'm getting the next error on the sylog of my HP-UX 11iv2

May 2 10:44:36 erp05 vmunix: LLT INFO V-14-1-10019 delayed hb 160501 ticks from 2 link 1(lan5)

May 2 10:44:36 erp05 vmunix: LLT INFO V-14-1-10023 lost 3209 hb seq 494498 from 2 link 1(lan5)

May 2 11:14:28 erp05 vmunix: LLT INFO V-14-1-10019 delayed hb 179203 ticks from 2 link 1(lan5)

May 2 11:14:28 erp05 vmunix: LLT INFO V-14-1-10023 lost 3583 hb seq 498083 from 2 link 1(lan5)

May 2 11:14:28 erp05 vmunix: LLT INFO V-14-1-10019 delayed hb 179203 ticks from 2 link 0(lan1)

May 2 11:14:28 erp05 vmunix: LLT INFO V-14-1-10023 lost 3583 hb seq 498083 from 2 link 0(lan1)

May 2 11:17:44 erp05 vmunix: LLT INFO V-14-1-10019 delayed hb 19550 ticks from 2 link 0 (lan1)

May 2 11:17:44 erp05 vmunix: LLT INFO V-14-1-10023 lost 390 hb seq 498475 from 2 link 0 (lan1)

This is on my private LAN => 2 1000BaseT switches connected with a crossover between them for HA. 2 1000BaseT cards on each node (both of them connected to different switches).

I've seen this threads with no solution :S

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1049918

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1035555

I appreciate any suggestion.

REgards
8 REPLIES 8
Steven E. Protter
Exalted Contributor

Re: LLT error on ServiceGuard CFS or oracle RAC

Shalom,

Heartbeat delays are generally network issues.

Oracle RAC specifically warns against using a crossover cable for heartbeat. The solution is to get a gigabit switch, even a low end one and start using that.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
David Islas Gonz├бlez_1
Frequent Advisor

Re: LLT error on ServiceGuard CFS or oracle RAC

Hi Steven,

I think my post was confused.

I am working with two gigabit ethernet switches.

lan1 conected to switch1
lan2 conected to switch2

switch1 conected to switch2 with a crossover cable.

That way the gigabit switch is not a SPOF.

Re: LLT error on ServiceGuard CFS or oracle RAC

but these are heartbeat networks, not public networks - the way we prevent SPOFs is by having 2 seperate HB networks - not 1 HB network with 2 NICs on it. So you need to seperate out the ports that lan1 and lan5 are on into seperate VLANs (or plug them into entirely seperate infrastructure.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
David Islas Gonz├бlez_1
Frequent Advisor

Re: LLT error on ServiceGuard CFS or oracle RAC

Ok, So I should separate this 2 NICs into 2 different networks. BUT Oracle RAC only supports 1 private network. So I have SPOF againt, isn't it??

Veritas supports 2 private networks?? How can I configure that?? via ServiceGuard??
Armin Kunaschik
Esteemed Contributor

Re: LLT error on ServiceGuard CFS or oracle RAC

I'm not sure about the reason for that.
I had some VCS clusters in the past and 1 had a similar problem. It looks like every once in a while heartbeat packets get lost. But the cluster works still fine. All clusters had the same patch level by the way...
The network guys can't find any errors.. maybe a software call at Veritas is necessary.

First suggestion: Update to the latest VCS/LLT/GAB patchlevel. This decreased the errors but did not remove them.
If possible, put every heartbeat LAN into a seperate VLAN! There should be no traffic on the heartbeat LAN except heartbeat.

You can define more than 2 heartbeat LAN's either in VCS or in MC/SG.
Just add more interfaces and modify /etc/llttab or /etc/cmcluster/cmclconfig.ascii manually.
MC/SG needs to be shutdown while doing this.
VCS needs to be stopped with hastop -all -force and gab/llt must be unloaded:
gabconfig -U, lltconfig -U, then restart with /sbin/init.d/llt start etc...

My 2 cents,
Armin
And now for something completely different...
Armin Kunaschik
Esteemed Contributor

Re: LLT error on ServiceGuard CFS or oracle RAC

I'm not sure about the reason for that.
I had some VCS clusters in the past and 1 had a similar problem.
It looks like every once in a while heartbeat packets get lost. But the cluster works still fine. All clusters had the same patch level by the way...
The network guys can't find any errors.. maybe a software call at Veritas is necessary.

First suggestion: Update to the latest VCS/LLT/GAB patchlevel. This decreased the errors but did not remove them.
If possible, put every heartbeat LAN into a seperate VLAN! There should be no traffic on the heartbeat LAN except heartbeat.

You can define more than 2 heartbeat LAN's either in VCS or in MC/SG.
Just add more interfaces and modify /etc/llttab or /etc/cmcluster/cmclconfig.ascii manually.
MC/SG needs to be shutdown while doing this.
VCS needs to be stopped with hastop -all -force and gab/llt must be unloaded:
gabconfig -U
lltconfig -U
then restart with
/sbin/init.d/llt start
/sbin/init,d/gab start
/sbin/init.d/vcs start

My 2 cents,
Armin
And now for something completely different...
Armin Kunaschik
Esteemed Contributor

Re: LLT error on ServiceGuard CFS or oracle RAC

Ooops. Sorry, I got an error message after reviewing my reply.. which obviously does not mean that nothing was posted :-(
And now for something completely different...

Re: LLT error on ServiceGuard CFS or oracle RAC

David,

Actually a little more investigation turned up a patch that resolves this with Serviceguard CFS - PHNE_35353:

http://www8.itrc.hp.com/service/patch/patchDetail.do?BC=main|search|patchDetail{PHNE_35353,{hpux:11.23,}}|&patchid=PHNE_35353&sel={hpux:11.23,}

HTH

Duncan

I am an HPE Employee
Accept or Kudo