Re: Abnormalities

Wim Van den Wyngaert · ‎09-05-2004

Thanks for all the comments.

In my opinion, everyone that touches the VMS systems should be certified for what he does.

An operator-dummie just typing things must be certified in recognizing things that go wrong. To recognize things that go wrong, you need a lot of VMS knowledge. Thus operator-dummies are not allowed !

E.g. operators-dummie has task of reboot. Reboot fails for 1 of the 10 reasons means system down until the expert arrives. This should be avoided. (I set shutdown$decnet_minutes to -1).

Wim

Willem,

we already had system crashes during unattended weekends. The node rebooted and all applications (mainly sybase) restarted and on Monday, only 1 failed job was found. If applications were not started automatically, it would have been a mess.

But is your real problem not that the shutdown of VMS is lawsy ? Decnet is shut before you get the hands on the system. I have bypassed all this and do the shutdown completely, decnet included and after the applications have stopped.

Wim

Matt West · ‎09-05-2004

After skimming through the repsponses I don't recall seeing any mention of monitoring rules? The simplest thing is to script your health checks using a tool such as Robomon. If written to the right level, this should provide a quick and easy system check, without spending untold time chasing your tail.

Wim Van den Wyngaert · ‎09-05-2004

Matt,

We go one step further in some of applications. The applications "asks" to be monitored. This way you can stop and start the application without getting alarms (notice alarms on the screens of most monitoring systems : the users have to know which alarms to ignore). As an extra, a restart command can specify multiple nodes. Thus if a nodes fails, the monitoring system can restart it on another one.

Wim

Wim

Wim Van den Wyngaert · ‎06-06-2006

An old cow but something else can go wrong.

We had a tape drive giving errors. The node crashed and restarts but crashes again. This repeats itself for hours and is solved without intervention after 2 hours (scsi timeout ?). Have seen this before when having cpu problems.

Wim

Wim

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Abnormalities