Re: Reboot Schedule for HP-UX 11iv3 system.

Pete Randall · ‎08-24-2010

Tip: Bob's DRD link won't work because of the parentheses. If you want to put parentheses around a link, use spaces to separate the two. In this case, just click the link then delete out the trailing parenthesis in the resulting 404 URL.

Pete

Pete

Candace Pettit · ‎08-24-2010

Steven, I do not have all the details on the why. My boss has explained that it was 17 years ago when systems were much smaller, did not have as efficient of internal hardware monitoring as we have now and did not have as efficient of operating systems or application programs. Their IT professional at that time recommended nightly reboots and gave sufficient reasoning for them that they felt it was the best choice.

And yes, Bob, our /var has plenty of space now. We have not had any issues relating to running out of room on any of our filesystems. We've only been on our new system since April at this time.

Candace Pettit · ‎08-24-2010

And Bob, one of the things I was looking for were suggestions of log files to keep an eye on. I can write shell scripts to do any necessary trimming. I just need some guidelines on what to look at there and just regular "cleanup" that is normal for sys admins to schedule or perform.

Bob E Campbell · ‎08-24-2010

There are many answers to that and many have already been discussed in detail by folks that know more in other threads. My short answer is:

1. /var/adm/syslog/*log
2. Make sure root email is seen by someone
3. Depends on what you are using the system for

Spend some time looking for "HP-UX log files" in google and I bet several threads in these forums will show up.

James R. Ferguson · ‎08-24-2010

Hi Candace:

My bet is that your boss worked on systems with poorly coded, poorly debugged application code that leaked memory.

Once a process allocates memory (as with the 'malloc()' system call), memory is added to the process's "heap". A subsequent 'free()' deallocates the memory *but* only returns it to the heap, not to the operating system. This is designed to be a performance tweak when a subsequent 'malloc()' wants memory it has the chance to take it from the heap.

Should a program have been poorly written such that it fails to 'free()' memory from its heap, eventually the program will terminate when it can't get any more memory or cause *other* programs to swap as the available pool of *system* memory reaches critically low levels. Programs that misbehave by failing to appropriately allocate and deallocate memory are said to have "memory leaks".

Since a leaking program returns all of the memory it acquisitioned to the operating system at large when the program terminates, a short-running program isn't usually a problem. Long-lived processes, however, may slowly consume more an more memory and as your boss may have found, can be "fixed" by restarting them --- a reboot being one blind choice.

Regards!

...JRF...

Steven Schweda · ‎08-24-2010

> [...] a reboot being one blind choice.

Blind, because even in that case, restarting
the resource-hungry application should be all
that was needed, not restarting the whole
system.

> [...] Their IT professional at that time
> recommended nightly reboots and gave
> sufficient reasoning for them that they
> felt it was the best choice.

Next time, get it in writing.

> Based on your past experience why not
> schedule a monthly patch window. [...]

Or try running for two days. Then, if that
seems to work, try it for four days. Then
eight. And so on. Quit extending the
interval when you can't stand the strain, or
when you find some actual reason for doing a
reboot.

Candace Pettit · ‎08-24-2010

The recommendation I'm going to make to my boss is that we schedule it for once a week and monitor our resource usage for a week or two.

If there are no adverse affects, then we'll try a month. Matching it to the patch schedule is a good idea and if we don't experience any adverse affects I think that's a good option for us.

Thanks for all your input.

doug hosking · ‎08-24-2010

It's hard to make absolute statements about what's best without knowing much more about your environment, but I'll make a few general comments.

I generally agree with the idea of extending the reboot frequency to far longer than the daily schedule you have now. HP-UX itself is easily capable of going many months or years between reboots. Although occasionally a defect like a memory leak will slip through the testing processes, those get serious attention by HP (and most major application vendors) when they are noticed, so if you're keeping current with patches, that should rarely be a problem.

I worked for many years at a company that was a major user of HP-UX systems and was fortunate enough to have very stable power. We had a number of systems that had not been rebooted in over 3 years and were still happily running (though they were obviously not current on all patches).

I can only think of a few reasons that you might benefit from a more frequent reboot schedule.

1) I have heard of cases where some HP-UX systems were so reliable that nobody on site remembered the proper processes for rebooting. This is more likely to be a problem as sites migrate to different generations of hardware, such as only having a few PA-RISC systems left after a migration to Integrity serves, or when there is staff turnover. Going through the process manually frequently enough to stay comfortable with it seems like good practice, though that doesn't mean it has to be done on all production machines.

2) I have run across cases where an admin did something stupid (like deleting /stand/vmunix or some other critical file) and it wasn't noticed for so long that it became a problem later. Problems of this type tend to be noticed during reboots. Learning about them before the old backups get overwritten can sometimes make the problems easier to deal with.

3) There is marginal benefit in running fsck and similar integrity checkers occasionally, to help keep small problems from becoming big ones. Given the reliability of modern hardware and software, "extra" fsck runs are probably of less value than in years past.

Many other factors, such as security needs, redundancy, tolerance for down time, maturity of key applications, stability of power, availability of staff, etc. can all influence the best practices for any given environment. Generally speaking, I think you could likely easily get away with reducing your daily reboot schedule for HP-UX to a monthly or quarterly schedule, depending on your patching needs, if you can't identify specific reasons to do more frequent reboots.
Don't let (mis)behavior of other operating systems cloud your judgment about HP-UX.

Viktor Balogh · ‎08-25-2010

Hi,

We have some systems with a year of uptime, (we patch regularly, every year). At this large uptime you want to watch out one thing: the majority of the hardware problems come at the next reboot after a long enough uptime. (With "long enough uptime" I mean more than 500 days, even around 1000 or more days.)

I know about a bug in the scsi driver of 11.11, where around ~480 days an overflow occurs and the syslog gets fulled with error messages. So maybe some bugs are uptime related, but I think that the latest 11.31 should run stable for years.

****
Unix operates with beer.

doug hosking · ‎08-25-2010

Good point, Viktor. I was thinking about mentioning that in my reply, though that gets into disaster recovery and other areas not well suited to short replies. That's a "pay me now or pay me later" situation. Hitting those type of failures on a staggered basis may be much less painful than hitting them simultaneously on multiple machines, such as after a site-wide power failure. Some of those types of failures show up on simple reboots, while others are more likely after power cycling when devices may go through more exhaustive self-tests, more thermal changes, etc.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Reboot Schedule for HP-UX 11iv3 system.