Operating System - HP-UX
1832207 Members
2715 Online
110040 Solutions
New Discussion

Quality Control on a Server rollout.

 
SOLVED
Go to solution
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

I've uploaded the results of a 2 hour perforamnce data collection run for analysis. Same point rules.

If I've varied lower on points assingments its because of two issues:

1) The original suggestion should have included more detail
2) I felt the suggestion duplicated other suggestions or info available in my original or subsequent posts.

vx_ninode is way too big. Stefan's going to get a bunny for that.

If I varied higher its because I'm in a good mood and/or have extra respect for Salad Heads.

I am conducting an analysis of those performance results. Something with regards to disk performance is bothering me.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

A. Clay,

No we've not fully followed all your suggestions.

Power tests have been run, and equipment is under maintenance.

Good stuff.

Stefan,

Thanks for the pvlinks suggestion, that got a bunny on its own merit.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steve Lewis
Honored Contributor

Re: Quality Control on a Server rollout.

1. I heartily recommend a DETAILED checklist, containing everything you can think of double-checking on the day, with a formal signoff form for a tester/user at the end, so when they come back afterwards wanting something else, you are covered.

2. Network config. IP routing table. Interfaces to all other systems up OK.

3. How long is the data migration going to take?

4. Will the new IP be the same? If not, are the users pre-configured to point to the new server?

5. traceroute -i on 11i.

6. Is it part of a NIS(+) structure?

7. host and user keys for SSH?




Pete Randall
Outstanding Contributor

Re: Quality Control on a Server rollout.

Steve,

FYI - I can't open your attachment. I'm not sure whether the problem is on my end or not - I get a 404.


Pete


Pete
Massimo Bianchi
Honored Contributor

Re: Quality Control on a Server rollout.

Hi,
a word about Stefan hint.

I know that LVM can use PVLINK to balance the load, but there are situation in which this is not possible, unless you have specific software.

... i'm searching for the thread but the engine is orrible. title was "do i need securepath" ...

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x87a944f56197d711abdc0090277a778c,00.html



This thread in enlightning for me as an example.
There should be a consultation with the HW vendor

Massimo
James R. Ferguson
Acclaimed Contributor

Re: Quality Control on a Server rollout.

Hi Steven:

You are correct with regard to LVM pvlinks. LVM does *not* load-balance between/among pvlinks. The primary link handles the I/O unless/until demoted because of failure or a manual 'pvchange -s' or a 'vgreduce'/'vgextend'.

As noted, LVM pv (alternate) links are for high-availablity: when you have two paths to a particular disk via two controllers, you put the primary path on controller-A and the secondary path on controller-B.

Regards!

...JRF...
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

itrc has a problem

perf data is at:

http://www.isnamerica.com/perf.tar.gz

This data will be pulled if too much bandwidth gets sucked up. It will eventually be pulled anyway.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Chris Wilshaw
Honored Contributor

Re: Quality Control on a Server rollout.

Steve,

I notice you mention telnet users.

One thing to check is that the telnetd line in inetd.conf is set up suitably.

There can be issues with the buffer size and/or timeout which leads to performance problems even on fast servers (screen drawing rates low etc).

You should try either

telnet stream tcp nowait root /usr/lbin/telnetd telnetd -b /etc/issue -s400 -z5

or

telnet stream tcp nowait root /usr/lbin/telnetd telnetd -b /etc/issue -TCP_DELAY

Chris
Stefan Farrelly
Honored Contributor

Re: Quality Control on a Server rollout.

Im not sure what the big debate is about here with pvlinks using both channels - A and B.

When I was at HP we used this 'crude' form of increasing available i/o throughput (perhaps load balancing is the wrong word) by alternating primary and alt (pvlink) paths when creating vg's, and we use it at my current site on all servers. It really helps throughput considerably (potentially double at peak load). We really notice the increased i/o throughput at month end. My boss loves it. Its reliable - tested by pulling paths and replacing them and the application continues and lvm self corrects.

James - so why the comment about using disks on path A for primary and path B for secondary. You simply do not need to do it this way. Dont you use alternatve pvlink paths like this to increase i/o ??

Im from Palmerston North, New Zealand, but somehow ended up in London...
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Chris,

good one. I will look into it.

The wait time figures in HP_perf_info.disk is 2.5 times what it is on our developers server. We are looking into the issue on the disk array.

If anyone notices something missing from the performance data which is from the script I often use, due to agency policy, I pulled the info section. It reveals too much concerning our security practices.

Notice how we're not using any swap.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Massimo Bianchi
Honored Contributor

Re: Quality Control on a Server rollout.

Hi James and Stefan.

I currently use pvlinks to balance the load, with no problem whatsoever.

But reading the thread the i linked over, i found that there may be limitations for some storage.

Alternating the path is excellent for performance, i too can assure, but from a couple of days i began to think that some additional checks may be worth..

Massimo



[please no points]
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Steve Lewis,



1. I heartily recommend a DETAILED checklist, containing everything you can think of double-checking on the day, with a formal signoff form for a tester/user at the end, so when they come back afterwards wanting something else, you are covered.

This thread is helping me develop the checklist

2. Network config. IP routing table. Interfaces to all other systems up OK.

Done, cool.


3. How long is the data migration going to take?

6-8 hours. Window is 9 p.m. Sat to 6 a.m. Monday, August 4.

4. Will the new IP be the same? If not, are the users pre-configured to point to the new server?

No. Several applications will not migrate due to a lack of certification on 11.11. We are pushing new icons for the Oracle apps. Once you connect to the right web server, you are on the new box. We have tested this.

5. traceroute -i on 11i.

Got the tee-shirt

6. Is it part of a NIS(+) structure?
No, Thank G-d.

7. host and user keys for SSH?
Public keys exchanged for root adabas database owner and possibly for oracle user.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
James R. Ferguson
Acclaimed Contributor

Re: Quality Control on a Server rollout.

Hi Stefan!

You and I are saying the same thing, only you better. Your diagram in your bunny-post expresses the idea best. The goal is to get I/O going to the volume group from two controllers. In the degenerate case where a volume group had but one physical disk, my point was that an alternate (secondary) link would not do I/O but would only "standby".

Warmest regards!

...JRF...
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Stefan,
et al

Don't these two figures seem a little out of whack

ninode 1024 - 1024
vx_ninode 16000 - 16000

I'm thinking this weekend dropping vx_ninode to 4000.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Just because Stefan got a bunny is no reason to stop posting.

I add to my own list for what I just found.

If cron does mailx -s for email.

Its a good idea to verify the smtp mailboxes for your new boxes, in case operations tries to be helpful and renames a box.

Or in case the sysadmin makes assumptions without verifying them.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Wondering if anyone had time to look at the kernel.

Over the weekend I lowered vx_ninode to 4000.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Stefan Farrelly
Honored Contributor

Re: Quality Control on a Server rollout.

Hi Steve,

we run ninode at 10,000 and vx_ninode at 9000 (as per HP's recommendation to us). I note on our servers (7400's) that sar -v shows only around 2500 of ninode being used - so we could lower ours to say 3000 and vx_ninode to 90% of that. What does sar -v show on yours ? (only when the apps been up for a while and lots of users on).

I find that ocassionally we, or users, run jobs which search filesystems for many files and this runs much better with ninode set larger - thats why we leave ours at 10,000.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Massimo Bianchi
Honored Contributor

Re: Quality Control on a Server rollout.

Hi,
another nice couple of checks:

- autonegotiation for nic (100FD/100HD)

- existance of support contract with proper intervention time (we found too late that one of our customer didn't buy it, and now openin support call is a nightmare...) and consequent list of contact persons

- in the event of a Disaster recover over a slightly different server, enough driver in the kernel to have a proper recognition of the hardware

Massimo
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Stefan's requested info in inode

These settings were taken prior to my kernel changes. vx_inode was 16000, now its 4000.
ninode was 1024 and has not been changed.

Isn't it interestng Stefan how HP gives the two of us toally different recommendations?

This was from during a oracle to disk backup test, pretty high load factor for us.

22:05:15 text-sz ov proc-sz ov inod-sz ov file-sz ov
22:05:16 N/A N/A 175/4096 0 1024/1024 0 921/18442 0
22:05:17 N/A N/A 176/4096 0 1024/1024 0 921/18442 0
22:05:18 N/A N/A 177/4096 0 1024/1024 0 938/18442 0
22:05:19 N/A N/A 178/4096 0 1024/1024 0 953/18442 0
22:05:20 N/A N/A 178/4096 0 1024/1024 0 953/18442 0
22:05:21 N/A N/A 177/4096 0 1024/1024 0 946/18442 0
22:05:22 N/A N/A 177/4096 0 1024/1024 0 911/18442 0
22:05:23 N/A N/A 175/4096 0 1024/1024 0 898/18442 0
22:05:24 N/A N/A 175/4096 0 1024/1024 0 898/18442 0
22:05:25 N/A N/A 174/4096 0 1024/1024 0 890/18442 0
22:05:26 N/A N/A 168/4096 0 1024/1024 0 830/18442 0
22:05:27 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:28 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:29 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:30 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:31 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:32 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:33 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:34 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:35 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:36 N/A N/A 167/4096 0 1024/1024 0 817/18442 0
22:05:37 N/A N/A 167/4096 0 1024/1024 0 817/18442 0

Massimo,

We have had our battles with auto-negotiate. We are manual on the Cisco switch. We are manual with hard coded configuration in the /etc/rc.config.d/hpbtlanconf file.

Machine configuration is frozen unless we find a good reason to play.

Keep it coming though, I'm working on a load test right now.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

To further answer Massimo's good questions:

Service contracts are good. We have three duplicate machines with Ignite tapes off site. If one machine goes down, it should be a trivial matter of bringing up our sandbox as the production machine.

If the whole site goes down, the equipment can be ordered on the DR site on an emergency basis, at which point we should just have to stock the Ignite tape in and boot.

Of course we might not have a shared disk array so fast so we'd have to order more local disk, but the Ignite configuration should be able to adapt to have all data local, even if we had to run unmirrored. I expect I could have a working system in a DR site about two hours after it gets powered up.

I'd rather have a hot DR site, and we have a D380 tasked for that job.

It will have a hot,online copy of our oracle app running 24/6 and a daily copy of our legady app, ready to take over service on my instructions.

Appropriate documentation exists should I become deceaseded(G-d forbid).

Note: A bunny is offered to A. Clay. He suggested some stuff that I didn't cover. Come and get it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Mark Greene_1
Honored Contributor

Re: Quality Control on a Server rollout.

Steve:

Because you didn't explicitly say, I have the following suggestions:

Have you setup DNS and tested it via nslookup in debug mode?

Setup lpsched (if needed)?

Modified the sendmail.conf to do domain masquarading if required by your smtp mail filter?

If ftp is to be used, have you modified /etc/inetd.conf to have it log?

Have you loaded the patch for the fiber card driver?

Have you tested the fiber card with fcmsutil to ensure the cards are good?

Have you configed and tested ntp?

Have you updated /etc/motd with some disclaimer about unauthorized access in horribly intimidating legalese?

Have you downloaded, installed, and run security_patch_check?

Have you verified /etc/shells contains the shells you need to use for the client connections, as well as /bin/false?

Have you updated /etc/securetty to limit direct root login to the console only?

HTH
mark
the future will be a lot like now, only later
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Mark,

Great questions, I'll answer in insert mode.

Because you didn't explicitly say, I have the following suggestions:

Have you setup DNS and tested it via nslookup n debug mode?

DNS is fully tested. Good one.


Setup lpsched (if needed)?

no lpsched, the easyspooler print spooler continues to confound us, but it is working well with everything but HR package which is NOT migrating.

Modified the sendmail.conf to do domain masquarading if required by your smtp mail filter?

We don't do that here. But at home, I'm working on that as an education project on an old D320.

If ftp is to be used, have you modified /etc/inetd.conf to have it log?

Yes. I wish we could get rid of ftp. In about 18 months we should be able to do it.


Have you loaded the patch for the fiber card driver?

Yes, Thank G-d.

Have you tested the fiber card with fcmsutil to ensure the cards are good?

Yes, we are going to test pvlinks on our sandbox so we can do it at a later date in production. Box is frozen so we won't roll out with pvlinks.

Have you configed and tested ntp?

Yes, the lousy clocks in these servers were the first to be noticeably off when the ntp server went down. Our ntp server is windows based and non-compliant and does not respond properly to ntpq -p command.

Have you updated /etc/motd with some disclaimer about unauthorized access in horribly intimidating legalese?

We used /etc/issue.

Have you downloaded, installed, and run security_patch_check?

We are a full Bastille shop, all security patches in except those that have come out in the past two weeks.

Have you verified /etc/shells contains the shells you need to use for the client connections, as well as /bin/false?

Yes, all is well there. Thought this one is making me think ....

Have you updated /etc/securetty to limit direct root login to the console only?

We still let me do root login from my desktop. When I have a decent ssh client on my PC we'll be disabling root login via telnet /etc/securetty

We are aware of the dangers, but with a trusted system and three second delay, the only people we fear can crack our complicated passwords for root are the ones with access to the safe with the password book in it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Keith Bevan_1
Trusted Contributor

Re: Quality Control on a Server rollout.

Steve,

Do you have an ftp daemon/process running. If so have you given/prevented access (ftpusers/ftpaccess)

Have you reviewed permissions for running last and lastb ?

Have you reviewed permissions on principal config files (inetd.conf, hosts, services, passwd) not an exhaustive list.

Have you tested a backup/restore

Have you checked the /tmp directory is not m
mounted on the root volume group.

Do you have a ups attached to the server and have you tried dropping the supply to see if the battery is utilized and the os conducts a graceful shutdown.

Finally, Configuration Documentation & procedures need to be reviewed/updated to reflect all the changes. Just in case you decide to run with your hard earned money and become a tax exile.

Keith

You are either part of the solution or part of the problem
Steven E. Protter
Exalted Contributor

Re: Quality Control on a Server rollout.

Yes Keith,

ftpuser has been updated.

More important, I tested it and found it didn't work. You have to get a new ftpd binary from hp for that functionality to work.

Clay suggested the power test and a few other details and if he'd post back in he'd get a bunny for it.

Documentation is being updated as we speak. I'm trying to have a meeting with the operators so they know how to handle the transition. As of today their manager has declined the meeting for them. I guess he likes his people to get blind sided.

sEP

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Khalid A. Al-Tayaran
Valued Contributor

Re: Quality Control on a Server rollout.


Hi,

A few things to look at:

1- sendmail + security fixes..

2- sam -r

3- ioscan -fn | more (all devices should be claimed)

4- kmtune (kernel parameters for Oracle..)

5- Documentation

6- Run CFG2HTML and print the HTML file as reference. A must have. Also run nickel if you have it.

7- Remember Rock Ridge extensions for 11i.