Re: How can I gather performance history for redhat linux

wvsa · ‎03-04-2009

Greetings fellow admins;

Wondering how to gather and keep performance stats for our linux blades. On hpux we can use glance and ovpa (measureware) software to extract data from ovpa logfiles (logappl, etc) to spreadsheets. How can this be done on linux, see there is glance and ovpa available for linux, is anyone using glance/ovpa? We have HP-SIM but have not used it much wondering if that is a option or is there (no doubt) another option.

Thank you for your responses, and will do my best to assign points.

Norm

Ivan Ferreira · ‎03-04-2009

You can use the sar command and schedule via cron. This is all done if you install the sysstat package.

Another, my preferred option, is to use collectl (collect for linux).

http://collectl.sourceforge.net/

Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Glenn S. Davidson · ‎03-04-2009

I've installed OPVA on Linux servers and it seems to work like the HP version. I'm able to collect similar information.

I use it primarily with OVO so I have never done exports or anything but I would imagine it would work the same.

Easiest way is to download the Linux version and check it out.

Conformity Destroys a mans initiative and independence. It supresses his powerful inner drive to do his own thing.

Steven McCoy · ‎03-05-2009

I've thrown together a "quick n dirty" script for us here (until we purchase the license for Measureware):

----------------------------------------------------
#!/bin/sh

date=`date +%Y/%m/%d\ %H:%M:%S`
date=": $date "
loadavg=`uptime |awk {'print $11'}|cut -d ',' -f1`
cpuidle=`top -b -H -n 1|grep Cpu|awk {'print $5'}|cut -d '%' -f1`
cpuload=$(echo 100 - $cpuidle| bc -l)
io=`vmstat |grep -v -e io -e bi|awk {'print $9, $10'}`
#perc=`sar -r 1 1|grep Average|awk {'print $4'}`
mem_output=`free -m|grep Mem|awk {'print $2,$3,$7'}`
mem_total=`echo $mem_output|cut -d ' ' -f1`
mem_used=`echo $mem_output|cut -d ' ' -f2`
mem_cached=`echo $mem_output|cut -d ' ' -f3`
mem_actual=$(( $mem_used - $mem_cached ))
mem_perc=$(echo "scale=2; $mem_total / $mem_actual" | bc -l)
swap=`vmstat |grep -v -e io -e bi|awk {'print $8, $9'}`

echo "$date: $loadavg" >> /rdisk01/stat/load-stat.log
echo "$date: $cpuload" >> /rdisk01/stat/cpu-stat.log
echo "$date: $io" >> /rdisk01/stat/disk-stat.log
echo "$date: ${mem_perc}" >> /rdisk01/stat/mem-perc-stat.log
#echo "$date: ${actual}MB" >> /rdisk01/stat/mem-used-stat.log
echo "$date: $swap" >> /rdisk01/stat/swap-stat.log
----------------------------------------------------

Alexander Chuzhoy · ‎03-05-2009

I use sar and it's great. It actually installs 2 cron jobs that sample the system:

*/10 * * * * root /usr/lib/sa/sa1 1 1
53 23 * * * root /usr/lib/sa/sa2 -A

But if you want to show the stats to the boss - better use graphs: mrtg,cacti or even gnuplot.

MarkSeger · ‎03-05-2009

Just to add a different perspective, and part of the reasons I wrote collectl, is I'm a firm believer in collecting a lot of data and collecting it often. The problem I find with sar is most people set it up using the default monitoring interval of one sample every 10 minutes. I suppose this is fine if you're looking for very coarse averages, but if you want to actually try and figure out what your system is doing that's not nearly frequent enough. collectl takes 10 second samples by default when run as a daemon and depending on the types of analysis I'm doing, I'll sometimes crank the sampling interval down to 1 second. Even at that level of granularity you're using less than 1% of the cpu and this will rarely effect the rest of what the system is doing. Of course your mileage may vary.

The other problem I have with sar is its output format. I want to be able to see everything that's happening each monitoring interval on the same line and that's why by default collectl displays its output in brief format. It you make your window wide enough you can see almost everything. Sar on the other hand chooses to include 2 decimal points of precision for virtually everything, wasting 3 columns per field. Do you really care how many MB/sec you're getting from your disk at that granularity?

On the other hand if you run sar at a monitoring frequency of 5-10 samples, you'll get useful data out of it. But then you still have to figure out how to visually parse it.

Have other played with collectl at all yet? One of my goals was to eliminate the need for all the *stat tools and others and I'd like to know if I succeeded or not.

Just a few comments to stimulate some discussion. ;-)

-mark

MarkSeger · ‎03-05-2009

Just to add a different perspective, and part of the reasons I wrote collectl, is I'm a firm believer in collecting a lot of data and collecting it often. The problem I find with sar is most people set it up using the default monitoring interval of one sample every 10 minutes. I suppose this is fine if you're looking for very coarse averages, but if you want to actually try and figure out what your system is doing that's not nearly frequent enough. collectl takes 10 second samples by default when run as a daemon and depending on the types of analysis I'm doing, I'll sometimes crank the sampling interval down to 1 second. Even at that level of granularity you're using less than 1% of the cpu and this will rarely effect the rest of what the system is doing. Of course your mileage may vary.

The other problem I have with sar is its output format. I want to be able to see everything that's happening each monitoring interval on the same line and that's why by default collectl displays its output in brief format. It you make your window wide enough you can see almost everything. Sar on the other hand chooses to include 2 decimal points of precision for virtually everything, wasting 3 columns per field. Do you really care how many MB/sec you're getting from your disk at that granularity?

On the other hand if you run sar at a monitoring frequency of 5-10 seconds, you'll get useful data out of it. But then you still have to figure out how to visually parse its output.

Have other played with collectl at all yet? One of my goals was to eliminate the need for all the *stat tools and others and I'd like to know if I succeeded or not.

Just a few comments to stimulate some discussion. ;-)

-mark

dirk dierickx · ‎03-05-2009

sar _is_ great, if you know how to use it. it's probably not as user friendly as glance.

ovpa works just as it does on hpux and solaris.

a wonderfull open source tool is collectd (http://collectd.org/index.shtml) and to a lesser extend cacti (http://www.cacti.net/screenshots.php).

MarkSeger · ‎03-06-2009

2 comments about sar:

If you do use it, set your monitoring interval to something that will actually show you what's really happening on your system. Using an interval of 10 minutes results in mush since you can't see any abnormalities.

One of the biggest problem I have with sar is its output format which does not make very efficient use of screen real estate. I want to see a bunch of columns of the same data, one line/sample, almost like a spreasdheet. This makes it much easier to see when something changes. It also makes it real easy to scrape that data and load it into other tools.

-mark

westb · ‎03-06-2009

Does collectl handle disks named like xvda, xvdb etc. (xen guest disks) or the dm-1 type of disks ?

MarkSeger · ‎03-06-2009

disk types - collectl gets all its data from /proc/diskinfo, as do most tools including sar. For now, all I look at are types of: cciss, hd, sd and dm. It would be fairly easy to extend that list.

As for dm, there is a slight trick in that when I display total disk usage not to include the dm numbers.

I have played a tad with virtual environments such as netware, to at least make sure collectl will run in them and it does. But I haven't done anything with virtual disks, at least not yet. If you care to shoot me an email with an extract from /proc/diskstats I might be able to add support and let you try it out for me. ;-)

-mark

Heironimus · ‎03-06-2009

There are a few tools that read sar output and generate better reports for you. One I've used a couple of times is ksar ( http://ksar.atomique.net/ ). Even though it's written in Java and I'm seriously anti-Java, I have to admit that I think it generates pretty graphs.

MarkSeger · ‎03-06-2009

Actually I just became familiar with ksar and have to agree, it is pretty slick. I even wrote a quick script to make collectl data look like sar data and was then able to plot it. The only problem is ksar is to heavily ties into sar data you can only display sar data with it.

On the other hand have you heard of tlviz? It's actually a visual basic tool designed for plotting VMS data but is general enough that with a couple of minor tweaks I was able to convert collectl output to the format it prefers. For those heavily committed to sar I'd suggest converting its output to something tlviz can read and you won't be sorry. It can overlay data from multiple systems on the same plot, generate trend lines and all other kinds of fancy features I haven't seen anywhere else. And yes, it is freeware. Just google it.

-mark

MarkSeger · ‎03-06-2009

I'm happy to report collectl now DOES support xen virtual disks, thanks to some email exchanges and testing with brian. Look for it in the next release which I'm hoping to get out in the next week or two OR if you really want to get your hands on it sooner drop me a line and you can help test it.
-mark

Steven McCoy · ‎03-06-2009

Congrats, that feature may come in handy for us.

I am really enjoying working with collectl, thanks for the tips. I especially love the plot flag (-P) when using gnuplot.

Great tool!

wvsa · ‎03-10-2009

Mark;

Thank you for your input. In reading the thread and being a new user to collectl (looks like a great tool now have it running 20 blade servers) do have a few questions:

1. How can I modify the collectl output to read by tlviz. We use tlviz to read performance info from our EVA's so I'm familar with the tool.

2. How would you use gnuplot to view collectl output?

3. What options would you recomend to use to run collectl. Here is snippet of the config file
DaemonCommands = -f /var/log/collectl -r00:01,14 -m -F60 -s+CDFNYZ. Believe this line tells collectl what to collect.

Thank you for your response

Norm

wvsa · ‎03-10-2009

All;

Thank you for input, looks like the thread was helpful for several of us. Thanks for the ksar post.

MarkSeger · ‎03-10-2009

to answer your questions:

- I wrote a utility to convert collectl plot data to tlviz readable data, but it's not open source! Send me an email and an HP customer I'll give you a copy.

- Using gnuplot for plotting is a little more complicated but it can be done. If you're not a gnuplot user you may have a lot of reading to do OR you could read the data into something like Excel and use its plotting facility, OR I have another non-open source tool I can share that calls gnuplot for you.

- when I run collectl I don't change anything in collectl.conf and just run it as is.

The whole thing about -s can be confusing in that it controls both the collection as well as display, noting collectl collects the same data whether you specific a lower or upper case letter! If you use lowercase c, on playback you only set CPU summary data which is typically enough. You can always play it back again with -sC if you really think you need details. Same goes for all the -sdfn. Make sense? Try this:

collectl -c5 -scdn -f/tmp

and you'll get a file with 5 samples. Now play it back with

collectl -p /tmp/xxx

and you'll see summary data for cdn. Now play it back with

collectl -p /tmp/xxx -sCDN

and you'll see detail. You can even mix 'n match OR try -scdnCDN and get everything.

enjoy
-mark

wvsa · ‎03-11-2009

Hi Mark,

Just sent you a email, not knowing your email address I'm hoping I'm sending it to the right place.

Norm

MarkSeger · ‎03-12-2009

Just thought I'd let you and anyone else watching this note that I just released a new version of collectl on sourceforg. Big news here is I finally cleaned up the NFS reporting section and added NFS V4 support. Also new, but perhaps less interesting is I added support for /proc/buddyinfo which tells you how your memory is fragmented.
-mark

westb · ‎03-20-2009

I tried ksar but it chokes when importing large text files; at about 2 weeks of sar -A data. I think it remains useful if one needs to look at a particular day or even a week.

For my systems, I have a script that runs just before midnight that gets the sar data for that day and keeps appending it to another text file that I was importing into ksar. Worked okay until the file gets too large.

The same script also generates another text file with daily averages for cpu and i/o. This gets imported into Excel and works well. All systems write their sar data to an nfs share which is also a samba share.

The goal here is to be able to generate reports that show performance over a long period of time, say a year or more. The daily average provides a nice summary of things and if one needs to drill into it further then the sar -A text file can be viewed.

ksar is nice for generating reports but in my opinion there is too much info which will generate a lot of questions. I tried adding specific graphs and also tried only generating pdf's for specific metrics but seems only 2 can be used ??

Ideally, something that can auto generate the graphs would be nice, guess I will have to look at gnuplot or something for that.

I'm also going to further explore collectl.

MarkSeger · ‎03-21-2009

Trying to keep/report performance data over a long period of time is something I'm not personally interested in as my (and collectl's) focus is fine grained very accurate data. At the very least I think anyone running sar should change their sampling interval to 10 seconds, otherwise it's not telling you anything useful.

That said, if you want longer term data my assumption is you don't care about accuracy. After all, if you look at a 10 minute or 1 hour sample (which I think you'd have to do it you want a year's worth of data) and see an average network load of 30%, you'd never realize it might have been pegged at 100% for multiple minutes and that you're bandwidth starved. However if that's what you want, maybe you want to load your data into rrd. That's a db technology which ages/aggregates data so that the more recent data is finer grained while the older data is courser. It also has a nice plotting tool - every see ganglia

Of course, my vote would be to use collectl to gather the data rather than sar, but I'm also admittedly a little biased. ;-)

Hope this helps.

-mark

William B. Parkinson · ‎03-22-2009

My name is Bill Parkinson and I have been at this for a long time, see my web site analinux.com. I use sar, iostat, and ps at 20 second intervals. I am starting to look at the work marc has done especially in the ps area. I have done Openvms work for many years and I use csvpng for web or html serving of graphs. I use Tlviz for drilling down on complicated issues. I recently completed graphs on demand from the fancy new admin menu but have not updated the web site to reflect the added feature yet. With menu driven feature all graphs are available within the minute for sar and iostat, and the rule driven graphs are also available. Tar gzip will give you 14 to 1 compression, you can keep reference days, its nice to have a base line. While my web site is still a work in progress, it is already packed with a lot of information. I want to make it shareware but I will do one free evaluation for you after I send you the collection bash script I use. Fill out the contact us section at analinux.com and you'll get one free evaluation. If your really interested read the Case studies section.

MarkSeger · ‎06-10-2009

Just thought I'd let anyone who is still following this thread that there is a new blog called HP Cluster Edge where you can read about many things HP is doing in High Performance Computing. Check it out at http://www.communities.hp.com/online/blogs/hpcclusteredge/default.aspx
-mark

John McNulty_2 · ‎06-11-2009

I use Zabbix. Sample intervals are tunable per item with the default to keep fine grained data for 90 days and course grained data for 1 year. After the 90 days it drops the fine grained data and only retains hourly stats.

It's a sledgehammer to crack a walnut though if you're only interested in a couple of systems, as it's designed to collate data for hundreds and thousands of systems/devices, and alert/trigger on events.

It's I/O stats collection on Linux sucks a bit, but I've worked round that by adding a small script to take named devices from /dev/mapper and then:

- get the major/minor numbers

- grep the major/minor numbers from /sys/block/*/dev to find the dm device name

- process the associated dm data from /proc/diskstats and graph what I'm interested in.

I find that being able to graph user friendly multipath device names like these:

asm-db, asm-dbp1, asm-fra, asm-frap1, home, ocr1, system-oracle, system-root, system-swap, usr-local, vote1

..is a dam sight more meaningful than

dm-0, dm-1, dm-2, dm-3, dm-4, dm-5, dm-6, dm-7, dm-8, dm-9, dm-10.

Not to mention that dm devices are not guaranteed to be the same, and change more often between reboots than they should with an iSCSI cluster, making them useless for collecting trend data.

That was the only thing that stopped me from using collectl ... hint hint ;-). Coming from a tru64 background I'm a big fan of collect.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: How can I gather performance history for redhat linux

How can I gather performance history for redhat linux