StoreVirtual Storage
1753781 Members
7296 Online
108799 Solutions
New Discussion юеВ

How to monitor Thin Provisioned LUNs

 
kghammond
Frequent Advisor

How to monitor Thin Provisioned LUNs

We are starting to deploy our vSphere 4 environment using some thin provisioned VM's and thin provisioned LUNs.

We have alerts setup in vSphere to alert us when VMFS volumes are running out of space.

I have been looking for an alert that will trigger when a thin provisioned LUN starts running out of space, but I have not found that yet.

Does anyone know of a way using CMC or 3rd party tools to trigger email warnings and alerts when a thin provisioned LUN starts to exceed 85% of the provisioned capacity of the LUN?

Thank You,
Kevin
2 REPLIES 2
teledata
Respected Contributor

Re: How to monitor Thin Provisioned LUNs


If you have a Nagios server I've built a script to return volume and cluster utilization performance metrics.

Which then can be graphed, and alerted based on percentage utilization....

I can post the scripts here.. Just have to go find them.
http://www.tdonline.com
teledata
Respected Contributor

Re: How to monitor Thin Provisioned LUNs

Please be kind. This script is still a work in progress and I appreciate any community input. this script could certainly be simplified I suspect.
Appologies for the lack of comprehensive commenting in the shell script. This assumes you have your nagios server running and SNMP (with Lefthand MIBs) configured properly

Nagios Check Command:$USER1$/check_nsmvolume2 $HOSTADDRESS$ $USER7$ $ARG1$ "$ARG2$" "Space Used" $ARG3$ $ARG4$

ARG1 = # (which cluster in the mgmt group, starts with 1)
ARG2 = Name of the volume (this is case sensitive)
ARG3 = warning threshold (% of allocated space that has been provisioned)
ARG4 = critical threshold (% of allocated space that has been provisioned)

Command Line Example: check_nsm_volume2!2!SQL_Backup!95!99

/usr/local/groundwork/nagios/libexec/check_nsmvolume2 10.0.1.250 public 2 "SQL_Backup" "Space Used" 95 99

OK - 509 GB (67% used)| SQL_Backup=509 snapshots=28 total=509


Nagios Performance Graph settings:

Graph Label: Lefthand Volume Utilization
Service: nsm_volumesize_
Use Service as a Regular Expression ON
Host: *
Status Text Parsing Regular Expression:
Use Status Text Parsing instead of Performance Data OFF
RRD Name /usr/local/groundwork/rrd/$HOST$_$SERVICE$.rrd
RRD Create Command $RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr DS:MB:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480
RRD Update Command $RRDTOOL$ update $RRDNAME$ $LASTCHECK$:$VALUE1$ 2>&1
Custom RRDtool Graph Command '/nagios/cgi-bin/label_graph.cgi'
Enable ON



The shell script used on the nagios server (check_nsmvolume2)
on my Nagios (Groundwork) it lives in /usr/local/groundwork/nagios/libexec/check_nsmvolume2


#! /bin/sh
#

STATE_OK=$(expr 0)
STATE_WARNING=$(expr 1)
STATE_CRITICAL=$(expr 2)
STATE_UNKNOWN=$(expr 3)

volumecount=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeCount.0|cut -d" " -f4)
volumecount=$(echo "$volumecount+1" |bc)

findvol=1

while [ "$findvol" -lt "$volumecount" ]

do

volname=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeName.$findvol|cut -d" " -f4|cut -d'"' -f2)
#echo "Checking $volname"
if [ "$volname" = "$4" ] ; then
volinstance=$(echo "$findvol" |bc)
break
# else
# echo "I guess $volname is not $4"
fi
findvol=$(($findvol+1))
done

if [ "$volinstance" = "" ] ; then
echo "problem - volume '$4' is not a volume known to the management group."
exit $STATE_UNKNOWN
fi

volspace=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeProvisionedSpace.$volinstance|cut -d" " -f4)

allocated=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeSize.$volinstance|cut -d" " -f4)

RET=$?
if [[ $RET -ne 0 ]]
then
echo "query problem - No data received from host"
exit $STATE_UNKNOWN
fi

snapcount=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeSnapshotCount.$volinstance|cut -d" " -f4)

repllevel=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeReplicaCount.$volinstance|cut -d" " -f4)
repllevel=$(echo "$repllevel / 1" | bc)
#echo "Got a rep level of $repllevel"

if [ $snapcount -ge 1 ] ; then
#echo "WE GOT $snapcount SNAPSHOTS"
varsnap=0
totalspace=$volspace
while [ "$varsnap" -lt $snapcount ]
do
varsnap=$(($varsnap+1))

snapspace=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeSnapshotProvisionedSpace.$volinstance.$varsnap|cut -d" " -f4)

totalspace=$(expr $totalspace + $snapspace)
volonly=$(echo "$snapspace + $volspace" | bc)
snaponly=$(echo "($totalspace - $snapspace) -volspace" |bc)

done

else
volonly=$(echo "$volspace /1" |bc)
snaponly=$(echo "0" |bc)
totalspace=$(echo "$volspace /1" |bc)
fi

volonly=$(echo "$volonly / $repllevel" | bc)
snaponly=$(echo "$repllevel * $snaponly" |bc)
totalspace=$(echo "$repllevel * $totalspace" |bc)
totalspace=$(echo "$totalspace / $repllevel" |bc)

mbspace=$(echo "$totalspace / 1024" |bc)
gigspace=$(echo "$mbspace / 1024" |bc)
gigspace=$(echo "$gigspace / $repllevel" |bc)

volonly=$(echo "$volonly / 1024" |bc)
volonly=$(echo "$volonly / 1024" |bc)

if [ $snaponly -gt 0 ] ; then
snaponly=$(echo "$snaponly / 1024" |bc)
snaponly=$(echo "$snaponly / 1024" |bc)
snaponly=$(echo "$snaponly/ $repllevel" |bc)
fi

if [ "$allocated" = 0 ] ; then
perct="N/A"
echo "OK - (remote snapshot) $totalspace GB | $volname=$volonly snapshots=$snaponly total=$gigspace"
else
warning=$(echo "scale=2; $6 * .01" | bc)
critical=$(echo "scale=2; $7 * .01" |bc)

warning=$(echo " ($warning * $allocated)/1024" |bc)
critical=$(echo " ($critical * $allocated)/1024" |bc)
allocated=$(echo "$allocated / 1024" |bc)
# allocated=$(echo "$allocated * $repllevel" |bc)
checksize=$(echo "$volonly * 1024" |bc)


perct=$(echo "scale=2; ($volonly*1024) / $allocated" |bc)
perct=$(echo "($perct * 100)/1" |bc)

if [ $checksize -ge $critical ] ; then
echo "CRITICAL - *$gigspace GB ($perct% used)* | $volname=$volonly snapshots=$snaponly total=$gigspace"
exit $STATE_CRITICAL

elif [ $checksize -ge $warning ] ; then
echo "WARNING - *$gigspace GB ($perct% used)* | $volname=$volonly snapshots=$snaponly total=$gigspace"
exit $STATE_WARNING

elif [ $checksize -lt $warning ] ; then
echo "OK - $gigspace GB ($perct% used)| $volname=$volonly snapshots=$snaponly total=$gigspace"
exit $STATE_OK

else
echo "problem - No data received from host"
exit $STATE_UNKNOWN
fi

fi
http://www.tdonline.com