HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

How to monitor Thin Provisioned LUNs

kghammond
Frequent Advisor

How to monitor Thin Provisioned LUNs

We are starting to deploy our vSphere 4 environment using some thin provisioned VM's and thin provisioned LUNs.

We have alerts setup in vSphere to alert us when VMFS volumes are running out of space.

I have been looking for an alert that will trigger when a thin provisioned LUN starts running out of space, but I have not found that yet.

Does anyone know of a way using CMC or 3rd party tools to trigger email warnings and alerts when a thin provisioned LUN starts to exceed 85% of the provisioned capacity of the LUN?

Thank You,
Kevin
2 REPLIES
teledata
Respected Contributor

Re: How to monitor Thin Provisioned LUNs


If you have a Nagios server I've built a script to return volume and cluster utilization performance metrics.

Which then can be graphed, and alerted based on percentage utilization....

I can post the scripts here.. Just have to go find them.
http://www.tdonline.com
teledata
Respected Contributor

Re: How to monitor Thin Provisioned LUNs

Please be kind. This script is still a work in progress and I appreciate any community input. this script could certainly be simplified I suspect.
Appologies for the lack of comprehensive commenting in the shell script. This assumes you have your nagios server running and SNMP (with Lefthand MIBs) configured properly

Nagios Check Command:$USER1$/check_nsmvolume2 $HOSTADDRESS$ $USER7$ $ARG1$ "$ARG2$" "Space Used" $ARG3$ $ARG4$

ARG1 = # (which cluster in the mgmt group, starts with 1)
ARG2 = Name of the volume (this is case sensitive)
ARG3 = warning threshold (% of allocated space that has been provisioned)
ARG4 = critical threshold (% of allocated space that has been provisioned)

Command Line Example: check_nsm_volume2!2!SQL_Backup!95!99

/usr/local/groundwork/nagios/libexec/check_nsmvolume2 10.0.1.250 public 2 "SQL_Backup" "Space Used" 95 99

OK - 509 GB (67% used)| SQL_Backup=509 snapshots=28 total=509


Nagios Performance Graph settings:

Graph Label: Lefthand Volume Utilization
Service: nsm_volumesize_
Use Service as a Regular Expression ON
Host: *
Status Text Parsing Regular Expression:
Use Status Text Parsing instead of Performance Data OFF
RRD Name /usr/local/groundwork/rrd/$HOST$_$SERVICE$.rrd
RRD Create Command $RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr DS:MB:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480
RRD Update Command $RRDTOOL$ update $RRDNAME$ $LASTCHECK$:$VALUE1$ 2>&1
Custom RRDtool Graph Command '/nagios/cgi-bin/label_graph.cgi'
Enable ON



The shell script used on the nagios server (check_nsmvolume2)
on my Nagios (Groundwork) it lives in /usr/local/groundwork/nagios/libexec/check_nsmvolume2


#! /bin/sh
#

STATE_OK=$(expr 0)
STATE_WARNING=$(expr 1)
STATE_CRITICAL=$(expr 2)
STATE_UNKNOWN=$(expr 3)

volumecount=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeCount.0|cut -d" " -f4)
volumecount=$(echo "$volumecount+1" |bc)

findvol=1

while [ "$findvol" -lt "$volumecount" ]

do

volname=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeName.$findvol|cut -d" " -f4|cut -d'"' -f2)
#echo "Checking $volname"
if [ "$volname" = "$4" ] ; then
volinstance=$(echo "$findvol" |bc)
break
# else
# echo "I guess $volname is not $4"
fi
findvol=$(($findvol+1))
done

if [ "$volinstance" = "" ] ; then
echo "problem - volume '$4' is not a volume known to the management group."
exit $STATE_UNKNOWN
fi

volspace=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeProvisionedSpace.$volinstance|cut -d" " -f4)

allocated=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeSize.$volinstance|cut -d" " -f4)

RET=$?
if [[ $RET -ne 0 ]]
then
echo "query problem - No data received from host"
exit $STATE_UNKNOWN
fi

snapcount=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeSnapshotCount.$volinstance|cut -d" " -f4)

repllevel=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeReplicaCount.$volinstance|cut -d" " -f4)
repllevel=$(echo "$repllevel / 1" | bc)
#echo "Got a rep level of $repllevel"

if [ $snapcount -ge 1 ] ; then
#echo "WE GOT $snapcount SNAPSHOTS"
varsnap=0
totalspace=$volspace
while [ "$varsnap" -lt $snapcount ]
do
varsnap=$(($varsnap+1))

snapspace=$(/usr/local/groundwork/nagios/libexec/check_snmp -P 2c -H $1 -C $2 -o LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB::clusVolumeSnapshotProvisionedSpace.$volinstance.$varsnap|cut -d" " -f4)

totalspace=$(expr $totalspace + $snapspace)
volonly=$(echo "$snapspace + $volspace" | bc)
snaponly=$(echo "($totalspace - $snapspace) -volspace" |bc)

done

else
volonly=$(echo "$volspace /1" |bc)
snaponly=$(echo "0" |bc)
totalspace=$(echo "$volspace /1" |bc)
fi

volonly=$(echo "$volonly / $repllevel" | bc)
snaponly=$(echo "$repllevel * $snaponly" |bc)
totalspace=$(echo "$repllevel * $totalspace" |bc)
totalspace=$(echo "$totalspace / $repllevel" |bc)

mbspace=$(echo "$totalspace / 1024" |bc)
gigspace=$(echo "$mbspace / 1024" |bc)
gigspace=$(echo "$gigspace / $repllevel" |bc)

volonly=$(echo "$volonly / 1024" |bc)
volonly=$(echo "$volonly / 1024" |bc)

if [ $snaponly -gt 0 ] ; then
snaponly=$(echo "$snaponly / 1024" |bc)
snaponly=$(echo "$snaponly / 1024" |bc)
snaponly=$(echo "$snaponly/ $repllevel" |bc)
fi

if [ "$allocated" = 0 ] ; then
perct="N/A"
echo "OK - (remote snapshot) $totalspace GB | $volname=$volonly snapshots=$snaponly total=$gigspace"
else
warning=$(echo "scale=2; $6 * .01" | bc)
critical=$(echo "scale=2; $7 * .01" |bc)

warning=$(echo " ($warning * $allocated)/1024" |bc)
critical=$(echo " ($critical * $allocated)/1024" |bc)
allocated=$(echo "$allocated / 1024" |bc)
# allocated=$(echo "$allocated * $repllevel" |bc)
checksize=$(echo "$volonly * 1024" |bc)


perct=$(echo "scale=2; ($volonly*1024) / $allocated" |bc)
perct=$(echo "($perct * 100)/1" |bc)

if [ $checksize -ge $critical ] ; then
echo "CRITICAL - *$gigspace GB ($perct% used)* | $volname=$volonly snapshots=$snaponly total=$gigspace"
exit $STATE_CRITICAL

elif [ $checksize -ge $warning ] ; then
echo "WARNING - *$gigspace GB ($perct% used)* | $volname=$volonly snapshots=$snaponly total=$gigspace"
exit $STATE_WARNING

elif [ $checksize -lt $warning ] ; then
echo "OK - $gigspace GB ($perct% used)| $volname=$volonly snapshots=$snaponly total=$gigspace"
exit $STATE_OK

else
echo "problem - No data received from host"
exit $STATE_UNKNOWN
fi

fi
http://www.tdonline.com