1748150 Members
3661 Online
108758 Solutions
New Discussion юеВ

Bash Scripting

 
Frank_A
New Member

Bash Scripting

Hello Folks,

I am newbie in Linux and working as jr. admin
I need some help to write script for SMART monitoring tool.
I have to write a script which will run every morning "smartctl" command on around 500 systems to scan hard drives for errors and mail it to admins. This script should only mail errors and show the server it associated with.
Currenlty we have smartd running on all systems and receive e-mail from each individual system if there is any error. Management want to recevie only one e-mail every morning which will have all disk errors.

I have written a script which scans drives in all systems and save the result in a file.
But I am not sure how to extract the date,
eg:
System Name = ?, Disk=?, Error.

or

if there is any other way to grep for errors which are genereated by SMART monitoring tools.

Thank you so much in advance.

Frank
3 REPLIES 3
Steven Schweda
Honored Contributor

Re: Bash Scripting

> [...] I am not sure how to extract the
> date,

Do you mean "the date" or "the data"?

> I have written a script [...]

With my weak psychic powers, I can't see it
from here.

> [...] and save the result in a file.

I can't see its result file, either, so I
also have no idea how to extract anything
from it. Perhaps if you could post a sample
of the data in the file, you might get some
useful suggestions.
Frank_A
New Member

Re: Bash Scripting

Thanks Steven for looking into it.

I would like to grep for the following lines along with the name of system and drive which is causing a problem.

- SMART overall-health self-assessment test result: PASSED

(If any line showing "FAILING_NOW" eg:
- 190 Airflow_Temperature_Cel 0x0022 041 036 045 Old_age Always FAILING_NOW 59 (255 255 61 58)

- (If value of Offline_Uncorrectable is > 0)
- 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
----------------------------------------
Script

#!/bin/bash
echo
StatusLog=/home/aeronaut/smart/StatusLog
ErrorLog=/home/aeronaut/smart/ErrorLog
DashLine="---------------------------"

if [ -f $StatusLog ]
then
rm $StatusLog
fi

if [ -f $ErrorLog ]
then
rm $ErrorLog
fi

SYSTEMS_LIST=$1

for SYSTEM in `cat $1`
do
SystemName=$SYSTEM
echo $DashLine $SystemName $DashLine >> $StatusLog
Drives=/home/aeronaut/smart/Drives
/usr/bin/ssh -x root@$SYSTEM "awk '{ print \$4 }' /proc/partitions" >> $Drives
for drive in `cat $Drives`
do
if expr length $drive = 3 &> /dev/null
then
echo "Disk /dev/$drive in $SystemName" >> $StatusLog
/usr/bin/ssh -x root@$SYSTEM "smartctl -H -A /dev/$drive" >> $StatusLog
fi
done

rm $Drives
done

#cat $StatusLog | egrep 'FAILING_NOW|Uncorrectable_Sector' >> $ErrorLog
#echo $SystemName >> $ErrorLog
#cat $StatusLog | grep PASSED >> $ErrorLog
#echo $DashLine >> ErrorLog

-------------------------------------------
Ouput of command in a file..

[root@Server smart]# cat StatusLog
---------------------------Hostname---------------------------
Disk /dev/sda in Hostname
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 105 100 006 Pre-fail Always - 93059077
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 30
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 78288284
9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 23298
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 30
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 043 039 045 Old_age Always FAILING_NOW 57 (255 255 58 56)
194 Temperature_Celsius 0x0022 057 061 000 Old_age Always - 57 (0 22 0 0)
195 Hardware_ECC_Recovered 0x001a 058 056 000 Old_age Always - 83729161
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Disk /dev/sdb in Hostname
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 093 078 006 Pre-fail Always - 202368475
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 34
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 5
7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always - 190767494
9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 40480
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 65
187 Reported_Uncorrect 0x0032 028 028 000 Old_age Always - 72
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 041 036 045 Old_age Always FAILING_NOW 59 (255 255 61 58)
194 Temperature_Celsius 0x0022 059 064 000 Old_age Always - 59 (0 28 0 0)
195 Hardware_ECC_Recovered 0x001a 048 045 000 Old_age Always - 1951677
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
Kurt Boyack
Occasional Advisor

Re: Bash Scripting

Something like this might do it:

 

# grep -A20 FAILING_NOW <file> | egrep 'FAILING_NOW|Disk'

190 Airflow_Temperature_Cel 0x0022 043 039 045 Old_age Always FAILING_NOW 57 (255 255 58 56)

Disk /dev/sdb in Hostname1

190 Airflow_Temperature_Cel 0x0022 041 036 045 Old_age Always FAILING_NOW 59 (255 255 61 58)

Disk /dev/sdf in Hostname2

 

Depending on the data, it could pick up extra lines.