- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: Bash Scripting
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-13-2011 05:19 PM
06-13-2011 05:19 PM
Bash Scripting
I am newbie in Linux and working as jr. admin
I need some help to write script for SMART monitoring tool.
I have to write a script which will run every morning "smartctl" command on around 500 systems to scan hard drives for errors and mail it to admins. This script should only mail errors and show the server it associated with.
Currenlty we have smartd running on all systems and receive e-mail from each individual system if there is any error. Management want to recevie only one e-mail every morning which will have all disk errors.
I have written a script which scans drives in all systems and save the result in a file.
But I am not sure how to extract the date,
eg:
System Name = ?, Disk=?, Error.
or
if there is any other way to grep for errors which are genereated by SMART monitoring tools.
Thank you so much in advance.
Frank
- Tags:
- bash
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-13-2011 08:35 PM
06-13-2011 08:35 PM
Re: Bash Scripting
> date,
Do you mean "the date" or "the data"?
> I have written a script [...]
With my weak psychic powers, I can't see it
from here.
> [...] and save the result in a file.
I can't see its result file, either, so I
also have no idea how to extract anything
from it. Perhaps if you could post a sample
of the data in the file, you might get some
useful suggestions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2011 06:55 AM
06-14-2011 06:55 AM
Re: Bash Scripting
I would like to grep for the following lines along with the name of system and drive which is causing a problem.
- SMART overall-health self-assessment test result: PASSED
(If any line showing "FAILING_NOW" eg:
- 190 Airflow_Temperature_Cel 0x0022 041 036 045 Old_age Always FAILING_NOW 59 (255 255 61 58)
- (If value of Offline_Uncorrectable is > 0)
- 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
----------------------------------------
Script
#!/bin/bash
echo
StatusLog=/home/aeronaut/smart/StatusLog
ErrorLog=/home/aeronaut/smart/ErrorLog
DashLine="---------------------------"
if [ -f $StatusLog ]
then
rm $StatusLog
fi
if [ -f $ErrorLog ]
then
rm $ErrorLog
fi
SYSTEMS_LIST=$1
for SYSTEM in `cat $1`
do
SystemName=$SYSTEM
echo $DashLine $SystemName $DashLine >> $StatusLog
Drives=/home/aeronaut/smart/Drives
/usr/bin/ssh -x root@$SYSTEM "awk '{ print \$4 }' /proc/partitions" >> $Drives
for drive in `cat $Drives`
do
if expr length $drive = 3 &> /dev/null
then
echo "Disk /dev/$drive in $SystemName" >> $StatusLog
/usr/bin/ssh -x root@$SYSTEM "smartctl -H -A /dev/$drive" >> $StatusLog
fi
done
rm $Drives
done
#cat $StatusLog | egrep 'FAILING_NOW|Uncorrectable_Sector' >> $ErrorLog
#echo $SystemName >> $ErrorLog
#cat $StatusLog | grep PASSED >> $ErrorLog
#echo $DashLine >> ErrorLog
-------------------------------------------
Ouput of command in a file..
[root@Server smart]# cat StatusLog
---------------------------Hostname---------------------------
Disk /dev/sda in Hostname
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 105 100 006 Pre-fail Always - 93059077
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 30
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 78288284
9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 23298
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 30
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 043 039 045 Old_age Always FAILING_NOW 57 (255 255 58 56)
194 Temperature_Celsius 0x0022 057 061 000 Old_age Always - 57 (0 22 0 0)
195 Hardware_ECC_Recovered 0x001a 058 056 000 Old_age Always - 83729161
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
Disk /dev/sdb in Hostname
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 093 078 006 Pre-fail Always - 202368475
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 34
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 5
7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always - 190767494
9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 40480
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 65
187 Reported_Uncorrect 0x0032 028 028 000 Old_age Always - 72
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 041 036 045 Old_age Always FAILING_NOW 59 (255 255 61 58)
194 Temperature_Celsius 0x0022 059 064 000 Old_age Always - 59 (0 28 0 0)
195 Hardware_ECC_Recovered 0x001a 048 045 000 Old_age Always - 1951677
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2011 01:13 PM
07-07-2011 01:13 PM
Re: Bash Scripting
Something like this might do it:
# grep -A20 FAILING_NOW <file> | egrep 'FAILING_NOW|Disk'
190 Airflow_Temperature_Cel 0x0022 043 039 045 Old_age Always FAILING_NOW 57 (255 255 58 56)
Disk /dev/sdb in Hostname1
190 Airflow_Temperature_Cel 0x0022 041 036 045 Old_age Always FAILING_NOW 59 (255 255 61 58)
Disk /dev/sdf in Hostname2
Depending on the data, it could pick up extra lines.