Monitoring Hard Disks with SMART on CentOS

In this article i explain how to use smartmontools’ smartctl utility and smartd dæmon to monitor the health of a system’s disks. I have a HP server that come with a hard drive HP seagate DG072BB975 unfortunately that is not yet supported by hddtemp.
The server uses SAS (Serial Attached SCSI) HP drives, so the device is mounted in /dev/cciss/c0d0

First install smartmontools.
$ sudo yum install smartmontools

Looking to our partitions system.
$ cat /proc/partitions
major minor #blocks name

104 0 71652960 cciss/c0d0
104 1 104391 cciss/c0d0p1
104 2 71545477 cciss/c0d0p2
253 0 69500928 dm-0
253 1 2031616 dm-1
That appear that there some cciss (Compaq Smart Array Controller) interface good.

So, the HP server come with 2 disks and by default there are connected to the 0 and 1 channel.
first HDD is on -d cciss,0 and the second on the -d cciss,1

Well, to get smartctl working, you need to specify device/bus type and the disk number, like this:
$ sudo /usr/sbin/smartctl -H -d cciss,0 /dev/cciss/c0d0

smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

SMART Health Status: OK

this output shows the results of the health status inquiry. This is the one-line Executive Summary Report of disk health; the disk shown here has passed. If your disk health status is FAILING, back up your data immediately.
For a full report you can set the option –all like this:
$ sudo /usr/sbin/smartctl –all -d cciss,0 /dev/cciss/c0d0

smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: HP DG072BB975 Version: HPDC
Serial number: 3NP3FWC200009917UFB3
Device type: disk
Transport protocol: SAS
Local Time is: Fri Aug 14 02:30:17 2009 GMT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 26 C
Drive Trip Temperature: 68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 3368064691
Blocks received from initiator = 4000083366
Blocks read from cache and sent to initiator = 1897459483
Number of read and write commands whose size segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 5259.38
number of minutes until next internal SMART test = 46

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 0.000 0
write: 0 0 0 0 0 0.000 0

Non-medium error count: 0
No self-tests have been logged
Long (extended) Self Test duration: 1070 seconds [17.8 minutes]

Configuring smartd:
Edit your smartd.conf file to add the disks you need. Ensure you edit the correct file (/etc/smartd.conf) and add the following lines:
$ cat >> etc/smartd.conf
/dev/cciss/c0d0 -d cciss,0 -H -m root@domain.tld
/dev/cciss/c0d0 -d cciss,1 -H -m root@domain.tld

Replace the email with your own so when a disk fails it will contact you directly.
and start smart daemon.
$ sudo /etc/init.d/smartd start

When you work on a group that runs a large computing cluster with many nodes and many disk drives, the use of SMART become very interesting really when it could help reduce downtime and keep the cluster operating more reliably.

Monitoring Hard Disks with SMART on CentOS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s