Hotswap a SCSI, SAS, or SATA drive in Linux

by Phil Dufault on August 4, 2008 · 9 comments

in CentOS,Gentoo,Linux,Redhat,Ubuntu

There seems to be not a lot of information on Google about this, thus, I post.

At my work, the majority of our servers have hot-swappable drive bays — however, Linux doesn’t usually automatically notice the drive is gone. Worse, sometimes it doesn’t even notice new drives hooked up.

Now, SCSI and SAS both support hot-plugging on a protocol level, and SATA II does as well. If your chassis is equipped with a hotswap drive bays, then that’s all you really need. I haven’t noticed any problems with SATA I hotswaps, but they appear to be less successful.

A common task I need to do is either:

  • upgrade a drive in a server
  • replace a dying drive with a new one

If the drive is in a RAID, or if it’s a separate sized drive for a disk upgrade, it’s a good idea to trigger the kernel driver to rescan information about the new disk.

In /sys/bus/scsi/devices, you’ll find a number of numbers that’ll correspond to your disk drives:
server devices # ls -1
0:0:0:0@
1:0:0:0@
2:0:0:0@

You can determine more information about the drive by cat’ing it’s model file:
server devices # cat "0:0:0:0/model"
ST3250410AS

Now, to cause the kernel to rescan the drive attached to the port, do this:
echo > "0:0:0:0/rescan"

Check dmesg now:
server devices # dmesg
---snip---
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back

Now, that’s pretty exciting, as this can be used to skip a reboot.  If you switch the drive, and trigger the rescan, it’ll update the drive information including the partition layout.  Make sure you don’t swap out your main system drive, otherwise your computer will freeze, and there will be data loss and possibly corruption.

Sometimes, this strategy fails to work, and I don’t know why (and resort to rebooting.)  I’m currently researching how to do this a bit better, I have a buggy script that I use to improve this method, but it’s not ready for public release yet. Does anyone have a better method?  Please comment!

{ 8 comments… read them below or add one }

Bob Mitchell August 4, 2008 at 2:32 pm

I don’t have a better method right now, but I’m going to note this down as I run a small number of Sun boxen with SAS drives.

None have failed yet, but there has to be a first.

Phil Dufault August 4, 2008 at 2:46 pm

Not sure how well this’ll work on a Sun box, since this is using the Linux-only /sys, but let me know how you fair out.

Cheers

Bob Mitchell August 7, 2008 at 3:13 pm

For reasons I won’t go into the Sun boxes are just as likely to be running Linux. Don’t ask.

Phil Dufault August 7, 2008 at 3:51 pm

Cool, enjoy then Bob. I’ll be release a shell script one day that’ll reprobe the SCSI, SAS, or SATA controller for newly plugged in drives.

One day. :)

Roman Tarnavski August 27, 2008 at 10:36 pm

Am surprised that how often linux is used in server installations, that a drive-replacement requires a reboot. Damn – even windows copes with it fine.

Can’t wait for these scripts, and I hope they get integrated into the likes of Ubuntu quickly.

Phil Dufault August 28, 2008 at 2:30 pm

Depends — some times it’ll handle nicely. Other times, it requires convincing. However, don’t even get me started on an anti-Windows rant. :P

kemcho September 4, 2008 at 1:28 pm

Majority of my servers are SLES, and only had two instances where drive failed. Only once, I had to reboot but that was because a bug in controller card.
Linux Rocks!

Phil Dufault September 4, 2008 at 10:52 pm

And you were running the drives in RAID, right? ;)

Leave a Comment

{ 1 trackback }

Previous post:

Next post: