Wumple.com

2007/01/23

Replacing a failed disk in a Linux software RAID 5 array

Filed under: — Stormwind @ 11:28 pm

TuxI recently had a drive in my Linux software RAID 5 array begin to die.  When I went to install the replacement, I could not find any straightforward instructions on how to put the new drive in the array, so I thought I’d toss what I did here in case others find it useful.

My machine is running Fedora Core 6 using mdadm with a RAID 5 array (so one disk in the array can die without data loss).  For these directions, lets say the RAID array is /dev/md0, the bad disk is /dev/sdc, and the RAID partition on the bad disk was /dev/sdc1.

1. I shut down the machine and removed the bad drive, because my SATA controller and the libata driver for it does not support drive hot-swap.
2. I installed the new drive in the same bay and SATA controller port.
3. I restarted the machine.  The RAID array came up in the degraded condition since it was missing an active drive.
4. I used "/sbin/fdisk /dev/sdb" to look at the partition table of another disk in the array to know what to create on the new drive.  Alternatively, if the old drive is still functional enough its partition table could be used as an example.
5. I used "/sbin/fdisk /dev/sdc" to create the RAID partition on the new drive:

‘n’ to create the new partition.  I created primary partition #1 and sized it to use the whole disk.
‘t’ to change the partition id.  I choose type fd (that is in hexadecimal, aka 0xfd), "Linux raid autodetect".
‘a’ to toggle the bootable flag since the Fedora install set all RAID partitions bootable on my disks during the original install/upgrade process.
‘w’ to write the new partition table and exit fdisk.

6. "/sbin/mdadm /dev/md0 -a /dev/sdc1" to add the new RAID partition on the new disk to the array.

The kernel then rebuilt the disk automatically over the next few hours.  The progress of the rebuild can be checked by "cat /proc/mdstat" or continually via "watch -n .1 cat /proc/mdstat".  "dmesg" will also display a message at the start and the completion of the rebuild.

Useful references:

  1. mdadm: A New Tool For Linux Software RAID Management
  2. mdadm man page
  3. The Software-RAID HOWTO
  4. HOWTO Install on Software RAID (Gentoo)

7 responses to “Replacing a failed disk in a Linux software RAID 5 array”

  1. Julumaniya says:

    Love the advice. Thank you.

  2. Shaun says:

    I had run into the same dearth of information as you had, and finally found your page.  I just used your instructions for rebuilding a RAID5 with LVM on Ubuntu 9.10.
    In case anyone reads that who is also running LVM, there is no need to do anything with the LVM when partitioning the drive or rebuilding the array.  Just follow the instructions as shown.

  3. Shaun says:

    Thank you VERY much for posting this.  It’s helped me out twice now.

  4. Gavin C says:

    Also want to give thanks. My array has disk failures just far enough apart that I can’t recall the exact sequence of repair – this and the link Pauli gave are what I find myself coming back to.
    Thanks for taking the time to give back.
    G

  5. […] found this page describing what I need to do: http://wumple.com/blog/2007/01/23/re…-raid-5-array/ Specifically, he lists the steps as: […]

  6. Richard says:

    Thanks so much for this helpful post. In a clear series of steps, we’ve resurrected my degraded Raid5 array perfectly! Cheers, and also a massive thanks to whoever wrote MDADM – fantastic!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

|| RSS 2.0 || Comments RSS 2.0 || XHTML || Powered by WordPress ||