The server I am using is new and I added a second drive to it before I brought it online. My feeling was that the /home partition would likely grow very large so why not beef it up by giving it its very own drive? It was a good idea and how was I to know that the drive I purchased would begin to fail in less than 30 days?
The first step in replacing a drive gone bad is to purchase a new drive. Drives are very cheap these days so I always look at the performance rather than the storage capacity. Even though this drive is going into a web server I know that I do not need 1TB of storage.
I settled on a 500GB Seagate drive with a 16MB cache that runs at 7200RPM. This drive set me back about $100 and would have been cheaper if I purchased it online but, I was under pressure so I paid extra for the convenience.
When you have a bad drive it is important to replace it as soon as you can. As long as the drive can be read most of your data should be recoverable. In my case the errors I was getting had to do with writing to the drive not with reading from it.
One of the first things I did was run fsck to check both drives. While the primary drive checked out fine the drive I used for /home was full of errors.
With drive in hand I opened up the case on my server after shutting it down. I pulled the old drive out and mounted the new one in its place.
After rebooting I went into single user mode and ran fdisk to get the disk ready. I set it to use the entire disk and created one slice on it. After fdisk I ran disklabel and set it to be used as /home. I then proceeded to shutdown the server so that I could reconnect the old /home drive.
After rebooting I once again entered single user mode and mounted the old drive as /oldhome. From here it was a simple matter of issuing the following cp command:
cp -r -p /oldhome/* /home
We use -r so that cp will copy directories and we use -p to insure all the permissions and ownerships stay the same. If I had not used -p all the copied files would have become owned by whatever account I used to copy them with. Since I was logged in as root all of those files would have become owned by root. On a webserver this would prevent all those files from being readable by apache, mail, mysql, etc...
After all the files were copied I rebooted the server and made sure everything came up. With everything working I then quickly took the server down to remove the bad drive.
I have had no problems since.