Hard drive RAID problem


Senior Member

Time for everyone to put on their thinking caps. I'm stuck. It's strange, I've never been this kind of stuck before...

Here's the short of it (also, sorry for the use of present and past tenses mixed...I'm in the middle of "fixing" so things might not be as described...but the relevant information is there):
My server has 5 hard drives. 2 were strictly for movies, 2 were RAID1, 1 was the OS drive (where HS sits).

RAID 1 is built in Intel RAID. Due to my recent move to Win7, my backup has NOT been running (just have not set it up yet). Why would I need it IMMEDIATLY? Important stuff is on RAID1, everything else is on a nearly new drive (less then 1 year old), S.M.A.R.T. is enabled and no error flags. Two directories are the MOST MOST important. Video of the kids, and pictures of the kids. Everything else...well, important, but don't really care TOO TOO much.

In my experience, RAID cards (even Cheaper ones) tend to put configuration data ON the drives. Good idea, in case you move the drives over to another controller (same brand of course), it just picks up the array as if nothing happened. This built in INTEL one apparently can be overridden by the motherboard.

The week before Father's day, something happened where the BIOS was reset. So, the BIOS decided that all SATA drives would be emulated as separate IDE drives to the OS. Basically, during boot, I saw The INTEL RAID controller STILL saw the drives as RAID (and I saw as the PC was booting that it said so). What I lacked in seeing was the first POST flash, where it configured all drives as IDE drives. NOT a big deal to the SINGLE drives. BIG deal for the RAID array.

So, for that week, until about Thursday, we ran the server. IT would ACCEPT new files, and hold them there, hell, even let you read them (so long as they were in cache). The wife never noticed, as she dumps pictures / movies of the kids, then go about her business. Didn't go back to check anything. I didn't notice until I was working with some video. I wrote a file, then tried to read it back. NO GO. When I realized what happened (through a reboot and comb through the BIOS settings), I changed it back to RAID. After I did this, I lost access to folders and say 20% ish of all the files on the ARRAY.

First step I did was unplug the array. i.e. incurr no MORE damage.
Then bought 2 more 1.5TB drives. Booted a second PC with Acronis, and a blank and one original drive. Did a BINARY image of the drive.

21 hours later, powered down, unplugged the original and inspected. Looked like 80% of the files were there. Most things I noticed that were missing, were new stuff.

Great! In my mind, the OTHER drive was being written to, and this one read from. Explains the missing files, but no errors when being written to.

So, I turned off the system, swapped the blank for the second original. Re-imaged the drive.

Examined. Windows saw "stuff" (under disc manager), but didn't know what that partition was. So, I reset the PC, to allow checkdisk to try to fix things. It spent a good 3 hours going through that drive. Deleting things, recovering things. I felt good! After the PC finished booting, the drive was still showing up in the same way. Something was there, but windows didn't know how to deal with it.
GREAT...wasted 21 hours (as I over wrote the first spare drive).

Re-image the first original.
Setup xcopy to image the drive.
xcopy e:\ f:\ /e /v /c /i /f /g /h /r /y >c:\copylog.txt
(where e is the imaged drive and f is the second spare)

Done. Check, that's where I get the 80% from. I figure I couldn't copy 10% of the files. And there's (guessing) 10% not even showing up.

At this point, I setup check disk on my "image" spare drive before I left for work.
chkdsk e: /f /v /x >c:\chkdsk.txt

Based on that, it looks like the SAME stuff I saw during the boot check disk on the second original drive.

What other tools can I use? I can continue to re-image the originals over and over (21 horus but hell, if I can get my files off). So, I can make mistakes, or just "try" stuff.

My fear is, the MFT is goofed up, so, with that and goofed up files, I won't be able to get anything off. My hope is the drive that doesn't show as anything, will be able to be reconstructed (i.e. I hope the MFT is broke, and that drive holds 100% of my missing files...so how to reconstruct the MFT??).

Last tid-bit, my explanation that there are OLD files that are goofed up (mostly my "fun" stuff and SW tools and such...so I can't even USE my own SW tools to fix this!), the server defragments itself once a night. It appears any file that has been TOUCHED is broke. All the files that have not been touched in ages (up to the day the array got goofed up) seem fine.

Thank you for reading through my VERY verbose message.


P.S. HELP!! I've not been in this kind of postion in a LONG time.

I personally wouldn't be accessing the drives at all unless you have reconfigured the drives in the RAID controller. Any other access method is most likely incompatible with the RAID striping. Your only saviour here might be the fact that it was RAID 1 and used a simple (compatible) low level format as to what W7 is using. Remember that when using a RAID controller, it is a logical partition that is presented to the OS, not a physical partition.

Since you've already made binary images of the drives, I would go back into the RAID config and rebuild the RAID 1 setup (without doing a format). If the drive was not severely modified, you may get most of your data back to being accessible.

Thanks for the reply.

I made the assumption (yeah I know), that since it was RAID1, it was like other RAID1 setups that I've had, where once the drive is divorced from the system (by controller release, or by just removing it), the drive acts as a normal stand alone. Raid0, 5, etc. don't act this way.

As you suggested though, I might be able to put them back in a try to tell it to re-init. the drive array...WITHOUT format. I know the last time I booted the array, the controller did NOT seem to see anything wrong with the drives...which is strange since one doesn't seem to want to read back.

However, as you said, I've got images, so maybe I can try that now, without worry of damaging the array (as I've got backup images).

RAID controller don't always store all the info on the drive. They may have the config info in flash memory. That's why you need to re-create the RAID config.

I've had to do this on servers before where I needed to move a set of drives from one server to another (each with the same RAID controller). When I put the drives in, they were recognized as physical devices, but their actual RAID configuration was not recognized. I then had to set the controller to what the proper config was before the logical drive could be seen. Yeah, it can be a stressfull process, because there is the chance that the data will go "poof" and the controller will create a new RAID container. Fortunately, I've never had that happen.

Edit... hey look at that - I just became an "Advanced Cocooner"!!
Over the last few days (while doing my fuse panel upgrade stuff) bounced my FreeNas box a few times. In the process trashed the two Raid 1 arrays and the other two SATA drives in it. Freenas NAS wouldn't boot.

Last night decided to rebuild it trying to keep the data on the drives intact. It worked.

I re-installed Freenas from scratch. I also kept a configuration backup if I needed to used it (I didn't use it anyways).

Installed each drive again (didn't format them) and created the two Raid1 arrays. All of the drives came up corrupt. The Raid1 wouldn't come up. (all 1TB drives right now- 6 1 TB drives in all). The messages stated something about unmounting the drives, running FSCK on them then remounting them. I did. All the data remained intact and I was able to access the drives once again.

A short step by step

1 - Boot from Freenas Live ISO
2 - Set up IP configuration
3 - Add one drive at a time (making sure that you do not format drives)
4 - Configure 1 Raid 1 set up at a time. (this is where I had to unmount / run FSCK on each of the drives)
5 - enable Raid1 arrays one at a time
6 - create mount points
7 - create shares (same as before) to root partitions and to separate directories.

Next test - network NIC aggregation (teaming - ethernet bonding) wondering what kind of performance improvements I will see.
Next test - network NIC aggregation (teaming - ethernet bonding) wondering what kind of performance improvements I will see.
If your switch doesn't support aggregation, you won't see any performance benefit.