My computer is dead. Windows refuse to restart. The computer keep rebooting itself on the XP logo screen. I was horrified, worrying losing all my data. Hardware failure is not a big deal, you can always replace the broken parts. However the data inside the computer is irreplaceable. Luckily all my hard drives are intact, only the Windows itself is corrupted. It will only take me a few days to re-install Windows and all my usual programs. However, I feel a bit uneasy to format my C: drive. I want to keep all the data in case there is something important. So I have to buy an extra hard disk to copy over the data.
Since I am already buying more hard disk, why don’t I fix it once for all, so that I can have a peace of mind. I upgraded the motherboard to one that support RAID and bought more hard disk space. RAID stands for redundant arrays of inexpensive disks. The idea is having two hard drive running in parallel mirroring each other. In case of a disk failure, you still have a complete set of data. It is a hardware solution, works much better than backup software. Now I know my photos, my personal records and my mp3 collections of every Chinese CD released in the past 20 years are safe inside the hard disks, which have over 2TB capacity in total.
Your data are safe as long as your File System’s data structure is not corrupted, which isn’t protected by RAID.
What if the hard drive fail physically?
Even the file system data structure is corrupted, it is not the end of the world. The data are still store as 0 and 1 on the magnetic disk. There are software tools to salvage what’s left on the disk. I had done it once a long time ago.
However if the hard drive fail, say the motor is dead or the reading head is broken, then you are really doomed.
With today’s HD reaching TB range, when you have a corrupted file system meta data, the process of locating and identifying those 1000s of files on your corrupted file system is not only tedious, it’s also very very time consuming. With a broken motor/read head, you can spend $ and get it fixed by a data recovery company. Time is harder to come by then $. You can give up nice dinner to save $, it’s much harder to find, say, 3 days of concentrated free time. (Even if you do have the time, you’d probably much rather do something else then sit in front of computer to identify files.)
Don’t get me wrong, having a mirroring RAID is a good step at ensuring your data’s safety, just don’t be over confident about it. If the data is important enough, make sure you have more then 1 copy of it.
Currently there is no really good automatic way of dealing with this. I have seen a few products that may help.
1) Drobo. It’s a USB storage server that automatically take care of mirroring data on different drive. But I’m not sure how it handle the meta data, so I am not sure if it’s a safe enough sol’n.
2) Window Home Server. It suppose to let you tag files as important and it will automatically mirror it to different physical HDs. Again, not sure how meta data are handled.
Otherwise, you can always have a 2nd computer and rsync them. Amazon’s S3 is another interesting option, which give you further improvement of being distributed geographically.
I am very interested in the field of data storage/archival and safety of such data, as I take a lots of pictures and has a pretty big storage need. And I’d interested in doing so without breaking the bank. 🙂
Mm… here is the question, is it safer to have two hard drive running on RAID 1 or copy the file from one drive to another every night?
There are many software to auto backup files from one drive to another drive. Actually I have two computers running side by side, I am copying the data from one computer to another.
Backup in a different geographical location is a bit tricky, where can I find the space and bandwidth to upload my backup?
2 HDs running RAID 1 means, for every write, you immediately get 2 copies. The good thing is, your got 2 copies right the way. However, if the write is bad (corrupted data), you have 2 copies of garbage.
Have 2 independent HDs, which you manually copy over every night, means you have a “delay” copy. Good thing is, if one of them is corrupted due to bad write, the other is likely to be ok. But there is a delay of up to 24hrs, so you may lose 1 day of change.
Amazon S3 is a pretty interesting distributed online storage cloud that’s spread geographically. But, it cost money. 🙂
I’m supporting this idea all the way! I can not imagine who would disagree with it. On the whole – make posts like this more often.