3.3 Storage and Backup

Data Management: The 3-2-1 Rule

3-2-1 Rule

VIDEO: https://www.youtube.com/watch?v=_F_r56dkq2I

HORROR STORIES:

             

There is a saying about storage that goes “lots of copies keeps stuff safe”. The idea behind the principle is that even if your main storage system fails, you still have access to your data.

If you have very important data, you may want to keep many copies, but most scientists should follow the 3-2-1 Rule and keep three copies of their files. This rule states that you should have 3 copies of your data in 2 locations on more than 1 type of storage media.

The offsite copy is particularly critical. Many people keep their data and a backup copy on-site, but this doesn’t factor in scenarios where the building floods or burns down (as can happen in a chemistry building) or a natural disaster occurs. Storing a copy of your data off-site can make the recovery process easier if everything local is lost.

While the 3-2-1 Rule mainly concerns redundancy, it’s also a recommendation for variety in that data should not all be stored on one type of hardware. Computer hard drives fail, cloud storage can be disrupted, and CDs will go bad over time; each storage type has its own strengths and weakness so using several types of storage spreads your risk around.  So if the first copy your data is on your computer, look for other options for your backups like external hard drives, cloud storage, local server, CDs/DVDs, tape backup, etc. Finally, always keep a local copy of your data if its main storage is in the cloud. Accidents happen, even with well-run cloud storage, so it’s always best to have a copy of your data in your direct control, just in case.

Here’s an example of following the 3-2-1 Rule using resources a research has locally available:

  • a copy on my computer (onsite)
  • a copy backed up weekly to the office shared drive (onsite)
  • a copy backed up automatically to the cloud

The 3-2-1 Rule is simply an interpretation of the old expression, ‘don’t put all of your eggs in one basket.’ This applies not only to the number of copies of your data but also the technology upon which they are stored. With a little bit of planning, it is very easy to ensure that your data are backed up in way that dramatically reduces the risk of total loss.

 

Adapted from “Rule of 3” by Kristin Briney (http://dataabinitio.com/?p=320), CC-BY.

 

Backups

Part of following the 3-2-1 Rule means having backups in place. When looking for good backup options, consider the following:

  • Any backup is better than none
  • Automatic backup is better than manual
  • Your work is only as safe as your backup plan
  • Check your backups periodically

You should check your backups for two reasons. First, you need to know that they are working properly. A backup that is not working is not a backup at all. You should test your backups once or twice a year and any time you make changes to your backup system. If your data are particularly complex to back up or particularly valuable, considering testing your backups more frequently.

The second reason to test your backups is to know how to restore from backup. You don’t want to be learning how to restore from backup when you’re already in a panic over losing the main copy of your data. Knowing how to restore from backup ahead of time will make the data recovery process go much more smoothly.

It’s a small thing to periodically test restore from backup, but it will give you piece of mind that your data are being properly backed up and that you will be able to recover everything if something happens to your main copy.

 

Adapted from “Test Your Backups” by Kristin Briney (http://dataabinitio.com/?p=399), CC-BY, and from the data management guide by UWM Libraries (http://guides.library.uwm.edu/data), CC-BY.

Rating: 
0
No votes yet
Join the conversation.

Comments 3

Ashley Cox (not verified) | Tue, 09/22/2015 - 13:15
I really wish that more people followed the 3-2-1 Rule. Store your data off-site!

Dr. Briney | Tue, 09/22/2015 - 14:14
I agree! I've seen WAY TOO MANY horror stories about students losing copies of their theses and don't want anyone else to fall into the trap. Do realize that off-site storage, particularly cloud storage, also isn't full-proof; companies fold, have technical issues, and lose data. That's why it's so important to keep both on-site (ie. in your direct control) and off-site copies of your data.

Herman Bergwerf | Tue, 09/22/2015 - 17:13
I work at a company who also uses cloud storage and of course data integrity is a very important issue. Some (big) companies (including Google) store tapes so they have a backup in case of emergency. We are quite a small company and we don't do any on-site storage. Instead we use advanced replication technologies which copies our data to multiple data centers. Actually this is also what companies like Google do with their data. If one copy of the data is lost there are still a number of copies left that are actively updating each other and adding new copies if required (so data can be inserted in any copy, in a master/slave system the data is routed to the master and then copied to all slaves, if the master goes down a new one is elected, in a master to master system each copy can add data and send it to the other members of the cluster). The PDB (Protein Data Bank) is also an example of a very important academic dataset which is mirrored by different, in this case, separate databases. These technologies actually make cloud storage extremely reliable. (still, our company almost suffered serious data loss after firmware updates on our hardware, this is why using multiple data centers is very important)

Annotations