File Organization
Organization helps you find and sort through your data and makes it easier to use your data in the future. Save yourself time in the future by making sure that your files are organized as you create them.
The most important thing for organization is to have a system and use it consistently. This will help you track down files when you need them and not waste time combing through useless information.
Here are several options for organization:
- By project
- By analysis type
- By date
- By researcher
- By thesis chapter
- By site or data source
You can also use these systems in combination.
Basically, figure out a system that works for your data (does not have to be listed here) and stick to it. You should also document your organization system in your lab notebook or another prominent place.
Here are several examples of file organization systems:
Experimental data:
- By experiment
- By file type (raw data, analyzed data, figures, etc.)
Collaborative project data:
Adapted from the data management guide by UWM Libraries (http://guides.library.uwm.edu/data), CC-BY.
File Naming
HORROR STORIES:
A file naming convention add standardization to your files, making them much easier to organize and locate. It will also help your colleagues sort through your files should you fall ill or leave the lab. Your naming scheme should be documented in your laboratory notebook (preferably at the front or back for easy access) or in a prominent place for this reason.
There are conventions available for you to choose from, though you will probably want to customize one for your own purposes. There are a few general tips for creating systems for naming files.
First, pick a group of files that you wish to name consistently and decide on the key information that will distinguish one file from another. Pick 2-3 things that will tell you a file's contents. Examples are:
- Date
- Site
- Analysis
- Sample
- Short description
Once you pick your key pieces of information, arrange them into a pattern using the following rules:
- Files should be named consistently
- Files names should be descriptive but short (<25 characters)
- Use underscores instead of spaces
- Avoid these characters: “ / \ : * ? ‘ < > [ ] & $
You can also add version information, as necessary. Versioning can be imminently helpful when you are analyzing data. If you make a change to your data that you don’t want to keep, it’s simple to go back to an earlier version of the file. The same is true if a file gets corrupted or if you simply want to change your analysis method. The key to making versioning work is being consistent with version names, periodically saving to new versions, and documenting the differences between versions.
- For analyzed data, use version numbers
- Save files often to a new version
- Label the final version FINAL
Using these guidelines, here are some example naming conventions and example file names. The first example, in particular, is useful for organizing .pdf’s of journal articles.
- AuthorLastName-Year-Title
- Smith-2010-ImpactOfStressOnSeaMonkeys
- Hailey-1999-VeryImportantDNAStudy
- YYYYMMDD_site_sampleNumber
- 20140422_PikeLake_03
- 20140424_EastLake_12
- Experiment_Analysis_Version
- KMnO4_FirstOrder_v2
- HCl_ZeroOrder_v5
Adapted from “Starting Small: File Naming Conventions” by Kristin Briney (http://dataabinitio.com/?p=14/), CC-BY, and the data management guide by UWM Libraries (http://guides.library.uwm.edu/data), CC-BY.
Dates
The standard ISO 8601 is incredibly useful for data management. This standard concerns dates, a common type of information used for data and documentation. To understand why this standard is important, consider the following dates:
- March 5, 2014
- 2014-03-05
- 3/5/14
- 05/03/2014
- 5 Mar 2014
All of these represent the same date but are expressed in different formats. The problem is that if someone uses all of these formats in her notes, how will you ever find everything that happened on March 5th? It’s simply too much work to search for all the possible variations. The answer to this problem is ISO 8601.
ISO 8601 dictates that all dates should use the format “YYYYMMDD” or “YYYY-MM-DD”. So the example above becomes “20140305” or “2014-03-05”. This provides you with a consistent format for all of your dates. Such consistency allows you to more easily find and organize your data, the hallmark of good data management.
ISO 8601’s consistency is great but is particularly useful when you use it at the beginning of file names. This is because dates using this standard sort chronologically by year, by month, and then by date. So if you date all of your file names using ISO 8601, you suddenly have a super easy way to find and sort through information.
Adapted from “Dating Your Data (or How I Learned to Stop Worrying and Love the Standard)” by Kristin Briney (http://dataabinitio.com/?p=449), CC-BY.
Comments 6
Data Management
Re: Data Management
Re: Data Management
Version Control
RE: Version Control
RE: Version Control