3.5 Documentation

Taking Better Notes

Having sufficient documentation is central to making your data usable and reusable. If you don’t write things down, you’re likely to forget important details over time and not be able to interpret a dataset. This is most apparent for data that needs to be used a year or more after collection, but can also impact the usability of data you acquired last week. In short, you need to know the context of your research data – such as sample information, protocol used, collection method, etc. – in order to use it properly.

All of this context starts with the information you record while collecting data. And for most researchers, this means taking better notes.

Most scientists learn to take good notes in school, but it’s always worth having a refresher on this important skill. Good research notes are following:

Clear and concise
Legible
Well organized
Easy to follow
Reproducible by someone “skilled in the art”
Transparent

Basically, someone should be able pick up your notes and be able to tell what you did without asking you for more information.

The problem a lot of people run into is not recording enough information. If you read laboratory notebook guidelines (which were established to help prove patents), they actually say that you should record any and all information relating to you research in your notebook. That includes research ideas, data, when and where you spoke about your research, references to the literature, etc. The more you record in your notebook, the easier it is to follow your train of thought.

It’s also recommended to employing headers, tables, and any other tool that helps you avoid having a solid block of text. These methods can not only help you better organize your information, but make it easier for you to scan through everything later. And don’t forget to record the units on any measurements!

Overall, there is no silver bullet to make you notes better. Rather, you should focus on taking thorough notes and practice good note taking skills. It also helps to have another person look over your notes and give you feedback for clarity. Use whatever methods work best for you so long as you are taking complete notes.

Adapted from “Taking Better Notes” by Kristin Briney (http://dataabinitio.com/?p=542), CC-BY.

Data Dictionaries

VIDEO: https://www.youtube.com/watch?v=Fe3i9qyqPjo

Best practices say that spreadsheets should contain only one large data table with short variable names at the top of each column, which doesn’t leave room to describe the formatting and meaning of the spreadsheet’s contents. This information is important, especially if you are trying to use someone else’s data, but it honestly doesn’t belong in the spreadsheet.

So how do you give context to a spreadsheet’s contents? The answer is a data dictionary.

So what is a data dictionary? A data dictionary is an external document that gives necessary context to a dataset. Generally, a data dictionary includes an overall description of the data along with more detailed descriptions of each variable, such as:

Variable name
Variable meaning
Variable units
Variable format
Variable coding values and meanings
Known issues with the data (systematic errors, missing values, etc.)
Relationship to other variables
Null value indicator
Anything else someone needs to know to better understand the data

This list represents the types of things you would want to know when faced with an unknown dataset. A data dictionary repeats this list (or a variation of this list) for every variable in the dataset to give a full picture of the data.

Not only is a data dictionary incredibly useful if you’re sharing a dataset, but it’s also useful if you plan to reuse a dataset in the future or you are working with a very large dataset. Basically, if there’s a chance you won’t remember the details about a spreadsheet or never knew them in the first place, a data dictionary is needed.

Adapted from “Data Dictionaries” by Kristin Briney (http://dataabinitio.com/?p=454), CC-BY.

Templates

Templates are a great way to add structure to research notes and make sure that you’ve recorded all of the necessary information. This will help you find information later and ensure that no important details are missing from your notes.

So how do templates work? Basically, you sit down at the start of data collection and make a list of all the information that you have to record each time you acquire a particular dataset. Then you use this as a checklist whenever you collect that type of data. That’s it.

You can use templates as a worksheet or just keep a print out by your computer or in the front of your research notebook, whatever works best for you. Basically, you just want to have the template around to remind you of what to record about your data.

Let’s look at an example. Here’s a template that a spectroscopist may use when recording her data:

Date
Experiment
Scan number
Laser beam powers
Laser beam wavelengths
Sample concentration
Calibration factors, like timing and beam size

Using this list as a template may result in notes like the following:

2010-06-05
UV pump/visible probe transient absorption spectroscopy
Scan #3
5 mW UV, visible beam is too weak to measure accurately
266 nm UV, ~400-1000 nm visible
5 mMol trans-stilbene in hexane
UV beam is 4 microns, visible beam is 3 microns

Remembering to record the necessary details is the biggest benefit of using a template, as this is an easy mistake to make in documentation. Templates can also help you sort through handwritten notes if you always put the same information in the same place on a notebook page. Basically, templates are a way to add consistency to often chaotic research notes.

Adapted from “Templates” by Kristin Briney (http://dataabinitio.com/?p=531), CC-BY.

README.txt Files

README.txt files are one of the best data management tools. The reason is that many of us keep notes separate from our digital data files, so our digital data is not always well documented or understandable at a glance. README.txt files cover this gap and allow you to add notes about the organization and content of your digital files and folders. This helps collaborators and your future-self navigate through your data.

README.txt files originated with computer code, where it is the first file someone should look at in order to understand the code (as implied by the name, README). Being a .txt file makes this information readable on a number of systems because of the simple file type. The simplicity and portability make README’s a great tool to coopt for data management.

It’s strongly recommended to use a README.txt file at the top level of your project folder to explain the purpose of the project, the relevant summary and contact details, and general organization of your files. This is equivalent to using the first page of your laboratory notebook to give a general description of your project.

Here is an example of a top-level README.txt file for an imaginary chemistry project:

Project: Beth’s important chemistry project
Date: June 2013-April 2014
Description: Description of my awesome project here
Funder: Department of Energy, grant no: XXXXXX
Contact: Beth Smith, beth@myemail.com

ORGANIZATION

All files live in the ‘ImportantProject’ folder, with content organized into subfolders as follows:
‘RawData’: All raw data goes into this folder, with subfolders organized by date
‘AnalyzedData’: Data analysis files
‘PaperDrafts’: Draft of paper, including text, figures, outlines, reference library, etc.
‘Documentation’: Scanned copies of my written research notes and other research notes
‘Miscellaneous’: Other information that relates to this project

NAMING

Raw data files will be named as follows:

“YYYYMMDD_experiment_sample_ExpNum” (ex: “20140224_UVVis_KMnO4_2.csv”)

STORAGE

All files will be stored on my computer and backed up daily to the shared department server. I will also keep a backup copy in the cloud.

If you hand someone this project folder, the README.txt contains enough information to understand the project and do basic navigation through the subfolders. Plus, the file tells you where all of the copies of the data live if one should accidentally be lost. While not extensive, this information is invaluable to someone unfamiliar with this work trying to find and use the files, such as a boss or coworker.

Besides having one top-level README.txt file, it’s also good to use these text files throughout your digital file structure whenever you need them. If you cannot tell, at a glance, what all of the files and subfolder contain, you should create a README.txt (and possibly rename your files and folders!).

Here is an example of a low-level README.txt, which documents the differences between several different versions of analyzed dataset:

Description of files in the “Analysis/ReactionTime/KMnO4” folder

KMnO4rxn_v01: Organizing raw data into one spreadsheet
KMnO4rxn_v02: Trying out first-order reaction rate
KMnO4rxn_v03: Trying out second-order reaction rate
KMnO4rxn_v04: Revert back to v02/first-order fitting and refining analysis
KMnO4rxn_FINAL: Final fit and numbers for reaction rate

The graphs corresponding to each file version are in the ‘Graphs’ subfolder, with correspondence explained by the README.txt contained therein.

You can see that README’s don’t have to be large files. Instead, they just need to contain enough information to know what you’re looking at.

README.txt files are ostensibly for other people who might use your data, but they are also useful for you, the data creator, if and when you come back to an older set of data. We tend to forget small details over time and a good README.txt serves as a reminder about those details and an easy way to reacclimate ourselves with our older data.

Adapted from “README.txt” by Kristin Briney (http://dataabinitio.com/?p=378), CC-BY.

Rating:

No votes yet

Annotations