3.4 Data Quality Control

Data Management: Spreadsheet Best Practices

Spreadsheet Best Practice

VIDEO: https://www.youtube.com/watch?v=f11-0Ce1i3I


With so many researchers using spreadsheets, spreadsheet best practices are important for managing data well. These best practices emphasize computability over human readability (this may differ from the way you are currently using spreadsheets). The reason is that a computable spreadsheet can be easily reused in many different analysis programs. With many different software programs available for analysis, having a spreadsheet that is portable and reusable allows researchers to do more with their data.

To make spreadsheets computable, the first thing to do is streamline them. This means getting rid of extra formatting like highlighted cells, merged cells, and excess font variations. This formatting does not encode information in a computable way and, as with merged cells, can sometimes even hinder computability. If you are using formatting to convey important information, find a different way to add this information to the spreadsheet or add it to the spreadsheet’s documentation.

You should also collapse all of your data down to one large table wherever possible. This maximizes computability. Collapse smaller tables and place charts on a separate page. Spreadsheet pages should only contain data with one row at the top for column labels. You can include some documentation in the spreadsheet itself, but large amounts of documentation should be moved externally, such as in a data dictionary.

Another good practice is to choose good null values. You use a null value where you have no data to report, such as when you could not record a particular measurement for some reason. Null can be represented with a blank space, “null”, “NAN”, etc. The important thing is to choose a representation and be consistent. Also note that null is not the same as zero. Use “0” in your spreadsheet to denote when you took a measurement and that measurement was actually zero.

Finally, it’s a good idea to keep a backup copy of your data in its raw form. That way, if you mess up your analysis, it’s easy to revert back to the original data and start a new analysis. It’s also a good idea to keep a copy of your spreadsheet in .csv format if you plan use the data in multiple software programs.


No votes yet