Skip to Main Content

Data Management: Collecting and Storing

This guide serves as a starting point for UNR faculty, students, and staff interested in research data management.

File Naming and Versioning

Versioning is the practice of naming and saving the various versions of your files, so that earlier versions of your work are accessible later on. Save each version as a separate file with a unique name and somehow indicate the version number (e.g., v2 at the end of the file name). And be sure to document any changes made - some data collection tools will do this for you automatically.

It is best to use non-cryptic, intuitive names whenever possible - so DataValidation rather than DatVal or DV. Remember that what may seem like a clear shorthand at the time of an experiment, may seem less so when you look back at files at some future date.

File names should also be extensible. If you plan ahead as to how many files there will be, you can choose the number of digits in any filename element for which you are cycling through numbers, so that they will sort properly.

Bad File Names

RawData1.xlsx

RawData10.xlsx

RawData2.xlsx

Good File Names

RawData01.xlsx

RawData02.xlsx

*

*

RawData10.xlsx

Finally, if you use consistent, documentable names, it will be easy to parse what is in each file, and easier for others to decipher. In this example, the file name contains the experiment type, experiment number, sample number, stain, coordinates of image, and stage of data processing.

AtherRat_012_056_mb_0423_raw.csv

AtherRat = experiment name

012   = experiment number

056   = sample number

mb   = stain used, methylene blue

0423  = 2-digit coordinates of image (4 across, 23 down)

Data Storage Best Practices

It is crucial throughout your research process that you track and document where your data will be stored at different stages. Plan this out ahead of time so you always know what data is located where. NOTE: Be sure to check cloud storage ownership policies; you want to avoid storing data on a cloud server that can claim ownership of your data.

Some data storage best practices include:

  • Make 3 copies (original + external/local + external/remote)
  • Have them geographically distributed (local vs. remote)
  • Use a Hard drive (e.g. Vista backup, Mac Timeline, UNIX rsync) or Tape backup system
  • Cloud Storage - some examples of private sector storage resources include: (Amazon S3, Elephant Drive, Jungle Disk, Mozy, Carbonite)
  • Unencrypted is ideal for storing your data because it will make it most easily read by you and others in the future…but if you do need to encrypt your data because of human subjects then:
  • Keep passwords and keys on paper (2 copies), and in a PGP (pretty good privacy) encrypted digital file
  • Uncompressed is also ideal for storage, but if you need to do so to conserve space, limit compression to your 3rd backup copy

Data Management: The 3-2-1 Rule

Here's another short data management video from the University of Wisconsin-Milwaukee Data Libraries' Services. This one outlines a few data storage tips.

Electronic Lab Notebooks

Electronic lab notebooks (ELNs) are another tool to consider using for proper data documentation. For help choosing an ELN that works best for your project, check out this article published in Nature (August 2018) as well as some of the resources below.