Data Labs Tutorial at WHOI

Temperature profiles from 2 Pioneer Array Profilers, showing data from May 2019.Today I had the opportunity to virtually present an introduction to the Ocean Data Labs project, along with a short tutorial on working with OOI Profiler data to the WHOI Ocean Informatics Working Group. We had over 40 participants attend, including undergraduate students, faculty and career scientists.

Weather Radar image as Tropical Storm Isaias passes over NJ.(Oh, and Tropical Storm Isaias was barreling down on me here in NJ.  Luckily the power held out at my house… but if I had actually been in the office, I wouldn’t have been so lucky. Go figure.)

It was fun session to put together, though I’m sure if would have been far easier in person. I covered a lot of ground (and in retrospect, probably way too much). But the content was partly driven by the diverse audience we had. Not only did the participants range from undergrads to tenured faculty, but also from python novices to experts, as well as OOI newbies to old-hands.

Here are just a few of the topics we tried to cover…

  • An introduction to the Data Labs project and our Community of Practice
  • A brief overview of the power and benefits of python notebooks for reproducible research
  • A nickel tour of the OOI
  • The Google Colab interface
  • Tricks to loading OOI datasets in Python
    • Including working with THREDDS, NetCDF files, loading single vs. multiple files, and the #fillmismatch hack
  • Advantages and differences to using Pandas vs. Xarray for loading and working with datasets
  • And finally, actually visualizing some fun data from the Pioneer Array Profilers

In the end we didn’t get too far through the notebook I created.  But that’s the beauty of notebooks.  All of the text, code and practice exercises I prepared are all there for participants to take a look at at their leisure.

The reality is, working with large complex datasets from observatories like the OOI, which itself is designed to measure complex processes, and using new tools (like Python) to work with them, is a lot to learn.  In a 1-hour tutorial, you can barely scratch the surface.  To really delve into these topics, you really need a full course (at a minimum) to practice all the skills that help you discover the stories hidden in the data.  One day, I’d love to lead one. If you’re interested in making this happen, please ping me.

But in an hour, you can definitely excite people about the potential of what’s possible.  And that’s always my goal.  So if you attended today, I hope you found some benefit, and are eager to learn more.

For those who are interested in diving in to what was presented today, you can check out the slides I presented (ppt, pdf), as well as the webinar recording, and the full notebook below.  If you would like to open the notebook in Colab to play with it yourself, you can use use this link.

Special thanks to Stace Beaulieu at WHOI for organizing the event and virtually hosting me!