Introduction to Python – Part 1 – OOI Ocean Data Labs

We now live in an ocean of data.

And of course, that is literally true for those of us who study the ocean.

We’ve come a long way from the early days of oceanography, when scientists like Nansen, Ekman, and Bjerknes might collect a few dozen data points while on a ship, or from their calculations, or from a lab experiment, and then painstakingly draw graphs of their data by hand. (Ekman’s classic 1905 paper is a great reminder of what science was like decades before the first computer, and how much awesome stuff they still could do.)

But now, with today’s modern instruments and ocean observatories, we can collect thousands of data points every day from dozens of instruments at the same location or spread across the world. This is both a blessing and a curse. Thanks to these new tools, we can study the ocean in more detail and at larger and longer scales than ever before. But on the down side, there is no way human hands or minds can make sense of all of this data without help. That is why learning how to program is now a skill that all oceanographers need to learn. While most students don’t have to become expert programmers, they do need to learn enough to processes, analyze and visualize the datasets they hope to use in their research.

A Virtual REU

This past summer, we put together our first Virtual REU (Research Experience for Undergraduates) in response to the cancellation of many traditional REUs due to the pandemic. Because we couldn’t take our students out to sea, we focused on teaching them how to utilize datasets we already have in hand, like the treasure-trove of data from the OOI. Of course, there’s not much you can do with the raw OOI dataset using a tool like Excel, let alone pencil and paper, so we decided it was important to provide students with a basic primer on oceanographic programming before they dove into their research projects.

Below is the first of 4 Python notebooks I developed this summer to support our students during the 2-week mini-workshop we ran prior to students’ 6-week research experience.

In the end, we only used 2 of the notebooks. (Developing 2x more than I need tends to be my style.) But I hope to share all of these notebooks with our Data Labs community over the next few weeks, in the hopes that you might find them helpful for developing your own classes or courses for introducing basic data processing skills using Python to your students.

Activity 1 – Python Basics & NDBC Weather Data

This first notebook (below) ended up requiring 2 sessions to cover.

In the first session, we highlighted why learning a programming tool like Python is important to becoming an oceanographer. (Here are the slides I used.)

Specifically, we covered:

A quick quick introduction to Google Colab
The importance of Reproducible Research (check out The Scientific Paper is Obsolete from the Atlantic)
How programming notebooks help with reproducible research and collaboration
And some Python programming basics.

The second session was far more fun. We focused on the bottom half of the notebook, which demonstrates how, with a few lines of code, students can quickly access and plot data from NDBC. After a quick demo, we broke students up into small groups (using Zoom’s breakout rooms feature) and asked them to make a plot or two to show the full class at the end. A few students had some familiarity with programming, and we made sure they were dispersed throughout the small groups, so each group had a “ringer” to help.

More importantly, we focused on using the NDBC dataset for two key reasons.

NDBC moorings are primarily supported by the National Weather Service, and thus focus on weather measurements like air/water temperatures, barometric pressure and winds that should be familiar with students.
The NDBC data portal, and specifically their DODS interface, makes it easy to access data from hundreds of buoys around the world. This allowed students to choose a research question that was of interest to them, and have plenty of options to choose from.

To my mind, NDBC is the best data center available that a) is easy to access, b) has a wide geographic reach, and c) has datasets that are easy to interpret. While trying to introduce students to programming, data processing and data visualization, I feel it’s better to keep the data as simple as possible to keep the cognitive load down. Plus being able to understand and interpret the results can help students increase their confidence as they build all of these skills.

Teaching is hard enough. Introducing students to programming, data visualization and interpreting messy real-world data requires a lot of flexibility. (And that’s before we even get into the challenges of remote learning.) The NDBC dataset, which we continued to use for a mini-research project as part of the 2-week workshop, made this easier and more fun.

This was my first attempt at teaching all these skills at once, and I learned a lot myself. So while this notebook is far from perfect, I hope you still find it helpful.

This post is part of our 2020 Summer REU Intro to Python series. See also Part 2, Part 3 and Part 4.

Check it out on github Last updated: 12/06/2020 18:51:52