We now live in an ocean of data.

And of course, that is literally true for those of us who study the ocean.

We’ve come a long way from the early days of oceanography, when scientists like Nansen, Ekman, and Bjerknes might collect a few dozen data points while on a ship, or from their calculations, or from a lab experiment, and then painstakingly draw graphs of their data by hand. (Ekman’s classic 1905 paper is a great reminder of what science was like decades before the first computer, and how much awesome stuff they still could do.)

But now, with today’s modern instruments and ocean observatories, we can collect thousands of data points every day from dozens of instruments at the same location or spread across the world. This is both a blessing and a curse. Thanks to these new tools, we can study the ocean in more detail and at larger and longer scales than ever before. But on the down side, there is no way human hands or minds can make sense of all of this data without help. That is why learning how to program is now a skill that all oceanographers need to learn. While most students don’t have to become expert programmers, they do need to learn enough to processes, analyze and visualize the datasets they hope to use in their research.

A Virtual REU

Timeseries plot of ocean and air temperatures at NDBC Station 44025 off the coast of NJ.
This past summer, we put together our first Virtual REU (Research Experience for Undergraduates) in response to the cancellation of many traditional REUs due to the pandemic. Because we couldn’t take our students out to sea, we focused on teaching them how to utilize datasets we already have in hand, like the treasure-trove of data from the OOI. Of course, there’s not much you can do with the raw OOI dataset using a tool like Excel, let alone pencil and paper, so we decided it was important to provide students with a basic primer on oceanographic programming before they dove into their research projects.

Below is the first of 4 Python notebooks I developed this summer to support our students during the 2-week mini-workshop we ran prior to students’ 6-week research experience.

In the end, we only used 2 of the notebooks. (Developing 2x more than I need tends to be my style.) But I hope to share all of these notebooks with our Data Labs community over the next few weeks, in the hopes that you might find them helpful for developing your own classes or courses for introducing basic data processing skills using Python to your students.

Activity 1 – Python Basics & NDBC Weather Data

This first notebook (below) ended up requiring 2 sessions to cover.

In the first session, we highlighted why learning a programming tool like Python is important to becoming an oceanographer. (Here are the slides I used.)

Specifically, we covered:

  • A quick quick introduction to Google Colab
  • The importance of Reproducible Research (check out The Scientific Paper is Obsolete from the Atlantic)
  • How programming notebooks help with reproducible research and collaboration
  • And some Python programming basics.

The second session was far more fun. We focused on the bottom half of the notebook, which demonstrates how, with a few lines of code, students can quickly access and plot data from NDBC. After a quick demo, we broke students up into small groups (using Zoom’s breakout rooms feature) and asked them to make a plot or two to show the full class at the end. A few students had some familiarity with programming, and we made sure they were dispersed throughout the small groups, so each group had a “ringer” to help.

More importantly, we focused on using the NDBC dataset for two key reasons.

  1. NDBC moorings are primarily supported by the National Weather Service, and thus focus on weather measurements like air/water temperatures, barometric pressure and winds that should be familiar with students.
  2. The NDBC data portal, and specifically their DODS interface, makes it easy to access data from hundreds of buoys around the world. This allowed students to choose a research question that was of interest to them, and have plenty of options to choose from.

To my mind, NDBC is the best data center available that a) is easy to access, b) has a wide geographic reach, and c) has datasets that are easy to interpret. While trying to introduce students to programming, data processing and data visualization, I feel it’s better to keep the data as simple as possible to keep the cognitive load down. Plus being able to understand and interpret the results can help students increase their confidence as they build all of these skills.

Teaching is hard enough. Introducing students to programming, data visualization and interpreting messy real-world data requires a lot of flexibility. (And that’s before we even get into the challenges of remote learning.) The NDBC dataset, which we continued to use for a mini-research project as part of the 2-week workshop, made this easier and more fun.

This was my first attempt at teaching all these skills at once, and I learned a lot myself. So while this notebook is far from perfect, I hope you still find it helpful.

This post is part of our 2020 Summer REU Intro to Python series. See also Part 2, Part 3 and Part 4.

Activity 1 - Python Basics & NDBC Weather Data

2020 Data Labs REU

Written by Sage Lichtenwalner, Rutgers University, June 9, 2020

Welcome to Python! In this notebook, we will demonstrate how you can quickly get started programming in Python, using Google's cool Colaboratory platform. Colab is basically a free service that can run Python/Jupyter notebooks in the cloud.

In this notebook, we will demonstrate some of the basics of programming Python. If you want to lean more, there are lots of other resources and training sessions out there, including the official Python Tutorial. But as an oceanographer, you don't really need to know all the ins-and-outs of programming (though it helps), especially when just starting out.

Over the next few sessions we will cover many of the basic recipes you need to:

  • Quickly load some data
  • Make some quick plots, and make them look good
  • Calculate a few basic statistics and averages
  • And save the data to a new file you can use elsewhere.

Getting Started

Jupyter notebooks have two kids of cells. "Markdown" cells, like this one which can contain formatted text, and "Code" cells, which contain the code you will run.

To execute the code in a cell, you can either:

  • click the Play icon on the left
  • type Cmd (or Ctrl) + Enter to run the cell in place
  • or type Shift + Enter to run the cell and move the focus to the next cell.

You can try all these options on our first very elaborate piece of code in the next cell.

After you execute the cell, the result will automatically display underneath the cell.

In [0]:
2+2
In [0]:
print("Hello, world!")
In [0]:
# This is a comment

As we go through the notebooks, you can add your own comments or text blocks to save your notes.

In [0]:
# Your Turn: Create your own print() command here with your name
print()

A note about print()

  • By default, a Colab/Jupyter notebook will print out the output from the last line, so you don't have to specify the print() command.
  • However, if we want to output the results from additional lines (as we do below), we need to use print() on each line.
  • Sometimes, you can suppress the output from the last line by adding a semi-colon ; at the end.
In [0]:
3
4
5
In [0]:
print(3)
print(4)
print(5)

Some Basics

Let's review a few basic features of programming.

First, it's great for math. You can use addition (+), subtraction (-), multiplication (*), division (/) and exponents (**).

In [0]:
# Your Turn: Try some math here
5*2

The order of operations is also important.

In [0]:
print(5 * 2 + 3)
print(5 * (2+3))
print((5 * 2) + 3)

Variables

In [0]:
# We can eailsy assign variables, just like in other languages
x = 4
y = 2.5
In [0]:
# And we can use them in our formulas
print(x + y)
print(x/y)
In [0]:
# What kind of objects are these?
print(type(x))
print(type(y))

Strings

In [0]:
# A string needs to be in quotes (single or double)
z = 'Python is great'
z
In [0]:
# You can't concatenate (add) strings and integers
print( z + x )
In [0]:
# But you can multiply them!
print( z * x )
In [0]:
# If you convert an integer into a string, you can then catenate them
print( z + ' ' + str(x) + ' you!' )
In [0]:
# A better way
print( 'Python is great %s you!' % x )

Fun with Lists

Remember, Python uses 0-based indexes, so to grab the first element in a list you actually use "0". The last element is n-1, or just "-1" for short. In Matlab this would be 1 to n, or 1:end.

In [0]:
my_list = [3, 4, 5, 9, 12, 13]
In [0]:
# The fist item
my_list[0]
In [0]:
# The last item
my_list[-1]
In [0]:
# Extract a subset
my_list[2:5]
In [0]:
# A subset from the end
my_list[-3:]
In [0]:
# Update a value
my_list[3] = 99
my_list
In [0]:
# Warning, Python variables are object references and not copies by default
my_second_list = my_list
print( my_second_list )

my_second_list[0] = 66

print( my_second_list )
print( my_list ) # The first list has been overwritten
In [0]:
# To avoid this, create a copy of the list, which keeps the original intact
my_list = [3, 4, 5, 9, 12]

my_second_list = list(my_list) # You can also use copy.copy() or my_list[:]

my_second_list[0] = 66

print( my_second_list )
print( my_list )

Arrays

Note, a list is not an array by default. But we can turn it into an array using the NumPy library.

NumPy is an essential library for working with scientific data. It provides an array object that is very similar to Matlab's array functionality, allowing you to perform mathematical calculations or run linear algebra routines.

In [0]:
my_list * x
In [0]:
import numpy as np
In [0]:
a = np.array(my_list)
a * x

Note, we won't be explicitly creating NumPy arrays much in this course. But later on, when we load datasets using Pandas or Xarray, the actually arrays under the hood will be numpy arrays.

Dictionaries

These are a great way to stored structured data of different types. You'll often find metadata information inside dictionaries.

In [0]:
my_dict = {'temperature': 21, 'salinity':35, 'sensor':'CTD 23'}
my_dict
In [0]:
# Grab a list of dictionary keys
my_dict.keys()
In [0]:
# Accessing a key/value pair
my_dict['sensor']

Functions, Conditions and Loops

If you're familiar with how to do these in Matlab or R, it's all very similar, just with a different syntax.

Remember, Python uses spaces to group together sub-elements, rather than parentheses, curly braces, or end statements. Traditionally, you can use 2 or 4 spaces to indent lines.

In [0]:
def times_two(num):
  return num * 2;
In [0]:
times_two(3)
In [0]:
def my_name(name='Sage'):
  return name;
In [0]:
my_name()

Here one quick example that demonstrates how to define a function, use a conditional, and iterate over a for loop all at once.

In [0]:
# A more complicated function
def my_func(number):
  print('Running my_func')
  if type(number)==int:
    for i in range(number):
      print(i)
  else:
    print("Not a number")
In [0]:
my_func('Test')
In [0]:
my_func(4)

Fun with NDBC Data

Now that we've covered some basics, let's start having some fun with actual ocean data.

The National Data Buoy Center (NDBC) provides a great dataset to start with. And for this example, we'll use my favorite buoy Station 44025.

NDBC Mid-Atlantic Station Map

To load datasets like this, there are 2 popular libraries we can use.

  • Pandas
    • Great for working with "spreadsheet-like" tables that have headers and rows, like Excel or CSV files
    • Can easily load text or CSV files
  • Xarray
    • Supports multidimensional arrays (e.g. x,y,z,t)
    • Can open NetCDF files or data from Thredds servers which are common in Oceanography
    • If you're using a Thredds server, you don't have to load all the data to use it

NDBC actually makes their data available in a variety of ways. Text files are often more intuitive. However, the NDBC text files require a few hoops to load a use (each file is a separate year, dates are in multiple columns, etc.).

Luckily, NDBC also provides a Thredds server DODS, which we can use to quickly load some data to play with.

In [0]:
import xarray as xr
!pip install netcdf4
In [0]:
data = xr.open_dataset('https://dods.ndbc.noaa.gov/thredds/dodsC/data/stdmet/44025/44025.ncml')
In [0]:
# The Dataset
data
In [0]:
# Let's look at one variable
data.air_temperature
In [0]:
# And one piece of metadata
data.air_temperature.long_name
In [0]:
# Now let's make a quick plot
data.air_temperature.plot();
In [0]:
# Let's subset the data in time
data2 = data.sel(time=slice('2019-01-01','2020-01-01'))
In [0]:
# Let's make that quick plot again
data2.air_temperature.plot();
In [0]:
import matplotlib.pyplot as plt
In [0]:
# We can even plot 2 variables on one graph
data2.air_temperature.plot(label="Air Temperature")
data2.sea_surface_temperature.plot(label="Sea Surface Temperature")
plt.legend();

Tomorrow, we'll delve a lot more into data visualization and many of the other plotting commands you can use. But now, it's your turn to create your own plots.

Try plotting different:

  • Variables (see options above)
  • Time ranges (you will need to reload the dataset)
  • Different stations (you will need to change the dataset URL). Check out the NDBC homepage for available stations

As you create your graphs, try to write figure captions that describe what you think is going on.

In [0]:
# Your Turn: Create some plots 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
2 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

What do you think?