I love creating data visualizations. I always feel like I’m on an adventure when I take some data, visualize it in a number of different ways, and try to make sense of what secrets are hidden inside. Often times you have a good idea of what you hope to find, but many times you’re surprised by something you didn’t know or hadn’t thought of. And that’s the power of visualizations. They help us see a lot more inside the dataset than just looking at the numbers themselves.
In oceanography, we usually have a lot of data to look at, and it’s rare that we can look at the raw numbers by themselves (though we should always give them a good glance to make sure they’re reasonable). If we have an instrument that takes one measurement every hour, then in one year we’ll already have 8,760 data points to digest. If we then have multiple instruments, or multiple years of data, to say nothing of data measured at higher resolutions, we’ll have far more to deal with. Thus, data visualizations that can show hundreds, or thousands or hundreds of thousands of data points in a way that makes sense, are an essential component of an oceanographer’s toolkit.
Over the next year on this blog, I hope to regularly share cool datasets and visualization techniques that highlight some of the interesting data stories hidden inside the OOI dataset. My goal is not to delve into cutting-edge or esoteric research topics, rather it will be to demystify some of the basic oceanographic processes and analysis techniques undergraduate faculty and students can use with OOI data as they’re just getting started on their adventure into oceanography.
To start of this series, let’s dive into one of the most basic oceanographic datasets: air and seawater temperatures.
The Annual Dance
The image above shows the hourly air and seawater (ocean) temperatures from the Coastal Pioneer Offshore Surface Mooring over all of 2018. The air temperature sensor sits on the top of the buoy, about 3m above the ocean’s surface, while the seawater temperature is recorded from a CTD about 2m below the surface.
At first glance, we can easily see that there is a clear annual cycle in both datasets, which should match up with the experiences most students have had (at least for those in the mid-latitudes, which is where this buoy is located).
Certainly, showing only one of these datasets (either air or seawater) would accomplish this same goal. And if you wanted to see if students were really paying attention, you could also show them the annual cycle from a buoy in the southern hemisphere, to see if they notice the 6-month offset.
But what I love about this dataset, comparing both air and seawater temps, is that there are so many opportunities for a deeper analysis.
Here are just a few examples:
- The peak temperatures are in August. This is 6-8 weeks after the summer solstice when you might intuitively expect the highest temperatures to occur because solar insolation is greatest. (In some places that is true, but not in the Mid Atlantic.)
- Likewise, the minimum temps appear to be well after the winter solstice, though this offset is not as obvious from this dataset. (Thank you winter storms for adding in some noise.)
- Air and seawater temperatures track well over the course of the year. Or to put it another way, in statistical terms, they appear to be well correlated, as one might expect.
- That said, the temps align much more closely in the summer than they do the winter. This provides a great opportunity for students to debate why this might be the case. (Again, thank you storms and ocean currents for adding in some fun.)
- We can also easily see that the overall temperature range for seawater is much less than that of the air. (Here we start to see that heat capacity might play a role.)
- And if students have a sense for variability, they might be able to notice how the variability of the air seems to be larger than that of the seawater over the course of the year. (Again, thanks to storms and heat capacity effects.)
Day by Day
Using some Python magic, we can simplify this dataset a bit by calculating daily averages. The following plot shows the daily means (top graph) and standard deviations (bottom graph) for the temperature timeseries over the course of the year.
By averaging out some of the hour-to-hour variability, as well as the day-night heating-cooling cycles, we can more easily see a stronger correlation between the mean temperatures. We also can see that the temperatures match very well in the summer. However in the winter, the air temperatures at the buoy are generally several degrees colder than the ocean.
Another interesting story is comes out of the timesavers of the standard deviations, which essentially tells us how much variation in temperature there is over the course of a given day, and how that changes over the course of the year.
Here we note that the variability is low in the summer, and generally much higher in the winter and spring. If you are familiar with storms and daily temperature swings (as most students are), this should make sense. In the mid-Atlantic, we typically have more storms in the winter and spring, and we also have large daily temperature swings in the winter (which can range from 20-30 degrees). However, in the summer we go from a hot day to a slightly-less-hot night, especially near the coast, so there isn’t as much change over the course of a day.
But let’s take this analysis one step further…
The Monthly View
Again, using some Python magic, let’s calculate the monthly mean and standard deviation.
Now we can clearly see that the mean temperatures for both the air and the ocean are essentially the same between May and August, while in the winter they are often 5-10 °C different.
Also, like we saw above, we can note that the standard deviation of the temperature is much higher in the winter than in the summer. But we can also note that they are much higher in the air than in the ocean in every month. From this, we can see how the greater heat capacity of water tends to moderate the daily and monthly temperature swings, relative to those we see in the air.
To put it another way, here we can see that the day-to-day or month-to-month “shakiness” that we saw in the first graph (i.e. the short-term variability) is much greater in the air temperature dataset than in the seawater temperature dataset. Notably, we can use the standard deviation as a way to calculate a metric for the visual pattern we can intuitively see in the graph of the raw hourly data.
Such a simple dataset, and yet so much to see.
If you’d like to continue playing with this dataset, you can download the python notebook.