The Mid Atlantic has one of the largest summer to winter changes of sea surface temperature in the world. But you don’t have to take my word for this, we can use real data to see how large the change is!
In last week’s example notebook, we created a dummy dataset to demonstrate how one could calculate a long-term averaged seasonal cycle. This week, let’s replace the dummy data with some real data from NDBC Buoy 44025 in the Mid-Atlantic.
Personally, I love using NDBC buoy data for educational applications. In particular, it has a number of advantages:
- NDBC data is free and (relatively) easy to use. Well, a lot of data is free, but NDBC also provides data in easy-to-use standard formats, like text files, netcdf, and OpenDAP which is perfect for use in Python.
- Datasets from hundreds of buoys and stations around the world are available, which allows students to investigate location related questions.
- The datasets feature a lot of meteorological parameters, which students are generally more familiar with. Familiarity helps when students are also trying to develop programming and data analysis skills.
- But it’s not all meteorological data. You can also find waves, sea surface temperatures, and tides for a lot of stations, and a smaller subset also includes salinity, DO, and pH data. Plus, winds and barometric pressure data are also helpful in identifying storms and understanding current movements, which impact the ocean.
So, let’s dive into this dataset, grab some data to calculate a seasonal sea surface temperature (SST) average. And then just for fun, we’ll also calculate the recent SST anomaly to see where we are today relative to the last 10 years.
Accessing NDBC Data¶
By Sage Lichtenwalner, March 31, 2020
In this notebook, we will demonstrate how to easily retrieve meteorological data from the National Data Buoy Center and calculate a daily average and anomaly.
# Notebook setup
import xarray as xr
!pip install netcdf4
import matplotlib.pyplot as plt
# This makes the plots prettier
import seaborn as sns
sns.set()
sns.set(rc={'figure.figsize':(8,6)})
# This removes an annoying warning
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
Retrieving NDBC Data¶
NDBC's chief mission is to collect weather data to support the National Weather Service and its operational forecast models. NDBC maintains a fleet of buoys and shore stations that collect a variety of atmospheric and oceanographic measurements. In addition, they also serve as a data repository for a number of other partner observing systems, like IOOS. With over 1000 stations all over the world, NDBC is a great resource for those looking to play with some ocean data.
The NDBC website provides some basic displays of recent data. It also includes downloadable text files of archived data by year, which can be fun to play with in Excel (though not if you want to aggregate a number of different years or stations).
Thankfully, NDBC also provides a DODS service (aka THREDDS or OPeNDAP) that makes it easy to access their full archive of netcdf data files, which is perfect for programming.
In general, the Standard Meteorological data files are the best place to start, as they include air and sea surface temperatures, winds, barometric pressure and waves. The Oceanographic data stations are more relevant to oceanographers, but only a few stations are available
How to find the right dataset URL¶
- From the NDBC DODS page, select the link for stdmet
- Click on the station you are interested in. The NDBC homepage has a station map and search feature to help you find the right id.
- Next you will see a list of files for each year the buoy/station was deployed. If you only want one year of data, you can select that year. If you want real-time data, select the 9999 file. If you would like the full aggregated archive, select the .ncml file.
- Click on the OPENDAP link.
- Copy the "Data URL" link, and paste that to your notebook.
For this example, I'll use my favorite buoy 44025. (Doesn't everyone have a favorite buoy?)
# The OPENDAP URL for the station we want
url = 'https://dods.ndbc.noaa.gov/thredds/dodsC/data/stdmet/44025/44025.ncml'
# Open the dataset using xarray, look how simple!
ds = xr.open_dataset(url)
# Quick list of available variaibles
ds.data_vars
# Limit to last decade
ds = ds.sel(time=slice('2010-01-01', '2020-01-01'))
# Quickplot of SST
ds.sea_surface_temperature.plot()
plt.title('NDBC Buoy 44025');
# Calculate and Plot Daily Average
daily_sst = ds.sea_surface_temperature.load().resample(time='1D').mean()
daily_sst.plot()
plt.title('Daily Average SST at NDBC Buoy 44025')
plt.xlabel('')
plt.ylabel('Sea Surface Temperature (C)');
Calculating the Seasonal Cycle¶
Now that we have a decade of data, let's calculate a seasonal cycle which we can then use to calculate a daily anomaly measurement.
While xarray Datasets are great for accessing data, Pandas Dataframes provide a bit more functionality for this, so our first step will be to convert our Dataset to a Dataframe.
# Convert Dataset to Dataframe
df = ds.to_dataframe()
df = df.droplevel(['latitude','longitude']) # Drop extra indices that we don't need
df.head()
# Add a yearday column
df['yearday'] = df.index.dayofyear
# Calculate Annual Cycle
avg_sst = df.sea_surface_temperature.groupby(df.yearday).mean()
# Plot data by Yearday
plt.plot(df.yearday,df.sea_surface_temperature,'.',markersize=1,label='Raw Hourly Data');
avg_sst.plot(linewidth=3,label='10 Year Average')
plt.legend()
plt.xlabel('Day of Year')
plt.ylabel('Sea Surface Temperature (C)')
plt.title('Seasonal Cycle of Sea Surface Temperature at NDBC 44025 from 2010-2019');
plt.savefig('NDBC_44025_Seasonal_SST.png');
Calculating the Anomoly¶
Now that we have calculated the 10-year average, we can use it to calculate a daily anomaly.
# Calculate daily average
df_daily = df.resample('1D').mean()
df_daily['yearday'] = df_daily.index.dayofyear
df_daily.head()
# Calculate SST Anomoly based on the 10-year average
df_daily['sst_climate'] = avg_sst[df_daily.yearday].values
df_daily['sst_anomoly'] = df_daily['sea_surface_temperature'] - df_daily['sst_climate']
# Anomoly Plot
df_daily['sea_surface_temperature'].plot(label='Raw Data')
df_daily['sst_climate'].plot(label='Climatic Prediction')
df_daily['sst_anomoly'].plot(label='Anomoly')
plt.legend(loc='upper left');
plt.xlabel('')
plt.ylabel('Sea Surface Temperature (C)')
plt.title('Sea Surface Temperature and Anomoly at NDBC 44025');
Recent SST Anomoly¶
Finally, let's go back to the beginning, and instead of pulling 10 years of data, we will pull the last year and use the 10-year seasonal cycle to see what the anomaly looks like.
# Open the dataset and subset to the last 3 months
realtime = xr.open_dataset(url)
realtime = realtime.sel(time=slice('2019-01-01', '2020-04-01'))
# Convert to Dataframe
realtime = realtime.to_dataframe()
realtime = realtime.droplevel(['latitude','longitude']) # Drop extra indices
# Calculate daily average
realtime = realtime.resample('1D').mean()
realtime['yearday'] = realtime.index.dayofyear
# Add SST Anomoly based on the 10-year average
realtime['sst_climate'] = avg_sst[realtime.yearday].values
realtime['sst_anomoly'] = realtime['sea_surface_temperature'] - realtime['sst_climate']
# Finally, let's plot it!
fig = plt.figure(constrained_layout=True)
# Rearrange the subplots so we have 1 big and 1 small graph
gs = plt.GridSpec(nrows=3,ncols=1, figure=fig)
ax1 = fig.add_subplot(gs[0:2])
ax2 = fig.add_subplot(gs[2:3])
ax1.plot(realtime['sea_surface_temperature'],linewidth=2,label='Measured Data')
ax1.plot(realtime['sst_climate'],linewidth=2,label='Climatic Average')
ax2.plot(realtime['sst_anomoly'],linewidth=2,label='Anomoly')
ax1.legend(loc='upper left');
ax1.set_ylabel('Sea Surface Temperature (C)')
ax1.set_title('Sea Surface Temperature and Anomoly at NDBC 44025');
ax2.set_ylabel('Temperature (C)')
ax2.set_title('Anomoly')
plt.savefig('NDBC_44025_SST_Anomoly.png');
Based on this, it looks like the ocean in the Mid-Atlantic was quite a bit warmer than average (around 1.5 degrees!) in February and March of 2020. Last year at this time, temperatures were about 0.5 to 1 degree below the recent normal, though there was a lot of variability the rest of the year.