Using OOI Data to Teach Data Analysis and Scientific Computing Skills in Upper-Level Courses
Data Labs in the Classroom:
Teaching Tips from the Community
Dr. Hilary Palevsky, OOI Data Lab Fellow 2020
I am a faculty member in the Department of Earth and Environmental Sciences at Boston College. My research is in marine biogeochemistry, and much of my work over the past five years has used OOI data from the Irminger Sea Array.
I have incorporated OOI data into my teaching in introductory oceanography courses, and have advised undergraduate and graduate students using OOI data in their research.
I found – together with a team of colleagues interested in using OOI data in undergraduate education – that there were as-yet-untapped opportunities to incorporate OOI data into teaching activities for the upper level curriculum, especially in support of data analysis and higher cognitive level (evaluate and create) skills (Greengrove et al. 2020).
The course for which I designed this new activity is titled “Environmental Data Exploration and Analysis” and centers on developing students’ data analysis and scientific computing skills. The course is cross-listed at the advanced undergraduate and graduate levels and students enter with a wide range of prior backgrounds in programming and scientific research, and a wide range of disciplinary interests within the geosciences.
I teach the course in MATLAB, since that is the programming language I am most experienced with and which is most commonly used in my department, but aim to emphasize scientific computing skills that could be translated to any programming language.
I structure the class around three multi-week data-focused labs completed in pair programming teams, an approach frequently used in the tech industry and computer science education, followed by team final research projects at the end of the semester. Students build collaborative skills by pair programming with three different peers in the data labs while maintaining individual accountability for their learning by submitting their own lab writeups. The data labs each build new programming, data analysis, and visualization skills.
The first two labs start with curated previously-published data sets. After my first year teaching the class, I identified a need to develop an additional data lab to introduce skills for working with “messy” raw data and comparing data from multiple sources.
In this third lab, “The Blob Lab,” students investigate a recurring marine heatwave in the North Pacific Ocean using temperature data collected at the OOI Global Station Papa Array in combination with World Ocean Atlas and satellite sea surface temperature data.
The overall course learning goals center on eight transferrable skills: programming in a scripting language, statistical analyses of data, visualizing data, critical consumption of publicly-available data, reading scientific papers, research using scientific data, synthesizing and presenting research findings, and collaborating productively with groups.
The Blob Lab particularly focuses on preparing students to:
- Read in and explore netCDF data files in MATLAB to identify relevant variables and metadata
- Find and download data from online data repositories
- Read documentation to understand how publicly-available data were collected and processed
- Plot and interpret raw data, including identifying and excluding outliers
- Combine and compare data from multiple data sources
- Evaluate the strengths and weaknesses of different data sources that could be used to approach the same question or calculation
Student surveys completed after this data lab in Spring 2021 showed that students overall found the data labs in this course effective at achieving the transferrable skill learning goals, and that the Blob data lab was especially effective at helping students develop skills in critical consumption of publicly available data.
The Blob Data Lab – Overview
The data analysis portion of the lab in divided into two parts, each completed over the course of a week, which the students complete by writing code in MATLAB together with their pair programming partner. Each part of the lab is accompanied by a detailed handout and starter code with step-by-step instructions:
In Part 1, students acquire, extract, and begin working with the data from the OOI Global Station Papa array. I chose to provide netCDF files I had already downloaded from the OOI Data Portal directly to the students. They write code to plot the raw time series of Flanking Mooring A 30 m temperature data from 2013 through the end of 2019, and develop and apply their own filter to remove data during periods of high variability, which occur during periods of strong stratification when the 30 m sensor is no longer within the surface mixed layer.
Part 1 also guides students to directly download satellite SST anomaly data from a NOAA ERDDAP server, providing another source of temperature data for the eastern subarctic North Pacific in the region around the OOI Global Station Papa array.
In Part 2, students a) combine the time series of OOI temperature data extracted in Part 1 with World Ocean Atlas climatology data to calculate the temperature anomaly, and b) compare the OOI-based temperature anomaly data with the satellite SST anomaly data downloaded in Part 1.
After students complete the data analysis and visualization portion of the lab together with their pair programming partners, they each complete an individual writeup of the lab in the style of a scientific paper, following guidelines in this detailed handout.
I find that assessing the individual writeup ensures that both members of a pair programming team are accountable for their work, including not just successfully getting their code to work but also understanding why they conducted each step of the process and what their results mean.
Engage students in driving their own learning
I structure this class explicitly around prioritizing students’ own motivations and goals, which I’ve found is especially effective at helping students who come in with widely varying backgrounds to develop their programming skills, self-confidence, and intrinsic motivation.
Students begin the semester by reflecting on their own goals for the course, drawing on the set of transferrable skills I highlight on the course syllabus, and submit reflection papers along with every major assignment throughout the semester where they track their progress towards those goals.
They also have the opportunity to select from a set of extensions to each lab tailored towards different learning goals. I have found that this enables students with highly disparate backgrounds to all grow and learn relevant new skills throughout the course, as well as take pride in developing skills that they had not initially prioritized or felt confident they could develop.
Creatively address group-work challenges
Many students are initially resistant to group work. I find that including collaboration as an explicit learning goal in the syllabus and spending time at the beginning of the semester emphasizing the value of collaborative skills both within academic science and in the professional world helps overcome that resistance.
I also ask students to fill out a survey self-assessing their skill level and sharing preferences around what they’re looking for in a pair programming partner to help facilitate effective matches. After each data lab, I ask students to reflect on how their pair programming partnership worked, and use that information to help make subsequent pair assignments.
Enhance collaboration using code repositories
To facilitate collaboration among students working in pair programming teams, I have my students create GitHub code repositories shared among each pair (and with me and the course TA). I use GitHub classroom as a means of easily managing the process of providing starter code to all pairs of students in the class. The student handouts provided here include the GitHub classroom links I used with my students in Spring 2021 as an example.
- The instructions for this lab and the starter code are written in MATLAB, but could easily be adapted to the programming language of your choice, especially since the starter code is primarily instructions rather than actual code.
- Similarly, for a less involved activity, other instructors could choose to use only the OOI piece of the activity from Part 1 or make other modifications to the data lab design.
- This lab could also be adapted for courses more focused on oceanography content, adding greater emphasis on the interpretation of marine heatwaves.