Teaching Concepts in Data Analysis with Live Coding
Data Labs in the Classroom:
Teaching Tips from the Community
Dr. Tom Connolly, OOI Data Lab Fellow 2020
I am a faculty member at San José State University and I teach at Moss Landing Marine Laboratories (MLML). I am a physical oceanographer, but as a faculty member at a small marine lab, I work with students from a wide range of disciplines in marine science.
In the Spring, I teach a course on Data Analysis Techniques in Marine Science. The students in this course are typically in their first year of the Master’s degree program at MLML, in disciplines that span the breadth of the various labs at MLML, including oceanography and marine biology. The students have widely varying levels of prior experience in statistics and programming, so one of my primary goals is to keep the class engaging for everyone without leaving any student behind.
One of my teaching strategies is to emphasize concepts though practical experience working with real oceanographic data. The practical experience often starts in class during live coding sessions with Python notebooks. My live coding instruction is influenced by Greg Wilson’s Teaching Tech Together book and the Software Carpentry program.
In the past year, I have developed two notebooks using data that show different aspects of the coastal upwelling process. Along the US west coast, upwelling occurs when winds blow from the north. These winds push the surface water offshore, as deeper water moves onshore and upwells towards the coast.
Python Notebook Learning Goals
I had two primary goals for these notebooks. The first was to reinforce concepts presented in lecture on time series analysis.
Another major goal was to demonstrate common techniques for working with vector data, which have both a magnitude and direction. This is motivated by working with undergraduate and graduate researchers, having guided many of them through the same steps.
The two lessons use wind data from a NDBC buoy offshore Monterey Bay, CA and ocean current data from the Ocean Observatories Initiative Endurance Array off the coasts of Washington and Oregon. The notebooks are available online in a Github repository.
- Tip: In addition to viewing notebooks on Github in a static format, you can also click the “Open in Colab” badge at the top of each notebook to open them up in an interactive environment on the cloud.
These two notebooks are designed so that they could be used as standalone tutorials to get started working with NDBC wind data or OOI Acoustic Doppler Current Profiler (ADCP) data.
The notebooks were first used during live coding sessions over Zoom. Before starting, the students download starter code and data from a Github repository. The notebooks contain explanatory text, some pre-written code and prompts for activities.
- Explanatory text is included to allow students to focus on understanding what is happening at each step of the tutorial, without having to switch between writing code and taking notes (although they can expand on the text or write comments if they wish). This makes the process less overwhelming than a blank slate. In an online environment, I also used the text boxes to include background material and images that might have been shown on a separate slide presentation when in person.
- Pre-written code is included to allow students focus solely on the new concepts and their implementation. This might include preliminary steps like importing libraries or loading data. The pre-written code is also sometimes designed to be modified by the students, allowing them to experiment with different parameters or options in an interactive way.
- Exercises are a key part of the notebooks. Regular exercises allow students to check their understanding of the material. These exercises range from quick check-ins, where students answer on a Zoom poll or chat, to longer sets of exercises completed in breakout rooms.
- Accessing custom OOI data through the portal can be daunting for a first-time user. I generated data files with my own account, which the students can then access in two ways: (1) they can access the data remotely with an OpenDAP link that I provide, and (2) I also include a separate data file in the Github repository containing the starter code. The second method was helpful for students with less internet bandwidth. It is important to give them enough time to download the repository (over a break, or starting at the beginning of class).
- The size of the data sets can present technology issues, whether teaching online or in person. Pseudocolor plots of the ADCP data (like in the above figure) can take a long time to generate on personal computers with less processing power. Shifting the computation to the cloud using using Google Colab is one way to avoid this problem. This is the approach I take in classes where the programming itself is not a major focus.
- For courses with a programming focus, the OOI ADCP data is helpful for introducing students to the xarray package in Python. This package is amazing for working with data sets that have multiple dimensions (time and depth in this case).
In the future, I am excited about using the new OOI Data Explorer tool to allow students to quickly build customized data sets.