
Anomaly Plots – Once you have a long-term timeseries dataset (which NDBC has at many locations) you can calculate a climatology, which is long-term average. Thankfully, there’s a library for that too! Basically they’re a 2-dimensional histogram, common in meteorology, but a bit tricky to create. Wind Roses – If your data is 2-dimenstional, as wind data is, wind roses are a great way to show the directional relationship in your data. It’s important to make sure your data makes sense, without any outliers biasing the averages. Plot the Raw Data! – When calculating averages, it’s always important to plot the raw data as well as a check, so you can see the averages (and standard deviations or other percentiles about them) in context. groupby() commands or box plots you can analyze: How much variability is there? Is there a seasonal trend that is increasing or decreasing? Is the mean or variability dependent on the year, month, or season? This is when things start to get fun. Inter-annual Variation and Variability – Understanding the daily or annual pattern is a great first step, but if you have a longer dataset, you will probably want to investigate how this pattern changes from year to year. This will calculate an annual cycle for one year or many years. groupby() to average your dataset by month or yearday. Annual Cycle by Month or Day – Is there an annual pattern? You can also use. In addition, this cycle may differ over the course of a year, so you may also want to look at these averages by season or month or year as well. However, you need to be careful to look at both the mean and the variability about that mean to see if it’s meaningful. groupby() in pandas to average data by the hour of the day to see if there is a diurnal cycle. Daily Cycle – Many processes repeat regularly over the course of a day. resample() in pandas to calculate hourly, daily or monthly averages (or indeed, any interval you like) to smooth things out.
Running Averages – Often “raw” data is too “noisy” for how you want to use it. Along with the basic statistics, it provides a pictorial representation that is quick to interpret. Histograms – An important first step in understanding the shape of a dataset. Basic Statistics – Including count, mean, std, min, max and percentile calculations, as well as identifying extreme values. Here are a few of the data analysis techniques I highlight in the notebook: #Python dsl strongpassword data analysis code#
Plus, what I think makes this notebook so cool, is that it demonstrates that it doesn’t take much code to make these sorts of graphs, thanks to the awesome pandas library in python. This notebook is not comprehensive (that would require a longer course or a textbook to cover), but you can consider it a hodgepodge sampling of some of the most common techniques one might see for this kind of timeseries dataset.
This could include weather in other areas of the world, or new processes that they are learning about, such as wind/wave correlations, sea breezes (land/air interactions), or heat capacity (air/sea temp relationships).Īs part of our 2020 Virtual REU, I created the following notebook to demonstrate some basic data analysis techniques using a few years of data from NDBC Station 44025. With larger datasets like NDBC, which has stations all over the world, students can compare the patterns they’ve identified and are familiar with, with patterns they may not be as familiar with. With 10-years of data plotted at once, you can quickly see what the mean and variability look like over the course of the year, as well as the impact from the occasional extreme event (read: storm). Here is just one example of showing the annual cycle of Sea Surface Temperature in the Mid Atlantic at NDBC Station 44025 (my favorite station – everyone should have one ). the weather near them), while they are also learning new data analysis techniques and developing their programming skills. That’s why I love the NDBC dataset, because it makes weather data easily accessible. This allows students to visualize data and look for patterns they are hopefully familiar with (i.e. Of course, as oceanographers, weather data is far more relevant to our research goals, but it’s also useful to start with more accessible weather or “ocean weather” related examples, as those will be more familiar with students, before diving into more niche oceanographic datasets. Which is why you’ll often find weather-related data used in data analysis courses.