A millennial approach to data integration

Tracy Grimes

The millennial generation is often characterized as valuing experiences over material possessions. Minimalism has had a resurgence (hello Marie Kondo) and repurposing things has become a way of life (mason jars have so many uses). We are generally frugal—perhaps due to entering adulthood during a recession—but that is a discussion for a different blog. The point is, as a millennial, I find great joy in repurposing existing stuff and reducing costs.

During my California Sea Grant State Fellowship with the Delta Stewardship Council’s Science Program, I have been working with a group of researchers from the Department of Water Resources, NOAA, Cramer Fish Sciences, and UC Davis to integrate their seemingly disparate Chinook salmon acoustic telemetry datasets into a collective whole. These data were originally collected to measure the survival of juvenile Chinook salmon as they migrate from the Sacramento-San Joaquin River system to the ocean. However, there is an increasing understanding that life-history diversity is a critical component to the resilience of the population and salmon conservation. So how can we use these existing data to measure life-history diversity?

Collaboration is key

As other fellows before me have written, the Delta is a complex system where many scientists continue to work and collect volumes of data. With a collaborative framework in mind, scientists can interact in ways to bring these various datasets together to shed new insight. A couple common practices include data sharing and data reuse.

Data reuse, in this case, means that we used data for a purpose other than it was originally intended. Reusing data is beneficial because:

  • it’s economical,
  • it provides a novel combination of data for new research,
  • it enhances scientific progress, and
  • it offers opportunities for co-authorship.

With these integrated data, we were able to capture all four runs of Chinook salmon that occur in the Sacramento-San Joaquin River system spanning a decade! We have been able to characterize the diversity of routes juvenile salmon take as they migrate downstream as well as the time it takes them to do so. Using a variety of routes and travel speeds allows the population to take advantage of the varied conditions encountered in the river. In other words, not all eggs are in the same basket. And, by reusing these data, we are creating a framework for fisheries managers to repurpose existing data to better describe life-history diversity, an important aspect of salmon conservation.

State Fellow Tracy Grimes

Lessons learned

  1. It takes time to integrate data. Lots of it. Standardizing and cleaning others’ data is no easy feat.
  2. Communication helps. Reusing data that originally had a different purpose is not painless. It pays to be able to ask the data originators questions about an odd looking data point or an unfamiliar abbreviation!
  3. Keep an open mind. Not all data are comparable. You may need to cut data you thought would make a cool comparison because it isn’t consistent with the other datasets for one reason or another.
  4. Lastly, sometimes all you need is in front of you. It’s not always feasible or cost effective to start a new field study but by collaborating with others you may find a way to answer your questions with existing data.

“The secret of happiness, you see, is not found in seeking more, but in developing the capacity to enjoy less.” —Socrates

Written by Tracy Grimes