Adventures in Natural Science Data Curation
Published: May 14, 2026 by Dr. James A. Hodges
Information takes the shape of its container. In other words, the technologies we use to store information have a significant impact on the character of the information they carry. This has been a refrain in my research since I started comparing natural science data curation efforts in 2022. I’m pleased to report that a new journal article discussing these topics recently came out in print: “Coastal water information orders: Reverse-engineering New York City’s data dashboards,” is out in Social Studies of Science volume 56, issue 2.
The study compares two different data curation efforts from the New York City area aimed at tracking coastal pollution levels so that users can determine whether it is safe to go swimming or fishing in their local waterbodies: one from the city government and one from a nonprofit organization. My coauthor and I examined the web source code and data structures for each initiative using an approach informed by digital forensics, and we turned up some pretty interesting results. Perhaps most notably, we found that the city’s official water quality tracking dashboard appears to have been adapted from a tool originally designed to track restaurant health inspection grades. This suggests something interesting about the primacy of commerce in city government.
I’ve been looking at source code and other computational texts to understand the production and circulation of knowledge for years now, but in the past I largely focused on software and file forensics. This time, I’ve turned my attention towards the structure of scientific datasets. I was inspired partly by my teaching. Since joining San José State University in 2022, I’ve been teaching a master’s-level course on database design that got me thinking about the ways that data structures influence what counts as knowledge. In other words, I’ve been interested in how the contours of data affect the contours of everyday life. The work was impacted by my role at SJSU in other ways too. My coauthor, Rachel Paprocki, was a graduate student in our MLIS program when we wrote the paper together.
I have always considered myself a scholar of digital objects’ value as evidence, construed in the broadest possible sense. The digital objects associated with water quality have proven exceptionally generative. Where else can we see so clearly the propensity for digital systems to discretize and stabilize inherently unwieldy and changing phenomena? Bodies of water are constantly moving and changing, such that any data collected about their composition is always already outdated. I suspect most scholars’ research outputs are similar—constantly evolving, but captured in static representations like an article from time to time. You can read the latest such snapshot from me now in Social Studies of Science: https://doi.org/10.1177/03063127251372629
Comments
Post new comment