"The group was split in two. One half was setting web crawlers upon NOAA web pages that could be easily copied and sent to the Internet Archive. The other was working their way through the harder-to-crack data sets—the ones that fuel pages like the EPA’s incredibly detailed interactive map of greenhouse gas emissions, zoomable down to each high-emitting factory and power plant. “In that case, you have to find a back door,” said Michelle Murphy, a technoscience scholar at the University of Toronto.
Murphy had traveled to Philly from Toronto, where another data-rescuing hackathon had taken place a month prior. Murphy brought with her a list of all the data sets that were too tough for the Toronto volunteers to crack before their event ended. “Part of the work is finding where the data set is downloadable—and then sometimes that data set is hooked up to many other data sets,” she said, making a tree-like motion with her hands...
But data, no matter how expertly it is harvested, isn’t useful divorced from its meaning. “It no longer has the beautiful context of being a website, it’s just a data set,” Allen says.
That’s where the librarians came in. In order to be used by future researchers—or possibly used to repopulate the data libraries of a future, more science-friendly administration—the data would have to be untainted by suspicions of meddling. So the data must be meticulously kept under a “secure chain of provenance.” In one corner of the room, volunteers were busy matching data to descriptors like which agency the data came from, when it was retrieved, and who was handling it. Later, they hope, scientists can properly input a finer explanation of what the data actually describes."
No comments:
Post a Comment