Evaluating the Metadata of NYC's Open Data Portal

Purpose and Background

This project is an evaluation of the metadata of NYC Open Datasets. Government data portals are intended to expand public access to information by giving constituents access to data, but is the data that is published contextualized well enough to enable non-expert use?

Data

The data used for this evaluation was sourced by querying the Socrata Discovery API

Results

Most of the datasets on NYCOpenData fall into the education category

The datasets in the education category are also the least well-described - on average, datasets in the edcuation category have only 1% of their variables described in the human-readable metadata. This means it might be difficult for users of the data portal to contextualize these datasets, or understand how they were collected and how they should and shouldn’t be interpreted.

Most of the datasets in the Education category come from the Department of Education, the agency that also drives the low documentation rate of these datasets.

Most of the DOE’s datasets have none of their variables described in the portal.

Finally, there is no correlation between how well-documented a dataset is and the number of times it gets downloaded. It seems that users don’t necessarily prefer to work with datasets that are well described, even though variable descriptions provide important context for working with data.

lubov mckone
lubov mckone
mlis candidate