Behind the scenes at the Archive: the Data Services team

Article dated: 20 September 2011

The Archive has been curating data for more than 40 years. The Data Services team play an integral part in assuring that our digital data are enriched, preserved, and accessible.

The Data Services team are highly specialised data professionals who work at the core of the Archive to validate and enhance ESDS social science data deposits. The aim of the team is to prepare high quality datasets for curation, enabling users to access all of the data and documentation they need to facilitate their research.

the data lifecycle

Here's how it works: Once data are acquired, they are passed to the Data Services team, who are responsible for the Archive's ingest activities. They carry out data validation and quality control and convert the data into a form suitable for both long-term preservation and immediate access. Each data collection is unique and requires specific research and data conditioning so that the resulting preservation and dissemination formats are of the highest order to enable secondary analysts to make informed use of the data.

Once the Submission Information Package (SIP) is ready for ingest, the Data Services team's role in the data lifecycle begins. Data deposits, both quantitative and qualitative, are carefully validated and checked to ensure data integrity, anonymisation and confidentiality. Any anomalies found are resolved in collaboration with the data depositor. Data enrichment such as improved labeling, creating additional metadata, grouping survey variables, creating data lists and transferring to a preferred dissemination format are carried out by the team. They then assign keyword search terms that link to the HASSET thesaurus, and create and update online study descriptions for each dataset.

When checks are complete, data files are migrated into several preferred standard dissemination/preservation formats, depending on their original deposit condition. Some quantitative studies are mounted in Nesstar, the Archive’s premier online data browsing tool. Nesstar allows users to explore a range of social survey data, view variable frequencies and question text, and conduct online tabulations and graphs.

The team also ensure that each study includes sufficient documentation. This may include questionnaires, methodological information, interview schedules and research reports to accompany the data, forming the complete Dissemination Information Package (DIP).

Version control is then applied to each data revision to ensure all information is retained. It is crucial to the Archive to ensure that deposited data are securely preserved, and that persistent data access is maintained, allowing re-use and citation of every major version of the data

When data conditioning and curation are complete, the Archival Information Package (AIP) is submitted to the Archive's preservation system for long-term management. As new data are added – or the required data formats migrate forward in response to changes in the needs of the user community – the files from the preservation system form the basis for the generation and update of studies into new access formats.