SDS to move from pilot to full service

Article dated: 12 December 2009

Data services are now at a very exciting crossroads. While new techniques such as data merging, mashing, mining and mapping are rapidly expanding the possibilities for innovative research, and e-science Web 2.0 worlds clamour for open data, these same techniques raise new concerns about data protection, since they increase the possibility of identifying individuals. Following some high profile data disclosure scandals, and fearing public and respondent backlash, many data providers are seeking to place even greater restrictions on research access to detailed data. SDS

How can research access to data be retained (and indeed expanded) whilst ensuring data security? This has been the dilemma facing the ESRC as major funders of research resources in the economic and social sciences. It has responded by funding the UK Data Archive to develop a new Secure Data Service (SDS), to offer safe and secure remote access by approved researchers to data hitherto deemed too sensitive, detailed, confidential or potentially disclosive to be made available under standard licensing and dissemination arrangements. Initially funded as a two-year pilot, the service has been in development for a year and the pilot will be launched formally on 14 December 2009 at the Royal Statistical Society in London. In November 2009 the ESRC’s Research Priorities Board approved significant additional funding over three years to develop the pilot into a full service.

In its pilot phase, the SDS has focused on working towards ISO 27001 compliance (in preparation for future certification), developing the secure technology for remote access, and integrating the service into the UK Data Archive family of services, including the ESDS. To use the service, users will have to provide information about their bona fides and their proposed research in order to become ‘Approved / Accredited Researchers‘. They will also have to attend an SDS training session, where they will be introduced to the system, the principles of statistical disclosure control, and to their rights and responsibilities under the law — and the significant penalties for failing to meet them. Once they have completed this process, they will gain direct access to highly detailed, potentially disclosive data that have either never been available before, or have only been available at on-site data enclaves, possibly at considerable distance.

The system is built around Citrix technology, the de facto standard security technology for remote access employed by banking and military sectors. Users will log into the system from their own JANET-connected computer, or for some datasets from a designated secure machine at their home institution, and will be presented with a familiar desktop, including their favourite statistical analysis tools (Stata, SPSS) and office software. Users will perform all their analyses, and will be encouraged to work collaboratively, drafting papers and sharing outputs with other approved project members in shared project areas, all within the safe confines of the secure server. In addition to the secure data, they will be allowed to bring in any data from the standard UK Data Archive collection and, with appropriate vetting, their own data, to use alongside them. All data and working outputs stay on the secure server; final publications can be removed only after careful vetting for statistical disclosure issues.

Data to be included in the pilot launch include fully geographically grid-referenced data from the British Household Panel study. Under the full service (to be launched in 2010), the SDS will acquire the major holdings of the Virtual Microdata Laboratory (VML) of the Office for National Statistics, including highly detailed versions of social surveys already disseminated by ESDS (e.g. the Annual Population Survey, the Labour Force Survey, and the General Household Survey) as well as a large collection of business microdata. In addition, the SDS has been in discussion with all the major ESRC-funded longitudinal studies, and will provide access to linked data (such as the Pupil Level Annual School Census (PLASC)-linked Millennium Cohort Study education data) and other detailed, potentially disclosive data, such as verbatim text responses and qualitative data. SDS is also working closely with its sister service, Administrative Data Liaison Service (ADLS), to encourage the use of administrative data in research, by providing a safe environment for data matching and merging, particularly for those whose home institutions cannot meet the requisite security standards.