As a researcher, once you have satisfied the licence and access conditions, all data and derived data products that you create are downloadable in a variety of formats. If your focus is more on software development, the platform will have multiple endpoints where you can retrieve RDF data through SPARQL queries or Json and XML from a RESTful API.
The DataFrame API is available in Scala, Java, Python, and R. We also use Apache Zeppelin providing an intuitive and developer friendly web-based environment for data ingestion, wrangling, munging, visualization and support.
Yes. The core platform focuses on creation of derived linked data products but approved researchers working on impactful projects will be able to undertake more complex research using a quota of cluster resources in a secure containerised environment. We have invested in generalised GPUs to help enable deep learning on our core Hadoop stack. Our plan over the next 18 months is with Hadoop 3.0 YARN to enable running all types of clusters and mix CPU and GPU intensive processes.
Training will be delivered through multiple channels – webinars and online tutorials will guide you through the core functionality of discovery and linkage. More specialised training for advanced tools such as Spark and Zeppelin will be provided by our Training and Support team.
Being able to cite data and reproduce research is essential to research credibility. Unlike the current UK Data Service system where you can only cite the metadata, each data product that you create on DSaaP is assigned a unique URL, which can be retrieved over the web, providing that you have satisfied any necessary security checks and access conditions first.
The DSaaP infrastructure will be in beta testing early 2019 and in production by the end of 2019, as a requirement of delivering the Smart Meter Research Portal. A number of components will be available for early adopter testing in late 2018. A detailed roadmap with the milestones and deliverables will be available on the site soon.
Data has to be cleaned, transformed, and analyzed to unlock its hidden potential through linking to other data sources. Once ‘tamed’ through organizing and integrating processes, large volumes of unstructured, semi-structured, and structured data are turned into “smart data” that reflect the research priorities of a particular discipline or field. We can then consider linking this data. Smart data inquiries can then be used to provide comprehensive analyses and generate new products and services.
What DSaaP makes possible is uncovering new insights that drive societal and economic impact by linking data that traditionally has been difficult to co-locate. Any data linkages that are performed are strictly and ethically controlled and performed on anonymised data. We have strict data governance and information security management procedures. Any potential disclosure risks are identified and mitigated at all stages of the data lifecycle and new privacy engineering techniques make it infeasible [impossible] to identify individuals from datasets.
Firstly, we take a completely new approach to modelling data using linked data techniques and a scalable big data infrastructure. In this “universal format”, we can manipulate and secure data of all sizes and of all kinds down to the cell level. This unlocks much more powerful opportunities for linkage than are currently possible. We overlay this with a simple drag-and-drop web interface which allows all data holdings of whatever type to be queried in one place. Lastly, the whole data lifecycle is managed in a trustworthy standards based repository, combining the best of a classic data repository with leading-edge tools and data models.
We enable more cross-disciplinary linkages, that is to say, we are not just a social science repository or an energy data repository. We are a national data repository. Real policy and research value comes when we can cross-reference knowledge from different domains. That collective intelligence is what DSaaP enables.
We have fifty years’ experience of being entrusted with data at all levels of sensitivity. While using new technology stacks like Hadoop, Apache Spark and ElasticSearch, DSaaP is architected within an ISO-compliant Trusted Digital Repository Framework and adheres to strict governance frameworks such as Five Safes.
It’s free for researchers