National Data Infrastructure: Data as a strategic asset to the public service
Paul Morrin, Assistant Director General, Central Statistics Office (CSO) talks to eolas Magazine about how the National Data Infrastructure (NDI) informs the formulation of public policy and what role the next generation of open, researcher, and operational data will play in doing so.
Ireland’s NDI, Morrin explains, is built on three identification pillars: businesses, based on the unique business identifier (UBI) as provided by the Revenue Commissioners; household locations, based on Eircodes; and people, based on PPSNs. “Without this structure, we would always be working in silos, even at times within our own organisations,” Morrin says. “Looking at the future of data linkage, we need to have National Data Infrastructure at the core of everything we do.”
Across 45 national data sets reported by the CSO, there are 43 million records per year, many of which are now updated on a real time basis. “65 per cent of new records coming into the public service had an Eircode in 2021,” Morrin says. “When we started measuring this in 2016, it was 15 per cent, so things are moving along rapidly. The PPSNs are high and always have been; it is at around 80 per cent at the moment and we expect it to go higher.”
‘An important enabler for nationwide renewal and transformation’
The CSO’s “number one reason for existence” is the production of statistics which are used as “evidence to inform decision-making in Ireland”. Traditionally, 80 per cent of the statistics produced are mandated by the EU, “but that includes surveys like the Labour Force Survey which are widely used for national purposes”. The proportion of statistics produced for national demands is moving closer to 50/50 in recent years. “We believe that people in Ireland have a right to live in an informed society, which includes how we present the statistics,” he elaborates. “We believe in the intrinsic value of statistics; we are not using them as part of a communications plan and so we try to present them in an unvarnished but clear and understandable way. We think that builds up trust among citizens in statistics produced by the CSO. Our figures must reflect the lived reality of the people of Ireland, based on the highest quality standards.”
Morrin further explains that, through its work in providing both statistics and data services to government and in seconding teams to public and civil service departments and agencies, the CSO’s role is as “an important enabler for delivery of nationwide renewal and transformation”. Renewal and transformation are arrived at, Morrin says, through the facilitation of policy and operational analytics within organisations by the NDI.
“You can integrate data well within your own organisation if you are collecting these standard identifiers and that is the first step of doing data analytics in any organisation, making sure that your customers are clearly identified,” he says. “The next step in the public service is data linkage across different silos where it is legally facilitated. That is going to support much deeper analytics over time, when you can see your data in context and check it against other data sets. It will really support policy development in the public service, because it will allow people to see where their data fits into the overall population, allowing all kinds of analysis on policies.” Common standards are essential to data linkage in this future environment, and CSO is happy to engage with public bodies who wish to implement future proofed standards in their systems and processes.
Analytics “is not a black box” and is “very much a human activity”, Morrin states, meaning that organisations will want expert advice on interpretation of the results. As such, the CSO currently has 40 statisticians on secondment in various government departments and agencies. The traditional role of these statisticians was production of statistical reports for their departments. However, many are now involved in business intelligence, the “use of statistical analysis and data visualisation to provide self-service reports” which makes “data available in a way that is understandable for people who are not experts”.
Recently, data modelling has become the main growth area for seconded statisticians: “This is the development of machine learning models and the use of those models to support the business divisions to meet their goals. This is particularly where you do not want the business division trying to interpret a black box, you need someone explaining this to them.”
Privacy and open data
“We believe in the intrinsic value of statistics; we are not using them as part of a communications plan and so we try to present them in an unvarnished but clear and understandable way… Our figures must reflect the lived reality of the people of Ireland, based on the highest quality standards.” Paul Morrin, Assistant Director General, Central Statistics Office (CSO)
The question, as Morrin states, on anyone’s mind when discussing data use is the respect of privacy while enabling public organisations to become data driven. “What is crucial here is differential privacy,” he says. “You apply the maximum level of privacy to the data while being able to execute the purpose you want to use it for. Open data is useful for monitoring where your customers are in relation to the national picture for different sectors of employment based on survey results. If you want to do policy analysis, you need to go down a level into researcher data, for which we have a secure service, where the only data you can take out is anonymous data from tables. For operations, it is justifiable to exchange ‘raw’ data for defined purposes and that is done through the Data Sharing and Governance Act.”
Ireland is “very successful” with open data, which is a “big focus” for the CSO, Morrin says: “We publish statistics for about 20 government departments and agencies in an open format which meets the requirements of EU Directives. The next generation of open data allows the interrogation of the data, meaning that the tabulations can be ‘sliced and diced’, which will improve the usability of the data.”
From a technology point of view, CSO and many of our seconded statistical teams are moving towards open-source analysis products. “Open-source software and data analytics are not free; this has to be supported from technology teams and involves data engineering as well to ensure the data flows well to statisticians and other analysts, such as economists.”
Morrin says that the CSO has learned from recent crises such as the Russian invasion of Ukraine and Covid-19 and that the CSO has organised a team to deploy at short notice in response, who can also support short-term analytical needs in public bodies. He concludes: “Looking forward, we hope to service the broader public and civil service. Data as a product is our traditional role, but now we are embracing data as a service as well.
“We need to evolve and scale up in a targeted way; that is the goal. Innovation, collaborations, and partnerships are critical. We gain so much from the partnerships we have with various public bodies, and it is the way to go. We need to leverage each other’s strengths on this journey.”