In September 2020, the European Commission adopted the Digital Finance Strategy to support innovation in the European financial sector and build a single market for digital financial services. The EU Digital Finance Platform, a collaborative space that connects innovative financial firms and national supervisors and that also features the new Data Hub, amongst others, is part of this effort.
Data Hub: In fall 2023, the Data Hub was added to the platform. This project, which will complement national innovation hubs and regulatory sandboxes, as well as private-sector initiatives, is certainly a novelty. For the first time, innovative firms will be able to access supervisory data for testing new applications or training artificial intelligence (AI) and machine learning (ML) models.
But given the EU's strict data privacy requirements, how can public sector data be shared with innovators? To ensure compliance with EU privacy requirements, the Data Hub will host synthetic data sets and thus rely on data synthetisation. But...
…what is data synthetisation? Synthetic data generation is a technique to create artificial ("new") data that closely resemble original data, but without exposing sensitive or confidential information. It serves as a substitute for actual data, allowing firms to experiment, test use cases, develop algorithms and perform analyses while keeping data safe and private. Synthetic data generation ensures full anonymisation while preserving the characteristics of the original data. Because of this, synthetic data and original data should deliver very similar results, which makes synthetic data highly relevant for testing.
For the Data Hub, this means that real data will never leave the authorities' premises and no external user will access actual data. Thus, national supervisors can participate in the project while innovators will be able to access meaningful information. Hence we would expect synthetic data to gain increased traction within AI and ML, as it helps train algorithms that require vast amounts of training data, which can be expensive or come with usage restrictions.
Outlook: The Directorate-General for Financial Stability, Financial Services and Capital Markets Union (DG FISMA), the EU Commission's responsible directorate for this project, is engaging in an intense dialogue with European supervisors to bring as many as possible into this initiative. Following a successful synthetic data pilot with the Bank of Spain, the first data sets are expected to become available as early as the beginning of 2024. While the exact types of data that will be available is not yet public, you can expect them to be relevant, as the industry was consulted earlier this year on potential use cases and the type of datasets they would like to access for testing.
 https://digital-finance-platform.ec.europa.eu/data-hub; https://www.youtube.com/watch?v=e6xVgV0G8xY.