The Data-Centric Manifesto: A Paradigm Shift in Development
Data is ubiquitous and integral to various aspects of our lives in the modern digital landscape. However, the current approach to application development often places the application at the center rather than the data. This video titled “Data-Centric Manifesto” argues for a shift in mindset, advocating for a data-centric approach that empowers users rather than application owners.
The Problem with App-Centric Mindset
App-centric mindset in application development. This approach often leads to the creation of “super apps” that aim to do everything and, in the process, hoard all user data. This is problematic because it disempowers users and gives undue power to application owners.
Data-Centricity: A Better Approach
The speaker argues for a data-centric approach, where data is treated as a “first-class citizen.” In this paradigm, data is considered an asset and an economic currency. The focus is on making data
These are the key principles of the data-centric manifesto:
- Data is a key asset of any person, organization, and society.
- Data is self-describing and does not rely on an application for interpretation and meaning.
- Data is expressed in open, non-proprietary formats.
- Access to and security of the data is a responsibility of the enterprise data layer or the personal data vault, and not managed by applications.
- Applications are allowed to visit the data, perform their magic and express the results of their process back into the data layer.
FAIR Data
FAIR — Findable, Accessible, Interoperable, and Reusable. This approach advocates for the use of open standards and protocols, making data self-explainable through semantic layers.
Findable
F1: (Meta)data are assigned a globally unique identifier
- Ensures each dataset can be uniquely distinguished.
- Example: Using a DOI (Digital Object Identifier).
F2: Data are described with rich metadata
- Metadata should be detailed enough for others to find the data.
- Example: Using standardized metadata schemas like Dublin Core.
F3: Metadata include the identifier of the data they describe
- Makes it easier to link the metadata and the data.
- Example: Including the DOI in the metadata description.
F4: (Meta)data are registered in a searchable resource
- Ensures that the data can be found by others.
- Example: Registering the data in a public repository like Zenodo.
Accessible
A1: (Meta)data are retrievable by their identifier
- Ensures that the data can be programmatically accessed.
- Example: Providing an API endpoint for data retrieval.
A1.1: The protocol is open, free, and universally implementable
- Ensures that there are no barriers to access.
- Example: Using HTTP/HTTPS as the protocol.
A1.2: The protocol allows for authentication and authorization
- Ensures that only authorized individuals can access the data.
- Example: Using OAuth for secure access.
A2: Metadata are accessible, even when the data are no longer available
- Ensures that the context of the data is preserved.
- Example: Archiving metadata separately from the data.
Interoperable
I1: (Meta)data use a formal, accessible, shared language
- Ensures that the data can be integrated with other data.
- Example: Using RDF (Resource Description Framework).
I2: (Meta)data use FAIR vocabularies
- Ensures that the terms used are understandable and reusable.
- Example: Using controlled vocabularies or ontologies.
I3: (Meta)data include references to other (meta)data
- Ensures that relationships between datasets are clear.
- Example: Using identifiers to link related datasets.
Reusable
R1: Meta(data) are richly described
- Ensures that the data can be reused effectively.
- Example: Providing detailed methods sections in metadata.
R1.1: (Meta)data have a clear data usage license
- Ensures that others know how they can use the data.
- Example: Using a Creative Commons license.
R1.2: (Meta)data are associated with detailed provenance
- Ensures that the history of the data is clear.
- Example: Providing a version history.
R1.3: (Meta)data meet community standards
- Ensures that the data meets quality standards.
- Example: Following community-specific metadata standards.
User Control and Open Protocols
In a data-centric model, control over data and its access is given back to the user. Applications can request access to user data, generate insights, and then return these insights to the user. This allows for a more dynamic and user-empowering ecosystem where data is not locked within a single application.
A Case Study
Verifiable Credentials & SSI
Let’s use verifiable credentials to demonstrate how a data-centric approach can work. In this model, data is accompanied by a semantic layer that explains how it links with other data, making it understandable for both humans and machines. SSI exchange protocols is open and transparent and allow to build applications on top
NOSTR
Protocol for social media apps and relay-based architecture. All data could be shared and reused by multiple apps and platforms built on top of the protocol. All data is simple JSON payloads that are defined by the protocol. More social media and publishing specific cases.
DWN
More flexible Architecture that separates structured data, data access, and data management with a permission layer controlled by the user. Access and data management based on open protocol and open spec. Even more, users get a tool for designing their own data and permission-driven protocols for the good of all. More generic architecture
Data-Oriented Programming
Data-oriented development, which separates data from behavior. This is in contrast to object-oriented programming, where data and behavior are often mixed, leading to complications like object-relational impedance. By keeping data separate, it becomes easier to query, aggregate, and manipulate it.
Key Concepts in Data-Oriented Programming
Data Layout
One of the primary concerns in DOP is how data is laid out in memory. The idea is to organize data in a way that is cache-friendly, thereby reducing cache misses and improving performance.
Data Transformation
DOP focuses on transforming data from one representation to another. This is often done in bulk operations that can be easily parallelized.
Separation of Data and Code
Unlike OOP, where data and methods are bundled together in classes, DOP keeps them separate. This makes it easier to optimize data storage and access patterns.
Avoiding Indirection
DOP tries to minimize pointer chasing and indirection, which can be detrimental to performance. This is achieved by using contiguous memory structures like arrays.
Immutable Data
DOP often favors immutability, as it makes it easier to reason about the data and enables certain optimizations.
Conclusion
The data-centric approach offers a more equitable and efficient way to handle data in the digital age. It puts the user at the center, giving them control over their data, and promotes interoperability and reusability. As we move forward, adopting a data-centric mindset could very well be the future of application development.