The Data-Centric Manifesto: A Paradigm Shift in Development

Volodymyr Pavlyshyn
5 min readOct 9, 2023

Data is ubiquitous and integral to various aspects of our lives in the modern digital landscape. However, the current approach to application development often places the application at the center rather than the data. This video titled “Data-Centric Manifesto” argues for a shift in mindset, advocating for a data-centric approach that empowers users rather than application owners.

The Problem with App-Centric Mindset

App-centric mindset in application development. This approach often leads to the creation of “super apps” that aim to do everything and, in the process, hoard all user data. This is problematic because it disempowers users and gives undue power to application owners.

Data-Centricity: A Better Approach

The speaker argues for a data-centric approach, where data is treated as a “first-class citizen.” In this paradigm, data is considered an asset and an economic currency. The focus is on making data

These are the key principles of the data-centric manifesto:

  • Data is a key asset of any person, organization, and society.
  • Data is self-describing and does not rely on an application for interpretation and meaning.
  • Data is expressed in open, non-proprietary formats.
  • Access to and security of the data is a responsibility of the enterprise data layer or the personal data vault, and not managed by applications.
  • Applications are allowed to visit the data, perform their magic and express the results of their process back into the data layer.

FAIR Data

FAIR — Findable, Accessible, Interoperable, and Reusable. This approach advocates for the use of open standards and protocols, making data self-explainable through semantic layers.

Findable

F1: (Meta)data are assigned a globally unique identifier

  • Ensures each dataset can be uniquely distinguished.
  • Example: Using a DOI (Digital Object Identifier).

F2: Data are described with rich metadata

  • Metadata should be detailed enough for others to find the data.
  • Example: Using standardized metadata schemas like Dublin Core.

F3: Metadata include the identifier of the data they describe

  • Makes it easier to link the metadata and the data.
  • Example: Including the DOI in the metadata description.

F4: (Meta)data are registered in a searchable resource

  • Ensures that the data can be found by others.
  • Example: Registering the data in a public repository like Zenodo.

Accessible

A1: (Meta)data are retrievable by their identifier

  • Ensures that the data can be programmatically accessed.
  • Example: Providing an API endpoint for data retrieval.

A1.1: The protocol is open, free, and universally implementable

  • Ensures that there are no barriers to access.
  • Example: Using HTTP/HTTPS as the protocol.

A1.2: The protocol allows for authentication and authorization

  • Ensures that only authorized individuals can access the data.
  • Example: Using OAuth for secure access.

A2: Metadata are accessible, even when the data are no longer available

  • Ensures that the context of the data is preserved.
  • Example: Archiving metadata separately from the data.

Interoperable

I1: (Meta)data use a formal, accessible, shared language

  • Ensures that the data can be integrated with other data.
  • Example: Using RDF (Resource Description Framework).

I2: (Meta)data use FAIR vocabularies

  • Ensures that the terms used are understandable and reusable.
  • Example: Using controlled vocabularies or ontologies.

I3: (Meta)data include references to other (meta)data

  • Ensures that relationships between datasets are clear.
  • Example: Using identifiers to link related datasets.

Reusable

R1: Meta(data) are richly described

  • Ensures that the data can be reused effectively.
  • Example: Providing detailed methods sections in metadata.

R1.1: (Meta)data have a clear data usage license

  • Ensures that others know how they can use the data.
  • Example: Using a Creative Commons license.

R1.2: (Meta)data are associated with detailed provenance

  • Ensures that the history of the data is clear.
  • Example: Providing a version history.

R1.3: (Meta)data meet community standards

  • Ensures that the data meets quality standards.
  • Example: Following community-specific metadata standards.

User Control and Open Protocols

In a data-centric model, control over data and its access is given back to the user. Applications can request access to user data, generate insights, and then return these insights to the user. This allows for a more dynamic and user-empowering ecosystem where data is not locked within a single application.

A Case Study

Verifiable Credentials & SSI

Let’s use verifiable credentials to demonstrate how a data-centric approach can work. In this model, data is accompanied by a semantic layer that explains how it links with other data, making it understandable for both humans and machines. SSI exchange protocols is open and transparent and allow to build applications on top

NOSTR

Protocol for social media apps and relay-based architecture. All data could be shared and reused by multiple apps and platforms built on top of the protocol. All data is simple JSON payloads that are defined by the protocol. More social media and publishing specific cases.

DWN

More flexible Architecture that separates structured data, data access, and data management with a permission layer controlled by the user. Access and data management based on open protocol and open spec. Even more, users get a tool for designing their own data and permission-driven protocols for the good of all. More generic architecture

Data-Oriented Programming

Data-oriented development, which separates data from behavior. This is in contrast to object-oriented programming, where data and behavior are often mixed, leading to complications like object-relational impedance. By keeping data separate, it becomes easier to query, aggregate, and manipulate it.

Key Concepts in Data-Oriented Programming

Data Layout

One of the primary concerns in DOP is how data is laid out in memory. The idea is to organize data in a way that is cache-friendly, thereby reducing cache misses and improving performance.

Data Transformation

DOP focuses on transforming data from one representation to another. This is often done in bulk operations that can be easily parallelized.

Separation of Data and Code

Unlike OOP, where data and methods are bundled together in classes, DOP keeps them separate. This makes it easier to optimize data storage and access patterns.

Avoiding Indirection

DOP tries to minimize pointer chasing and indirection, which can be detrimental to performance. This is achieved by using contiguous memory structures like arrays.

Immutable Data

DOP often favors immutability, as it makes it easier to reason about the data and enables certain optimizations.

Conclusion

The data-centric approach offers a more equitable and efficient way to handle data in the digital age. It puts the user at the center, giving them control over their data, and promotes interoperability and reusability. As we move forward, adopting a data-centric mindset could very well be the future of application development.

--

--

Volodymyr Pavlyshyn

I believe in SSI, web5 web3 and democratized open data.I make all magic happens! dream & make ideas real, read poetry, write code, cook, do mate, and love.