Artificial Intelligence in Plain English

New AI, ML and Data Science articles every day. Follow to join our 3.5M+ monthly readers.

Follow publication

Dependent Types: A New Paradigm for Ontologies and Knowledge Graphs

--

Knowledge graphs today often rely on ontologies (e.g. OWL) to define classes and relationships, and use constraint languages like SHACL to validate data shapes. This two-layer approach works, but it has well-known limitations in expressiveness and integration. Dependent Type Theory (DTT) offers an alternative: a unified type system rich enough to act as ontology, data schema, and logic simultaneously. Think of DTT as a programming language type system on steroids — one that can encode complex rules (like “birthdate must come before deathdate”) directly in the types of your data. For AI and Knowledge Graph practitioners, this means potentially catching data inconsistencies at compile-time (during data entry) rather than via separate validation steps. In this article, we introduce DTT in accessible terms and explore how it can enhance or even replace traditional OWL ontologies and SHACL validations.

What is Dependent Type Theory (DTT)?

At its core, Dependent Type Theory is a form of type system where types can depend on values. In other words, a program’s type definitions can incorporate actual data or conditions. This is commonplace in proof assistant languages like Coq, Agda, Idris, or Lean. For example, in a dependently-typed language you might have a type Vector(n) that represents “an array of length n.” The number n is a value that lives inside the type — so Vector(3) and Vector(4) are different types. This allows the type system to enforce properties like array length consistency automatically.

For knowledge representation, we can draw an analogy: imagine treating your KG entities and relations as values in a programming language, with a type system that guarantees your ontology’s rules. Dependent types are extremely expressive — they can represent subset constraints, relations, or arbitrary rules as part of a type definition . In practical terms, a dependent type can be viewed as a class or schema that carries its own logic.

To make this concrete, consider a simplified example in pseudocode (in a language with DTT features):

// Pseudocode for a Person type with a dependent constraint
type Person(name: String, birthDate: Date, deathDate: Date?)
. // If deathDate is provided, enforce birthDate < deathDate
. requires (deathDate == null) or (birthDate < deathDate)

Here, Person is a type with fields name, birthDate, and optional deathDate. The interesting part is the requires clause: it’s a logical condition that becomes part of the type. Any data declared as a Person must satisfy this birthDate < deathDate rule (if a death date exists). In DTT, such conditions are checked by the compiler or interpreter as part of type-checking — much like how violating a type in a program causes a compile error. For KG builders, this means the constraint “birthdate before deathdate” is not an external check, but a built-in property of the Person type.

OWL’s Logical Foundations and Limitations

The Web Ontology Language (OWL) is a prevalent standard for defining ontologies. Under the hood, OWL is based on Description Logics, which are themselves fragments of first-order logic (some aspects of OWL, like reasoning about classes, can be viewed as second-order logic). This design trades expressiveness for decidability: OWL reasoners can efficiently compute subclass relations, detect logical inconsistencies, etc., but only for ontologies that stay within certain logical bounds. As a result, OWL cannot capture some kinds of constraints that developers often care about in data.

For example, OWL (even in OWL 2) cannot express a constraint comparing two data values on the same individual. You can state that a Person has a birthdate and a deathdate as date properties, but you cannot formally declare in OWL that birthdate < deathdate — such a restriction lies outside OWL’s expressive range . Similarly, OWL has limited ability to declare global constraints like “every Person must have exactly 2 parents” or “every Employee’s salary must be higher than their bonus.” These kinds of rules are either impossible or very awkward to state in OWL. They often require workarounds (like using additional rules, or leaving the constraint unenforced and handling it in application code).

Why are these limitations present? In essence, OWL’s description logic framework avoids complex constraints to ensure that reasoning remains computationally feasible. It sticks to a certain template of statements (about class membership, subclass relations, property domain/range, etc.) and avoids the full power of first-order or second-order logic. If we try to force OWL to express, say, an ordering on dates or a mathematical relationship, we hit a wall. The OWL 2 standard added some features (e.g. qualified cardinality restrictions and property characteristics like reflexivity, which older OWL versions lacked) but still cannot represent arbitrary constraints between property values . For instance, OWL can state “a Bicycle has at least 2 wheels” or “at most 2 wheels,” but it was not originally possible to say “exactly 2 wheels of type Wheel” in early OWL DL without an extension . And there is no OWL axiom to declare a property surjective (i.e., that every object in the property’s range is connected to some subject) — that kind of constraint falls outside OWL’s open-world, monotonic logic.

To fill these gaps, the Semantic Web community introduced tools like SHACL (Shapes Constraint Language). SHACL is used to define shapes or structural rules for RDF data — for example, “a Person node must have a birthdate and if it has a deathdate, then birth < death.” SHACL rules are typically applied as a separate validation step on your data graph. Unlike OWL’s reasoning (which is open-world and does not assume data is complete), SHACL operates in a closed-world, strict validation mode. In practice, many projects use OWL for semantic modeling (classes, subclass hierarchies, inference) and SHACL for data integrity checks (cardinalities, value ranges, co-occurrence of properties). This separation means that “ontology” and “data validation” live in different layers of the system — you get logical consistency checks from OWL, but you need to run a SHACL engine (or similar) to enforce the more procedural business rules.

Enter Dependent Types. With DTT, we have the potential to unify what OWL and SHACL do separately. The idea is that the type system can express all those constraints directly. Instead of having an OWL ontology plus a separate SHACL schema, a dependently-typed knowledge representation encodes the ontology and integrity constraints in one coherent framework.

How DTT Enhances Knowledge Modeling

Using Dependent Type Theory for knowledge graphs means that types become “logic-rich”: your entities, relationships, and constraints all live together in a single formal system. In fact, in a DTT-based model, everything is a type or a dependent construct, including what we normally think of as data instances. A recent example of this approach is TypeDB (previously Grakn), which applies dependent type principles to its schema design. As the designers of TypeDB describe: entities can be seen as types, relationships as dependent types (types that reference other entities), and attributes as dependent types targeting literal values . In such a system, a relationship like Marriage(Person, Person) could be a dependent type that itself depends on two Person types — effectively treating the relationship as a first-class citizen with its own constraints. This is far more expressive than traditional graph modeling, where an edge is just a link with no internal structure.

Crucially, because DTT is rooted in formal logic, these rich types carry proof obligations or constraints that must be satisfied. If you define a type Person with a rule that birthdate must precede deathdate, then any instance of Person requires a proof or evidence of that rule. In practical terms, that evidence could just be the actual comparison of the two dates — the type checker will accept the instance only if it can verify the comparison holds. We effectively get the enforcement that SHACL provides, but baked into the act of creating data.

Let’s explore a few concrete scenarios where DTT provides more expressiveness or precision than OWL/SHACL:

• Value-dependent constraints (e.g. birthdate before deathdate): As discussed, OWL can’t natively enforce an order comparison between two date properties . SHACL can enforce it, but only by running a separate validator. In a DTT-based model, we simply make the date order a part of the Person type. For example, in a dependently-typed language like Idris/Agda one could write a type:

record Person where
. constructor MkPerson
. name. : String
. birthDate : Date
. deathDate : Optional Date
. proof. : case deathDate of
. Just d. => birthDate < d
. Nothing => True

In this Idris-like pseudocode, the proof field ensures that if deathDate is provided (Just d), then birthDate < d must hold (the value proof would conceptually carry a proof of that fact, but could be left implicit if the compiler can figure it out). If someone tries to construct a Person with a death date earlier than the birth date, the type checker will refuse it — similar to a compile error. This means consistency is guaranteed at data construction time, rather than after-the-fact. The dependent type has effectively replaced a SHACL shape.

• Functional properties and injective/surjective mappings: In OWL you can declare a property to be Functional or InverseFunctional, which covers simple cases of uniqueness (each person has at most one birthdate, or each social security number identifies at most one person). But what about specifying a truly bijective or surjective relationship? For instance, suppose in a university KG we want to say each student ID is unique to a student (injective) and every student has a student ID (functional in one direction, total in the other). OWL cannot enforce “each student has exactly one ID and each ID refers to exactly one student” fully — one can assert inverse-functional (unique ID) and functional (each student at most one ID), but one cannot enforce that every student definitely has an ID (that’s a closed-world assumption). With dependent types, we can model the ID assignment as a function in our type theory and prove properties about it. For example, we might have:

constant studentID : Student → ID. - each Student has an ID (total function)
axiom id_injective : ∀ s1 s2, studentID s1 =
studentID s2 → s1 = s2

This Lean snippet declares studentID as a total function (meaning every Student yields an ID — you’d ensure this by how you construct or define students), and an axiom stating if two students share the same ID, they must be the same student (injectivity). In a dependently-typed setting, this axiom could instead be a theorem you prove from how you set up the types. For instance, you could make ID a dependent type containing the student as evidence, so that by construction an ID is always tied to a unique student. The key point is properties like injectivity or surjectivity can be encoded and verified within the type system, something well beyond OWL’s native capabilities.

• Consistency proofs at build time: In knowledge graph development today, consistency checking is often done by running a reasoner (to see if your OWL ontology plus data has any unsatisfiable classes or contradictions) or by running SHACL validation. These are runtime or deployment-time steps. With DTT, consistency can often be ensured by the type checker before the data is ever considered “loaded.” Since any violation of a constraint makes a would-be instance ill-typed, you simply cannot assemble a inconsistent knowledge graph in the first place — the type theory won’t allow it . In the context of a proof assistant like Coq or Agda, one would construct a witness of the entire knowledge graph as a term of some complex type that represents all the ontology constraints. If the construction succeeds, you essentially have a proof that the KG satisfies all stated constraints. In fact, researchers have demonstrated building and querying a knowledge graph entirely within Coq’s dependent type system , meaning the KG comes with machine-checkable proof of consistency by construction.

• Types as schemas (shapes and cardinalities): Dependent types blur the line between schema and data. In a DTT approach, you don’t write a SHACL shape that says “Person must have properties X, Y, Z”; instead, your Person type is defined to have exactly those properties. Cardinalities that OWL or SHACL would express (like 1 or 2 values for a property) become just the number of fields or the structure of your types. Need to model a “family” entity that has exactly two parents and any number of children? You can define a type Family(parent1: Person, parent2: Person, children: List Person) — now by definition a Family has two and only two parents. Or if an academic paper should have at least 1 author, you might define it as Paper(authors: NonEmptyList Person, …), using a non-empty list type. These tricks are common in programming with advanced type systems, but the same idea applies to knowledge modeling: the shape constraints are enforced by the type constructors. There is no separate validation phase needed to check “did someone provide 0 parents by mistake?” because such a state is literally unrepresentable in the type. This approach can simplify the technology stack: instead of maintaining parallel ontology files and shape files, you have one dependently-typed specification that serves both purposes.

All these examples illustrate the central theme: DTT lets us mix data and logic seamlessly. We no longer have “dumb data” that needs external rules to be checked; rather, the data carries its own rules with it through the types. A dependent type system can be seen as a highly expressive ontology language where the “ontology” isn’t just a passive schema but an active part of the data model that prevents errors.

Unified Framework: Entities, Relations, and Constraints as Types

One of the most powerful aspects of the dependently-typed approach is that it unifies what were separate layers in the Semantic Web stack. In DTT, we don’t have to split into “ontology vs data” or “TBox vs ABox” — the types cover both. As an example, consider how a dependently-typed knowledge base might represent a relationship like a marriage between two people. In OWL, you might have an object property spouse with some axioms, and you might use SWRL or SHACL to enforce that spouse is reciprocal and one-per-person, etc. In a DTT system, you could instead introduce a type (or a parameterized type) Marriage(p1: Person, p2: Person) that directly encodes that p1 and p2 are each other’s spouses (and maybe a proof that p1 ≠ p2 to forbid self-marriage, etc.). Now a particular marriage instance would be a value of type Marriage(alice, bob). That value exists only if alice and bob are Persons and if all marriage constraints are satisfied. We’ve thus captured the relationship and its constraints in one construct.

In such a framework, entities might correspond to types, and relations to dependent types or functions that link those entities . Attributes (literal values) can be seen as dependent types as well (a type of “Person’s age” that is always consistent with Person’s birthdate, for instance). All parts of the knowledge graph are governed by the same type-theoretic rules. Not only does this provide expressiveness, it also yields a kind of built-in documentation: the type signatures tell you exactly what is allowed or not allowed in the graph.

Because DTT is also a logic (specifically, a form of constructive, higher-order logic), we can even do reasoning in this unified framework. The types themselves can be seen as logical propositions about the data they describe . A well-typed knowledge base is akin to a proven theorem stating “there exists a model satisfying all these constraints.” Moreover, queries or rules can be formulated in the same language. In fact, in the TypeDB approach the query language and the schema language are closely connected, to the point where queries can be treated as types that must satisfy certain constraints . This means when you query the data, the query can be checked against the schema’s rules (or even inferred from them) in a very precise way, all within one system. Contrast this with the traditional stack: you might have SPARQL for queries, which is separate from OWL for ontology, separate from SHACL for validation — integration between those layers is limited and often informal. DTT promises a single coherent framework.

Using Proof Assistants and Dependently-Typed Languages

How can practitioners experiment with these ideas today? There are already several mature dependently-typed languages and proof assistants that one can use for knowledge modeling:

• Coq / Agda / Lean (Proof Assistants): These are interactive theorem proving environments based on dependent type theory. They are not designed as databases, but one can encode ontologies as collections of inductive types and propositions. For instance, Coq was used to build a prototype “dependently typed knowledge graph” that recreated RDF and SPARQL functionalities within Coq’s type theory . In Coq or Lean, you might define types for your core entities and relations, and then state lemmas (which the system will ask you to prove) corresponding to consistency or inference rules. The upside is rock-solid guarantees — if something is proven in Coq/Lean, it’s mathematically certain. The downside is that these tools have a steep learning curve and the “knowledge base” lives inside the proof assistant rather than a typical database. However, they’re great for experimentation and for modeling a domain with absolute rigor. Lean, in particular, has a growing community and a lot of tutorials (Lean is based on a flavor of DTT and is used to formalize mathematics, but the concepts transfer to ontology modeling as well).

• Idris / F★ / Haskell with GADTs (Programming Languages): Idris is a general-purpose programming language with full dependent types, which means you can write programs that manipulate dependently-typed data directly. You could, for example, implement a small knowledge base in Idris where each insertion is a function that returns a new knowledge state if and only if the insertion maintains consistency (otherwise the function won’t type-check). F★ is another interesting language (from Microsoft Research) which is aimed at program verification but supports dependent types; it could potentially encode complex relationships and then compile down to a runnable form. Haskell is not dependently typed in the strict sense, but with extensions like GADTs and type families, Haskell can achieve some dependent-type-like modeling. These programming languages might offer a more practical route to integrating with existing software systems, at the cost of some expressiveness or convenience compared to proof assistants.

• Specialized Knowledge Base Systems: As noted, TypeDB is an example of a database that incorporates ideas from dependent types into its schema. It provides a query interface and data storage, so it’s closer to what KG practitioners are used to, but under the hood it treats the schema in a type-theoretic way. The creators explain that dependent type theory gave them a “mathematically sound foundation for knowledge representation,” enabling complex relationship modeling and even hyper-relational (hypergraph) data structures . While you might not directly interact with lambda terms or proofs when using such a system, it’s instructive to know that these principles are influencing new KG technology. It means the ideas of DTT are not just academic; they’re making their way into real tools.

To illustrate what modeling with a dependent type theory looks like, here’s a tiny example in Lean 4 style (Lean 4 is a newer version of Lean that feels a bit more like a programming language while retaining theorem-proving power):

- Define a basic type of Person with a dependent constraint on ages
structure Person where
. name. : String
. birthYear : Nat
. deathYear : Option Nat
. - Dependent field: a proof that if deathYear exists, it's > birthYear
. validYears : match deathYear with
. | some d => birthYear < d
. | none. => True
- Example usage: constructing a Person instance
def alice : Person :=
. { name := "Alice"
. birthYear := 1980
. deathYear := some 2070
. validYears := Nat.lt.base 1980 2070. - proof that 1980 < 2070 (trivial)
. }

In this snippet, if we tried to create a person with deathYear := some 1970 and birthYear := 1980, Lean would refuse to compile it, saying we failed to provide a valid proof for validYears. By design, we cannot represent a Person with an impossible date sequence. The type system caught the error statically. Compare that to RDF/OWL: you could describe Alice with birth and death years in RDF, and nothing in OWL would automatically flag the inconsistency until you perhaps run a custom rule or query. In Lean (or other DTT-based environment), the inconsistency is a type error.

Moreover, once you have data in Lean/Coq/Agda, you can ask the system to prove things about it. For example, you could prove a lemma that “all Persons in our knowledge base died after they were born” and the proof would be automatically solved because it’s guaranteed by the type of Person itself. This style of formal knowledge representation opens the door to machine-checked proofs of data quality and logical entailments, going beyond what OWL reasoners typically do (OWL reasoners focus on class subsumption, instance classification, etc., but they don’t produce proofs that you can inspect step-by-step — proof assistants do).

Hypergraphs and Higher-Order Relationships

It’s worth noting that dependent types naturally accommodate hypergraphs, i.e., relationships involving more than two entities. Standard RDF-based knowledge graphs are limited to binary relations (triples subject-predicate-object). If you have a relationship among three or four things (say an event linking a person, a place, a time, and a reason), you normally have to reify it or break it into smaller parts. In a dependently-typed framework, you can directly model an n-ary relation as a type or as an n-ary function/predicate. For instance, you could define a type Event(person: Person, location: Place, time: Time, reason: Reason) as a single construct. This is like having a hyperedge connecting all involved nodes. The TypeDB system highlights this capability: “unlike simple graphs where edges connect exactly two nodes, hypergraphs allow edges to connect multiple nodes simultaneously”, and by using dependent types, one can model relations with complex dependencies on multiple entities . The ability to natively handle hyper-relations is increasingly important for representing complex knowledge (such as biochemical pathways, multi-party transactions, etc.), and DTT provides a clean way to do it. Each hyper-relation can carry its own constraints too (e.g., a single Event type could enforce that the person is actually present at the location at the given time through some scheduling sub-logic, if one wanted to go that far).

In categorical terms, dependent type theory’s semantics are tied to higher-order structures (often discussed in category theory as locally cartesian closed categories), which means it’s comfortable talking about “families of things indexed by other things” — essentially what a hyperedge is (a family of connections indexed by multiple node parameters). The takeaway for practitioners is: you don’t have to contort your model to fit into binary relations; a richer type system can directly express the multi-entity relations.

Conclusion

Dependent Type Theory brings a new level of rigor and expressiveness to knowledge representation. By making types capable of expressing logic, we get a system where the ontology, data, and constraints are all encoded in one coherent framework . This results in a kind of “live ontology” — one that not only describes the world but actively enforces the rules of that description. For AI and KG developers, this could mean fewer invalid data entries, more powerful automated reasoning, and ultimately more trustworthy knowledge bases.

Of course, adopting DTT for KGs comes with challenges. The theory and tools are more complex than traditional schema languages. There is a learning curve to understanding proofs and dependent types. However, as tools improve and hide some of the complexity (as we see with newer systems like TypeDB, which embeds the ideas under a more user-friendly surface), the benefits become hard to ignore. We are essentially moving the validation and reasoning that we used to do at runtime (or as a separate step) into the design of the data structures themselves. It’s analogous to the shift from untyped scripting to strongly-typed programming in software engineering — many initially resisted it, but ultimately the benefits of catching errors early and having self-documenting interfaces won out.

In the coming years, we can expect more research and development at the intersection of type theory and knowledge graphs. There’s already work on using proof assistants to curate knowledge and on compiling dependently-typed specifications into efficient runtime systems. As knowledge-intensive AI systems (like those for scientific research, legal reasoning, or complex planning) demand ever more guarantees of consistency and correctness, a DTT-based approach looks increasingly attractive. It offers a firm foundation where every piece of data is bound by a logical contract (its type). By embracing dependent types, we might well be witnessing the emergence of next-generation knowledge graphs — ones that are not just smart in the connections they represent, but also formally intelligent in how they maintain integrity and infer new knowledge.

Ready to experiment? You could start by modeling a toy ontology in a language like Agda or Lean, or explore the literature on dependently typed knowledge graphs . Even if DTT doesn’t replace OWL/SHACL overnight, it can greatly inspire new ways of thinking about data models. For AI practitioners, learning about dependent types can spark ideas for designing systems that are correct by construction. The fusion of programming language theory and knowledge graph engineering has the potential to reduce the impedance mismatch between our data and our rules, yielding systems that are both highly expressive and robustly reliable . It’s an exciting time to reimagine the foundations of semantic AI, and dependent type theory just might be the key to unlocking the next leap forward.

Thank you for being a part of the community

Before you go:

--

--

Published in Artificial Intelligence in Plain English

New AI, ML and Data Science articles every day. Follow to join our 3.5M+ monthly readers.

Written by Volodymyr Pavlyshyn

I believe in SSI, web5 web3 and democratized open data.I make all magic happens! dream & make ideas real, read poetry, write code, cook, do mate, and love.

Responses (2)

Write a response