Personal Knowledge Graphs. Semantic Entity Persistence in DataLog. Deductive databases

7 min readMay 11, 2024

How logical and declarative programming could answer a complex question of personal Graphs

Personal graphs are emerging as we shift heavy enterprise technologies to more portable and smaller user spaces. We need embeddable and portable data storage that can operate on user devices like cell phones.

It is the main reason I started researching graph modeling in relational storage and using Sqlite or Duckdb for graph-like storage. Also, after over ten months in the domain, I noticed that applications that use personal Knowledge Graphs with AI often need a hybrid database with a vector search and semi-relational data for entities that support a graph structure.

My last article tackled the most challenging part of modeling entities and things in a relational world.

Personal Knowledge Graphs. Semantic Entity Persistence in Relational Model

In my last two articles, we modeled different kinds of graphs in a portable relational model.

blog.stackademic.com

As a compromise model, we take a mode of document-like storage with a JSON as far as representing entities in an individual table was overwhelming.

Common Logic and power of relations

We may have overlooked common logic and deductive databases in general. Many were popular in the pre-relational era and offered quite exciting properties.

I found this article quite inspiring and strongly recommend reading it. It focuses more on a pure graph challenge and the perfect type and language for a graph.

The "missing" graph datatype was invented in the '70s

This post is a response to/inspired by The Hunt for the Missing Data Type (HN) by Hillel Wayne. I suggest reading his…

tylerhou.com

I also enjoyed a post that inspired the author to write a post about datalog itself in more graph challenge

The Hunt for the Missing Data Type

A (directed) graph is a set of nodes, connected by arrows (edges). The nodes and edges may contain data. Here are some…

www.hillelwayne.com

In short, graphs are complex and diverse, and algorithms and representations depend on a domain. What makes common logic and logical programming a valuable tool for this problem is descriptive and declarative and whats, more importantly, recursive language that focuses more on the question of how entities are commented on and relate to each other

DataLog as a GPT like program

Datalog, a declarative subset of Prolog, is a flexible and powerful declarative query language proficient at dealing with complex and multi-hop graph relationships. Graph query languages not based on Datalog lack the clarity, simplicity, and logical framework that Datalog provides.

Datalog - Wikipedia

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally…

en.wikipedia.org

Let's look at the most famous example of graph queries in Datalog.

edge(1, 2).
edge(1,4)
edge(2, 3).
edge(3, 4).
path(a, b) :- edge(a, b).
path(a, b) :- path(a, c), edge(c, b).

As you can see, the datalog program consists of terms. The term could be a fact or rule, and every term consists of a set of predicates.

edge(1,3).
edge(2,3) :- true.

A fact is always true, so we usually see a short form of it. A Datalog program begins with an extensional database (EDB) containing ground facts. A fact database is quite close to relational databases, and it already allows modeling data structures as predicates. In our case, the edge is a predicate. Every predicate has arity. Edge has an arity of 2, so we could annotate this fact as an edge/2.

DataLog and Prolog do not separate programs from databases, so your program is a set of facts and rules, and DataLog or Prolog has a shell mode at the top level that allows you to ask questions and run queries.

How do queries look? Datalog has an idea of uniformity of horn clauses. This means every term in the data log will be uniform with a database.

We could ask all edges that have a starting node 1

?- edge(1, _)

As you can see, we use the same predicate for a query as we used for a fact definition, and the data log will give us all facts that are unified or pattern-matched with it

edge(1,2)
edge(1,4)

We have a universal predicate that could be used to define and verify data naturally.

A true superpower of datalog came with rules. Evaluation of a Datalog program creates an intensional database (IDB) containing relations with inferred facts. Datalog rules can be recursive, i.e., a rule can recursively depend on itself or another rule that depends on it. Modern data log implementations often separate a rule part to a reusable datalog program that consumes external facts and EDB and makes a program reusable. Datalog rules can be recursive, i.e., a rule can recursively depend on itself or another rule that depends on it. It makes it perfect for semantic graph reasoning.

path(a, b) :- edge(a, b).
path(a, b) :- path(a, c), edge(c, b).

The path is a recursive rule that consists of two parts. The first part covers direct connectivity.

path(a, b) :- edge(a, b).

The second one connected with the OR clause

path(a, b) :- path(a, c), edge(c, b).

The second part is a recursive query. We could read it as the path is edge or a recursive relation or edge and path.

You could run a similar example

egglog demo

Edit description

egraphs-good.github.io

Quick summary: The Datalog program consists of terms that could be rules or facts. In many implementations, datalog programs contain a root term () that defines a query to answer.

Implementations

it is a myriad of datalog engines. Datalog has many forms of synthesis. most famous is a Datomic in Clojure ecosystem

Datomic - Overview

Build flexible, distributed systems that can leverage the entire history of your critical data, not just the most…

www.datomic.com

More portable and close to a user and UI applications could be a datascript

GitHub - tonsky/datascript: Immutable database and Datalog query engine for Clojure, ClojureScript…

Immutable database and Datalog query engine for Clojure, ClojureScript and JS - tonsky/datascript

github.com

The data script is in memory, unfortunately.

More database-like systems that are embeddable and offer a full-scale database could be a cozoDB.

CozoDB: embedded Datalog, performant graphs

A general-purpose, transactional, relational database that uses Datalog and focuses on graph data and algorithms, with…

www.cozodb.org

I want to focus on a quite popular and scalable pure datalog system called Scouffle.

Welcome

Soufflé is a logic programming language inspired by Datalog. It overcomes some of the limitations in classical Datalog…

souffle-lang.github.io

Scouffle

Scouffle is a modern and super fast pure datalog engine that is fast and scalable. It also has unique features like strict types and rich types systems that could be useful for ontologies.

Tutorial

Datalog is a (declarative) logic-based query language, allowing the user to perform recursive queries. It adopts syntax…

souffle-lang.github.io

Our graph example on Scouffle

.decl edge(n: symbol, m: symbol)
edge("a", "b"). /* facts of edge */
edge("b", "c").
edge("c", "b").
edge("c", "d").
.decl reachable (n: symbol, m: symbol)
.output reachable // output relation reachable
reachable(x, y):- edge(x, y). // base rule
reachable(x, z):- edge(x, y), reachable(y, z). // inductive rule

Entities modeling

I love datalogs because of their recursive and declarative nature and because you will find graph examples in every Datalog article. A powerful and overlooked feature is ontologies and entity relations. Let's pick an example from my previous article and try to model it in souffle.

Heterogeneous graphs contain different types of nodes and relations, which creates another challenge: describing constraints on relations and nodes.

So we have Entities
- Person
— name
— dob
- Skill
— name
— level
- location
— NAME
— lat
— long

Allowed relations form a graph itself but we will skip the representational challenges of heterogeneous graphs in this article

.decl person(name: symbol, dob: symbol)
.decl skill(name: symbol, level: number)
.decl place(name: symbol, lat: number, long: number)

Now, let's define the facts.

person("volodia", 231284).
place("berlin",52.5,13.24).
skill("datalog", 5)

Now it is time to model the edges

.decl love(person: symbol, object:symbol)
.decl has(subject:symbol, object:symbol)
.decl friend(person:symbol, person:symbol)
.decl elove(person: symbol, object:symbol)
.decl ehas(subject:symbol, object:symbol)
.decl efriend(person:symbol, person:symbol)

Now it is time for a magic

elove(p,o) :- person(p,_,_), person(o,_,_),love(p,o).
elove(p,o) :- person(p,_,_), skill(o,_,_), love(p,o).
elove(p,o) :- person(p,_,_), place(o,_,_), love(p,o).

So, we can define a rule as a polymorphic rules

efriend(x,y) :- person(x,_,_),person(y,_,_),friend(x,y)
elivein(x,y) :- person(x,_,_),place(y,_,_),livein(x,y)

same for has relations

ehas(x,y) :- person(x,_,_), skill(y,_,_), has(x,y)
ehas(x,y) :- place(x,_,_), place(y,_,_), has(x,y)
ehas(x,y) :- skill(x,_,_), skill(y,_,_), has(x,y)

Now, we are ready to express relations

love("Volodia", "datalog")
livein("Volodia", "Berlin")

So, we encode ontology in relations. How can we generalize entities as a knowledge graph?

Now, it is time for the abstract node.

.decl node {n: symbol}
.decl gnode {n: symbol}
gnode(n):- node(n).
gnode(n):- person(n,_,_).
gnode(n):- place(n,_,_).
gnode(n):- skill(n,_,_).

.decl edge(from: symbol, to:symbol, relation:sumbol)
.decl gedge(from: symbol, to:symbol, relation:sumbol)
.decl gnode(label:symbol)

You could define facts for abstract edges or nodes or do something in a more polymorphic way.

gedge(f,t,r) :- edge(f,t,r), gnode(f), gnode(t).
gedge(f,t,"friend") :- efriend(f,t).
gedge(f,t,"has") :- ehas(f,t).
gedge(f,t,"love") :- elove(f,t).
gedge(f,t,"livein") :- elivein(f,t).

Now it is time to ask all relations for Volodia

gedge('Volodia', t, r)

Conclusion

As you can see, with a datalog, we could encode not only data but ontologies and relations between data. With rules, we get deductive abstractions for multiple entities as graph items. ontologies in a datalog deserve a separate article itself