Physics Derivation Graph: Comparison of Design Options Documentation

Physics Derivation Graph navigation

Recommendation: Read the user documentation and FAQ first. This page assumes familiarity with the jargon used in the Physics Derivation Graph.

This page compares databases that could be used for the Physics Derivation Graph (PDG).

Historical design evolution

The Physics Derivation Graph has progressed through multiple architectures, with data structure changes keeping pace with the developer's knowledge.

plain text: databases for comments, connections, equations, operators. Perl script to convert database content to images. One line per entry in each database.
XML:
CSV:
file per expression:
property graph: a very limited exploration. Written in Cypher/Neo4j but could also use Gremlin/TinkerPop. No significant code base. Schema:

Schema for property graph representation.
sqlite: a very limited exploration. No significant code base. Schema:
- Table: derivations; columns: derivation_ID, name, notes, creation date, author
- Table: expressions; columns: expr_global_ID, latex, creation date, author, AST_as_string, note, name
- Table: inference_rules; columns: name, creation date, author, latex, number of inputs, number of feeds, number of outputs,
- Table: symbols; columns: symbol_id, creation date, author, latex, scope, value, references
- Table: operators; columns: latex, creation date, author, scope, macro, references
- Table: step; columns: step_id, creation date, author, inference_rule, derivation_ID, linear_index
- Table: step_inputs; columns: step_id, expr_local_ID, index
- Table: step_feeds; columns: step_id, expr_local_ID, index
- Table: step_outputs; columns: step_id, expr_local_ID, index
- Table: local_to_global; columns: expr_local_ID, expr_global_ID
TODO: is this schema in 3NF?
web interface: the current implementation. Uses Python, Flask, Docker. Data is stored in a JSON file. Limited support for checking inference rules using Sympy. Storage formats evolved:
1. nested Python dictionaries and lists stored as a Python Pickle
2. nested Python dictionaries and lists stored as a JSON file. With this approach the schema can be validated
3. nested Python dictionaries and lists stored as a JSON file stored in Redis. Retains the schema validation of JSON while preventing concurrent writes to file; see https://redis.io/topics/transactions
4. nested Python dictionaries and lists stored as a JSON file stored in SQLite3. Part of the migration towards table-based implementation. SQLite3 is better than Redis because Redis requires a Redis server to be running whereas SQLite3 is a file.

Each of these have required a rewrite of the code from scratch, as well as transfer code (to move from n to n+1). The author didn't know about property graphs when implementing v1, v2, and v3.

Within a given implementation, there are design decisions with trade-offs to evaluate. Knowing all the options or consequences is not feasible until one or more are implemented. Then the inefficiencies can be observed. Knowledge gained through evolutionary iteration is expensive and takes a lot of time.

A few storage methods were considered and then rejected without a full implementation.

Other approaches

Networkx


import networkx as nx
G=nx.digraph()
G.add_edge([8332941,8482459])
G.add_edge([8482459,6822583])
G.add_edge([5749291,6822583])
G.add_edge([6822583,8341200])
G.add_edge([8341200,9483715])
G.add_edge([8837284,9483715])
G.add_edge([9483715,9380032])
G.add_edge([9380032,8345721])
nx.plot()
plt.show()

GraphML

See GraphML file format.

<?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <graph id="G" edgedefault="directed"> <node id="8332941"/> <node id="3131111133"/> <edge source="8332941" target="3131111133"/> <node id="6822583"/> <edge source="3131111133" target="6822583"/> <node id="574929"/> <edge source="5749291" target="6822583"/> <node id="2131616531"/> <edge source="6822583" target="2131616531"/> <node id="9483715"/> <edge source="2131616531" target="9483715"/> <node id="8837284"/> <edge source="8837284" target="9483715"/> <node id="2113211456"/> <edge source="9483715" target="2113211456"/> <node id="8345721"/> <edge source="2113211456" target="8345721"/> <edge source="7473895" target="4938429483"/> <node id="3848927"/> <node id="2393922"/> <edge source="2393922" target="3848927"/> <node id="2384942"/> <node id="2103023049"/> <edge source="2103023049" target="2384942"/> </graph> </graphml>

RDF/OWL

The Physics Derivation Graph can be expressed in RDF.

Each step in a derivation could be put in the subject–predicate–object triple form. For example, suppose the step is

Input 1: y=mx+b
inference rule: multiply both sides by
feed: 2
output 2: 2*y = 2*m*x + 2*b

Putting this in RDF,

step 1 | has input | y=mx+b
step 1 | has inference rule | multiply both sides by
step 1 | has feed | 2
step 1 | has output | 2*y = 2*m*x + 2*b

While it's easy to convert, I am unaware of the advantages of using RDF. The Physics Derivation Graph is oriented towards visualization. SPARQL is the query language for RDF. I don't see much use for querying the graph. Using RDF doesn't help with using a computer algebra system for validation of the step.