navigation / documentation overview / design documentation

Recommendation: Read the user documentation and FAQ first. This page assumes familiarity with the jargon used in the Physics Derivation Graph.

This page provides background context for design decisions and implementation choices associated with the Physics Derivation Graph (PDG). Contributions to the project are welcome; see CONTRIBUTING.md on how to get started. The Physics Derivation Graph is covered by the Creative Commons Attribution 4.0 International License, so if you don't like a choice that was made you are welcome to fork the Physics Derivation Graph project.

Design Principles Documentation for the Physics Derivation Graph

This page enumerates design principles and goals for the Physics Derivation Graph (PDG).
The list is unordered.

Goals

Long-term durability (as measured by financial and temporal costs of maintenance)
Easy for users to leverage (accessibility)
Easy for developers to contribute (accessibility)
Reproducibility

Therefore

Stable code base (therefore avoid dependencies). Should decrease the maintenance burden
Use free software (e.g., Latex, SymPy, Linux, Python, Docker, HTML) - saves money for developers and users
Use Open source software (e.g., Latex, SymPy, Linux, Python, Docker, HTML) - enables inspection of implementation
Containers (reproducibility)

Expressions, symbols, operators, and relations exist only once in the database.

Expressions, symbols, operators, and relations are assigned unique identifiers; these are akin to Godel numbering.

Design Choices for the Physics Derivation Graph

Contents

Goal
Decision: What constitutes "all known mathematical physics"?
Decision: What does "human-readable" mean?
Decision: What does "checkable by CAS" mean?
Decision: Which Computer Algebra System(s) to use?
Decision: Which proof assistant(s) to use?
Decision: How to represent expressions?
Decision: How to render graphs visually?
Decision: Which data structure?
Decision: Which Property Graph database?
Decision: Upgrade path from JSON/SQL to Neo4j
Decision: Naming of repo
Decision: Page scope
Decision: How to display Latex on webpages?
Decision: Which VPS provider service company to use?
Decision: Separate non-interactive workflow pages (e.g., user_documentation) into a separate Python file?
Decision: Which languages to use?
Decision: Interface
Decision: Data structure used in the Physics Derivation Graph
\(\rm\LaTeX\) representation of expressions
Decision: How to represent the objects
Decision: Supported Mathematical features
Decision: Outside of Current Scope

Goal

Restating the goal from the front page of https://allofphysics.com/,

"Write down all known mathematical physics in a way that can be both read by humans and checked by a computer algebra system."

This page describes the current status and historical evolution of design decisions critical to the Physics Derivation Graph.

Decisions are not made independently; one choice informs others. Below is a visual overview of the relations among various decisions.

legend for shapes:

decisions are diamonds
top-level design choices are Mdiamonds
goals are ovals
options are rectangles

colors:

"Currently implemented" is blue
"never enacted" is red
"previously explored" is ivory4

rule of connectivity for this graph:

DECISION -> OPTION, as in "has an"
GOAL -> OPTION, as in "is satisfied by"
GOAL -> GOAL, as in "has subgoal"
OPTION -> OPTION, as in "therefore"

Decision: What constitutes "all known mathematical physics"?

For a list of topics, see https://en.wikipedia.org/wiki/Branches_of_physics.

Notationally, mathematical physics includes Dirac notation, calculus, differential equations, algebra, trigonometry. (There is https://en.wikipedia.org/wiki/List_of_common_physics_notations but there's not comprehensive coverage.
See also Supported mathematical features

Geometry is certainly part of mathematical physics but the developer of this project hasn't figure out how to incorporate geometry.
See the full list of not in scope.

Decision: What constitutes "spanning" for all known mathematical physics?

Merely having derivations in each domain (e.g., quantum, classical, relativistic) does not suffice.

Suppose we have a derivation in the domain of quantum mechanics and a derivation in the domain of classical mechanics. Suppose that there was a variable (e.g., length) common to the two derivations.

Suppose we have a derivation in the domain of quantum mechanics and a derivation in the domain of classical mechanics. Suppose that there was an expression common to the two derivations.

Is there a single derivation that involves expressions that are quantum and expressions that are classical?

Decision: What does "human-readable" mean?

Reasonable for a human to understand without use of specialized knowledge.

Raw Latex (like \int_0^{\infty}) is not understandable to everyone, whereas \( \int_0^{\infty} \) is). Similarly, raw contentML is not human-readable.

Decision: What does "checkable by CAS" mean?

In a derivation the mathematical steps can be verified as correct using symbolic mathematical software. (See which CAS for examples of applicable Computer Algebra Systems.)

Once a CAS is introduced, there are multiple aspects that can be checked:

dimensional consistency of expression
shape consistency of expression
step validation
that steps in the derivation are interlinked with other steps
that a step has an inference rule

The relevance of checking the math is to distinguish from just writing down symbols and expressions.

Why this matters: Reliability of machine-verified logic, Reproducibility, and Accessibility (no leaps of logic).

Decision: a CAS is not sufficient verification

A CAS is not sufficient as it may report \( 1 = x/x \) as true, even though \( x = 0 \) is false. That is why a proof assistant is necessary. (See which proof assistant?.)

Decision: Character support: ASCII or Unicode?

current status: Only ASCII is supported.

Why only ASCII?

Decision: Which Computer Algebra System(s) to use?

current status: using SymPy.

https://github.com/allofphysicsgraph/task-tracker/issues/117

See comparison of CAS like Mathematica, MathCad, Sage, Maple, SymPy.

Decision: Which proof assistant(s) to use?

current status: using Lean 4.

https://github.com/allofphysicsgraph/task-tracker/issues/106

See comparison of Rocq, Isabelle, Lean

Decision: How to represent expressions?

current status: using Latex.

See comparison of Latex, Content MathML, Presentation ML

Decision: Are feeds single use?

Suppose a step has feed "x/2". Should other steps re-use that feed? Should other derivations re-use that feed? Or should feeds only be connected to a single step?

current status: Feeds should only be used once.

Cypher query to find feeds connected to more than one step:

  MATCH (f:feed)--(s:step)
WITH f, count(s) AS stepCount
WHERE stepCount > 1
RETURN f, stepCount

Decision: How to render graphs visually?

current status: using d3js and graphviz.

https://github.com/allofphysicsgraph/task-tracker/issues/97

d3js, graphviz, networkx

Decision: Which Property Graph database?

current status: using Neo4j.

https://github.com/allofphysicsgraph/task-tracker/issues/43

See comparison of Neo4j

Decision: Whether to cache results

current status: Not caching generated information.

Caching could lower latency for users of the website. However, to eliminate a risk of incorrect caching, validation or checking or queries are done at the query time.

Historical Decision: Upgrade path from JSON/SQL to Neo4j

The repo ui_v7 used JSON/SQL and ui_v8 used Neo4j. The ui_v7 repo had a working implementation of Google authentication. I had trouble getting Google authentication working in ui_v8, and I didn't want to have to refactor all the static content from ui_v7, so I decided to create a new repo, "allofphysics.com" (now renamed to https://github.com/allofphysicsgraph/combined_v7_JSON_and_v8_neo4j.

Although getting Neo4j into ui_v7 was do-able, the "mash together to repos" ended up being a bad decision from a troubleshooting and cleanliness-of-design experience.

Decision: Naming of repo

I decided to name the repo "allofphysics.com" to make it clear which repo hosted the website on the Internet. Also, the alternative repo name "a mashup of v7 and v8" wasn't a good name, though it would have been more descriptive.

In retrospect, that was a bad naming convention because I later reverted to using ui_v7 for the website.

The "allofphysics.com" has been renamed to https://github.com/allofphysicsgraph/combined_v7_JSON_and_v8_neo4j.

As of 2026-02-06, ui_v8 is in use.

Decision: Page scope

current status: using page-per-decision.

Rendered HTML pages should have a single scope. (As opposed to a single-page website.)

Decision: How to display Latex on webpages?

current status: using MathJax.

Decision: Which VPS provider service company to use?

current status: Currently using Hetzner.

Example options: DigitalOcean, AWS, Oracle, Azure

https://github.com/allofphysicsgraph/task-tracker/issues/56

https://physicsderivationgraph.blogspot.com/2026/01/vps-price-comparison-september-2024.html
https://physicsderivationgraph.blogspot.com/2026/01/vps-price-comparison-january-2026.html

Decision: Separate some pages into a separate Python file?

current status: flask routes are in single file (pdg_app.py).

Options: have different categories of routes in separate .py scripts, or have all routes in a single .py script.

Having all the routes in a single file (e.g., pdg_app.py) results in a huge file with thousands of lines.
To make this more managable, Flask provides a way to separate routes into separate files (e.g., pdg_other_routes.py) using blueprints.

As an example of how this woule be enacted, suppose the file pdg_app.py contains

from pdg_other_routes import other_routes_bp
web_app.register_blueprint(other_routes_bp)

and pdg_other_routes.py contains

other_routes_bp = Blueprint("other_routes", __name__)

@other_routes_bp.route("/api_via_js")
def to_api_via_js() -> str:
    return render_template("js_with_api/api_js.html")

The cost of this separation is that Flask namespaces the endpoints to ensure there are no name collisions between different files. The url_for function in Jinja2 templates (and in the Python code) now expects the format:

other_routes.function_name

That would require figuring out which url_for functions point to routes in pdg_other_routes.py and which routes are in pdg_app.py

Therefore, having all routes in pdg_app.py is easier.

Decision: Which languages to use?

current status: using Python, HTML, Latex.

Languages I'm comfortable with and are widely used - Python; HTML; Docker

Decision: Is an interface needed?

current status: Direct write access to Neo4j is not allowed.

One could imagine allowing read-write access to the property graph database. That would allow users to make changes that violate the schema. A property graph is overly permissive regarding what nodes, edges, and properties are allowable.

Having an interface enforces compliance with the schema. The interface enforces

only certain nodes, edges, and properties are allowed
consistency of references; e.g. disconnected nodes are less likely
a specific sequence of actions -- a workflow. An interface guides which actions can be taken at any point in the process. For example, the user can't jump directly to adding sympy for an expression, they are only directed there after entering a new expression.

The backdoor to this is exporting the database and then importing modified content.

Decision: Which interface modality?

current status: primarily using web UI to read and write to Neo4j database. Exploring use of API.

Plan to support API.

Why web ui? Why not a GUI? Or command line?
Accessibility. Most users wouldn't be comfortable with command line UI. The web UI enforces the Neo4j schema.

The motive for adding the API is so that other developers can provide a better UI.

The reason to start with the web interface is to figure out which actions should be allowed and which actions should not be allowed. The same set of constraints and workflow will apply to the API.

Decision: Not hiring contractors to enact features

current status: I am not hiring a designer for the web front-end because that would be premature at this point. I'm still in the exploration prototyping phase to figure out what the front end will need to be able to do.

Decision: Which data structure to use in the Physics Derivation Graph

current status: the Physics Derivation Graph is stored in a Neo4j Property Graph database.

(There are intermediate data structures in pdg_app.py that interface with the jinja2 HTML pages.)
Not clear whether the site could just be using Neo4j directly by the web UI.

The many alternatives data storage options (e.g., SQLite, Redis, Python Pickle, CSV, GraphML) offer trade-offs, a few of which have been explored.

Decision: Why are there different node type?

In neo4j_query.py there are many node types:

:operation
:relation
:scalar:symbol
:vector:symbol
:matrix:symbol
:quantum_operator
:value_with_units
:feed
:expression

The motive for having distinct node types is each has a specific set of property keys. The property keys of :scalar are different from the property keys relevant to :matrix.

The :scalar, :vector, :matrix nodes are grouped under :symbol. There's an argument to be made that another hierarchical layer would be :operation, :relation, :quantum_operator, :value_with_units, :symbol. Not clear what that grouping would be called. "Things that appear in expressions and feeds" is too long.

A counter-proposal would be that :operation, :relation, and :quantum_operator should be nested under :symbol. Currently those nodes are connected to :expression and :feed by :IS_COMPRISED_OF (rather than :HAS_RELATION and :HAS_OPERATION).

Decision: Choice of edge labels

The Physics Derivation Graph previously used HAS_SYMBOL between :expression and :symbol nodes. That label is bad as an edge because it implies a category of node. IS_COMPRISED_OF is better.

Could every edge be IS_COMPRISED_OF?
No. For example, HAS_STEP is separate because of the property key sequence index. Similarly, HAS_INPUT, HAS_OUTPUT, HAS_FEED are right and each also should have an index .

\(\rm\LaTeX\) representation of expressions

There are multiple choices of how to represent a mathematical expression. The choices feature trade-offs between conciseness, ability to express the range of notations necessary for Physics, semantic meaning, and ability to use the expression in a computer algebra system (CAS). See the comparison of syntax. \(\rm\LaTeX\) was selected primarily because of the common use in Physics, display of complex math, conciseness, and expressiveness. The use of \(\rm\LaTeX\) means other tasks like parsing symbols and resolving ambiguity are harder.

\(\rm\LaTeX\) or SymPy as the primary representation of expressions?

Is the latex representation primary, or SymPy primary, or is one canonical?

They are co-equals and intended to be equivalent.

Decision: Which objects should be represented?

There are a few obvious objects that need to be accounted for, like derivation, steps, inference rule, feed, and expression.

Beyond those there are objects that could be a either node in the graph or a property of a node. For example, should (LHS, relation, RHS) be separate nodes or properties of an expression? A framing that motivates the choice is whether a user may want to query LHS separately from the expression. The trade-off is that additional nodes better support custom queries but then incur more queries to extract information relevant for typical workflows.

Another framing to motivate the node-or-property decision is the nodes can have properties but properties cannot have edges. For example, if LHS is a property of expression, then a symbol-as-node has to be related to the expression rather than LHS.

:expression (LHS) -> IS_COMPRISED_OF -> :symbol (x)

versus

:expression -> HAS_SIDE -> LHS -> IS_COMPRISED_OF -> :symbol (x)

If "symbol" is a node, then is a relation is a symbol? Should the relation be a property of the expression-as-node, or should the schema be

:expression -> HAS_RELATION -> :relation (=)

Decision: Supported Mathematical features

Many but not all symbols are supported. Here are some categories of supported symbols. (Symbols exist on the left side or right side of an expression and can also appear in a feed.)

Scalars (Rank 0): Single values. Includes constants (e.g., pi, e, 42) and variables (e.g., x, y, c)
Vectors (Rank 1): Ordered lists of scalars (1xn or nx1)
Matrices (Rank 2): Grids of scalars (mxn)
Tensors (Rank n)
Sets: real, integer, complex

Operators. These act on symbols (listed above) and are part of either the left side or right side of an expression. Operators can also appear in feeds.

arithmetic: +, -, multiply, divide, mod
Unary Operators: factorial, negation, square root, bar (mean)
Linear Algebra Operators: Transpose, Determinant, Trace
Calculus/Differential Operators: differential, nabla/del, laplacian
Integrals
Iterative Operators: Operations over a range (product, sum)
Set operations, e.g., "is member of"

Relations. These evaluate to True for an expression. Relations do not appear in a feed.

Equality/Equivalence: =
Ordering (Inequalities): greater than, less than, greater than or equal to, less than or equal to
Proportionality

To check the above documentation against the code, see https://github.com/allofphysicsgraph/ui_v8_website_flask_neo4j/blob/gh-pages/webserver_for_pdg/library/list_of_valid.py

Decision: Outside of Current Scope

Although the Physics Derivation Graph is intended to be comprehensive across domains, there are aspect of Physics not within the current scope of the project:

inclusion of graphics, e.g. free body diagrams, Feynman diagrams, geometrical diagrams.
explanatory text and pictures like
- https://learn.sparkfun.com/tutorials/what-is-electricity/
animations of concepts like
- https://www.youtube.com/@3blue1brown
experimental processes,
Geometric arguments, e.g. optics
Spatial reasoning, e.g. electrodynamics
Numerical analysis
Simulations of Physics
interactive models
- https://ciechanow.ski/ (with comments: https://news.ycombinator.com/item?id=35343495)
- https://landgreen.github.io/physics/notes/waves/waves/ (with comments: https://news.ycombinator.com/item?id=17178031)
Set operations, e.g., union, intersection
Logic operations, e.g., "for all", "implies", AND, OR, XOR

These aspects could be included if the data structure and workflow were adapted to an expanded scope.