Published 2020-03-20T11:57:00.003Z by Physics Derivation Graph

I've experimented with seven different data structures for the Physics Derivation Graph:- v1_plain_text
- v2_XML
- v3_CSV
- v4_file_per_expression
- v5_property_graph
- v6_sqlite
- v7_pickle_web_interface

Each of these have required a rewrite of the code from scratch, as well as transfer code (to move from n to n+1).

These changes progress concurrently with my knowledge of data structures. I didn't know about property graphs when I was implementing v1, v2, and v3. I wasn't comfortable with SQL when I implemented v4. I didn't know about Tidy data when I implemented v1 to v6. The data structures used in the PDG slightly lag my understanding of data structures.

Within a given implementation, there are design decisions with trade-offs to evaluate. I typically don't know all the options or consequences until I implement one of them and then determine what inefficiencies exist. Knowledge gained through evolutionary iteration is expensive and takes a lot of time.

Here's an example of two "minor" tweaks that incur a rewrite of all the code. My current data structure in v7 is

dat['derivations'] = {

'fun deriv': { # name of derivation

'4928482': { # key is "step ID"

'inf rule': 'declare initial expr',

'inputs': {},

'feeds': {},

'outputs': {'9428': '4928923942'}, # key is "expr local ID", value is "expr global ID"

'linear index': 1}, # linear index for PDF and for graph orientation

'2948592': {

'inf rule': 'add X to both sides',

'inputs': {'9428': '4928923942'},

'feeds': {'3190': '9494829190'},

'outputs': {'3921': '9499959299'},

'linear index': 2},

A better data structure would be

dat['derivations'] = {

'fun deriv': { # name of derivation

'4928482': { # key is "step ID"

'inf rule': 'declare initial expr',

'inputs': {},

'feeds': {},

'outputs': {1: '9428'}, # key is index, value is "expr local ID"

'linear index': 1}, # linear index for PDF and for graph orientation

'2948592': {

'inf rule': 'add X to both sides',

'inputs': {1: '9428'},

'feeds': {1: '3190'},

'outputs': {1: '3921'},

'linear index': 2},

dat['expr local to global'] = {

'9428': '4928923942',

'3190': '9494829190',

'3921': '9499959299',

'9128': '1492842000'}

The reasons this second data structure is an improvement is

- the global expression ID does not appear in the 'derivations' dict
- the inputs, feeds, and outputs have an index. The index is relevant for both printing in a PDF and use in inference rules.

I'm slowly evolving towards the likelihood that there will be a "v8" based on tables. The backend database would be something like SQLite3, and the internal representation in Python would be dataframes.

I'm not going to switch to v8 yet; I'll continue to invest effort in v7 for a bit longer to explore a few challenges (like implementation of inference rules).