Welcome to the Physics Derivation Graph

a differnet approach to generating content for the Physics Derivation Graph

Published 2018-07-21T00:41:00Z by Physics Derivation Graph

Wikipedia

Side note

Between any two adjacent expressions in your data set, there are likely a bunch of missing steps.

Suppose all the expressions were present. Even in that situation, the inference rules are missing. Filling in these is a big challenge.

To address these challenges, text analysis would be useful. Suppose the sequence is

text1
expression1
text2
expression2
text3
expression3
text4

There are a few distinct categories of text to analyze:

s1 = the last two sentences in "text1" proceeding "expression1"
s(i) = if text2 and text3 are short (ie a few sentences), then they are potential inference rules
s(j) = if text2 and text3 are longer than a few sentences, then probably the two sentences following an expression and the two sentences proceeding an expression are relevant
sf = the first two sentences of the "text4" which is text after the last expression.

We now have 1000 instances of "s1" sentences. In this "s1" data set, what's the most common word? What's the most common two word phrase? What's the most common three word phrase? If there are things that look like inference rules, that would be interesting. I doubt that "declare initial expression" will appear, but some consistency would be validating.

Similarly, run the same word and phrase frequency analysis for the 1000 "sf" sentences. Also apply to each of "s(i)" and "s(j)."