- s1 = the last two sentences in "text1" proceeding "expression1"
- s(i) = if text2 and text3 are short (ie a few sentences), then they are potential inference rules
- s(j) = if text2 and text3 are longer than a few sentences, then probably the two sentences following an expression and the two sentences proceeding an expression are relevant
- sf = the first two sentences of the "text4" which is text after the last expression.
We now have 1000 instances of "s1" sentences. In this "s1" data set, what's the most common word? What's the most common two word phrase? What's the most common three word phrase? If there are things that look like inference rules, that would be interesting. I doubt that "declare initial expression" will appear, but some consistency would be validating.