analyzing the text of Wikipedia posts

Published 2018-07-21T01:59:00.002Z by Physics Derivation Graph

In a previous post, an outline for analyzing Wikipedia content was described. In this post, I document a few initial observations about the data collected from Wikipedia.

Searching for "derivation" as a section marker means searching for "=== Derivation ===". There are other meanings to derivation, so sometimes the results include non-mathematical content like "=== Derivation and other names ===". To filter out irrelevant content, only sections with mathematical expressions (ie ":<math>") are relevant.

In addition to the text, there are potentially relevant images like
which has dimensions 813 × 570 pixels. Pictures with "derivation" in the name and dimensions greater than 300 x 300 might be relevant.

In the "derivation" section, lines that start with ":<math>" in the text are expressions. The closing bracket "</math>" may occur on a following line.