a grand vision for bulk .tex analysis

Published 2020-06-07T02:02:00.001Z by Physics Derivation Graph

Current plan for bulk .tex analysis and math extraction

characterization and counting of .tex in arxiv
anomaly detection, trie data structures of .tex in arxiv
clean up latex to remove formatting indications (this can be handled in the grammar)
The minimal regex is based on a threshold from the trie data structure.
Work is in progress to automate regex generation.
use the regex to lex latex character stream into ASTs.
parse ASTs for math syntax (e.g., into Sympy)
check dimensionality of expressions using Sympy
use inference rules to create steps that relate math expressions
use Sympy to validate inference rule application