# a grand vision for bulk .tex analysis

Published 2020-06-07T02:02:00.001Z by Physics Derivation Graph

Current plan for bulk .tex analysis and math extraction
1. characterization and counting of .tex in arxiv
2. anomaly detection, trie data structures of .tex in arxiv
3. clean up latex to remove formatting indications (this can be handled in the grammar)
4. The minimal regex is based on a threshold from the trie data structure.
Work is in progress to automate regex generation.
5. use the regex to lex latex character stream into ASTs.
6. parse ASTs for math syntax (e.g., into Sympy)
7. check dimensionality of expressions using Sympy
8. use inference rules to create steps that relate math expressions
9. use Sympy to validate inference rule application