a grand vision for bulk .tex analysis
Published 2020-06-07T02:02:00.001Z by Physics Derivation Graph
Current plan for bulk .tex analysis and math extraction
- characterization and counting of .tex in arxiv
- anomaly detection, trie data structures of .tex in arxiv
- clean up latex to remove formatting indications (this can be handled in the grammar)
- The minimal regex is based on a threshold from the trie data structure.
Work is in progress to automate regex generation.
- use the regex to lex latex character stream into ASTs.
- parse ASTs for math syntax (e.g., into Sympy)
- check dimensionality of expressions using Sympy
- use inference rules to create steps that relate math expressions
- use Sympy to validate inference rule application