Physics Derivation Graph navigation Sign in

characterizing Latex content in arXiv.org .tex files

Published 2020-05-31T02:23:00Z by Physics Derivation Graph

This characterization step will be useful when comparing domains. For example, if we sample another domain (e.g., quantum mechanics), are the distributions similar or not? If we see that the same characterization, then we can expect that the techniques you develop are likely to apply to a novel corpus.

Establishing that the sample being used is generic means we can work with a smaller data set (rather than "all the .tex in arXiv"). Showing the distribution shape does not change as more .tex files are added means convergence is possible.

If we find a domain that doesn't have a similar distributions, then we can investigate why it is anomalous.