Sequence of Prompts for LLM to convert wikipedia or PDF content to Latex for Physics Derivation Graph
First published 2026-03-14
Convert derivation to Latex document
Example: Simple Harmonic Oscillator
The output of the sequence of prompts below is DERIVATION_FILE.tex.
Assuming the initial material is on Wikipedia,
Prompt for Initial conversion to Latex
(# )
Copy
You are an expert technical typesetter and Theoretical Physicist. Your task is to transform the provided text
(sourced from Wikipedia) into a formal, pedagogically structured LaTeX document. The audience isn't familiar with the math operations, so be pedantic. Make the logic as explicit as feasible even though that increases the length of the document. Explain each step since the audience lacks the experience and insight that you have.
Objective: Convert the mathematical physics derivation into a sequence of logical steps as a single Latex document. A derivation consists of a sequence of steps where each equation relates to others via specific mathematical operations
(e.g., substitution, differentiation, algebraic rearrangement).
Derivation Specifications:
- To help the reader understand what transformation is being applied, explicitly state the mathematical operations performed between equations (e.g., "Substituting Eq.~\\ref{x} into Eq.~\\ref{y} yields...").
- If the source text implies a step that is mathematically non-trivial (like equating coefficients or using a trigonometric identity), explicitly break that out into a sub-step with its own equation and label.
- In addition to explicitly stating the transformations being applied between equations, guide the reader through the derivation.
Style
- focus on the mathematical and physical veracity
- uses declarative statements
- guide the reader through a logical sequence of ideas to reach a conclusion.
- use Scientific Impersonal Style, also known as Technical Expository Prose. Use Mathematical Imperatives to direct the reader's attention. The Impersonal Imperative should be used.
- be methodical and explicit
- be Precise and Technical
Document Specifications:
- use the document class and preamble provided below
- Strict ASCII Encoding: The entire .tex file must be ASCII. Do not use Unicode characters. For Greek letters, operators, or special symbols, use standard LaTeX commands.
- Every mathematical equation must be placed in a numbered `equation` environment.
- Use `\label{...}` for every equation.
- Labeling Convention: Labels must be unique, descriptive of the equation's physical or mathematical role, and contain no spaces. Use only lowercase letters and underscores
- Every equation should have a single relation (e.g., `=`) separating the left-hand side (LHS) from the right-hand side (RHS). If the source material uses short-hand of multiple relations to indicate a sequence of steps, break those into separate equations that each have a single relation.
Here is the starting point for the Latex file:
```
\documentclass{article}
\usepackage{amsmath}
\usepackage{amssymb}
% margins of 1 inch:
\setlength{\topmargin}{-.5in}
\setlength{\textheight}{9in}
\setlength{\oddsidemargin}{0in}
\setlength{\textwidth}{6.5in}
\usepackage[pdftex]{hyperref} % hyperlink equation and bibliographic citations
\author{Ben Payne, with Gemini 3 Flash}
\title{DERIVATION NAME HERE}
\begin{document}
\maketitle
\begin{abstract}
DESCRIPTION OF DERIVATION HERE
\end{abstract}
DERIVATION STEPS HERE
\end{document}
```
And here is the wikipedia text to be converted:
```
```
For an example use of the above prompt, see this comment .
Prompt for more steps
A derivation in mathematical Physics consists of a sequence of steps. Each step relates mathematical equations.
Every equation should have a label that is descriptive.
Sometimes a previous expression is referenced implicitly. Edit the text to include references to labels where appropriate. Reference the relevant equation labels so that Latex can compile the document.
Modify the Latex content by adding explanatory text about each step of the derivation. Document the mathematical transformations that relate each equation in the file to other equations in the file.
Optional (as needed):
(optional) Prompt for Splitting grouped expressions
A derivation in mathematical Physics consists of a sequence of steps. Each step relates mathematical equations.
In the Latex file below, for equations that have multiple instances of the equals sign, separate those into multiple equations such that each equation only has one use of the equals sign.
Write out the modified content as a single Latex file.
(optional) Prompt for more explanations
A derivation in mathematical Physics consists of a sequence of steps. Each step relates mathematical equations.
In the Latex file below, where there are two equations with no explanatory text between them, modify the Latex file by adding explanatory text to document the mathematical transformations that relate each equation in the file to other equations in the file.
Use the Latex equation labels to reference equations.
Occasionally manual review identifies gaps in explanation. Then additional prompting might be necessary. For example,
(optional) Prompt for more details
In the following text, which trigonometric identities were used? Explain the missing steps.
(optional) Prompt for more details
what does this latex expression mean by `arg`?
```
\varphi = \arg(c_1 + c_2i)
```
Now DERIVATION_FILE.tex should be as complete and as detailed as possible.
There are four actions to take with this .tex file:
Convert DERIVATION_FILE.tex to symbols.json
Example: symbols for Simple Harmonic Oscillator
Prompt for list of symbols
(# )
Copy
The latex file contains equations and symbols. Provide a list, formatted as JSON, of every unique symbol and a description of that symbol.
For each entry in the JSON list include a list of references to the labeled equations where each symbol is used.
For each symbol categorize the symbol as scalar, vector, or matrix
For each scalar symbol, categorize the symbol as variable or a constant.
For each scalar symbol, categorize the values the scalar can take as "real", "complex", "integer", or "arbitrary"
For each scalar symbol, categorize the scalar as "positive", "negative", "non-negative", or "any"
For each scalar symbol there are 7 dimensionality measurements: mass, time, length, temperature, electric charge, amount of substance, luminous intensity.
If the scalar is dimensionless, then the value for each of the 7 measures is zero.
If the scalar has non-zero dimensions, explicitly state the integer value of the dimensions.
Write out just the JSON list as your answer.
The output should comply with this schema:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Physics Symbol Definition Set",
"type": "array",
"items": {
"type": "object",
"required": [
"variable",
"description",
"references",
"category_type",
"scalar_type",
"value_type",
"sign_type",
"dimensionality"
],
"properties": {
"variable": {
"type": "string",
"description": "The LaTeX representation of the physical variable."
},
"description": {
"type": "string",
"description": "A human-readable explanation of the variable."
},
"references": {
"type": "array",
"items": { "type": "string" },
"description": "List of internal IDs or equation slugs where this variable is used."
},
"category_type": {
"type": "string",
"enum": ["scalar", "vector", "tensor"],
"description": "The mathematical nature of the quantity."
},
"scalar_type": {
"type": "string",
"enum": ["variable", "constant"],
"description": "Whether the value changes within the system context."
},
"value_type": {
"type": "string",
"enum": ["real", "complex", "integer"],
"description": "the numerical set the value belongs to."
},
"sign_type": {
"type": "string",
"enum": ["any", "positive", "negative", "non-negative", "non-positive"],
"description": "The physical constraints on the sign of the value."
},
"dimensionality": {
"type": "object",
"description": "The SI base dimensions exponents.",
"additionalProperties": false,
"required": [
"mass",
"time",
"length",
"temperature",
"electric_charge",
"amount_of_substance",
"luminous_intensity"
],
"properties": {
"mass": { "type": "integer" },
"time": { "type": "integer" },
"length": { "type": "integer" },
"temperature": { "type": "integer" },
"electric_charge": { "type": "integer" },
"amount_of_substance": { "type": "integer" },
"luminous_intensity": { "type": "integer" }
}
}
}
}
}
```
To get scalar symbols in PDG use
curl --silent --insecure https://localhost/api/v1/resources/symbol/scalars | python3 -c "
import sys, json
data = json.load(sys.stdin)
data.pop('_links', None)
if '_embedded' in data and 'scalar_symbols' in data['_embedded']:
for entry in data['_embedded']['scalar_symbols']:
entry.pop('_links', None)
entry.pop('author_name_latex', None)
print(json.dumps(data, indent=2))
"
Prompt for scalar symbol comparison: missing and matching
(# )
Copy
I have two input data sets:
- a set of symbols found in the derivation of the DERIVATION_NAME_HERE.
- a set of symbols in a database for physics derivations. Each symbol has a unique ID number.
The task is to create two JSON files: one for matches where symbols are in both inputs, and another JSON file for symbols in the DERIVATION_NAME_HERE derivation that are not in the physics derivations database. All symbols in the DERIVATION_NAME_HERE derivation should end up in one of the two JSON output files.
For the JSON that captures the matches (`matches.json`), for each match indicate
- the symbol Latex
- the ID from the database
- the list of equation labels from the derivation
- the description
- explanation of why the symbol is a match
- confidence level of the match
Here is the schema for `matches.json`:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Physics Symbol Matches Schema",
"type": "array",
"items": {
"type": "object",
"required": [
"symbol_latex",
"db_id",
"description",
"explanation",
"confidence"
],
"properties": {
"symbol_latex": {
"type": "string",
"minLength": 1,
"description": "The LaTeX representation of the symbol from the derivation."
},
"db_id": {
"type": "string",
"pattern": "^[0-9]+$",
"minLength": 8,
"description": "The unique numerical ID from the physics database."
},
"equation_labels": {
"type": "array",
"description": "equations in the derivation that use the symbol"
}
"description": {
"type": "string",
"minLength": 1,
"description": "A description of the symbol's role in the derivation."
},
"explanation": {
"type": "string",
"minLength": 1,
"description": "Justification for why this derivation symbol matches the database entry."
},
"confidence": {
"type": "string",
"enum": ["high", "medium", "low"],
"description": "The certainty level of the match."
}
}
}
}
```
For the JSON that captures the missing symbols (`missing.json`), for each symbol indicate
- the symbol Latex
- the description
- comments about symbols in the database that might be considered adjacent and what the ID number is
Here is the schema for `missing.json`:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Missing Physics Symbols Schema",
"type": "array",
"items": {
"type": "object",
"required": [
"symbol_latex",
"description",
"comments"
],
"properties": {
"symbol_latex": {
"type": "string",
"minLength": 1,
"description": "The LaTeX representation of the symbol used in the derivation."
},
"description": {
"type": "string",
"minLength": 1,
"description": "A description of what the symbol represents in the derivation."
},
"comments": {
"type": "string",
"minLength": 1,
"description": "Notes regarding nearby symbols in the database or why a match was omitted."
}
}
}
}
```
(Manually) input symbols that didn't match using web UI to generate new IDs; then re-run search for matches.
TODO: Vectors - extract from derivation
TODO: Vectors: compare extracted to PDG DB
TODO: matrices - extract from derivation
TODO: matrices: compare extracted to PDG DB
Convert DERIVATION_FILE.tex to operations.json
Example: operations in Simple Harmonic Oscillator
Prompt for tex to operations.json
(# )
Copy
An operator exists on the left-hand size (LHS) or right-hand side (RHS) of a mathematical expression.
An operator acts on variables and constants.
Examples include addition (`+`), subtraction (`-`), multiplication, division, integration, summation, trig functions.
For each expression identify operations used.
Write the result as a single JSON file with the following entries per operation:
- latex for operator. (Can be null if the operator is implicit)
- name of operator as text
- number of arguments for the operation
- equation labels in which the operation is used
The output should comply with this schema:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Mathematical Operators List",
"type": "array",
"items": {
"type": "object",
"required": ["latex", "name", "number_of_arguments", "labels"],
"properties": {
"latex": {
"description": "The LaTeX representation of the operator. Can be null for implicit operations.",
"type": ["string", "null"]
},
"name": {
"description": "The human-readable name of the operation.",
"type": "string"
},
"number_of_arguments": {
"description": "The number of arguments the operator takes.",
"type": "integer",
"minimum": 0
},
"labels": {
"description": "A list of equation identifiers where this operator is used.",
"type": "array",
"items": {
"type": "string"
}
}
},
"additionalProperties": false
}
}
```
Output is operations.json
To get operation symbols in PDG use
curl --silent --insecure https://localhost/api/v1/resources/symbol/operations | python3 -c "
import sys, json
data = json.load(sys.stdin)
data.pop('_links', None)
if '_embedded' in data and 'operation_symbols' in data['_embedded']:
for entry in data['_embedded']['operation_symbols']:
entry.pop('_links', None)
entry.pop('author_name_latex', None)
print(json.dumps(data, indent=2))
"
Prompt for comparing operations in PDG DB
(# )
Copy
I have two input data sets:
- a set of operations found in the derivation of the DERIVATION_NAME_HERE. Each operation is associated with one or more labels.
- a set of operations in a database for physics derivations. Each operation has a unique ID number.
The task is to create two JSON files: one for matches where operations are in both inputs, and another JSON file for operations in the DERIVATION_NAME_HERE derivation that are not in the physics derivations database. All operations in the DERIVATION_NAME_HERE derivation should end up in one of the two JSON output files.
For the JSON that captures the matches (`matches.json`), for each match indicate
- the operation Latex
- the ID from the database
- the list of equation labels from the derivation
- the description
- explanation of why the operation is a match
- confidence level of the match
Here is the schema for `matches.json`:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Physics operation Matches Schema",
"type": "array",
"items": {
"type": "object",
"required": [
"operation_latex",
"db_id",
"description",
"explanation",
"confidence"
],
"properties": {
"operation_latex": {
"type": "string",
"minLength": 1,
"description": "The LaTeX representation of the operation from the derivation."
},
"db_id": {
"type": "string",
"pattern": "^[0-9]+$",
"minLength": 8,
"description": "The unique numerical ID from the physics database."
},
"equation_labels": {
"type": "array",
"description": "equations in the derivation that use the operation"
}
"description": {
"type": "string",
"minLength": 1,
"description": "A description of the operation's role in the derivation."
},
"explanation": {
"type": "string",
"minLength": 1,
"description": "Justification for why this derivation operation matches the database entry."
},
"confidence": {
"type": "string",
"enum": ["high", "medium", "low"],
"description": "The certainty level of the match."
}
}
}
}
```
For the JSON that captures the missing operations (`missing.json`), for each operation indicate
- the operation Latex
- the description
- comments about operations in the database that might be considered adjacent and what the ID number is
Here is the schema for `missing.json`:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Missing Physics operations Schema",
"type": "array",
"items": {
"type": "object",
"required": [
"operation_latex",
"description",
"comments"
],
"properties": {
"operation_latex": {
"type": "string",
"minLength": 1,
"description": "The LaTeX representation of the operation used in the derivation."
},
"description": {
"type": "string",
"minLength": 1,
"description": "A description of what the operation represents in the derivation."
},
"comments": {
"type": "string",
"minLength": 1,
"description": "Notes regarding nearby operations in the database or why a match was omitted."
}
}
}
}
```
Convert DERIVATION_FILE.tex to expressions.json
Example: expressions in Simple Harmonic Oscillator
Prompt for expressions.json: split LHS, RHS; add Sympy
(# )
Copy
Identify each expression in this document.
For each expression, identify the left-hand site (LHS), right-hand size (RHS), and the relation (e.g., `=`).
Sometime expressions have conditional applicability, like `v << c`. If there is a condition on the expression, determine what that is.
Write out the LHS, relation, RHS, a condition if applicable, and the expression's label as JSON.
For each LHS and RHS in this JSON, what would the SymPy representation be?
Add the SymPy for each side of every expression to the JSON.
The output should comply with this schema:
```
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "EquationSet",
"type": "array",
"items": {
"type": "object",
"required": [
"LHS",
"relation",
"RHS",
"condition",
"label",
"sympy_LHS",
"sympy_RHS"
],
"properties": {
"LHS": {
"type": "string",
"minLength": 1,
"description": "The Left Hand Side of the equation in LaTeX."
},
"relation": {
"type": "string",
"minLength": 1,
"description": "The mathematical operator connecting LHS and RHS (usually '=')."
},
"RHS": {
"type": "string",
"minLength": 1,
"description": "The Right Hand Side of the equation in LaTeX."
},
"condition": {
"type": ["string", "null"],
"description": "Specific constraints or values (e.g., t=0) under which the equation holds."
},
"label": {
"type": "string",
"minLength": 1,
"description": "A unique identifier or name for the physics/math law."
},
"sympy_LHS": {
"type": "string",
"minLength": 1,
"description": "SymPy-compatible string representation of the LHS."
},
"sympy_RHS": {
"type": "string",
"minLength": 1,
"description": "SymPy-compatible string representation of the RHS."
}
},
"additionalProperties": false
}
}
```
To validate output expressions.json against the schema,
import json
from jsonschema import validate
schema = { ... } # The schema
data = [ ... ] # JSON data
# This will raise a ValidationError if the data is inconsistent
validate(instance=data, schema=schema)
print("JSON is valid!")
To get expressions in PDG use
curl --silent --insecure https://localhost/api/v1/resources/expressions | python3 -c "
import sys, json
data = json.load(sys.stdin)
data.pop('_links', None)
if '_embedded' in data and 'expressions' in data['_embedded']:
for entry in data['_embedded']['expressions']:
entry.pop('_links', None)
entry.pop('author_name_latex', None)
print(json.dumps(data, indent=2))
"
Prompt for comparison of expressions with PDG: matches and missing
(# )
Copy
I have two input data sets:
- a set of expressions used in the derivation of the DERIVATION_NAME_HERE. Each expression has a unique label.
- a set of expressions in a database for physics derivations. Each expression has a unique ID number.
The task is to create two JSON files: one for matches where expressions are in both input JSON files, and another JSON file for expressions in the DERIVATION_NAME_HERE derivation that are not in the physics derivations database. All expressions in the DERIVATION_NAME_HERE derivation should end up in one of the two JSON output files.
For the JSON that captures the matches (`matches.json`), for each matching expression indicate
- label for Latex expression from derivation
- Latex expression from derivation
- Latex expression from database
- database's ID number for the expression
- comments about the reason for the expected similarity
- degree of confidence
The `matches.json` should comply with this schema:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "array",
"items": {
"type": "object",
"required": [
"label",
"original_latex",
"latex_from_database",
"id",
"comments",
"confidence_level"
],
"properties": {
"label": {
"type": "string",
"minLength": 1,
"description": "A unique name for the equation."
},
"original_latex": {
"type": "string",
"minLength": 1,
"description": "The LaTeX representation of the equation provided by the user."
},
"latex_from_database": {
"type": "string",
"minLength": 1,
"description": "The LaTeX representation retrieved from the database."
},
"id": {
"type": "string",
"pattern": "^[0-9]+$",
"minLength": 8,
"description": "A numeric ID represented as a string."
},
"comments": {
"type": "string",
"description": "A textual description or metadata about the equation."
},
"confidence_level": {
"type": "string",
"description": "self-assessment"
}
},
"additionalProperties": false
}
}
```
For the JSON that captures the missing expressions (`missing.json`), for each expression indicate
- latex for the expression
- label of the equation
The `missing.json` should comply with this schema:
```
```
Output is expressions.json
(Manually) input expressions that didn't match using web UI to generate new IDs; then re-run search for matches.
TODO: associate symbol IDs with each expression
Prompt for associating symbols with expressions
#
TODO: associate operation IDs with each expression
Prompt for associating operators with expressions
#
TODO: for each expression rewrite SymPy using PDG IDs
Prompt for rewriting expression SymPy using PDG IDs
#
Convert DERIVATION_FILE.tex to steps.json
Example: steps in Simple Harmonic Oscillator
Prompt for steps.json
(# )
Copy
Create JSON describing each step in the derivation. Specify
- input expression(s),
- output expression(s)
- transform being applied that relates the inputs and outputs
The first time an expression is used in the table show both the latex and the equation label. When later steps use that same expression show just the equation label.
The output should comply with this schema:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Mathematical Derivation Steps",
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"step",
"input_expressions",
"output_expressions",
"transform_relation"
],
"properties": {
"step": {
"type": "integer",
"description": "The sequential number of the derivation step.",
"minimum": 1
},
"input_expressions": {
"type": "array",
"description": "A list of prerequisite expressions or labels used in this step.",
"items": {
"type": "string"
}
},
"output_expressions": {
"type": "array",
"description": "A list of expressions resulting from this step, often including a label.",
"items": {
"type": "string"
}
},
"transform_relation": {
"type": "string",
"minLength": 1,
"description": "A description of the mathematical operation or logical transformation performed."
}
}
}
}
```
Output: steps.json
TODO: replace labels with PDG IDs for expressions.
Rendered in 0.002 seconds using Flask