Skip to content

Benchmarks

This directory collects benchmark results in a form that is comparable across machines, runs, and engines. Each file documents one benchmark: what was run, on what hardware, with what config, and how to reproduce it.

Per the runtime-boundaries doc, every result carries an engine tag:

  • mlx_atomistic — the project’s MLX/Metal runtime (product output)
  • openmm-reference — OpenMM, used as a reference ceiling, not a product path
  • lammps-reference — LAMMPS, used as a reference for GPU/OpenCL semantics

Filenames lead with the engine tag plus the platform and system, e.g. openmm-opencl-apoa1.md.

Each result file should answer, in order:

  1. Result table — ns/day (and any other primary metric) for each test, with one column per platform variant if applicable.
  2. Provenance — engine version, device, host, date, commit if relevant.
  3. Config — timestep, cutoff, constraints, precision, ensemble. Match OpenMM’s public benchmark config when comparing against openmm.org/benchmarks.
  4. Reproducer — exact shell command that regenerates the JSON, plus the path to the raw JSON output (kept under gitignored results/).
  5. External comparison — links to public reference numbers, with the same config caveats called out.
FileEngineSystemPlatformHost
inventory-gap-matrix.mdmlx_atomisticbenchmark inventory and Phase 3 gapsN/AN/A
benchmark-ladder.mdmlx_atomistic/openmm-reference/lammps-referencebenchmark ladder and row decision valueMetal/OpenCL where availablelocal
same-workload-comparison-matrix.mdmlx_atomistic/openmm-referenceplanned same-workload comparison pairsMetal/OpenCL where availablelocal
same-workload-openmm-comparison.mdmlx_atomistic/openmm-referencerefreshed controlled same-workload comparison reportMetal/OpenCL where availablelocal
same-workload-dhfr-stretch.mdmlx_atomistic/openmm-referenceDHFR stretch statusMetal/OpenCL where availablelocal
performance-audit-baseline.mdmlx_atomisticfast baseline audit and ranked backlogMetal/OpenCL where availablelocal
m5max-reference-engines.mdopenmm-reference/lammps-referenceM5 Max reference-engine manifest overviewOpenCLApple M5 Max
openmm-opencl-dhfr.mdopenmm-referenceDHFR (23k atoms)OpenCLApple M5 Max
openmm-opencl-apoa1.mdopenmm-referenceApoA1 (92k atoms)OpenCLApple M5 Max
openmm-opencl-amber20.mdopenmm-referenceCellulose (409k) + STMV (1.07M atoms)OpenCLApple M5 Max
lammps-opencl-m5max.mdlammps-referenceofficial LAMMPS five-case benchmark setOpenCLApple M5 Max

The inventory appears first. Result files are ordered by system size, smallest first, so the scaling story reads top-to-bottom.

Fast developer commands are routine local checks. They must not require OpenMM, LAMMPS, OpenCL, large downloaded fixtures, or committed raw outputs.

CommandEngineTierOutput
uv run pytest tests/test_benchmarks.py -qmlx_atomisticfast developerpytest stdout; temporary test files only
uv run python -m mlx_atomistic.benchmarks.mm_force_terms --evaluations 1 --particles 16 --jsonmlx_atomisticfast developernormalized JSON on stdout
uv run python -m mlx_atomistic.benchmarks.md_acceleration --sizes 16 --evaluations 1 --jsonmlx_atomisticfast developernormalized JSON on stdout
uv run python -m mlx_atomistic.benchmarks.md_performance --sizes 32 --steps 1 --sample-interval 1 --diagnostic-interval 1 --evaluation-interval 1 --jsonmlx_atomisticfast developernormalized JSON on stdout
uv run python -m mlx_atomistic.benchmarks.pme_performance --fixture-dir results/missing-pme-fixture --iterations 1 --warmups 0 --jsonmlx_atomisticfast developer blocked-path smokenormalized blocked JSON on stdout

Opt-in performance commands are non-CI and non-routine. They may need Apple Silicon/Metal, prepared fixtures, OpenMM/LAMMPS from the dev group, OpenCL, or downloaded inputs. Raw JSON/CSV belongs under gitignored results/; committed Markdown summaries should cite those raw paths and reproduction commands.

CommandEngineTierOutput
uv run python -m mlx_atomistic.benchmarks.md_performance --include-large --steps 100 --json > results/mlx-md-performance.jsonmlx_atomisticopt-in performanceraw JSON under results/
uv run python -m mlx_atomistic.benchmarks.md_acceleration --include-large --evaluations 10 --json > results/mlx-md-acceleration.jsonmlx_atomisticopt-in performanceraw JSON under results/
uv run python -m mlx_atomistic.benchmarks.pme_performance --out-dir results/pme-performance --jsonmlx_atomisticopt-in performanceraw JSON under results/pme-performance/
uv run python -m mlx_atomistic.benchmarks.dhfr --case dhfr-implicit --steps 1 --json > results/same-workload-openmm-comparison/mlx-dhfr-implicit.jsonmlx_atomisticopt-in runnable stretch smokenormalized runnable JSON under results/
uv run python -m mlx_atomistic.benchmarks.dhfr --case dhfr-explicit-pme --steps 1 --json > results/same-workload-openmm-comparison/mlx-dhfr-explicit-pme.jsonmlx_atomisticopt-in blocked-path stretchnormalized blocked JSON under results/ until PME neutrality policy is resolved
uv run python scripts/benchmark_openmm_opencl.py --platform OpenCL --particles 4096 --steps 1000 --json --csv results/openmm-opencl-synthetic.csv > results/openmm-opencl-synthetic.jsonopenmm-referenceopt-in referenceraw JSON/CSV under results/
uv run python scripts/benchmark_openmm_dhfr.py --case dhfr-implicit --platform Reference --steps 1 --json > results/same-workload-openmm-comparison/openmm-dhfr-implicit.jsonopenmm-referenceopt-in reference shape checkraw JSON under results/
uv run python scripts/benchmark_openmm_dhfr.py --case dhfr-explicit-pme --platform Reference --steps 1 --json > results/same-workload-openmm-comparison/openmm-dhfr-explicit-pme.jsonopenmm-referenceopt-in reference shape checkraw JSON under results/
uv run python scripts/benchmark_openmm_opencl.py --platform DefinitelyMissing --particles 16 --steps 1 --jsonopenmm-referencefast blocked-path smokenormalized blocked JSON on stdout
uv run python scripts/benchmark_lammps_opencl.py --particles 16 --steps 1 --jsonlammps-referenceopt-in reference / blocked-path smokenormalized JSON or blocked JSON on stdout
uv run python scripts/benchmark_m5max_reference.py environment --jsonopenmm-reference/lammps-referencereference environment probenormalized JSON on stdout
uv run python scripts/benchmark_m5max_reference.py openmm --dry-run --jsonopenmm-referenceopt-in reference command planraw path plan under results/m5max-reference/openmm/
uv run python scripts/benchmark_m5max_reference.py lammps --classify-only --jsonlammps-referenceopt-in official case classificationnormalized diagnostic JSON on stdout
uv run python scripts/benchmark_m5max_reference.py run --seconds 30 --jsonopenmm-reference/lammps-referencehost-only reference benchmark suiteraw manifest under results/m5max-reference/
uv run python scripts/benchmark_m5max_reference.py validate --manifest results/m5max-reference/manifest.json --jsonopenmm-reference/lammps-referencereference manifest validationvalidation JSON on stdout

Some benchmarks pull input data from upstream sources. The reproducer commands handle the download automatically. Downloaded data lands in results/inputs/ (gitignored), with a one-line provenance record in results/inputs/README.md. Re-running a reproducer is the recommended way to refresh; nothing in results/inputs/ needs to be committed.

Raw JSON/CSV produced by the benchmark scripts is written to results/, which is gitignored. The synthesized markdown report in this directory is the committed record; rerunning the reproducer should reproduce the JSON.

The existing OpenMM reports are committed normalized summaries over raw reference inputs. Their raw JSON files remain under gitignored results/ and may come from either this repository’s synthetic fail-soft script or OpenMM’s stock upstream benchmark script:

SummaryRaw reference inputNormalized fields
openmm-opencl-dhfr.mdresults/openmm-opencl-dhfr-m5max.json from vendors/openmm/examples/benchmarks/benchmark.pyengine, fixture/system, atom count, timing metric, runtime, hardware, raw output path
openmm-opencl-apoa1.mdresults/openmm-opencl-apoa1-m5max.json from vendors/openmm/examples/benchmarks/benchmark.pyengine, fixture/system, atom count, timing metric, runtime, hardware, raw output path
openmm-opencl-amber20.mdresults/openmm-opencl-amber20-m5max.json from vendors/openmm/examples/benchmarks/benchmark.pyengine, fixture/system, atom count, timing metric, runtime, hardware, raw output path
m5max-reference-engines.mdresults/m5max-reference/manifest.json from scripts/benchmark_m5max_reference.pyengine provenance, required case coverage, OpenMM rows, LAMMPS statuses, raw output paths
lammps-opencl-m5max.mdresults/m5max-reference/lammps/*.json from scripts/benchmark_m5max_reference.pyofficial input paths, style mapping, acceleration classification, loop time or blocker

scripts/benchmark_openmm_opencl.py and scripts/benchmark_lammps_opencl.py emit the shared normalized JSON schema directly. When the optional reference engine, OpenCL platform, fixture, or GPU support is unavailable, they return status: "blocked" with a concrete blocker instead of turning reference-engine availability into a routine test failure.