Same-Workload OpenMM Comparison
Date: 2026-05-23
Scope: refreshed controlled mlx_atomistic vs openmm-reference comparison.
This report only compares rows where workload and metric match. Rows marked
blocked, diagnostic, or deferred keep their raw evidence but do not get
performance ratios.
Summary
Section titled “Summary”| Pair id | Status | MLX result | OpenMM result | Ratio | Raw output |
|---|---|---|---|---|---|
lj-synthetic-loop | comparable | 37.8469 steps/s | 727.9128 steps/s | 19.2331 OpenMM/MLX throughput | results/same-workload-openmm-comparison/summary.json; results/same-workload-openmm-comparison/mlx-lj-synthetic-loop.json; results/same-workload-openmm-comparison/openmm-lj-synthetic-loop.json |
gbsa-obc-small | comparable | 3.3352 ms/eval | 0.001458 ms/eval | 0.0004372 OpenMM/MLX latency | results/same-workload-openmm-comparison/summary.json; results/same-workload-openmm-comparison/mlx-phase3-controlled.json; results/same-workload-openmm-comparison/openmm-gbsa-obc-small.json |
tip4p-ew-water | comparable | 6.0068 ms/eval | 0.0003339 ms/eval | 0.00005558 OpenMM/MLX latency | results/same-workload-openmm-comparison/summary.json; results/same-workload-openmm-comparison/mlx-phase3-controlled.json; results/same-workload-openmm-comparison/openmm-tip4p-ew-water.json |
dhfr-implicit | comparable | 0.3095 ns/day | 1.3136 ns/day | 4.2447 OpenMM/MLX throughput | results/same-workload-openmm-comparison/summary.json; results/same-workload-openmm-comparison/mlx-dhfr-implicit.json; results/same-workload-openmm-comparison/openmm-dhfr-implicit.json; same-workload-dhfr-stretch.md |
dhfr-explicit-pme | blocked | blocked: PME artifact must be neutral (net_charge=-11) | OpenMM Reference one-step row runs; OpenMM OpenCL context 752.5 ns/day | none | results/same-workload-openmm-comparison/mlx-dhfr-explicit-pme.json; results/same-workload-openmm-comparison/openmm-dhfr-explicit-pme.json; same-workload-dhfr-stretch.md |
Interpretation
Section titled “Interpretation”lj-synthetic-loop is a tiny controlled MD row: one synthetic LJ step, 32
atoms, matching atom count, matching step count, and steps/s on both sides.
In that narrow row, the OpenMM OpenCL throughput is 19.2331 times the MLX
throughput.
That is not a production-MD conclusion. It says the current tiny MLX full-loop path has lower throughput than OpenMM on this controlled smoke case. It does not say how MLX compares on DHFR, ApoA1, PME, or larger systems.
gbsa-obc-small is now a controlled latency row. The MLX row is the
obc_pair_accumulation_and_force case from
results/same-workload-openmm-comparison/mlx-phase3-controlled.json. The
summary keeps the logical pair id gbsa-obc-small; the per-pair
results/same-workload-openmm-comparison/mlx-gbsa-obc-small.json file is a row
extract that points back to the combined phase3 source. The OpenMM Reference row
reports status: "ok",
fixture: "gbsa_obc_small", ms_per_eval, and an obc_force_setup payload in
results/same-workload-openmm-comparison/openmm-gbsa-obc-small.json. Because
this is a latency metric, the lower OpenMM/MLX ratio means the OpenMM Reference
latency is lower for this controlled OBC operation.
tip4p-ew-water is also a controlled latency row. The MLX row is the
m_site_reconstruction case from
results/same-workload-openmm-comparison/mlx-phase3-controlled.json. The
summary keeps the logical pair id tip4p-ew-water; the per-pair
results/same-workload-openmm-comparison/mlx-tip4p-ew-water.json file is a row
extract that points back to the combined phase3 source. The OpenMM Reference row
reports status: "ok",
fixture: "tip4p_ew_water", operation_semantics: "virtual_site_reconstruction", and openmm_operation: "Context.computeVirtualSites" in
results/same-workload-openmm-comparison/openmm-tip4p-ew-water.json. Because
this is a latency metric, the lower OpenMM/MLX ratio means the OpenMM Reference
latency is lower for this controlled virtual-site reconstruction operation.
dhfr-implicit is now a real one-step same-workload smoke row. The MLX side
loads an OpenMM-derived DHFR GBSA/OBC artifact, builds MLX force terms, and runs
one bounded NVT step at 0.004 ps. The OpenMM Reference side runs the matching
one-step implicit GBSA/OBC row. Because both rows are ok, use the ratio as a
narrow reference comparison for this smoke workload only.
dhfr-explicit-pme remains blocked on the MLX side for a concrete artifact
reason: the Amber20/JAC PME artifact is charged (net_charge=-11) and the
current PME artifact policy requires neutral systems. The old AMBER 10-12
blocker is no longer the active blocker for this local input. The comparison
helper correctly suppresses the explicit PME ratio while keeping the OpenMM
Reference raw row.
No row in this report is currently labeled diagnostic or deferred. The
benchmark ladder still uses those labels for rows whose operation semantics,
metric family, or reference mapping are not yet controlled enough for a ratio.
Reproducer
Section titled “Reproducer”Raw controlled outputs:
UV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python -m mlx_atomistic.benchmarks.md_performance --sizes 32 --steps 1 --sample-interval 1 --diagnostic-interval 1 --evaluation-interval 1 --json > results/same-workload-openmm-comparison/mlx-lj-synthetic-loop.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python scripts/benchmark_openmm_opencl.py --platform OpenCL --particles 32 --steps 1 --warmup-steps 0 --spacing-nm 1.0 --json > results/same-workload-openmm-comparison/openmm-lj-synthetic-loop.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python -m mlx_atomistic.benchmarks.phase3_physics --evaluations 1 --waters 1 --atoms 4 --replica-steps 1 --json > results/same-workload-openmm-comparison/mlx-phase3-controlled.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python scripts/benchmark_openmm_opencl.py --case gbsa-obc-small --platform Reference --particles 4 --steps 1 --json > results/same-workload-openmm-comparison/openmm-gbsa-obc-small.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python scripts/benchmark_openmm_opencl.py --case tip4p-ew-water --platform Reference --particles 4 --steps 1 --json > results/same-workload-openmm-comparison/openmm-tip4p-ew-water.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python -m mlx_atomistic.benchmarks.dhfr --case dhfr-implicit --steps 1 --json > results/same-workload-openmm-comparison/mlx-dhfr-implicit.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python -m mlx_atomistic.benchmarks.dhfr --case dhfr-explicit-pme --steps 1 --json > results/same-workload-openmm-comparison/mlx-dhfr-explicit-pme.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python scripts/benchmark_openmm_dhfr.py --case dhfr-implicit --platform Reference --steps 1 --json > results/same-workload-openmm-comparison/openmm-dhfr-implicit.jsonUV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python scripts/benchmark_openmm_dhfr.py --case dhfr-explicit-pme --platform Reference --steps 1 --json > results/same-workload-openmm-comparison/openmm-dhfr-explicit-pme.jsonComparison summary:
UV_CACHE_DIR=/tmp/mlx-atomistic-uv-cache uv run python -m mlx_atomistic.benchmarks.same_workload_compare --mlx-json results/same-workload-openmm-comparison/mlx-lj-synthetic-loop.json --mlx-json results/same-workload-openmm-comparison/mlx-phase3-controlled.json --mlx-json results/same-workload-openmm-comparison/mlx-dhfr-implicit.json --mlx-json results/same-workload-openmm-comparison/mlx-dhfr-explicit-pme.json --openmm-json results/same-workload-openmm-comparison/openmm-lj-synthetic-loop.json --openmm-json results/same-workload-openmm-comparison/openmm-gbsa-obc-small.json --openmm-json results/same-workload-openmm-comparison/openmm-tip4p-ew-water.json --openmm-json results/same-workload-openmm-comparison/openmm-dhfr-implicit.json --openmm-json results/same-workload-openmm-comparison/openmm-dhfr-explicit-pme.json --out results/same-workload-openmm-comparison/summary.jsonThese commands require local MLX/Metal access for MLX rows and OpenMM/OpenCL or
OpenMM Reference availability for OpenMM rows. A sandbox atexit warning from
MLX/Metal cleanup after the summary command is not a benchmark failure when the
summary JSON is written with status: "ok".
Next Optimization Target
Section titled “Next Optimization Target”The next optimization target should be MLX TIP4P-Ew virtual-site
reconstruction, followed by GBSA/OBC force evaluation if the TIP4P path does
not explain the shared latency overhead. This is based on measured comparable
rows only: tip4p-ew-water has the largest MLX latency in the refreshed
controlled summary, and both tip4p-ew-water and gbsa-obc-small use
controlled OpenMM Reference latency rows.
The lj-synthetic-loop result also shows a measured tiny full-loop throughput
gap, so force evaluation, reporting cadence, and synchronization remain valid
diagnostic follow-ups for that row. dhfr-implicit is now runnable, but its
one-step row should guide runtime-path hardening before it guides broad
performance optimization claims. dhfr-explicit-pme still needs a neutral PME
artifact policy before it can guide optimization.