Skip to content

Performance Audit Baseline

Date: 2026-05-22

This report is the committed Slice 6 audit summary for 2026-05-22-performance-audit-harness-hardening. Raw JSON is gitignored under results/performance-audit-harness-hardening/.

BenchmarkEngineCommandRaw outputStatus
force-term microbenchmarksmlx_atomisticuv run python -m mlx_atomistic.benchmarks.mm_force_terms --evaluations 1 --particles 16 --jsonresults/performance-audit-harness-hardening/mm-force-terms-fast.jsonok
nonbonded acceleration splitmlx_atomisticuv run python -m mlx_atomistic.benchmarks.md_acceleration --sizes 16 --evaluations 1 --jsonresults/performance-audit-harness-hardening/md-acceleration-fast.jsonok
full MD smokemlx_atomisticuv run python -m mlx_atomistic.benchmarks.md_performance --sizes 32 --steps 1 --sample-interval 1 --diagnostic-interval 1 --evaluation-interval 1 --jsonresults/performance-audit-harness-hardening/md-performance-fast.jsonok
Phase 3 physics smokemlx_atomisticuv run python -m mlx_atomistic.benchmarks.phase3_physics --evaluations 1 --waters 1 --atoms 4 --replica-steps 1 --jsonresults/performance-audit-harness-hardening/phase3-physics-fast.jsonok
PME missing-fixture blocked smokemlx_atomisticuv run python -m mlx_atomistic.benchmarks.pme_performance --fixture-dir results/missing-pme-fixture --iterations 1 --warmups 0 --jsonresults/performance-audit-harness-hardening/pme-blocked-fast.jsonblocked
OpenMM unavailable-platform smokeopenmm-referenceuv run python scripts/benchmark_openmm_opencl.py --platform DefinitelyMissing --particles 16 --steps 1 --jsonresults/performance-audit-harness-hardening/openmm-blocked-fast.jsonblocked
LAMMPS OpenCL smokelammps-referenceuv run python scripts/benchmark_lammps_opencl.py --particles 16 --steps 1 --jsonresults/performance-audit-harness-hardening/lammps-fast.jsonok
RowMetricValueEvidence
full MD synthetic LJ, dense backendsteps_per_s53.073md-performance-fast.json
full MD force evaluationforce_eval_ms_per_step0.221md-performance-fast.json
nonbonded mlx_tiledms_per_eval0.521md-acceleration-fast.json
nonbonded mlx_densems_per_eval0.855md-acceleration-fast.json
nonbonded mlx_pairs force evalms_per_eval0.864md-acceleration-fast.json
mlx_pairs neighbor buildneighbor_build_ms_per_eval2.423md-acceleration-fast.json
python_neighbor total evalms_per_eval2.376md-acceleration-fast.json
Phase 3 replica exchangems_per_eval9.552phase3-physics-fast.json
GBSA/OBC energy and forcesms_per_eval7.791phase3-physics-fast.json
TIP4P-Ew M-site reconstructionms_per_eval5.526phase3-physics-fast.json
soft-core lambda gridms_per_eval3.389phase3-physics-fast.json
virtual-site force redistributionms_per_eval1.125phase3-physics-fast.json
LAMMPS synthetic OpenCL smokesteps_per_s700.076lammps-fast.json

The reference-engine rows are context only. The OpenMM row intentionally uses a missing platform and proves fail-soft behavior; the LAMMPS row is a tiny synthetic smoke run and is not an apples-to-apples production target.

  1. Replica-exchange runtime materialization and serial replica execution. Evidence: phase3-physics-fast.json reports two_replica_temperature_exchange at 9.552 ms/eval, the slowest fast row, with history_materialization_count: 8. Reproducer: uv run python -m mlx_atomistic.benchmarks.phase3_physics --evaluations 1 --waters 1 --atoms 4 --replica-steps 1 --json.

  2. GBSA/OBC force evaluation. Evidence: phase3-physics-fast.json reports gbsa_obc_energy_forces at 7.791 ms/eval for only four atoms, while the surface-area term is 0.903 ms/eval. Reproducer: same Phase 3 command above.

  3. TIP4P-Ew virtual-site reconstruction and advanced-water overhead. Evidence: phase3-physics-fast.json reports TIP4P-Ew reconstruction at 5.526 ms/eval; mm-force-terms-fast.json reports synchronized tip4p-ew-reconstruct at 1.158 ms/eval and force redistribution at 0.866 ms/eval. Reproducer: uv run python -m mlx_atomistic.benchmarks.mm_force_terms --evaluations 1 --particles 16 --json.

  4. Neighbor-list build and pair-compaction overhead. Evidence: md-acceleration-fast.json reports mlx_pairs neighbor_build_ms_per_eval: 2.423, larger than its force eval row (0.864 ms/eval) and the dense/tiled diagnostic rows at this small size. Reproducer: uv run python -m mlx_atomistic.benchmarks.md_acceleration --sizes 16 --evaluations 1 --json.

  5. Full-loop MD synchronization/cadence path. Evidence: md-performance-fast.json reports 53.073 steps/s for a one-step smoke with force_eval_ms_per_step: 0.221; this row should be rerun at opt-in sizes before any custom-kernel work. Reproducer: uv run python -m mlx_atomistic.benchmarks.md_performance --sizes 32 --steps 1 --sample-interval 1 --diagnostic-interval 1 --evaluation-interval 1 --json.

The next optimization spec should target replica-exchange and GBSA/OBC Phase 3 overhead first, then neighbor-list build/compaction. Custom Metal kernel work remains deferred until opt-in larger-system runs reproduce these rankings beyond fast synthetic probes.