Large language models have already shown strong performance on full-stack software projects, especially in TypeScript and Python.
What’s been tested far less is how these systems perform in scientific computing, where correctness depends on numerical stability, physical constraints, and careful validation - not just code that compiles.
In this video, I benchmark Claude Code by asking it to build a C++ solver for the two-dimensional heat equation, live. I guide the model the way a senior researcher would guide a junior engineer, but the numerical decisions are its own.
We evaluate the result using standard numerical-methods criteria:
stability under time-step changes
correctness of boundary conditions
comparison against analytical solutions
and predictable failure modes
This is not a demo of code generation — it’s a test of scientific reasoning.