Claude Opus: Why spec-driven development matters more than model version
Same spec. Same coding harness. Only the model changed. Does Claude Opus 4.7 actually write better code than 4.6? I gave Claude Code the exact same Tetris spec twice — identical prompt, Extra High effort + Thinking on — and changed only the model version. Both runs passed every functional test. Then I scored both outputs across 13 non-functional code-quality dimensions and pressure-tested my own judgement with ESLint 9 and an AST walk. Final scoreboard: 9–3 (1 tie) in favor of Opus 4.7. But the quantitative pass surfaced three real regressions that the scoreboard hides. 🔎 What you'll see: • A controlled Opus 4.6 vs 4.7 code-quality comparison (not a benchmark run) • 13-dimension rubric with operational definitions so the grading is reproducible • Side-by-side code: naming, CSS variables, async nesting, structural layering • Static analysis where the metrics contradict the manual assessment (nesting, cyclomatic complexity, eqeqeq slip) • Where Opus 4.6 still wins — 'use strict', persistence module, input-guard wrapper • Caveats: n=1, one spec, one language, one domain 🔗 Links • Article write-up: https://www.linkedin.com/pulse/anthropic-opus-46-vs-47-which-better-code-quality-experiment-goh-uroxc • Code quality assessment rubric + static-analysis numbers: https://tetris.therayg.com/code-analysis/CodeQuality-4_6-vs-4_7.html If you work with Claude Code, Cursor, Kiro, or any coding agent — does this match what you're seeing on your own prompts? Drop a comment with what surprised you most. #ClaudeCode #ClaudeOpus #Opus47 #AICoding #Tetris
Download
0 formatsNo download links available.