Claude Opus is ACTUALLY UNUSABLE
Claude Opus 4.6 has regressed badly โ and we have the benchmarks to prove it. In this video, we run custom tests designed to evaluate instruction following, opposite behavior, false completion, and destructive actions. The results? Claude Opus scored just 40% on its OWN benchmark, while ChatGPT 5.4 scored 63%+ on the same tests. We break down why Anthropic is likely throttling Opus intelligence to conserve compute for their upcoming Mythos release, why Claude Code keeps ignoring instructions and deleting files, and why we're migrating our entire workflow to GPT 5.4 and Kilo Code. ๐ฅ Check out our Community Website: https://boxminingai.com/ ๐ Join our Discord: https://discord.gg/dhXKCxz654 ๐๐ผ Zeabur Server: https://zeabur.com/ (Save $5 use code: boxmining) ๐๐ผ Minimax 10% Off: https://platform.minimax.io/subscribe/coding-plan?code=5GYCNOeSVQ&source=link ๐๐ผ Kimi AI: https://www.kimi.com/kimiplus/sale?activity_enter_method=h5_share&invitation_code=Y4JW7Y ๐๐ผ GLM Coding Plan: https://z.ai/subscribe?ic=WDHIPYBDSB ๐ Read more AI News: https://www.boxmining.com/ Partnership/Collaboration Email: [email protected] Chapters: 00:00 Opus level intelligence means nothing now 00:27 Building a custom benchmark for Claude Opus 01:38 Instruction following test results 03:18 Claude scores 40% โ GPT 5.4 scores 63% 04:42 The parachute meme & Claude's self-sacrifice 06:22 Claude Code can't follow its own plans 08:04 Why Anthropic is nerfing Opus 09:28 Community backlash & cancellations 10:48 Migrating everything to GPT 5.4 12:01 Final thoughts
Download
0 formatsNo download links available.