Güncelleme çok daha güçlü bir beanchmark dataseti ile sağlanan test API'sinde karşılaştırmalı bir test yaptım sonuç önceki bilgilendirmede basit bir test ile sınanmıştı.
> **Tested:** March 7, 2026
> **Benchmark version:** Hybrid scoring — deterministic (65pts) + LLM judge (40pts)
> **Threshold:** 85/100 = CONFIRMED Opus 4.6
---
## Reference Baselines (Official Channels)
| Provider | Model | Score | Verdict |
|-----------------------------|---------------------------|:------:|:---------:|
| Anthropic Direct API | claude-opus-4-6 | 93/100 | CONFIRMED |
| OpenRouter (Anthropic) | anthropic/claude-opus-4.6 | 94/100 | CONFIRMED |
| OpenRouter (Amazon Bedrock) | anthropic/claude-opus-4.6 | 91/100 | CONFIRMED |
| OpenRouter (Google Vertex) | anthropic/claude-opus-4.6 | 93/100 | CONFIRMED |
| Claude Code (Official) | claude-opus-4-6 | 91/100 | CONFIRMED |
---
## Competitor Models (For Comparison)
| Provider | Model | Score | Verdict |
|------------|-------------------------------|:------:|:-------------:|
| OpenAI | gpt-5.4 | 79/100 | NOT CONFIRMED |
| OpenAI | gpt-5.3-codex | 80/100 | NOT CONFIRMED |
| OpenAI | gpt-5.2 | 77/100 | NOT CONFIRMED |
| OpenRouter | z-ai/glm-5 | 69/100 | NOT CONFIRMED |
| OpenRouter | minimax/minimax-m2.5 | 76/100 | NOT CONFIRMED |
| OpenRouter | moonshotai/kimi-k2.5 | 78/100 | NOT CONFIRMED |
| OpenRouter | google/gemini-3.1-pro-preview | 80/100 | NOT CONFIRMED |
| OpenRouter | anthropic/claude-sonnet-4.6 | 76/100 | NOT CONFIRMED |
| OpenRouter | anthropic/claude-sonnet-4.5 | 74/100 | NOT CONFIRMED |
---
## Third-Party Opus Services
| Provider | Model | Score | Verdict |
|--------------|---------------------|:------:|:-------------:|
| Yuxor | claude-opus-4-6 | 68/100 | NOT CONFIRMED |