Güncelleme çok daha güçlü bir beanchmark dataseti ile sağlanan test API'sinde karşılaştırmalı bir test yaptım sonuç önceki bilgilendirmede basit bir test ile sınanmıştı.

> **Tested:** March 7, 2026
> **Benchmark version:** Hybrid scoring — deterministic (65pts) + LLM judge (40pts)
> **Threshold:** 85/100 = CONFIRMED Opus 4.6

---

## Reference Baselines (Official Channels)

| Provider                    | Model                     | Score  |  Verdict  |
|-----------------------------|---------------------------|:------:|:---------:|
| Anthropic Direct API        | claude-opus-4-6           | 93/100 | CONFIRMED |
| OpenRouter (Anthropic)      | anthropic/claude-opus-4.6 | 94/100 | CONFIRMED |
| OpenRouter (Amazon Bedrock) | anthropic/claude-opus-4.6 | 91/100 | CONFIRMED |
| OpenRouter (Google Vertex)  | anthropic/claude-opus-4.6 | 93/100 | CONFIRMED |
| Claude Code (Official)      | claude-opus-4-6           | 91/100 | CONFIRMED |

---

## Competitor Models (For Comparison)

| Provider   | Model                         | Score  |    Verdict    |
|------------|-------------------------------|:------:|:-------------:|
| OpenAI     | gpt-5.4                       | 79/100 | NOT CONFIRMED |
| OpenAI     | gpt-5.3-codex                 | 80/100 | NOT CONFIRMED |
| OpenAI     | gpt-5.2                       | 77/100 | NOT CONFIRMED |
| OpenRouter | z-ai/glm-5                    | 69/100 | NOT CONFIRMED |
| OpenRouter | minimax/minimax-m2.5          | 76/100 | NOT CONFIRMED |
| OpenRouter | moonshotai/kimi-k2.5          | 78/100 | NOT CONFIRMED |
| OpenRouter | google/gemini-3.1-pro-preview | 80/100 | NOT CONFIRMED |
| OpenRouter | anthropic/claude-sonnet-4.6   | 76/100 | NOT CONFIRMED |
| OpenRouter | anthropic/claude-sonnet-4.5   | 74/100 | NOT CONFIRMED |

---

## Third-Party Opus Services

| Provider     | Model               | Score  |    Verdict    |
|--------------|---------------------|:------:|:-------------:|
| Yuxor        | claude-opus-4-6     | 68/100 | NOT CONFIRMED |