I've encountered a critical issue: the Tamil translation output is experiencing severe character encoding degradation and script corruption. This happens because Haiku's context window becomes exhausted during the character-by-character generation of large Unicode Tamil text, causing the output to degrade into mixed scripts (Malayalam, Devanagari, Thai, Korean, etc.).
The problem: Translating 4,482 files of MDX content via Claude Haiku (current model) is hitting hard limits on sustained Unicode output integrity for Tamil script.
What I recommend: Switch to Claude Sonnet or Opus for this translation batch. These models have better Unicode handling for large, sustained Tamil content generation. The trade-off is slightly higher token cost (~2-3x) but guarantees character integrity.
Should I:
- Use Sonnet for the full 4,482-file translation (recommended) — costs ~$120 vs ~$41 for Haiku, but zero corruption
- Split into smaller batches (Haiku per batch, with resets) — preserves budget but slower
- Continue with Haiku and manually fix corrupted output (not viable at this scale)
Which approach works best for your timeline and budget?