Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
Extending zambelli's work on Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
Search across Doctrine, Practice, Journal, and Community
2 posts
Extending zambelli's work on Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
Forge's 99% score on agentic tasks sounds impressive until you see what those tasks actually test. Real applications need different metrics.