Job Description
AI Evaluation Engineer
Location: Pune, India
Seniority: Mid-level (3-6 years)
Purpose: Execute comprehensive evaluations of FNZ's AI agents across the six-pillar framework, working as a generalist while developing specialist expertise in 1-2 pillars.
Key Responsibilities:
Design and conduct evaluations covering Task Performance, Safety, Efficiency, Groundedness, Robustness, and Suitability
Create golden sets of test examples representing expert judgment on desired agent behaviour
Develop evaluation rubrics and scoring criteria aligned to FNZ Evaluation Framework principles
Build comprehensive test suites covering happy paths, edge cases, and adversarial inputs
Evaluate multi-step agentic workflows: planning, tool selection, execution, error handling
Assess agent groundedness: verify outputs against knowled...