KAI in the Wild: What We Learned Building an HR Policy Assistant People Actually Trust
KAI was never meant to be a generic chat layer over HR documents. It was built to reduce pressure on HR teams while giving employees timely, policy-grounded answers they can rely on. The trust requirement changed almost every technical choice we made. In this domain, a fluent answer is not useful if the source is unclear, out of date, or scoped to the wrong policy set.
We started by observing how policy questions actually arrive. People do not ask in clean legal language. They ask in fragments, often under time pressure, and often with terms that are informal or country-specific. The system needed to interpret intent, map it to the right policy scope, and respond with enough precision to be actionable without overreaching into legal advice.
Where Policy Assistants Usually Break
Early prototypes exposed four common failure modes:
- Answering a plausible question with the wrong policy version.
- Merging guidance across countries and creating silent contradictions.
- Returning summary text without clear evidence attribution.
- Using confident tone for edge cases that should be escalated to HR.
None of these failures are rare. They are structural. If you do not design explicit controls for versioning, jurisdiction, and uncertainty, they will appear at scale.
Our Architecture Choice: Retrieval First, Generation Second
KAI uses grounded retrieval pipelines with strict policy scoping before generation starts. We index policy content with metadata for region, business unit, version, effective dates, and document authority level. Query interpretation produces a scoped retrieval plan, not just a semantic search vector.
Once relevant evidence is assembled, generation is constrained by citation requirements. The model cannot produce policy claims without linked supporting passages. If evidence confidence is low, KAI is instructed to return a bounded response that explains uncertainty and routes the user to the right escalation channel.
The most important answer in an HR assistant is sometimes, "this needs a human decision."
We Treated Source Attribution as a Product Feature
Source links are often added as a compliance checkbox. We treated them as core UX. Every KAI response is designed to make attribution easy to inspect: users can see policy title, section, effective date, and scope. This changed behavior quickly. HR teams reported fewer repeated clarification requests, because users could self-verify whether an answer applied to them.
Attribution also improved internal quality loops. When reviewers challenged an answer, they challenged specific evidence mappings rather than arguing broadly about tone. That made corrections faster and more durable.
Handling Policy Drift and Updates
Policy systems are not static. Documents are revised, merged, and occasionally contradicted before governance catches up. KAI includes an ingestion pipeline that tracks version lineage and deprecates superseded content in retrieval pathways. Older policy text is retained for audit, but marked non-authoritative unless a historical query explicitly requires it.
We also learned to separate update propagation from model behavior updates. Policy refreshes must go live quickly; model tuning should move on a different cadence. Decoupling these pathways reduced release risk and gave governance teams clearer control.
Multilingual Delivery and Polyglot Batch
KAI serves users in multiple languages, but policy precision cannot degrade in translation. We integrated Polyglot Batch so key policy phrasing can be translated with domain-aware controls and consistency checks. This is especially valuable for repeated terms where literal translation is technically correct but operationally misleading.
The batch workflow also makes audits easier. Instead of one-off translations buried in chat logs, teams get versioned translation outputs that can be reviewed and approved centrally.
Evaluation: Beyond "Did the Answer Sound Good?"
We evaluate KAI on several dimensions at once:
- Grounding precision: percentage of claims supported by authoritative in-scope sources.
- Scope adherence: how often responses stay within role, country, and policy boundaries.
- Escalation quality: correctness of cases routed to HR instead of answered automatically.
- Update sensitivity: performance before and after policy revisions.
- User resolution rate: percentage of questions resolved without unsafe overreach.
The escalation metric turned out to be a leading indicator. When escalation quality dropped, downstream incidents increased, even if answer fluency and retrieval metrics looked stable.
What We Changed After Real Usage
Real usage exposed behaviors we had underestimated. Employees often ask compound questions that mix policy interpretation, local practice, and personal context. Initial versions tried to answer everything at once, which increased ambiguity. We shifted to response decomposition: KAI now separates what policy says, what depends on local implementation, and what requires case-specific HR review.
We also added explicit "assumption banners" for responses that infer missing context. This simple pattern significantly reduced misinterpretation, because users could see exactly what KAI assumed and correct it quickly.
Security and Governance Lessons
KAI is built on secure AI frameworks with tenant isolation and auditable event logs. That sounds standard, but governance only works if visibility is practical. We invested in review dashboards that let HR and compliance teams inspect answer patterns, high-risk topics, and source usage trends without technical mediation.
This governance layer changed adoption conversations. Instead of debating abstract AI risk, teams could inspect concrete evidence of system behavior. Trust moved from promise to proof.
What Is Next for KAI
Our next step is deeper integration with change-intelligence workflows so policy questions and broader labour-relations scenarios can share context safely. We are also improving personalized clarification prompts, so KAI can ask for missing scope details earlier without creating conversational friction.
The lesson from this build is simple: an HR assistant is only as good as its grounding discipline. Natural language quality matters, but trust comes from evidence, scope control, and honest uncertainty handling. That is where we continue to invest.