Announcing Monitors: Opening the AI black box

March 25, 2026

At Fin Labs Paris, we announced Monitors, a new product area for Fin that sits alongside Insights and Recommendations to give you a full observability suite – and confidence in what Fin is doing.

With Monitors, you define what conversations get reviewed, both Fin and human, and set evaluation criteria using Custom Scorecards. This ensures you’re monitoring metrics that matter the most to your business, giving you complete control over support quality.

When used alongside Insights and Recommendations, you have everything you need to see what’s happening across your support operation, hold every conversation to your bar, and continuously drive toward perfect customer experiences.

As Agents become more powerful, transparency and control become critical

AI is getting more capable, fast. Everyone working in this industry feels it.

We now have Agents executing complex tasks, like handling real customer conversations with real consequences. Fin has almost 8,000 customers, averages a 67% resolution rate, and resolves close to 2 million customer queries every single week, including highly complex queries in regulated industries.

As Agents take on deeper work at that scale, observability is essential. Most support leaders today can’t confidently answer basic questions about what their Agent is actually doing: whether it’s delivering good experiences, resolving complex issues completely, representing their brand the way a human would, and doing all of it reliably, conversation after conversation. The old infrastructure can’t help them, CSAT scores and QA samples have always been limited in scope, they don’t scale effectively.

The result is a black box.

What teams need most right now is confidence. That’s why we designed Fin with transparency at its center. Because teams should feel empowered to understand and optimize it on their own.

At Intercom, this is called the Fin Flywheel: Train, Test, Deploy, Analyze.

Analyze is the step where you find out what’s actually happening and it’s where improvement begins.

We believe that having confidence in your AI operation requires three things:

A complete understanding of what Fin, your human team, and your customers are talking about.
A way to monitor and score conversations based on the criteria that matter most to your business.
AI-powered recommendations that make it easy to act on what you find.

To address these, last year we launched Insights and Recommendations. Now, we’re announcing Monitors – completing a system for full observability, and the key to opening the black box.

Monitors: Know whether every conversation met your standards

Knowing how a customer felt about a conversation is different from knowing whether it was handled correctly; both are important to service quality.

Monitors is a new QA capability that delivers a structured, repeatable way to define which conversations get reviewed — and evaluate them against quality criteria you set. It replaces ad-hoc sampling and spreadsheet-driven QA with a system that scales as your volume grows.

Two components work together: Monitors define what gets reviewed and Custom Scorecards define how each conversation is evaluated.

The right conversations, and the right coverage

Random sampling was always a blunt tool. When AI is handling thousands of conversations a week, a small, arbitrary slice won’t reliably capture your highest-risk edge cases, your most complex escalations, or where quality is starting to drift.

With Monitors, you define how conversations are selected and evaluated. That can mean targeting specific signals of risk or failure, like “the customer showed signs of financial vulnerability” or “Fin looped around with the same answer without resolving the issue.” Or it can mean creating consistent, repeatable generic samples to benchmark quality over time. You set this criteria from an existing list of filters, based on customer data, channel, or Fin-specific metrics, or use natural language to describe instances with more nuance.

You can combine both approaches: hone in on the conversations that matter most and maintain a steady, structured QA sample each week.

Custom Scorecards: Enforce your standards, consistently

Every business operates differently, so a one-size-fits-all quality rubric won’t reflect your priorities, your trade-offs, or what your customers actually care about.

Custom Scorecards let you define what “good” looks like for your business and turn that into a custom quality score for every conversation.

You define the criteria that matters, how each should be measured, and how important each one is. Some criteria can be scored automatically by AI, others reviewed by a human, or both — all within the same scorecard. This means you’re not choosing between scale and judgment; you get both in one system.

Each conversation is then evaluated against these criteria, and the system calculates an overall quality score based on your configuration. You can weigh what matters most, or mark certain criteria as critical, so a single failure can fail the entire evaluation when needed.

The result is a single, consistent quality score that reflects your standards — not a generic metric, and not a collection of disconnected checks. This is what makes quality measurable over time. You can track how your AI and human support are performing against the same definition of “good,” and see where things are improving or breaking down.

There’s an important distinction here: CX Score tells you how customers felt about a conversation. Custom Scorecards tell you whether it met your standards. You need both.

Review Queue: Turn flags into fixes

When conversations are flagged as needing a human review, based on your criteria, they are placed in the Review Queue. Here, every conversation matched by a Monitor is automatically assigned to the right reviewer, with its scorecard attached and review status tracked:

Not reviewed
Reviewed
Needs a fix
Fix complete

Reviewers work through conversations directly inside Intercom, filling in scorecard criteria as they go.

When a conversation fails, they mark it, add a note on what went wrong, and can suggest potential solutions such as updating documentation. These conversations then move to follow-up, where the team can apply fixes.

This means that nothing gets lost in a spreadsheet or a Slack thread, and QA stops being a loop that ends at a score and becomes one that ends at an improvement.

Reporting: Use QA as a continuous signal

Reporting is where your quality scores connect to everything else. You can track review scores over time, across Monitors and Scorecards, and compare them directly against CX Score, resolution rate, and other performance metrics.

Patterns that were previously invisible become clear: a specific topic consistently underperforming, a quality dip that correlates with a recent knowledge base change, a team whose scores are improving week on week. QA data becomes a continuous signal for how the operation is improving, not a one-off exercise that lives in a separate tool.

What’s coming next

Monitors for Fin conversations is live today, and we’re already building what comes next.

Human agent QA will bring the same structured evaluation to your human team’s conversations, giving you one consistent quality system across your entire support operation.
Real-time alerts will notify you the moment a conversation crosses a threshold you’ve defined — before the issue reaches more customers.
Knowledge base evaluation will connect AI scoring directly to your content, so conversations are assessed against your latest policies and documentation, catching inaccurate or outdated responses and providing clear rationale linked to the relevant source.

Open the black box

Creating perfect customer experience with AI requires transparency. You need to understand how the system is performing if you want to maintain and improve quality over time. With Insights, Monitors, and Recommendations, this is now possible. It’s a complete analysis suite that allows you to see what’s happening across every conversation, ensure it’s meeting your standards, and identify improvement opportunities when you need them.

Everything in Analyze step of the Flywheel was built in close partnership with our customers – your use cases, your feedback, your honest conversations about what wasn’t working. That partnership is why Fin is the highest-performing Agent in the market. And we’re not slowing down.