Trust What You See

Split image showing a black box on the left representing opaque data quality tools and a laptop screen with an open-ended survey question on the right representing transparent participant observation

I had a conversation with Blake, our Product Owner for Sentry, that I haven’t been able to stop thinking about. We were talking about a client who’s currently testing three data quality solutions side by side, Sentry and two well-known competitors. And the client noticed something strange. The three products were flagging completely different people.

You’d think solutions claiming to do the same thing would at least agree on who the bad actors are. They don’t. There’s almost no overlap. One tool flags one group. Another flags a different group. Sentry flags a third. So the obvious question is: who’s right?

This is where I get frustrated. Most people in this industry skip right past that question. They look at the numbers and say, well, each tool is catching something, so maybe we should use all of them. More flags must mean more protection, right?

Wrong. Without a ground truth, you have no idea what you’re looking at.

What I Mean by Ground Truth

Let me explain what I mean by ground truth, because this is the thing that drives me crazy about how data quality gets evaluated in our industry.

Here’s what a lot of researchers do. They run a study. They apply a data quality tool. They look at how many participants got tossed versus how many got through. And then they decide the tool is working based on… what exactly? They didn’t know ahead of time how many participants should have been tossed. They have no baseline. No expected rate of fraud. No independent measure of who was actually a bad respondent. They’re just looking at a number and deciding it feels about right.

Without a ground truth, you’re not doing quality control. You’re guessing.

If I told you I had a bouncer at a nightclub who turned away 30% of the people in line, would you think he was good at his job? You have no idea. Maybe 30% of the crowd was underage. Maybe 5% was. Maybe none of them were. Without knowing who should and shouldn’t get in, the rejection rate tells you absolutely nothing. You’re just watching a guy stand at a door and saying he looks busy, so he must be doing something.

Our internal analysis of one well-known competitor showed something that should make people uncomfortable: its true positive and false positive rates are nearly equal. Think about what that means. It’s removing people more or less at random. A third of online sample tends to be problematic, so if you randomly toss one in every three people, you’re going to accidentally remove some bad ones. You’ll get passable-looking results. But it’s not because the tool works. It’s because the math works out when you flip a coin enough times.

Think about it this way: if a third of your online sample is bad and you’re just removing one in every three people randomly, you’ll probably end up with passable results. That’s not a product. That’s probability.

Why Sentry Is Different

Sentry is different, and I want to be specific about why.

In blind validation testing against sophisticated AI agents, Sentry’s integrated detection system achieved a greater than 99% detection rate with a false positive rate below 1%. That means it catches virtually everything that should be caught and almost never flags a legitimate participant by mistake. When competitors are running near-equal true and false positive rates, we’re operating at 99% and under 1%. Those aren’t in the same universe.

But here’s what matters even more than the numbers. Sentry is the only solution with face validity.

Face validity means you don’t have to take anyone’s word for it. You can observe participants taking a study and see, with your own eyes, the behaviors you don’t want. Sentry’s event-streaming recorder shows you exactly what’s happening: someone translating the survey into another language, someone pasting ChatGPT-generated responses into open-ended fields, someone using browser plugins to auto-fill answers, someone blowing through questions with mouse movements that no actual human would make. You can watch the recordings yourself. You decide.

Sentry is the only data quality solution with face validity. You don’t take anyone’s word for it. You watch. You decide.

Blake put it perfectly on a client call: “You don’t have to trust us. You don’t need to take our word for it. Just look at the recording and decide for yourself. ‘Trust me bro’ is not our pitch.”

When clients see those recordings, the reaction is always the same. It’s cut and dry. You watch someone take a survey and you know, instantly, whether you’d want that person in your data. There’s no algorithm score to interpret, no black box confidence level to agonize over. It’s a person doing something obviously wrong, caught on camera. Or it’s a person responding normally and thoughtfully. Your gut tells you which is which in about two seconds.

We’re Accelerating

And we’re not standing still. We’re accelerating. Our Red Team / Blue Team approach means we’re constantly attacking our own systems with the most sophisticated AI agents we can build, then hardening our defenses against whatever gets through. Our Red Team builds AI bots designed specifically to beat Sentry. When they succeed, our Blue Team builds new countermeasures. When they fail, the Red Team gets smarter. This isn’t a once-a-year security audit. It’s a continuous arms race we run against ourselves, and it’s why we keep pulling further ahead. The threats are evolving fast. We need to evolve faster. And we are.

We’ve Stopped Trusting Our Own Instincts

But here’s what’s been eating at me. Despite all of this, I still hear from otherwise smart, experienced industry professionals that they use competing tools, or trust threat scores from products with no observable evidence behind them. And I keep asking myself: why?

I wrote in my last blog about how ChatGPT identified one of my blind spots: I assume that if the evidence is clear, people will just follow it. They won’t. And I’m learning to accept that. But this goes deeper than marketing and light shows. This is about something I think we’ve lost as an industry and maybe as a culture.

We’ve stopped trusting our own instincts.

When you watch a Sentry recording of a bad actor taking a survey, you don’t need a PhD to know something is off. You don’t need a threat score or a confidence interval. You just know. It’s the same instinct you’d use at a dinner party. Something about that person isn’t right. You wouldn’t invite them back. You’d trust that feeling without a second thought.

We’ve stopped trusting our own instincts. Truth resonates. You know it when you see it. Sentry simply lets you see it.

But somehow, in a professional context, people won’t trust that same instinct. They need an algorithm to tell them what they can already see. They defer to a tool with a slick dashboard even when the tool’s own numbers show it’s barely better than a coin flip, because the tool has a brand name or because someone with a title recommended it. Questioning the expert feels risky, so they don’t.

That’s backwards. Truth resonates. You know it when you see it. Sentry simply lets you see it.

Evidence Over Authority

I think about this a lot in terms of how we run CloudResearch. One of the things I’m most proud of is that nobody here goes by their title. Everyone is on a first-name basis. We expect the most junior person on the team to feel comfortable calling out something that doesn’t make sense, and we expect the most senior person to be comfortable when they’re the one getting called out. There are no “experts” who are above questioning. There are no degrees or titles that serve as shields against scrutiny.

That’s not an accident. It’s a design choice. The moment you create a culture where authority substitutes for evidence, you end up with the same problem the industry has with data quality tools. People stop looking at what’s actually in front of them and start deferring to whoever sounds most confident.

And here’s what’s wild. Outside of CloudResearch, there really aren’t data quality “experts” with evidence to show. Not in the way people assume. I had a candid, off-the-record conversation recently with the CEO of one of the major sample providers. He told me flat out that his company is run by sales people, himself included. That’s not a criticism of him. He was being honest. Most of these data quality companies are sales organizations first. They have great pitch decks and impressive dashboards and confident spokespeople. What they don’t have is a research team that can show you the evidence.

CloudResearch is different. About 25% of our staff have PhDs. Research is literally our name. It’s not a brand exercise. It’s who we are and what we do. Our expertise doesn’t hide behind a title or a sales deck. It shows up in results that anyone can verify.

A Surprising Discovery

Sometimes those results surprise even me.

Yesterday, Josh, our Machine Learning Team Lead who guides both our Red and Blue teams, showed me something that failed the prima facie test. He had evidence that using ML, we can tell with 90% accuracy whether a survey participant is from CloudResearch Connect or from other popular platforms, just by their behavior as they take a survey. Not their demographics or their answers. Their actual behavior on the page.

Using ML, we can identify which platform a participant came from with 90% accuracy — just by watching how they take a survey.

My gut said that couldn’t be right. The signal can’t be that strong. But Josh showed me the evidence, and it wasn’t just right. It was overwhelming. Connect participants are significantly more diligent than participants from other platforms. Not a little more. By orders of magnitude. The behavioral fingerprint of a careful, engaged human respondent is so distinct that a machine learning model can identify which platform they came from just by watching them work.

That was a serendipitous discovery. Nobody set out to find it. But it tells you a lot about what happens when you actually invest in participant quality at the recruitment level instead of trying to patch it after the fact with a screening tool that’s barely better than chance.

Trust the Evidence

And here’s the part that brings this full circle for me. My instincts said Josh was wrong. The evidence said he was right. I trusted the evidence. That’s exactly what I’m asking the industry to do with Sentry. Don’t trust me. Don’t trust my title or my pitch. Just look at what’s actually there.

Your instincts are right more often than you think, and you should follow them. But sometimes the evidence overrides your instincts, and you need to follow that instead. The key is building an environment where both are welcome, where the most junior person can challenge the CEO, and where evidence always wins over authority. That’s how we built CloudResearch. That’s how we built Sentry. And that’s why it works.

The world has too many sales teams masquerading as experts and too many dashboards substituting for evidence. People defer to confidence when they should be demanding proof.

Sentry doesn’t ask for your trust. It gives you a window. Look through it and decide for yourself.