How to Write and QA Cold Email at Scale

Jun 21, 2026Cristian Frunze7 min

featurecold emaildeliverabilityai

To write and QA cold email at scale, split the work into roles a system can hold: humans build the frameworks and give the final approval, an AI drafts from those frameworks, and a second, independent AI reviews every draft against a fixed checklist before anyone sends. The review is what keeps quality from sliding as volume grows, because no person can hand-check thousands of emails across a dozen segments without missing the same mistakes a checklist catches every time.

Most teams do the opposite. They let one AI write the email and ship whatever it produced, then learn it was bad weeks later from a flat reply rate. By then the domain has already taken the hit. This is a walkthrough of the system our team uses to write and review copy at volume: the human frameworks at the front, the automated review step in the middle, and the human approval at the end.

Why does AI-written cold email need a review step at all?

Because the model that writes a cold email is fluent, fast, and slightly addicted to sounding like marketing. Left alone, it reaches for the exact phrases that spam filters were trained to catch, pads three good sentences into eight, and ends on a polite question instead of an ask. None of that is a bug in the model. It is what "write me a persuasive sales email" actually produces.

The deeper problem is that bad cold email is expensive in a way you do not see immediately. A salesy word or a false claim does not just lower one reply. It teaches inbox providers that your sending domain is a marketer, and that reputation follows you into every future send. Quality and deliverability are the same problem. We wrote more about why we built our entire stack around that idea on our why Ken page.

The core mistake: letting the AI that writes also be the AI that approves

If you ask one model to write a cold email and then ask the same model whether the email is good, it will tell you it is great. Every time. Models grade their own work generously, because the same instinct that produced the copy is the one judging it.

The fix is structural, not a better prompt. Split the job into two roles. One agent writes. A second, separate agent reviews, and its only job is to try to fail the draft. The writer is rewarded for a clean, persuasive email. The reviewer is rewarded for catching what is wrong with it. When those two incentives are pointed at the same draft, the quality that survives is real, not self-reported.

What should a cold email QA checklist actually check?

A useful review is not vibes. It is a fixed list of failure modes, each one machine-checkable, so the same email gets graded the same way every time. Here is the core of what ours looks for.

What the reviewer checks

Why it matters

What fails it

Spam-trigger words in the body

Filters were trained on a decade of bad cold email and know the tells

"free," "guarantee," "offer," "buy," currency and percent symbols

Salesy follow-up endings

The moment an email announces it is a follow-up, it reads like a chore

"just following up," "circling back," "checking in," "last note from me"

First-line bridge

A personalized opener that the body ignores reads like two emails stitched together

Line 1 is about the prospect, line 2 jumps to a generic topic

Brevity

A busy person decides to reply in seconds. Long reads as ignored

Body over 80 words, or follow-ups over 55

Real ask, not a diagnostic question

A question about the prospect puts the work back on them

"Do you have a system for this?" used as the close

Claims match the real offer

A claim the product cannot back is both a spam tell and a trust problem

"No credit card required" when a card is in fact required

The point of writing it down is consistency. A human reviewer catches different things on a Monday than a Friday. A fixed checklist catches the same things at email number one and email number ten thousand.

The quality gate: a score and a hard rule check, separately

Once the draft is reviewed, it gets a single quality score, decimals allowed, and it has to clear a minimum bar before it can move forward. We set that bar high.

The part worth copying is that the score is not the only gate. There is a second, independent gate: the hard rules. A draft can score beautifully and still get rejected outright if it breaks one non-negotiable rule, like a banned spam word or a claim that contradicts the offer. The score and the rule check are ANDed together. A near-perfect email with one landmine in it is rejected exactly like a weak one. That stops the model from writing its way past a deliverability problem with otherwise lovely copy.

Two gates matter because they catch two different kinds of failure. The score catches mediocre. The hard rules catch dangerous. You want both.

The revision loop: fail, fix, recheck

A review step that only says "this failed" is half a system. The useful version closes the loop. When a draft fails, the reviewer does not quietly patch it. It writes specific revision notes, organized by email, and hands the draft back to the writer. The writer revises, the reviewer grades again, and that cycle repeats up to three times before any human sees the copy.

The effect is that the first two rounds of editing, the rounds that used to eat a copywriter's afternoon, happen automatically. A person only steps in once the copy already cleared the bar, to add the judgment a checklist cannot. That is the difference between AI replacing the writer and AI doing the writer's least valuable work so the writer can do the rest.

How Ken AI writes and QAs every email at scale

This is not a side feature for us. It is how our client success managers write and review every email in every campaign. A human owns the framework and the final approval. The drafting agent and the reviewing agent run in between, as separate AI roles inside our campaign engine, per audience segment, in parallel, so a campaign split into ten segments gets ten independent copy reviews at once. That is the only way one person can stand behind the quality of thousands of emails. An agency doing this by hand cannot match it without throwing bodies at the problem, and the quality drifts the moment they get busy.

It sits on top of the rest of our quality engine: a replicated spam-filter check that rewrites any email that would land in spam, triple email verification, and our own sending infrastructure on dedicated IPs instead of shared pools. You can see the full stack on our features page. The reason we built all of it is on our about page, and if you run outbound for a B2B company, our founders use case covers how this plays out in practice.

The honest version: none of this guarantees a reply. Copy still has to say something true and relevant to a real person. What the review loop does is make sure the copy that goes out is short, human, free of the words that get you filtered, and honest about your offer. That is the floor. Most senders never build the floor, which is why most cold email is bad.

Frequently asked questions

Can I just use a better prompt instead of a second review pass?

A better prompt helps the first draft, but it does not solve self-grading. The model that wrote the email is still the one judging it, and it will be generous. The value of a separate review pass is the independent incentive: an agent whose entire job is to find what is wrong, not to defend what it wrote.

Why is the word "free" banned from the email body?

It is one of the oldest and most reliable spam-filter triggers there is. It is a great word on a landing page and a liability inside a cold email body. The fix is a reframe, not a deletion: "free trial" becomes "trial" or "a no-cost look," which keeps the offer and drops the trigger.

Is 80 words too short to say anything real?

It is shorter than your first instinct, which is the point. Most cold emails carry one good idea wrapped in forty words of throat-clearing. Eighty words is plenty for a relevant opener, one clear point, and a direct ask. Follow-ups can be tighter still, under fifty-five.

Does this replace human copywriters?

No. Humans build the frameworks, the angle, and the voice, and a human reviews the copy after it clears the automated gate. The review loop handles the first two rounds of mechanical editing so people spend their time on judgment, not on counting words and hunting for banned phrases.

How do I start QAing my own cold email today?

Open your last sequence and run the checklist above by hand. Search for "just following up" and "free," count the words in each body, and check that every email ends with a real ask. You will find at least one thing to cut. If you would rather have the whole stack run for you, you can see how Ken works or book a founder call.