How We Test Sales Tools - Our Methodology: Guide

On this page

What we test
How long we test
Who tests
Where data comes from
What we don't accept
Update cadence
How a verdict is built
Conflicts of interest
Author bylines
Contact for corrections

What we test

Every comparison piece on this blog scores tools across the same seven dimensions. We do not let a tool win a category by being good at one thing and untested at the rest. The full scorecard runs against this list before any verdict is written.

Deliverability

Inbox vs spam placement across Gmail, Outlook, GMX, Free.fr, Libero, and corporate Microsoft 365 tenants. Bounce rate, hard vs soft, catch-all behaviour.

Multichannel coverage

Email, LinkedIn, calls, WhatsApp, SMS - what is native, what runs through Zapier duct tape, what breaks at step 7 of a sequence.

Pricing transparency

Price at 1 seat, 3 seats, 10 seats. Hidden credit caps. Annual vs monthly. We re-verify against the vendor's pricing page every quarter.

EU compliance

RGPD posture, data hosting region, DPA quality, opt-out mechanics. Read against actual CNIL / AEPD / Garante / Wettbewerbszentrale enforcement.

AI quality

Personalisation output an SDR would actually send. Hallucinations, signal quality, model behaviour at scale. Tested against 50+ real ICP profiles.

Integrations

Native vs Zapier vs API. CRM (HubSpot, Pipedrive, Salesforce), data providers, calling, calendar. What syncs both directions, what does not.

Support response time

We open a real ticket on day 7. Stopwatch starts. We measure first human reply, time to resolution, and whether the answer was useful or canned.

How long we test

Speed-running a tool in two hours is how marketing-fluff comparisons get written. We do not do that. The minimum bar before any verdict ships:

14 calendar days minimum hands-on with the tool, signed up like a normal customer (no vendor handshake, no insider account).
1,000+ sequences sent through the tool's primary inbox flow during the test window. This is non-negotiable - deliverability problems do not show up at 50 sends.
2 full weekly cycles so we catch Monday-morning rollout issues, Friday support response time, and weekend warmup behaviour.
3 ICPs minimum tested in parallel - French mid-market, DACH enterprise, US SMB - so we surface market-specific data quality gaps.

For deeper dives on category leaders (Apollo, Outreach, Salesloft, Lemlist, Instantly, Cognism), the test window stretches to 30+ days because the surface area is bigger and the corner cases matter more.

Who tests

Three expert reviewers, three distinct lenses. No piece ships without all three signing off.

Vincenzo Ruggiero

Co-founder Overloop

Deliverability, sequence design, multichannel testing. 10+ years cold email operator. Italian B2B market.

Nicolas Finet

CEO Sortlist + Overloop

Market data lens (600K Sortlist demands/mo), pricing math at scale, EU regulatory posture, founder POV.

Nathalie Saikali

Head of Sales Overloop

Operator pass - would I run this on my own pipeline Monday morning? French / Belgian / Swiss market relevance.

The three lenses are deliberate. A tool can have great deliverability (Vincenzo passes it), but flunk pricing math at 10 seats (Nicolas catches it), or fail the French CMO test on subject lines (Nathalie kills it). All three sign-offs are required. If one says no, we go back to the test, not to the article.

Where data comes from

We do not paraphrase G2 reviews and call it research. Every claim in a comparison piece traces back to one of the following sources, and we name the source in the article.

Proprietary · Sortlist

600K B2B demand requests per month across Europe

Sortlist matches B2B buyers to agencies. The 600K monthly demand stream tells us what European buyers are actually asking for, by country and category. We use it for category sizing, buyer-intent signals, and language-specific cold email pattern testing.

Proprietary · Overloop

1.2M+ sequences sent through the platform in 2025-2026

Aggregated and anonymised performance data from real Overloop campaigns: open rates by industry, reply rates by sequence depth, deliverability by inbox provider. Our own dogfooding plus customer-aggregated signal. Never shared at the individual customer level.

Proprietary · Overloop database

93% verified email accuracy across 450M-contact database

Live verification on the Overloop contact database. We re-run accuracy benchmarks quarterly against a stratified sample of 10,000 contacts and publish the result. When a number changes, we update the article.

Public · Vendor pricing

Vendor pricing pages, verified quarterly (Q1 / Q2 / Q3 / Q4 2026)

Every price quoted on this blog is checked against the vendor's live pricing page on a fixed quarterly cadence. We screenshot and date-stamp. If a tool changes pricing, the article is flagged for update within 7 days.

Public · Legal & regulatory

CNIL, AEPD, Garante, Wettbewerbszentrale, Forrester, Gartner, McKinsey

We cite primary sources for regulatory claims: CNIL sanctions database (France), AEPD enforcement (Spain), Garante decisions (Italy), Wettbewerbszentrale UWG cases (Germany). For market data and analyst calls, we cite Forrester, Gartner, McKinsey reports with publication date and report ID where applicable.

What we don't accept

The fastest way to wreck reader trust is to source-launder. The list below is what would never make it into a comparison piece on this blog.

Disqualifies a piece from publishing

Paid placements or "sponsored ranking"
Affiliate-only reviews (kickback bias)
Vendor-supplied screenshots without us re-shooting
Anonymous reviewer claims (no name = no claim)
Marketing fluff masquerading as data
Untested-on-our-side feature claims
"X said on a podcast" without a transcript link
G2 / Capterra rating imports without our own test

Tools sometimes ask to be added to a comparison in exchange for a backlink swap or a co-marketing push. The answer is no. Our position cannot be bought, and that is the only reason readers should keep paying attention.

Update cadence

A comparison published in Q1 is wrong by Q3 if nobody touches it. We have a fixed update rhythm:

Quarterly

Pricing re-verified

Every Q1 / Q2 / Q3 / Q4. Date-stamped at the top of each article. Pricing change triggers a same-week patch.

Annually

Full re-test

Every comparison gets a fresh 14+ day re-test once a year. Verdicts can flip. We say so when they do.

Within 7 days

Urgent patches

When a vendor ships a major product change (new pricing, big feature, acquisition), we patch impacted articles within 7 days.

How a verdict is built

Once a tool has finished its 14+ day test window, the verdict pipeline runs through three stages - independent scoring, cross-review, and contradiction resolution.

Stage 1 - Independent scoring

Each of the three reviewers scores the tool independently across the seven dimensions. We deliberately do not share scores until everyone is done - collective drift toward a "consensus" view is exactly the bias that wrecks comparison content. Vincenzo scores deliverability and sequence design from the lab data. Nicolas scores pricing math, EU compliance posture, and integrations from the platform / market lens. Nathalie scores AI quality, support response time, and the operator pass.

Stage 2 - Cross-review

The three score sheets land on the same table. Disagreements are flagged. This is where the most useful editorial work happens - when two reviewers say a tool is great and one reviewer says it is unusable, the article is not done until we understand which buyer the contradiction reflects. A tool can win for high-volume Anglo SaaS and lose for French mid-market. We say so explicitly.

Stage 3 - Contradiction resolution

When stage 2 surfaces a real disagreement (not a minor scoring delta), one of two things happens. Either we add a second mini-test that closes the question, or we publish the disagreement explicitly as part of the verdict. We do not paper over it. Most "best for" callouts in our comparisons exist because of a stage-3 split - the tool wins for buyer A and loses for buyer B, and we name both.

Stage 4 - The verdict ships

Only after all three reviewers have signed off does a comparison go live. The byline reflects who did what. If a piece is single-authored, the other two have still reviewed it - we just credit the lead writer. If you spot a verdict that contradicts your own real-world experience, we want to hear it: corrections@overloop.com.

Conflicts of interest disclosure

Two facts you should know about this blog before reading any comparison:

Overloop is one of the tools we cover. When Overloop appears in a comparison piece, the same scoring rules apply. We have published comparisons where Overloop loses on a specific dimension to a specific competitor - and we have left those verdicts unchanged because changing them would defeat the whole point of running this blog. When Overloop wins, we still source the claim against the same evidence bar as any other tool.
We do not run an affiliate program on this blog. No "buy via this link, we get a kickback" layer. The CTAs on the blog point to Overloop because we own Overloop, not because there is a hidden commission tier behind any vendor link.

If we ever change either of those policies, the change will be disclosed at the top of every affected article and dated.

Author bylines

Every article on this blog is signed. The byline is a real person, with a real face, a real role at a real company, and a real public profile you can vet. The author bylines on this blog link to:

Nicolas Finet - CEO Sortlist + Overloop (founder voice, market data, pricing math, regulatory posture)
Vincenzo Ruggiero - Co-founder Overloop (deliverability, sequence design, hands-on testing protocol)
Nathalie Saikali - Head of Sales Overloop (operator pass, French / Belgian / Swiss market relevance)

If a piece is co-authored, the schema reflects all three. If a guest contributor writes a piece, their full bio and conflict-of-interest statement is published alongside.

Contact for corrections

We get things wrong sometimes. Pricing changes overnight. A vendor ships a feature that turns a "no" into a "yes". A reader spots a regional gap we missed. When that happens, we want to know - and we want to fix it fast. The bar is the same as the bar for the original test: be specific, send the source, and we will update the article publicly with a date-stamped change log.

Spotted an error or a stale claim?

Email corrections@overloop.com with the article URL, the issue, and a source. We acknowledge within 48 hours and patch within 7 days when the source checks out.

Send a correction