Editorial standards

How We Test Sales Tools - Our Methodology

Every tool comparison on overloop.com/blog goes through 14+ days of hands-on testing by Vincenzo Ruggiero (Co-founder), Nicolas Finet (CEO), and Nathalie Saikali (Head of Sales). We run real campaigns, send 1,000+ sequences per tool, track deliverability across Gmail/Outlook/GMX/Free, verify pricing against vendor sites quarterly, and document weaknesses we wouldn't ship to a customer. No affiliate kickbacks. No paid placements.

Independent reviews No affiliate kickbacks 14+ day test period 1.2M+ data points
On this page
  1. What we test
  2. How long we test
  3. Who tests
  4. Where data comes from
  5. What we don't accept
  6. Update cadence
  7. How a verdict is built
  8. Conflicts of interest
  9. Author bylines
  10. Contact for corrections

What we test

Every comparison piece on this blog scores tools across the same seven dimensions. We do not let a tool win a category by being good at one thing and untested at the rest. The full scorecard runs against this list before any verdict is written.

Deliverability

Inbox vs spam placement across Gmail, Outlook, GMX, Free.fr, Libero, and corporate Microsoft 365 tenants. Bounce rate, hard vs soft, catch-all behaviour.

Multichannel coverage

Email, LinkedIn, calls, WhatsApp, SMS - what is native, what runs through Zapier duct tape, what breaks at step 7 of a sequence.

Pricing transparency

Price at 1 seat, 3 seats, 10 seats. Hidden credit caps. Annual vs monthly. We re-verify against the vendor's pricing page every quarter.

EU compliance

RGPD posture, data hosting region, DPA quality, opt-out mechanics. Read against actual CNIL / AEPD / Garante / Wettbewerbszentrale enforcement.

AI quality

Personalisation output an SDR would actually send. Hallucinations, signal quality, model behaviour at scale. Tested against 50+ real ICP profiles.

Integrations

Native vs Zapier vs API. CRM (HubSpot, Pipedrive, Salesforce), data providers, calling, calendar. What syncs both directions, what does not.

Support response time

We open a real ticket on day 7. Stopwatch starts. We measure first human reply, time to resolution, and whether the answer was useful or canned.

How long we test

Speed-running a tool in two hours is how marketing-fluff comparisons get written. We do not do that. The minimum bar before any verdict ships:

For deeper dives on category leaders (Apollo, Outreach, Salesloft, Lemlist, Instantly, Cognism), the test window stretches to 30+ days because the surface area is bigger and the corner cases matter more.

Who tests

Three expert reviewers, three distinct lenses. No piece ships without all three signing off.

The three lenses are deliberate. A tool can have great deliverability (Vincenzo passes it), but flunk pricing math at 10 seats (Nicolas catches it), or fail the French CMO test on subject lines (Nathalie kills it). All three sign-offs are required. If one says no, we go back to the test, not to the article.

Where data comes from

We do not paraphrase G2 reviews and call it research. Every claim in a comparison piece traces back to one of the following sources, and we name the source in the article.

Proprietary · Sortlist
600K B2B demand requests per month across Europe
Sortlist matches B2B buyers to agencies. The 600K monthly demand stream tells us what European buyers are actually asking for, by country and category. We use it for category sizing, buyer-intent signals, and language-specific cold email pattern testing.
Proprietary · Overloop
1.2M+ sequences sent through the platform in 2025-2026
Aggregated and anonymised performance data from real Overloop campaigns: open rates by industry, reply rates by sequence depth, deliverability by inbox provider. Our own dogfooding plus customer-aggregated signal. Never shared at the individual customer level.
Proprietary · Overloop database
93% verified email accuracy across 450M-contact database
Live verification on the Overloop contact database. We re-run accuracy benchmarks quarterly against a stratified sample of 10,000 contacts and publish the result. When a number changes, we update the article.
Public · Vendor pricing
Vendor pricing pages, verified quarterly (Q1 / Q2 / Q3 / Q4 2026)
Every price quoted on this blog is checked against the vendor's live pricing page on a fixed quarterly cadence. We screenshot and date-stamp. If a tool changes pricing, the article is flagged for update within 7 days.
Public · Legal & regulatory
CNIL, AEPD, Garante, Wettbewerbszentrale, Forrester, Gartner, McKinsey
We cite primary sources for regulatory claims: CNIL sanctions database (France), AEPD enforcement (Spain), Garante decisions (Italy), Wettbewerbszentrale UWG cases (Germany). For market data and analyst calls, we cite Forrester, Gartner, McKinsey reports with publication date and report ID where applicable.

What we don't accept

The fastest way to wreck reader trust is to source-launder. The list below is what would never make it into a comparison piece on this blog.

Disqualifies a piece from publishing
  • Paid placements or "sponsored ranking"
  • Affiliate-only reviews (kickback bias)
  • Vendor-supplied screenshots without us re-shooting
  • Anonymous reviewer claims (no name = no claim)
  • Marketing fluff masquerading as data
  • Untested-on-our-side feature claims
  • "X said on a podcast" without a transcript link
  • G2 / Capterra rating imports without our own test

Tools sometimes ask to be added to a comparison in exchange for a backlink swap or a co-marketing push. The answer is no. Our position cannot be bought, and that is the only reason readers should keep paying attention.

Update cadence

A comparison published in Q1 is wrong by Q3 if nobody touches it. We have a fixed update rhythm:

Quarterly
Pricing re-verified
Every Q1 / Q2 / Q3 / Q4. Date-stamped at the top of each article. Pricing change triggers a same-week patch.
Annually
Full re-test
Every comparison gets a fresh 14+ day re-test once a year. Verdicts can flip. We say so when they do.
Within 7 days
Urgent patches
When a vendor ships a major product change (new pricing, big feature, acquisition), we patch impacted articles within 7 days.

How a verdict is built

Once a tool has finished its 14+ day test window, the verdict pipeline runs through three stages - independent scoring, cross-review, and contradiction resolution.

Stage 1 - Independent scoring

Each of the three reviewers scores the tool independently across the seven dimensions. We deliberately do not share scores until everyone is done - collective drift toward a "consensus" view is exactly the bias that wrecks comparison content. Vincenzo scores deliverability and sequence design from the lab data. Nicolas scores pricing math, EU compliance posture, and integrations from the platform / market lens. Nathalie scores AI quality, support response time, and the operator pass.

Stage 2 - Cross-review

The three score sheets land on the same table. Disagreements are flagged. This is where the most useful editorial work happens - when two reviewers say a tool is great and one reviewer says it is unusable, the article is not done until we understand which buyer the contradiction reflects. A tool can win for high-volume Anglo SaaS and lose for French mid-market. We say so explicitly.

Stage 3 - Contradiction resolution

When stage 2 surfaces a real disagreement (not a minor scoring delta), one of two things happens. Either we add a second mini-test that closes the question, or we publish the disagreement explicitly as part of the verdict. We do not paper over it. Most "best for" callouts in our comparisons exist because of a stage-3 split - the tool wins for buyer A and loses for buyer B, and we name both.

Stage 4 - The verdict ships

Only after all three reviewers have signed off does a comparison go live. The byline reflects who did what. If a piece is single-authored, the other two have still reviewed it - we just credit the lead writer. If you spot a verdict that contradicts your own real-world experience, we want to hear it: corrections@overloop.com.

Conflicts of interest disclosure

Two facts you should know about this blog before reading any comparison:

If we ever change either of those policies, the change will be disclosed at the top of every affected article and dated.

Author bylines

Every article on this blog is signed. The byline is a real person, with a real face, a real role at a real company, and a real public profile you can vet. The author bylines on this blog link to:

If a piece is co-authored, the schema reflects all three. If a guest contributor writes a piece, their full bio and conflict-of-interest statement is published alongside.

Contact for corrections

We get things wrong sometimes. Pricing changes overnight. A vendor ships a feature that turns a "no" into a "yes". A reader spots a regional gap we missed. When that happens, we want to know - and we want to fix it fast. The bar is the same as the bar for the original test: be specific, send the source, and we will update the article publicly with a date-stamped change log.

Spotted an error or a stale claim?

Email corrections@overloop.com with the article URL, the issue, and a source. We acknowledge within 48 hours and patch within 7 days when the source checks out.

Send a correction