Bad data is an endless money pit. The cost of
poor information quality is often estimated at 15% to 25% of an organization’s
total operating revenue, and most of that cost doesn’t show up as a neat line
item.
The Costs of Quality (CoQ) are visible like
labor spent fixing records and invisible like missed opportunities, workflow
friction, and eroded trust in reporting. These liabilities often hide inside
general administrative expenses instead of appearing as clear, actionable
operational costs.
To avoid inheriting new costs, you have to
treat information like a manufactured product not an abstract “asset.”
Manufactured products get inspected and rejected when they fail. Data should be
bought the same way: not based on branding, demos, or feature checklists, but
on whether it performs inside your workflow.
In a product-buy context, data quality means
fitness for use. A dataset can be high data quality (fits a schema and
constraints) but low information quality (not reliable for decisions or
execution). The goal isn’t perfection, it's data that behaves predictably in
your process.
Fitness for use is usually evaluated across
these dimensions:
●
Accuracy: Real-world truth
●
Completeness: Required fields
present
●
Freshness: Updated as reality
changes
●
Uniqueness: No duplicates
representing the same entity
● Consistency: No internal contradictions
● Validity: Correct formats/allowed values
These dimensions trade off. A vendor can
“improve completeness” by stuffing placeholders into fields, making a dataset
look full while making your workflow worse. The target isn’t “more data.” It’s
reliable, fit-for-use data.
Some segments expose data defects faster than
others. RIA (Registered Investment Advisor) data is one of the best stress
tests.
RIAs look simple until you operationalize
them: advisor movement, firm restructuring, multi-office entities, shifting
roles, and messy parent-child relationships can break “good-looking” datasets
quickly. If you’re doing outbound, territory planning, market mapping, or
enrichment, RIAs are where you find out whether a vendor’s dataset is actively
maintained or just packaged.
That’s why the checks below matter: they don’t
measure abstract “quality.” They measure whether the dataset holds up in a
high-change environment where errors become wasted touches, mis-targeting,
compliance risk, and broken reporting.
Accuracy is the closeness of a value to its
real-world representation. The key is semantic accuracy (factually correct),
not just syntactic accuracy (looks formatted correctly).
What to do: Pull 50–100 records from your
actual ICP slice (not a generic export) and verify against “golden sources”
(firm sites, filings, professional profiles, SEC/IARD context where relevant).
Validate only the fields that drive your workflow.
For RIAs, focus on: current firm affiliation,
current title/seniority, office/location, and correct firm identity (not a
similar-name entity).
How to score:
● Semantic Accuracy Rate: % aligned with real-world truth
● Syntactic Accuracy Rate: % correctly formatted (email/phone formats)
Red flags: High syntactic accuracy but low
semantic accuracy (it looks right but isn’t). Vendor claims that don’t match
what you observe in the sample.
Completeness is the % of required fields
available but completeness can be faked. Measure usable completeness, not
cosmetic fill.
What to do: Run frequency distribution
analysis for fields you need (direct email, phone, seniority, firm type,
office/branch identifiers, AUM bands if relevant). Look for hidden null
patterns like “999-999-9999,” “N/A,” repeated placeholders, or suspiciously
uniform values.
How to score:
●
True Fill Rate: availability minus
hidden nulls/nonsense
●
Usable Record Rate: % with minimum
viable fields (e.g., Name + Email + Title)
Red flags: “Complete-looking” fields that
don’t improve execution; imputed values that inflate fill rates without
increasing contactability or routing accuracy.
Freshness (timeliness) is whether records
update at a rate that matches operational needs. Reality changes fast: job
moves, firm shifts, role changes. RIA datasets amplify this—advisor movement
and firm changes happen continuously.
What to do: Translate vendor refresh claims
into verifiable mechanics:
●
Do records carry timestamps for
last update?
● Can the vendor explain typical lag from real-world change to dataset update?
● Is maintenance continuous or mostly periodic bulk refresh?
How to score:
●
Timeliness Compliance: % of
records updated within your acceptable lag window
Red flags: No refresh timestamps; vague
answers about lag time; “continuous refresh” with no way to validate.
Uniqueness prevents the same real-world entity
from appearing multiple times. Duplicates inflate outreach, pollute reporting,
and break territories. RIAs are especially prone due to similar firm names,
multi-office structures, and advisor moves. This is an entity resolution
problem: knowing when two records are the same entity.
What to do: Run basic dependency and linkage
checks on a pulled segment:
●
Are CompanyIDs unique?
● Does CompanyID reliably map to a single CompanyName?
● Do person records map cleanly to one firm entity?
How to score:
● Duplication Rate: % of probable matches representing the same entity
● Consistency Rate: % without conflicting attributes across fields/tables
Red flags: Same entity described with
different attributes; weak matching logic or no uniqueness rules.
Many
errors happen at export/import boundaries and stay invisible if you only
inspect the final dataset. If the vendor data breaks during export → import,
the quality you paid for never reaches your team.
What
to do: Create a simple pilot data-flow view: vendor export (Source of Record) →
CRM (Source of Truth). Test relationships and failure points:
●
Do contacts/advisors correctly
link to accounts/firms?
●
Do parent/child firm relationships
survive import?
●
Do missing parents create orphan
records or dropped rows?
How to score:
● Integrity Pass Rate: % maintaining referential integrity
● Interface Error Rate: % errors/drops at export/import
Red flags: Broken links (people-to-firm,
contacts-to-accounts), dropped records, corrupted mappings.
Applied to multiple platforms, these checks
stop being theory and become a practical comparison method. Running the same
sampling, completeness analysis, freshness validation, de-duplication tests,
and workflow checks across vendors surfaces trade-offs that demos rarely
reveal.
One platform may win on coverage while
introducing duplicates or stale records; another may sacrifice completeness to
maintain cleaner entity resolution and tighter refresh. In RIA workflows, a
generic “feature comparison” often misses the point that the deciding factor is
how the data behaves after ingestion.
That’s also how any vendor evaluation should
be done: run the same five checks on the same RIA ICP slice, especially across
well-known, widely used brands, and compare observed performance accuracy,
usable completeness, refresh lag, duplication behavior, and integrity through
import. A recent AdvizorPro vs FINTRX comparison shows what
this looks like in practice.
Don’t “review findings.” Set pass/fail rules.
Run the mini-pilot, score it, and put dollars
against the defects. You’ll make a decision you can defend months later when
the dataset has to perform, not just demo well.