Sanctions Testing Dataset

A comprehensive testing dataset designed to validate, calibrate, and benchmark your sanctions screening solution. Identify gaps, reduce false positives, and ensure your screening system performs optimally across real-world name variations.

Our Mission

Our mission for the sanctions testing dataset is to help collect benchmarking statistics to better help the industry understand the screening solutions available in the market. By providing a standardized testing framework, we enable organizations to make informed decisions about their screening technology and drive industry-wide improvements in sanctions compliance accuracy.

Why Test Your Screening Solution?

Even the best sanctions screening systems have blind spots. Name matching is inherently complex—variations in spelling, transliteration, phonetics, and data entry errors can cause your system to miss critical matches or flag excessive false positives. Without systematic testing, you won't know where your screening solution excels and where it fails.

Our testing dataset provides a controlled, comprehensive benchmark to measure your screening engine's performance across the full spectrum of name matching challenges you'll encounter in production.

What's Included in the Testing Dataset

1. Exact Matches

Baseline test cases where the screened name exactly matches the sanctioned entity name, character-for-character. These should always be caught by any screening system.

Example: "Vladimir Putin" → "Vladimir Putin" (exact match)

Use case: Validates that your screening system is functioning at the most basic level. If exact matches fail, there are fundamental configuration issues.

2. Edit Distance Variants (1, 2, 3 character changes)

Test cases with minor spelling variations—typos, transpositions, missing or extra characters. Real-world data entry errors often introduce 1-3 character differences that should still trigger a match.

1-character edit: "Vladimir Putin" → "Vladimr Putin" (missing 'i')

2-character edit: "Vladimir Putin" → "Vladmir Putni" (transposition + typo)

3-character edit: "Vladimir Putin" → "Vladimr Putinn" (multiple errors)

Use case: Identifies your system's tolerance for common data entry errors. If your system only catches 1-edit variants but misses 2-3 edits, you may need to adjust fuzzy matching thresholds to reduce false negatives—but be careful not to introduce too many false positives.

3. Phonetic Matches

Names that sound similar but are spelled differently. Phonetic algorithms (Soundex, Metaphone, NYSIIS) help catch these, but many screening systems struggle with phonetic variations, especially across languages.

Example 1: "Vladimir Putin" → "Vladimer Poutine" (sounds similar)

Example 2: "Ali Hassan" → "Ally Hasan" (phonetic equivalent)

Example 3: "Mohamed" → "Muhammad", "Mohammed", "Muhammed" (common variants)

Use case: Tests whether your screening engine uses phonetic matching algorithms. If these fail, you're likely missing legitimate risks that appear under phonetically similar but orthographically different spellings.

4. Common Transliterations

Cross-script name variations—converting names from Cyrillic, Arabic, Chinese, or other scripts into Latin characters. There are often multiple valid transliterations for the same name, and your screening system needs to catch all of them.

Cyrillic to Latin: "Владимир Путин" → "Vladimir Putin", "Wladimir Putin", "Vladimer Poutine"

Arabic to Latin: "محمد" → "Mohamed", "Muhammad", "Mohammed", "Mohamad"

Chinese to Latin: "习近平" → "Xi Jinping", "Shi Jinping"

Use case: Critical for global screening. If your system only catches one transliteration variant but misses others, you have a significant compliance gap. Our dataset includes the most common transliteration patterns across major languages.

5. Real-World False Positive Scenarios

Names that are similar to sanctioned entities but are NOT actual matches. These help you measure your false positive rate—too many false positives slow down operations and erode trust in your screening system.

Example 1: "John Smith" (common name, not the sanctioned "John Smith")

Example 2: "Ali Mohammad" (common name, not a specific sanctioned individual)

Use case: Helps you tune your screening thresholds. If you're flagging too many false positives, your matching is too sensitive. If you're missing true positives, it's too strict. Our dataset helps you find the right balance.

How to Use the Testing Dataset for Calibration

Step 1: Baseline Testing

Run the exact match test cases through your screening system. All of these should return positive matches. If any fail, investigate your system configuration immediately.

Step 2: Measure Fuzzy Matching Performance

Test the 1-, 2-, and 3-edit distance variants. Track your catch rate for each level. Most systems should catch 1-edit variants reliably, but performance often degrades at 2-3 edits. Decide what tolerance level is acceptable for your risk appetite.

Step 3: Validate Phonetic & Transliteration Coverage

Run phonetic and transliteration test cases. If your system misses these, consider enabling or tuning phonetic algorithms (Soundex, Metaphone) or adding transliteration normalization layers. This is especially critical for international operations.

Step 4: Benchmark False Positive Rate

Use the false positive test cases to measure your system's precision. If your false positive rate is too high, you may need to tighten matching thresholds, add additional filtering rules, or incorporate more contextual data (e.g., dates of birth, nationalities) to disambiguate.

Step 5: Iterate and Optimize

Adjust your screening parameters based on test results. Re-run the full test suite after each change to measure improvement. Track metrics like:

  • True Positive Rate (sensitivity): % of actual matches caught
  • False Positive Rate: % of non-matches incorrectly flagged
  • Match confidence scores: Distribution of scores for true vs. false positives

Who Should Use This Testing Dataset?

Financial Institutions & Fintechs

Ensure your AML/CFT screening meets regulatory expectations. Demonstrate to auditors and regulators that you've rigorously tested your sanctions screening solution.

Compliance & Risk Teams

Validate vendor solutions or in-house screening engines. Identify gaps in coverage before they become compliance incidents.

Software Vendors & Integrators

Benchmark your screening product against industry standards. Demonstrate to customers that your solution handles real-world name matching challenges.

Developers & Data Engineers

QA test screening algorithms during development. Use the dataset as a regression test suite to ensure changes don't degrade performance.

Ready to Test Your Screening Solution?

Our Sanctions Testing Dataset is available for enterprise customers and partners. Contact us to request access and receive detailed documentation on test case structure, expected results, and calibration best practices.