Lead Dedupe + Enrich Pipeline

A real production-shape data pipeline. Takes a raw CSV of leads, runs it through QuickFlo’s data steps — data.csv-to-items, data.dedup, data.scrub, data.enrich, data.map, data-store.bulk-set — and produces a clean, deduplicated, scrubbed, enriched dataset ready to push downstream. Same shape works for contact lists, sales leads, marketing outreach, customer migrations.

Workflow JSON

For demonstration the recipe takes the CSV, the DNC list, and the CRM data all inline in the webhook payload — that keeps the data flow visible end to end. In production you’d typically load the DNC list and CRM data from external APIs (Salesforce, Hubspot) or from cloud storage at the start of the workflow.

{
  "name": "Lead Dedupe + Enrich Pipeline",
  "initial": {
    "csv": "email,firstName,lastName,phone\njane@acme.com,Jane,Doe,5551234\nbob@acme.com,Bob,Smith,5555678\njane@acme.com,Jane,D.,5551234",
    "dncList": [
      { "value": "5559999" }
    ],
    "crmContacts": [
      { "email": "jane@acme.com", "company": "Acme", "region": "US", "tier": "gold" }
    ]
  },
  "steps": [
    {
      "stepId": "parse-csv",
      "stepType": "data.csv-to-items",
      "input": {
        "source": "content",
        "csv": "{{ initial.csv }}",
        "skipHeader": true
      }
    },
    {
      "stepId": "dedupe",
      "stepType": "data.dedup",
      "input": {
        "items": "{{ parse-csv.items }}",
        "key": ["email"],
        "keep": "first"
      }
    },
    {
      "stepId": "scrub",
      "stepType": "data.scrub",
      "input": {
        "items": "{{ dedupe.items }}",
        "against": "{{ initial.dncList }}",
        "fields": ["phone", "email"],
        "againstField": "value",
        "action": "remove"
      }
    },
    {
      "stepId": "enrich",
      "stepType": "data.enrich",
      "input": {
        "items": "{{ scrub.items }}",
        "from": "{{ initial.crmContacts }}",
        "matchField": "email",
        "fromMatchField": "email",
        "copyFields": ["company", "region", "tier"],
        "prefix": "crm_"
      }
    },
    {
      "stepId": "shape-records",
      "stepType": "data.map",
      "input": {
        "items": "{{ enrich.items }}",
        "map": {
          "key": "{{ $item.email }}",
          "value": "{{ $item }}"
        }
      }
    },
    {
      "stepId": "store",
      "stepType": "data-store.bulk-set",
      "input": {
        "tableName": "cleaned-leads",
        "records": "{{ shape-records.items }}"
      }
    },
    {
      "stepId": "respond",
      "stepType": "core.return",
      "input": {
        "webhookResponse": {
          "statusCode": 200,
          "body": {
            "imported": "{{ parse-csv.count }}",
            "deduped": "{{ dedupe.removedCount }}",
            "scrubbed": "{{ scrub.removedCount }}",
            "enriched": "{{ enrich.matchedCount }}",
            "stored": "{{ store.total }}"
          }
        }
      }
    }
  ]
}

Setup

Connections needed: none for the basic pipeline. Add a CRM connection if you swap the inline crmContacts for a real API lookup.

Trigger: add a Webhook trigger that accepts the CSV string, the DNC list, and the CRM reference data. For larger files, you’d typically use a form trigger with file upload or a scheduled trigger reading from a fixed location instead.

Field	Value
Name	`import-leads`
Method	`POST`
Authentication	On — generate a secret

How it works

inline CSV
  → parse into rows
  → dedupe by email
  → scrub against DNC list
  → enrich each row with the matching CRM record
  → reshape into bulk-set records
  → bulk-write to a data store
  → return a summary

The trick at the end is the shape-records step — data-store.bulk-set expects an array of { key, value } objects (not raw items), so we use a data.map step to wrap each enriched lead into the right shape with the email as the key and the whole row as the value.

Test it

curl -X POST https://run.quickflo.app/w/@your-org/import-leads \
  -H "Authorization: Bearer YOUR_WEBHOOK_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "csv": "email,firstName,lastName,phone\njane@acme.com,Jane,Doe,5551234\njane@acme.com,Jane,D.,5551234",
    "dncList": [{"value": "5559999"}],
    "crmContacts": [{"email": "jane@acme.com", "company": "Acme", "region": "US", "tier": "gold"}]
  }'

You should get back something like:

{
  "imported": 2,
  "deduped": 1,
  "scrubbed": 0,
  "enriched": 1,
  "stored": 1
}

What to customize first

Load the DNC list and CRM data from real sources. Replace the inline initial.dncList / initial.crmContacts with HTTP calls to your CRM (Salesforce, Hubspot) or to a DNC service. The rest of the pipeline doesn’t change.
Use composite-key dedup. Single-field email dedup misses near-duplicates with typos. Switch the dedup step’s key to ["firstName", "lastName", "phone"] or add a normalization step (data.map) first that lowercases emails and strips diacritics with the stripDiacritics filter.
Add a phone-cleaning pass. Before scrubbing, run a data.map step that converts each phone to E.164 format using the toE164 filter — that way “555-1234” and “(555) 1234” match the same DNC entry.
Use data.merge instead of data.dedup if you want to consolidate duplicates (e.g. keep all phones from the duplicate records as a phones array on one merged row) instead of dropping them.
Add data.explode → scrub → data.implode if records have multiple phone columns (phone1, phone2, phone3) and you want to scrub each one independently. See the Working with Data reference for the explode→implode pattern.

Scheduled Slack Digest — pairs nicely with this; have the digest report yesterday’s import counts