CleanCSV API Documentation

Upload a CSV, get clean data. No API key required.

Endpoints

MethodPathDescriptionLimits
POST/api/demo/cleanClean CSV (rate-limited demo)1 MB, 10 req/min
POST/api/demo/suggestSuggest dedup rules (rate-limited demo)1 MB, 10 req/min
POST/api/v1/cleanClean CSV (open, no auth)4.5 MB
POST/api/v1/suggestSuggest dedup rules (open, no auth)4.5 MB

Quick Start

Send a CSV as multipart form data with an optional JSON config:

bash
curl -X POST https://your-domain.vercel.app/api/v1/clean \
  -F "file=@data.csv" \
  -F 'config={"format":"json","dedup":{"columns":["email"],"strategy":"most_complete"}}'

Configuration

Pass as a config field in multipart form data, or as JSON body:

json
{
  "format": "json",
  "dedup": {
    "columns": ["email"],
    "strategy": "most_complete",
    "fuzzy": false,
    "threshold": 0.85,
    "normalize": true,
    "preview": false,
    "nulls_equal": true
  },
  "encoding": {
    "encoding": "auto",
    "fallback": ["windows-1252", "iso-8859-1"],
    "fix_mojibake": true,
    "normalize_unicode": "NFC",
    "strip_control_chars": true,
    "strip_invisible": true,
    "custom_mojibake_mappings": [
      { "from": "é", "to": "é" }
    ]
  },
  "null_handling": {
    "null_values": ["", "NULL", "N/A", "-"],
    "treat_empty_as_null": true,
    "type_inference": true,
    "fill": "leave",
    "per_column": {
      "age": { "cast": "int", "fill": "fill_median" },
      "status": { "fill": "fill_default:unknown" }
    }
  },
  "null_representation": ""
}

Parameters

FieldTypeDefaultDescription
formatjson | csvjsonOutput format
dedup.columnsstring[]allColumns to deduplicate by
dedup.strategystringfirstfirst | last | most_complete | merge
dedup.fuzzybooleanfalseEnable fuzzy matching
dedup.thresholdnumber0.85Fuzzy similarity threshold (0-1)
dedup.normalizebooleanfalseNormalize values before comparing
dedup.previewbooleanfalseReturn all rows + duplicate_groups
dedup.nulls_equalbooleantrueTreat two null values as equal in dedup
null_handlingobject{}Null detection and fill config (see below)
null_representationstring""How nulls appear in CSV output

Null Handling

null_handling Configuration

FieldTypeDefaultDescription
null_valuesstring[]17 defaultsStrings treated as null
treat_empty_as_nullbooleantrueConvert empty strings to null
type_inferencebooleantrueAuto-detect and cast column types
fillstringleaveGlobal fill strategy
per_columnobject{}Per-column overrides (fill, cast, null_values)

Fill Strategies

leave

Keep null as-is

drop_row

Drop entire row if any configured column is null

drop_column

Drop column if >50% values are null

forward_fill

Fill null with previous non-null value

fill_mean

Fill with column mean (numeric columns)

fill_median

Fill with column median (numeric columns)

fill_mode

Fill with most common value

fill_default:VALUE

Fill with a custom default value

Type Inference

When enabled, columns are auto-detected as int, float, boolean, or string using an 80% threshold.

Boolean values recognized: true/false, yes/no, 1/0, on/off, t/f, y/n

Important Notes

Keep Strategies

first

Keep the first occurrence in original order

last

Keep the last occurrence

most_complete

Keep the row with the most non-null/non-empty values

merge

Merge all rows in group — take the first non-null value per column

Encoding Issue Types

TypeDescription
mojibake_detectedMojibake patterns found but not fully repaired
mojibake_repairedMojibake successfully repaired
double_encoding_repairedDouble/triple UTF-8 encoding reversed
custom_mappings_appliedUser-defined mappings applied
bom_removedByte order mark stripped
control_chars_removedControl characters removed
invisible_chars_removedZero-width/invisible characters removed
unicode_normalizedUnicode normalization applied (NFC/NFD/NFKC/NFKD)
line_endings_normalizedLine endings normalized
encoding_fallback_usedFallback encoding used for better results

Response Examples

Clean Response (JSON)

json
{
  "meta": {
    "rows": 142,
    "columns": 5,
    "removed": 8,
    "encoding_detected": "utf-8",
    "encoding_confidence": 0.99,
    "encoding_issues": ["mojibake_repaired: known_mojibake_replaced"],
    "null_profiles": [
      {
        "column": "email",
        "null_count": 3,
        "null_percentage": 0.021,
        "top_null_values": [{ "value": "N/A", "count": 2 }],
        "inferred_type": "string"
      }
    ],
    "null_warnings": [
      { "column": "age", "message": "Column \"age\" appears numeric but has non-numeric values", "severity": "warning" }
    ],
    "type_map": { "id": "int", "email": "string", "age": "int", "active": "boolean", "name": "string" }
  },
  "data": [
    { "id": 1, "email": "alice@example.com", "age": 30, "active": true, "name": "Alice" },
    { "id": 2, "email": "bob@example.com", "age": null, "active": false, "name": "Bob" }
  ],
  "duplicate_groups": [
    {
      "reason": "exact match on email",
      "confidence": 1.0,
      "rows": [
        { "id": 1, "email": "alice@example.com", "age": 30, "active": true, "name": "Alice" },
        { "id": 3, "email": "alice@example.com", "age": 30, "active": true, "name": "alice" }
      ],
      "kept": { "id": 1, "email": "alice@example.com", "age": 30, "active": true, "name": "Alice" },
      "kept_row_index": 0,
      "droppedIndices": [2]
    }
  ]
}

Suggest Response

json
{
  "row_count": 150,
  "column_count": 5,
  "columns": ["id", "email", "name", "age", "active"],
  "suggested_rules": [
    {
      "dedupBy": ["id"],
      "fuzzy": false,
      "normalize": false,
      "reason": "\"id\" looks like a unique identifier"
    },
    {
      "dedupBy": ["email"],
      "fuzzy": false,
      "normalize": true,
      "reason": "\"email\" contains email addresses — normalizing for dedup"
    }
  ],
  "null_profiles": [
    { "column": "age", "null_count": 12, "null_percentage": 0.08, "top_null_values": [{ "value": "N/A", "count": 8 }], "inferred_type": "int" }
  ],
  "warnings": [
    { "column": "age", "message": "Column \"age\" has null-like placeholders: N/A", "severity": "info" }
  ]
}

Error Codes

StatusMeaningExample
400Bad requestInvalid config, missing file, bad strategy
413File too largeExceeds 1 MB (demo) or 4.5 MB (v1)
429Rate limitedDemo endpoints: 10 requests/minute