Strip non-printable characters from imported text — formula tutorial

Real spreadsheet text is messy: trailing spaces from a paste, mixed case from form submissions, invisible whitespace from a CSV import. This tutorial walks through the cleaning toolkit — TRIM, CLEAN, SUBSTITUTE, PROPER, LOWER, UPPER — and the order in which to apply them so you do not chase phantom mismatches in your lookups.

Start with TRIM and CLEAN

=TRIM(CLEAN(A2))

TRIM removes leading, trailing, and repeated internal spaces. CLEAN strips non-printable characters (tabs, line breaks, the byte-order mark from CSVs). Apply both to every imported text column before you do anything else — half of all "why does my VLOOKUP fail" questions trace back to a stray space or invisible character that TRIM+CLEAN would have removed.

SUBSTITUTE for surgical changes

=SUBSTITUTE(A2, "–", "-")

SUBSTITUTE replaces specific substrings (en-dashes with hyphens above). Unlike REGEXREPLACE it is case-sensitive and does not interpret special characters, which makes it safer when the thing you are replacing might collide with a regex metacharacter. Stack SUBSTITUTE calls for multiple fixes: =SUBSTITUTE(SUBSTITUTE(A2, CHAR(160), " "), " ", " ") handles non-breaking spaces and double spaces in one cell.

Case normalization

LOWER and UPPER do exactly what they say. PROPER capitalizes the first letter of every word — useful for normalizing names but treacherous on data like "Mary McCarthy" ("Mary Mccarthy") or product codes ("AB-100" → "Ab-100"). For names specifically, consider a helper that lowercases everything then uppercases just the first letter of each word using regex.

Order matters

Apply cleaning in this order: trim and clean → substitute known weirdness → case normalization → final transformation. Reversing the order means TRIM operates on a pre-substituted string and may miss spaces introduced by the substitution. Build the chain in a helper column and only retire the helper once you trust the result.

Worked examples

Standard import-cleanup chain

=TRIM(CLEAN(SUBSTITUTE(A2, CHAR(160), " ")))

Replaces non-breaking spaces, then trims, then strips non-printable characters.

Normalize a phone number to digits only

=REGEXREPLACE(TRIM(A2), "[^0-9]", "")

Trim first to drop padding, then strip everything that is not a digit.

Fix mixed case names

=PROPER(LOWER(TRIM(A2)))

Lowercase first to neutralize ALL CAPS, then PROPER capitalizes per word.

Common mistakes (and how to avoid them)

Trusting the eye: a cell that looks empty often contains a single space or a non-breaking space. Use =LEN(A2) to verify.
Regex-replacing things that have a literal SUBSTITUTE answer — slower and more fragile.
Running PROPER on identifiers ("AB-100" → "Ab-100") and breaking your lookups.
Cleaning in place by copy-pasting values back over the source. Keep a clean column next to the dirty one until you are sure.

Tips that pay off later

CHAR(160) is a non-breaking space; CHAR(8203) is a zero-width space. Both are common in pasted web content.
UNICODE(MID(A2, n, 1)) tells you the codepoint of the n-th character — invaluable for diagnosing invisible junk.
Cleaning one cell at a time is fine for ad-hoc work; for production, wrap the chain in ARRAYFORMULA and own the column.

When to reach for something else

If your data is consistently dirty in the same way every import, fix it at the source (Apps Script on import, or a Forms validation rule) — cleaning the same garbage every time is a poor use of formula bandwidth.