Your CSV Is a Mess. Here's How to Fix It in 10 Minutes.
Every data project starts the same way: someone hands you a CSV and says "the data's all there." You open it and find 47 different date formats, phone numbers stored as scientific notation, random blank rows, and encoding that turns every accented character into ????.
Here's my cleanup checklist, refined over hundreds of messy CSVs.
Step 1: Fix the encoding
If you see é, ü, â€", or ??? characters, you have an encoding problem. The file was probably saved in one encoding and opened in another. The fix: open the file in a text editor (VS Code, Notepad++), check the current encoding in the bottom bar, and re-save as UTF-8. This alone fixes 80% of "weird character" issues.
Step 2: Standardize dates
I've seen CSVs with dates in the same column formatted as: 03/15/2026, 2026-03-15, March 15 2026, 15-Mar-26, and 15.03.2026. All in the same file. Pick ISO format (YYYY-MM-DD) and convert everything. It sorts correctly, it's unambiguous, and every system understands it.
Step 3: Kill the blank rows
Open the file, sort by the most important column, and delete any row where that column is empty. In Excel: select column A, Data → Filter, uncheck "Blanks", select visible rows, delete. Takes 10 seconds.
Step 4: Fix Excel's "helpful" conversions
The most infuriating CSV issue: Excel converting long numbers to scientific notation (123456789012 becomes 1.23E+11) and short numbers that look like dates to actual dates (1-2 becomes January 2nd).
Prevention: import the CSV using Data → From Text/CSV instead of double-clicking it. This lets you set each column's data type before Excel touches it. For phone numbers and IDs, always set them as "Text."
Step 5: Remove duplicates
In Excel: Data → Remove Duplicates. In Python: df.drop_duplicates(). In command line: sort file.csv | uniq. Pick your weapon. Just do it before you spend three hours analyzing data that's double-counted.
If you're dealing with this regularly, upload your messy CSV to our Data Analyzer — it auto-detects and flags most of these issues before you even ask.