Skip to main content
Back to Blog
Automation7 min read01.07.2026Max Fey

You didn't create the duplicate. Your automation did, because you told it to.

Every order spawns a new contact, even when the customer already exists. Why automations fill a CRM with duplicates, and how a matching key, an upsert, and a uniqueness rule stop them before the cleanup bill arrives.

The cleanup bill always arrives later

No one sets out to fill a CRM with duplicates. It happens quietly, one record at a time, and by the time anyone notices there are four entries for a customer who should have one, with no obvious way to tell which is real.

The shape of it is almost always the same. A business connects its shop, its web forms, or its lead source to a CRM, and every incoming order creates a contact. It works beautifully in testing and for the first few weeks in production. Then a returning customer orders again with a slightly different email address, and the automation dutifully creates a second contact. Run that for a year and the sales team quietly stops trusting the data.

The automation did nothing wrong

This is the part people get backwards. The automation didn't malfunction. It carried out the instruction it was given, and that instruction was create a new record. It has no memory, no sense of who already exists. Every order that arrives is a stranger, unless you explicitly tell the workflow to check first.

The missing building block has a name: upsert. Look for the record before you write, then decide whether to create or update. Most no-code platforms offer it, usually as a Create or Update or a Find or Create action. It just isn't the default. The default is Create, because it is simpler and it never fails in a test. In a test there is never a second person with the same address.

Someone has to decide what the same customer means

This is where the real work sits, and it is a business question long before it is a technical one. Before an automation can avoid duplicates, a human has to define how two records count as the same person or the same company. That definition is the matching key, and everything downstream depends on it.

Email is the usual candidate. It is better than nothing, and it reaches less far than you would hope. People have more than one address. A company orders through info@ one week and through an assistant's personal inbox the next. Add a stray capital letter or a trailing space and an exact comparison no longer sees the same value. Phone numbers are worse, because one number turns up in five formats, with country codes, leading zeros, spaces, and brackets.

Company names are a trap of their own. Meyer Ltd, Meyer Limited, and meyer ltd are obviously one business to a human and three different strings to an exact match. Use the company name as your only key and you get either duplicates, because the spelling drifts, or false merges, because two genuinely separate firms with the same name collapse into one record.

The failure that only shows up under load

Most duplicates come from the obvious gap: the workflow never checks at all, so it always creates. That one is easy to fix with a lookup before the write.

The harder case is timing, and it is worth understanding because it stays hidden until your volume grows. Two events arrive almost at once, say a customer submits a form and places an order a minute later. Both runs search for the contact at the same moment, both find nothing, and both create. The lookup was correct. It just ran before the other process had finished writing its record. You cannot fully prevent this with search logic alone, because the two runs are blind to each other. What stops it is a uniqueness rule in the database itself, a field that refuses to store the same value twice, so that a simultaneous request is rejected instead of duplicated.

How we actually build it

When a data flow creates contacts, the order is always the same. First, define what makes a customer unique, before building a single step. For consumers that is often the normalised email. For companies it is a combination such as normalised name plus postcode, or better, an external identifier like the customer number from the shop.

Then normalise before comparing. Lowercase the email and strip whitespace. Reduce the phone number to digits with a country code. Drop the legal suffix and the double spaces from company names. This one step removes most of the duplicates that come from spelling variants, and it costs almost nothing.

Then search before writing. If a contact with that key exists, update it; if not, create it. Make sure the workflow handles the case where the search returns more than one match, because that already means you have duplicates, and it should flag them rather than silently picking the first.

Finally, add the uniqueness rule in the target system for the timing case. When the database refuses a second record with the same email, the automation catches the conflict and updates instead. It is the only layer that holds when two things happen at once.

Why prevention is cheaper than the cure

The reason to do this up front is the asymmetry. Creating duplicates is free and automatic. Untangling them is expensive and manual.

Once two records exist for one person, data attaches to both. An order on one, a support ticket on another, a note from a phone call on a third. Merging now means deciding which address wins, which history survives, and which linked items have to move across. A few dozen duplicates is an afternoon. A few thousand piled up over two years is a project with its own budget.

And there is the damage while the duplicates are alive. Sales calls a customer without seeing that a colleague already owns the relationship. An invoice goes to the old address. A cancellation is missed because it sits on the second record. Marketing counts one customer three times and the numbers stop meaning anything. A duplicate is not a cosmetic flaw in a database. It feeds bad decisions to people who trust the numbers.

The pattern underneath

An automation that writes without first searching is a duplicate generator on a delay. Day one looks fine, because there is no second record yet for the problem to show up against. The fault grows with use, and it surfaces at exactly the point where enough data has piled up to make the cleanup hurt.

So the question to answer before any workflow that creates people or companies is short: how does this workflow know it has seen someone before? If you can't answer that clearly, it will create a new record every time, reliably, until someone in sales calls to ask why a customer now shows up three times.

#Deduplizierung#Duplikate#CRM#Datenqualität#Upsert#Matching#Automatisierung#Make#n8n