Insycle Blog

CRM Deduplication: Why Picking the Right Master Record is Critical

Written by Ryan Bozeman | Mar 14, 2021 7:06:00 PM

When you deduplicate and merge records in your CRM, how do you determine which record remains and which one gets removed? How do you know that you are picking the right ‘master’ record? Making the right decision helps you to avoid unnecessary problems that can impact your ability to effectively engage with a prospect or customer down the road.

Duplicate contact data, companies, and deals make it difficult for your marketing teams to inject data into marketing automation campaigns, cause sales teams to waste time and make mistakes in the CRM and when engaging with prospects, and hinders the ability of support teams to provide a consistently excellent customer experience to every customer.

Duplicate customer data is much more common than most companies realize. The average duplicate rate in a database can be as high as 20%-30%. According to research from SiriusDecisions, ”It takes $1 to verify a record as it's entered, $10 to cleanse and de-dupe it and $100 if nothing is done, as the ramifications of the mistakes are felt over and over again."

You know you want to take care of your duplicates problem. It seems like it should be easy. You’re just merging two (or more) similar records, right? No problem. You’ll just keep the ones with the most accurate, updated information.

But the more you dig into your duplicates problem, the more you find out that merging duplicate contacts, or any duplicate record for that matter, is a bit more complicated than it seems on the surface.

Some of the issues that you’ll run into as you go through the process of deduplicating CRM data include:

  • There are many different types of duplicates. Some are harder to find than others. We recently published an in-depth article about hidden duplicates you might miss when using basic deduplication techniques. There are partial matches. Misspellings. Typos. Title and suffix considerations. External ID considerations. It all amounts to a much more complicated process than you would originally think for complete CRM data deduplication.
  • Data loss is a concern and choosing the right master record is important. When you have customer data split between two different records, both records may contain important information about the customer that you’ll want to leverage in marketing and sales campaigns. In fact, both records may contain important information that you want to retain in the same field. But, picking and choosing which data is retained and which data is overwritten can be a tedious but critical process.
  • CRMs offer deduplication features but companies require scalable solutions as they grow. Most platforms — including HubSpot, Salesforce, and Intercom — have their own built-in deduplication features and processes. However, these features may not cover your bases as your customer data scales and deduplication becomes more complex. Companies should look toward solutions that can scale with them, offering the ability to continuously CRM data deduplication on an automatic, ongoing basis.

In this article, we’ll deep-dive into how to go about choosing the right master record when merging duplicates to improve CRM data deduplication effectiveness and minimize data loss.

Deduplication Explained: An Introduction to Key Terms

CRM deduplication is the process of merging duplicate contact data, companies, and deals in your CRM system. These duplicates may be exact match duplications of another record, but often are partial matches, meaning that there is only partial data overlap between the records.

In this article, we’ll be using a couple of terms, let’s describe them and explain what they mean:

  • Duplicate group: When you have two or more records that represent the same entity, we’ll refer to them as a duplicate group. For example, if you had four Jane Doe’s in your database from the same company, these would be a duplicate group of four. Your CRM may contain multiple duplicate groups. When people say ‘duplicates’ they may be referring to the total number of duplicates, or to the number of groups. To avoid confusion, we’ll always use “duplicate groups” when referring to a set of multiple records. For instance, if you had two Jane Doe’s and 3 John Smith’s, you would have five duplicate contact records and two duplicate groups.
  • Master selection. When merging a duplicate group, one of the records is designated as the ‘master.’ This means that it will be retained in the database after merging the records. Data from the other duplicate records will be merged into the master record and then the duplicate records will be discarded. Child records of the duplicate records, for example, notes and emails, will get reassigned or re-parented into the master record.

With this in mind, let’s dive into why it is so important that you pick the right master record when deduplicating.

Why Picking the Right Master Record is Important for CRM Deduplication

Picking the right master records ultimately determines the effectiveness of any CRM deduplication campaign. Luckily, while picking the right master record is important, it isn’t necessarily difficult.

Here are a few of the key reasons why choosing the right master record is important.

  • The master record determines where your newsletters will go. One duplicate group of two has john@acme.com and john@gmail.com for their emails — where would you rather your newsletters go? For B2B companies, it’s probably going to be the work email, for B2C organizations, like schools, often the private email is more desirable because people change jobs and work email addresses become invalid over time.
  • Syncing with external ERP or CRM system (ID considerations). Is your data currently syncing with other systems? Depending on the integration, this may mean that you have to match record IDs to ensure that the data ends up in the right place in both systems and avoid creating more duplicates as a result of not having the IDs matched appropriately. You want to retain the right identifier for reconciliation purposes between multiple systems. That is, a company would have a unique global ID number across all of your CRM, ERP, and support systems.
  • Different CRMs have different merging logic. The way that CRMs determine what data is retained and overwritten differs depending on the platform. This doesn’t just apply to customer data, either. Consider how the create date, fields’ values, associations, and workflows are merged as well. Choosing the same master record can mean different outcomes on different platforms.
  • Deduping tools may have their own login, in addition to CRM logic. Using a third-party deduplication tool can mean adding another layer of merging logic on top of what already exists in your CRM. They may work the same, similar, or offer the ability to customize your logic so that you have full control over outcomes.

Choosing the right master record is often a very important decision, although that is not always the case. If you accidentally imported the same .CSV twice, because all of your duplicate groups are exact matches, choosing the right master selection here is probably not important. There may be other situations where your records are shallow and you are simply concerned about searchability and wouldn’t experience any negative side effects from designating either record as the master.

What Happens When You Pick the Wrong Master?

Depending on your internal data processes, there may be a clear, rule-based way that you choose a master (such as the first created record) that will effectively merge duplicate entries while retaining important data.

Let’s consider an example. Let’s say you have two records with different emails, but the same address, phone, and other data.

If you pick the record with jane@gmail.com instead of jane@acme.com, all you’ll need to do to fix the issue is change the primary email on the contact. Both emails are retained. You just make a simple swap and all of the other values — notes, email, etc. — are retained.

But, with that said, there are some effects that you’ll feel from choosing the wrong master record. Those include:

  • Accidentally overwrite critical customer data. Your valid customer data might be split up between multiple duplicate entries. Merging everything into one singular record without further consideration might lead to you deleting pertinent customer data that could be used to aid your marketing or sales initiatives.
  • Create confusion among your teams. Two duplicates might have different owners as sales reps. When you merge them, all of a sudden one owner may lose an account while another as a load of new data for the account dumped onto them. Now they have to re-evaluate their approach to that account.
  • The wrong choice can create syncing issues. For example, if you have HubSpot and Salesforce synced, what happens if you merge duplicate contact records into a master record that has an incorrect Salesforce contact ID? Selecting the wrong master record risks breaking the sync altogether.

The best way to merge duplicates is through custom merge processes that help you to define a step-by-step process for choosing master records, and avoiding leaving it up to guesswork each time that you deduplicate.

Why Determining the Right Master Record Can Be Complicated

Sometimes determining the right master record can be complex. Across, data can be incomplete or differ in a variety of different ways. Some records may be missing critical data, but offer more complete data in other fields.

Let’s assume that you have two different records that have been identified as duplicates. They might look a little something like this:

First Name

Last Name

Email

Company

Job Title

Company Size

Dawn

Smith

d.smith@acme.com

Acme Inc.

CMO

50

D

Smith

d.smith@gmail.com

Acme Inc. 

Chief Marketing Officer

N/A

 

This presents a pretty straightforward duplicate merging situation and a simple choice for your master selection.

The top record contains a more complete record for this prospect. It includes Dawn’s full first name, while the second record includes only an initial. The top record features her business email, rather than a personal GMail account. It also includes industry and company size data that isn’t present in the duplicate entry.

Here, choosing the master record is easy. One record is clearly the right choice.

There are often situations, however, where the ‘right choice’ is not as clear. What if the two records looked like this:

First Name

Last Name

Email

Company

Job Title

Company Size

Dawn

Smith

d.smith@acme.com

Acme Inc.

CMO

50

Dawn

Smith

d.smith@acme.com

Acme Inc. 

VP of Marketing

100

 

Now the waters get a little murkier.

Here the records are the same, except one lists Dawn’s job title as “CMO” while the other lists it as “VP of Marketing.” The company size has different figures as well. Which record is the right one? There will be differences in the types of marketing messaging Dawn will receive from your marketing and sales teams depending on her role.

But it can get even more complicated.

First Name

Last Name

Email

Company

Job Title

Company Size

Dawn

Smith

d.smith@acme.com

Acme Inc.

CMO

50

D.

Smith

d.smith@gmail.com

Acme Inc. 

VP of Marketing

100

Dee

S.

dawn.smith@acme.com

Acme

Chief Marketing Officer

N/A

 

Three different first names. Two different last names. Three different emails. Two different companies. Two different job titles with standardization issues. Conflicting company size data. All spread across three different “duplicate” customer records.

Most companies default to choosing a ‘master record’ with the earliest creation date. That will be the right choice for many of your duplicates but will cause issues with a certain percentage every time.

Choosing the right master record here is important but tricky. You want to make sure that you have the most accurate customer data. There are several possibilities here. You know that, at minimum, one record contains inaccurate data.

But simply defaulting to the first created record might be an issue. Based on these three records, it seems plausible that Dawn started out as a CMO at Acme and was later promoted to VP of Marketing as the company grew. There is a chance that the middle record may be the most accurate record while also being the most recently created. But that doesn’t mean the other records don’t have more accurate or updated data in some specific fields.

If you were to merge these records using the earliest creation date, you risk losing an accurate profile for Dawn.

This example illustrates how complicated deduplication can be and why choosing the right master record is so important for retaining accurate, quality data. It can get more involved, as you’ll often see more than three duplicates for a single contact or company.

Salesforce to HubSpot Syncing and Deduplication

Salesforce and HubSpot are two of the most popular, feature-rich CRM systems available today. Naturally, because HubSpot originally was a marketing-focused platform and Salesforce is sales-focused, the two have become a natural pairing for many companies. Those that do use both easily recognize the benefits of having reliable data syncing between the two.

Some small but aggravating data problems can come from syncing Salesforce and HubSpot. Bad data can break the sync altogether. If the sync is broken for an extended period, cleaning and reconciling that data can be a huge pain. Additionally, parent-child hierarchies between the two systems can be complicated and error-prone.

There are also sometimes issues with duplicates that arise once the Salesforce to HubSpot sync is in place. This is a huge problem, particularly for account-based marketing teams that rely on accurate contact-to-company associations.

Second, Salesforce duplicate leads will often sync with HubSpot, creating duplicates in both systems. When there are duplicate records in Salesforce that only share a name, or have a different email convention like dawn.smith@acme.com, dawns@acme.com, or dawn@acme.com, HubSpot will not identify these as duplicate records.

Duplicates are a complicated enough issue on their own, but when taking into account different cross-platform integrations and the complexities that come with syncing that data, the issue can become an even bigger headache.

To deduplicate records when the Salesforce to HubSpot sync is active, choose a master record that has either the Salesforce Account ID and Salesforce Contact ID populated. 

Now let’s look at some of the most common ways that companies typically use to choose a master record when merging duplicates.

Common Practices for Picking a Master Record

There are some common ways to pick a master record. Of course, the right way to choose a master record depends entirely on your specific situation. There may be something about the way that your company collects and utilizes customer data that makes one a better choice over another. There also may be different choices that would be considered ‘right’ for individual sets of matching duplicates.

Some of the most common practices for picking a master record when merging duplicates include:

  • Last activity or engagement. For contacts, merging duplicate data into the record that had the last activity may be the best way for ensuring the record that is being used by your teams has the most up-to-date information.
  • Most recently modified record. The record that received the most recent modifications is more likely to have the most up-to-date data.
  • Lifecycle stage. Choosing a master that has a lifecycle stage of “customer” can be a simple way to ensure that you choose the most relevant, updated record for a given contact.
  • Contact owner exists. If only one of your records in a duplicate group is assigned to an owner, the probability is high that that record is the most complete and would be the right choice for master.
  • Most complete record. It stands to reason that you would want to keep the record with the most data. While not always the right choice, this is seen as a safe choice when merging duplicates.
  • Personal vs work email address. For B2B applications, you’ll always want to make sure you are using the work email address for prospects and customers. For B2C, a personal email address is ideal. Scanning for contacts that use a free email provider (Gmail, Hotmail, etc.) can be a simple but effective way to make sure your master selection has the right email.
  • Field values. Field values, such as whether a particular field is populated or not can be an effective way to choose a master. For numeric values — such as the record with the highest number of clicks or lowest number of bounces — can help you to discern which record to keep based on engagement.
  • First created record. The duplicate record that was in your system first. These records, should, in theory, contain the most data for the prospect. That doesn’t always end up being the case.
  • Last created record. In some situations, the record that was created most recently may include the most up-to-date information about a contact, organization, or deal.

The right way to choose a master record depends on how your company collects, stores, and utilizes your customer data. But, it is critical that you make the right choice, as there are some serious downsides to choosing rules for master records at random when bulk merging duplicates.

Customizing Merge Behavior with Multi-Step Master Selection Rules

Customizing merge behavior can help you to limit mistakes, make better merging decisions, and take the guesswork out of deduplication processes.

Ideally, you’d have a multi-step process for determining master records. If the first condition is met, the master record is chosen. If it is not, then you continue down your list of master selection rules.

For instance, the master record selection rules for merging duplicate Salesforce companies might look something like this:

  1. Salesforce Account ID. When merging records, start by checking for IDs for other platforms. If a record is the only with an ID, it means that it’s the one that syncs with Salesforce and that is the record that you’d want to keep, and have all other records merged into it. This helps to ensure your integrations remain intact and simplifies post-deduplication cleanup.
  2. Company Owner Exists. If no record has a Salesforce Account ID associated with it, then look for the record that has an owner assigned to it.
  3. Earliest record creation date. If no records have a Salesforce Account ID or an owner assigned to them, then merge all duplicates into the record with the earliest creation date.

A multi-step process allows you to ensure that you have your bases covered and simplifies the master record selection process for your team.

Recommended Best Practices for Effective Deduplication Process

Like any data cleansing process, deduplication and master selection have some recommended best practices that you should follow to give yourself the highest chance of success.

First, companies should have processes in place for generating previews of their deduplication results. This is especially true when you are first getting started and gaining an understanding of how your particular deduplication tools and processes work.Once your comfortable, you should look to cut back on manual review process and institute deduplication automation so that you can free yourself to focus on other data management tasks.

Reporting should also play a critical role in deduplication. First, as a way to share, collaborate, and gather feedback form your team. A report can help you to identify ways to improve your master selection and other deduping processes. Reports also have the added benefit as serving as a backup and audit trail for if anything did happen to go wrong.

Before making any live changes to your CRM data, it’s always a good idea to preview your data to see how your master selection rules affected changes.

Insycle — Flexible Master Record Selection Rules & CRM Deduplication

Insycle is the ultimate deduplication software, making it easy to identify duplicate records and merge them in bulk using smart master record selection rules.

Once you determine the right way to choose a master record in your data set, Insycle makes it easy to create a straight-forward, multi-step process for master record selection.

Here’s an example:

 

Here, just as in our earlier example, we are checking a list of duplicates to see if a Salesforce ID exists. If a record does have one, all other duplicate records will be merged into it as the master.

If none of the records have a Salesforce Account ID, or multiple do, then we move down to the next step — the record with the highest number of associated deals. Then company owner, and finally, creation date. This ensures that you go through the list of rules that make the most sense based on your CRM data, while using the final “creation date” rule as a catch-all for records that don’t meet the other criteria.

Then, Insycle allows you to both preview the changes to your data before they go live (and see how your master selection rules end up playing out within your database), and generate a .CSV report of the changes.

Insycle makes it easy for companies to create multi-step, rule-based master selection processes. This is helpful because:

  • Rule-based master selection helps companies to reduce data overwriting in the duplicate merging process.
  • Reliably merge duplicate contacts, companies, and deals in bulk.
  • Companies can create detailed rules that reflect how their companies collect and use their customer data.
  • Preview changes to your data before they go live to ensure you avoid mistakes.
  • After running a dedupe process, you’ll receive an email with a full report detailing the changes.
  • Any deduplication process template can be automated and scheduled to run at set intervals, putting your deduplication process on autopilot and freeing your team to focus on other areas.

Are you tired of picking the master record by hand time after time, and manually analyzing fields of each duplicate record in order to pick the right master?

Sign up for Insycle’s 7-day trial and institute process-based deduplication into your customer data management strategy.