How to match customers accounts from different databases (different ids)?

I have Database 1, and Database 2. Database 1 is my native data on clients and Database 2 is an external database being ported over from a different company. I'm trying to find what customers in our database (Database 1), exist in their database (Database 2).

Database 1 has: customer_id, customer_name, customer_email, customer_phone, customer_location, etc...

Database 2 has: customer_key, customer_name, customer_email, customer_phone, customer_location, etc...

The question is: Find a way to link Database 1 to Database 2 so that each person in Database 1's information is joined with their respective information in Database 2.

The problem is: these databases have two different primary keys (because they are from different sources). So you can't just simple query them based on the "customer_id" because "customer_id" does not exist in Database2 and vice versa.

Initially I think of using a variable that is unique (i.e email) and matching on that...

But... what if the email they provided in one of the databases is "shared", meaning, in Database 1, that email is actually just an email the business shares for a bunch of different departments... Then it's not really helpful in linking back to a single person. Or if in Database 2 they provided a shared email, so many different records if grouped together on email will show they have multiple names, and phone numbers associated with them... This gets even more complicated when all the variables in the database (phone, name etc...) also have this "shared" problem for some of their entires.

People can also change companies (changes their email and possibly phone) or how people change their first and last name (more so last name).

The only solution I can think of at the moment is to use some sort of combination of verifying the email matches, then verifying that phone, name and location do as well. The more items that they match on between databases, the more certain one can be that "customer a" in Database 1 is "customer 123" in Database 2.

I imagine that this can be a problem in real life at companies that have millions of customer accounts and decide to integrate some third party datasource (which has data they can link back to some of their clients). How do you guys deal with this?

Read more here:

Content Attribution

This content was originally published by Jack Muffintop at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: