MySQL merging two data sets with ~12 million entries each

I'm running a merge between two tables with around 12 million entries each. The MySql merge process took far too long (no sign of obvious progress after 5 hrs), so I'm using my own custom python function.

This function uses

df_1 = pd.read_sql("SELECT * FROM SCHEMA.TABLE_1 WHERE UNIQUE_REFERENCE_NUMBER IN (1,2,3,..,100)", conn)

df_2 = pd.read_sql("SELECT * FROM SCHEMA.TABLE_2 WHERE UNIQUE_REFERENCE_NUMBER IN (1,2,3,..,100)", conn)

Then merges the two dataframes together, and uploads the output to a new merged table.

The problem is that each select from query takes around 1 minute to run. This means a total run time of 2,000 hours which is wayy to long!

An example of the UNIQUE_REFERENCE_NUMBER is 100060764680 in varchar(20) format.

Does anyone have any suggestions for either:

  1. Increasing the speed of the current process
  2. A new process that would let me quickly merge these two huge tables together?


Read more here:

Content Attribution

This content was originally published by Alex Law at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: