I'm running a merge between two tables with around 12 million entries each. The MySql merge process took far too long (no sign of obvious progress after 5 hrs), so I'm using my own custom python function.
This function uses
df_1 = pd.read_sql("SELECT * FROM SCHEMA.TABLE_1 WHERE UNIQUE_REFERENCE_NUMBER IN (1,2,3,..,100)", conn)
df_2 = pd.read_sql("SELECT * FROM SCHEMA.TABLE_2 WHERE UNIQUE_REFERENCE_NUMBER IN (1,2,3,..,100)", conn)
Then merges the two dataframes together, and uploads the output to a new merged table.
The problem is that each select from query takes around 1 minute to run. This means a total run time of 2,000 hours which is wayy to long!
An example of the
100060764680 in varchar(20) format.
Does anyone have any suggestions for either:
- Increasing the speed of the current process
- A new process that would let me quickly merge these two huge tables together?