I'm running a merge between two tables with around 12 million entries each. The MySql merge process took far too long (no sign of obvious progress after 5 hrs), so I'm using my own custom python function.
This function uses
df_1 = pd.read_sql("SELECT * FROM SCHEMA.TABLE_1 WHERE UNIQUE_REFERENCE_NUMBER IN (1,2,3,..,100)", conn)
df_2 = pd.read_sql("SELECT * FROM SCHEMA.TABLE_2 WHERE UNIQUE_REFERENCE_NUMBER IN (1,2,3,..,100)", conn)
Then merges the two dataframes together, and uploads the output to a new merged table.
The problem is that each select from query takes around 1 minute to run. This means a total run time of 2,000 hours which is wayy to long!
An example of the UNIQUE_REFERENCE_NUMBER
is 100060764680
in varchar(20) format.
Does anyone have any suggestions for either:
- Increasing the speed of the current process
- A new process that would let me quickly merge these two huge tables together?
Thanks!
Read more here: https://stackoverflow.com/questions/66339768/mysql-merging-two-data-sets-with-12-million-entries-each
Content Attribution
This content was originally published by Alex Law at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.