Looping over list of ID’s and Pass into SQL(Teradata) and writing excel file using Pandas. Memory Issue

I'm trying to accomplish a task which I've done before but I'm running into Memory Issue.

Here are the steps I'm performing:

  1. Get a distinct List of PID (ID's associated with Prov) -> '''select distinct PID from TABLE_A''' - Total Distinct Count -- 1100

  2. Using pd.read_sql add these distinct values into a df

  3. Loop through the list of distinct PID pass it into a SQL and read it into an excel sheet for matches with the ID. So every ID has its own excel sheet created with its data

     for i in PID:
    
         sql_var = f"""select * from table_B where PID = '{i}'""".replace("\n","")
         df1 = pd.read_sql(sql_var, connection)
         writer = pd.read_excel('{i}'.format(i.replace('/',' ') + 'xlsx', engine='xlsxwriter')
         workbook = writer.book
    
  4. Since the number of unique values are 1100, and the TABLE_B has 35 Million rows.

So when the 1100 unique values hits the database again and again, I run into Memory issue.

How can I solve this memory issue? I've read a couple of things about chunksize in pd.read_sql. But not sure how I will append. If its not too clear I'm fairly new to using python. Thank you for any help in advance.



Read more here: https://stackoverflow.com/questions/66336746/looping-over-list-of-ids-and-pass-into-sqlteradata-and-writing-excel-file-usi

Content Attribution

This content was originally published by vishrut jain at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: