Pandas Chunksize

I have the following code which appends each chunksize to a master excel file.

df = pd.read_sql(query,cnxn,chunksize=10000)
for chunk in df:
    chunk.to_excel(r'C:\File\Path\file.xlsx', mode='a', sep=',',encoding='utf-8')

The issue I'm running into is that I'm expecting a result of around 8 million rows. With excel's limit of about 1 mil rows, I am considering two options:

  1. For every 1mil rows, add a new sheet and continue the process. This is the preferred method.
  2. Creating a new workbook for each set of 1mil results

Any recommendations for which approach is best? I know I'll need to modify the mode='a' part of the code.

Here is the error message I'm receiving:

ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information. (8623) (SQLExecDirectW)')

Thank you in advance

EDIT: some further clarification

The SQL query is using a list of 200,000 strings as my key and pulling all rows that contain at least one of the strings. There are multiple rows with the same string identifier which is why i'm expecting a result of about 8 million. This is also why I believe I'm getting the programming error.

Read more here:

Content Attribution

This content was originally published by pilotmike327 at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: