Nested loop problem in python while working with pandas

I am trying to created a nested loop to load multiple files in an s3 bucket and concatenate them into a single dataframe. I am having trouble in arranging the nested loops in order to do this. Here is my code:

import json
import pandas as pd
import boto3
import io
client = boto3.client('s3')
    var = "filename"
    filenumber = ["/0", "/1", "/2","/3"]

    for j in range(len(filenumber)):
        response = client.list_objects(Bucket="bucketname", Prefix="subfolder/%s" % (var + filenumber[j]))

        df_list = []
        json_buffer = io.StringIO()

        for file in response["Contents"]:
            obj = client.get_object(Bucket="bucketname", Key=file["Key"])
            obj_df = pd.read_json(obj["Body"])
            df_list.append(obj_df)
        df = pd.concat(df_list)
    df.to_json(json_buffer)

On keeping df = pd.concat(df_list) inside the outer loop, I get the error: DataFrame index must be unique for orient='columns' If i keep the line outside the outer loop, I only get the last iteration file from the list ie. "/3" loaded into the dataframe.

any help/suggestions are much appreciated. Sorry if my question needs editing, kinda new to stackoverflow.



Read more here: https://stackoverflow.com/questions/66997596/nested-loop-problem-in-python-while-working-with-pandas

Content Attribution

This content was originally published by Glad at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: