I am trying to created a nested loop to load multiple files in an s3 bucket and concatenate them into a single dataframe. I am having trouble in arranging the nested loops in order to do this. Here is my code:
import json import pandas as pd import boto3 import io client = boto3.client('s3') var = "filename" filenumber = ["/0", "/1", "/2","/3"] for j in range(len(filenumber)): response = client.list_objects(Bucket="bucketname", Prefix="subfolder/%s" % (var + filenumber[j])) df_list =  json_buffer = io.StringIO() for file in response["Contents"]: obj = client.get_object(Bucket="bucketname", Key=file["Key"]) obj_df = pd.read_json(obj["Body"]) df_list.append(obj_df) df = pd.concat(df_list) df.to_json(json_buffer)
df = pd.concat(df_list) inside the outer loop, I get the error:
DataFrame index must be unique for orient='columns'
If i keep the line outside the outer loop, I only get the last iteration file from the list ie. "/3" loaded into the dataframe.
any help/suggestions are much appreciated. Sorry if my question needs editing, kinda new to stackoverflow.