I want to automate data fetching from USDA website, where I am specifically interested in few categories for data selection. To do so, I tried the following:
import io
import requests
import pandas as pd
url = 'https://www.marketnews.usda.gov/mnp/ls-report-retail?&repType=summary&portal=ls&category=Retail&species=BEEF&startIndex=1'
query_list = {"Report Type":"item","species":"BEEF","portal":"ls","category":"Retail", "Regions":"National", "Grades":"ALL", "Cut": "All", "Dates_from":"2019-03-01", "Dates_to":"2021-02-01"}
req = requests.get(url, params=query_list)
df = pd.read_csv(io.StringIO(req.text), sep="\s\s+", engine="python")
df.to_csv("usda_report.csv")
but I couldn't get the expected dataframe that I want, here is the output that after I tried to run above attempt:
ParserError: Expected 1 fields in line 117, saw 2. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
desired output
I need to pass these queries to do correct data selection: Category = "Retail"; Report Type = "Item"; Species = "Beef"; Region(s) = "National"; Dates_from = "2019-03-01"; Dates_to = "2021-02-15"
.
ideally, I want to pass those queries and want to get the following dataframe (head of dataframe):
update
in my desired outputs, I need those columns: Date
, Region
, Grade
, Cut
, Retail Items
, Outlets
or number of stores
, Weighted Avg
from the above attempt, I couldn't get the output dataframe like this. How should I fetch data correctly? Can anyone suggest possible of doing this right in pandas? any idea?
Read more here: https://stackoverflow.com/questions/66326122/how-to-correctly-fetch-dataframe-from-url-in-python
Content Attribution
This content was originally published by Hamilton at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.