Python Scrapy parsing for dates

I'm having trouble parsing xml data from websites to identify their most recent update date. This is the code I'm using:

def fetch_dates(self, response):
        sitemap = scrapy.selector.XmlXPathSelector(response)
                # ns is just a namespace and the second param should be whatever the 
                # xmlns of your sitemap is
            'ns', ''
        # this gets you a list of all the "loc" and "last modified" fields.
        locsList ='//ns:loc/text()').extract()
        lastModifiedList ='//ns:lastmod/text()').extract()

        # zip() the 2 lists together
        pageList = list(zip(locsList, lastModifiedList))

        for page in pageList:
            if os.path.exists('1url-to-date.csv'):
                append_write = 'a'
                append_write = 'w'
            with open('1url-to-date.cav', append_write) as url_f:
                url_f.write(locsList + "&,&" + lastModifiedList + "/n")
        return Item()

But it's not returning any values for dates and it's not even writing my file. So there's clearly something wrong with the code. I don't see any errors when it runs, but I'm not returning anything. Any suggestions on how to fix it?

What I'm ultimately looking for is a list of the HTML pages the webcrawler finds and the updated date. If there isn't a date available, I'll use today's date, and then the number of days since last update.

Read more here:

Content Attribution

This content was originally published by Meredith Abrams at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: