BS4 MemoryError: stack overflow and EOFError: Ran out of input when using multiprocessing in python

I have a simple python script that utilizes Python's BS4 library and multiprocessing to do some web scraping. I was initially getting some errors where the script would not complete since I would exceed the recursion limit, but then I found out here that BeautifulSoup trees cannot be pickled, and so causes issues with multiprocessing, so I followed one recommendation in the top answer which was to do the following: sys.setrecursionlimit(25000)

This worked fine for a couple of weeks with no issues (as far as I could tell), but today I restarted the script and some of the processes do not work and I get the error that you can see below:

I now get this error:

Traceback (most recent call last):
  File "C:/Users/user/PycharmProjects/foo/single_items/single_item.py", line 243, in <module>
    Process(target=instance.constant_thread).start()
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\bs4\element.py", line 1449, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__, tag))
MemoryError: stack overflow
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

I am not sure what it means, but here is a pseudocode example of the script I have running:

class foo:
    def __init__(url):
        self.url = url

    def constant_scrape:
        while True:
            rq = make_request(url)
            soup = BeautifulSoup(rq)



if __name__ == '__main__':

    sys.setrecursionlimit(25000)

    url_list = [...]

    for url in url_list:
        instance = foo(url)
        Process(target=instance.constant_scrape).start()


Read more here: https://stackoverflow.com/questions/67384123/bs4-memoryerror-stack-overflow-and-eoferror-ran-out-of-input-when-using-multip

Content Attribution

This content was originally published by user13834264 at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: