At Scrapinghub we maintain and contribute to a wide variety of open source
projects. See below for a list of projects we can mentor this year.
Very popular web crawling and scraping framework for Python used to write
spiders for crawling and extracting data from websites.
Headless-browser framework for web crawling and scraping, specifically
designed to act as an accessory for Scrapy crawlers, though it can be used
as a stand-alone tool as well.
Python package which helps to debug machine learning classifiers and
explain their predictions. It supports scikit-learn, xgboost, LightGBM,
lightning, and sklearn-crfsuite out of the box, and it also supports
black-box operation for explaining classifiers from outside this set.
Python library to easily parse localized dates in almost any string format
commonly found on web pages.
Python Quality-Assurance framework for Scrapy spiders that lets spider
developers define and enforce rules for data schema and field coverage, and
is extensible towards broader crawl-verification and data-validation needs.