Onion Peeler
A configuration-driven, concurrent web scraping system.
Documentation Structure¶
Our documentation uses the Diátaxis framework to organize information effectively. Choose the category that best matches your current needs:
-
Learning-oriented step-by-step guides. Best for your first time running the scraper.
-
Problem-oriented instructions. Practical steps to achieve a specific goal, like building a new configuration.
-
Understanding-oriented theory. High-level discussions on ethics, system architecture, and design decisions.
-
Information-oriented technical specs, pipeline layouts, and auto-generated code documentation.
Core Features¶
-
No-Code Configuration
Build robust Scrapy spiders using entirely dynamic
.tomlfiles. No need to repeatedly write Python classes for every new website added to the target list. -
Page Object Models
Deep integration with
scrapy-poetallows you to decouple extraction parsing from raw spider mechanics, making updates trivial when a site's layout changes. -
Anti bot detection
Engineered to provide avoid account bans and adapt to request limits, providing comprehensive data extraction