Base Configuration Reference¶
The config/base.toml file defines the global defaults for the scraper These values serve as the fallback for all site-specific configurations and define the core operating settings for the Scrapy engine and proxy middleware.
Configuration Merging¶
The application follows a strict hierarchy for setting values:
1. Site Config (config/sites/*.toml): Highest priority.
2. Environment Variables: Overrides prefixed with ONION_PEELER__ (e.g., ONION_PEELER__PROXY__TOR_PORT).
3. Base Config (config/base.toml): The global defaults.
System Settings ([system])¶
General application-level parameters.
| Key | Type | Description |
|---|---|---|
site |
string |
The default site ID to load if none is specified at runtime. |
Scrapy Engine ([scrapy])¶
These settings directly map to Scrapy's internal settings. They control the "politeness" and performance of the crawler.
| Key | Default | Description |
|---|---|---|
USER_AGENT |
Firefox 115.0 | The browser identifier sent with every request. |
CONCURRENT_REQUESTS |
4 |
Maximum number of concurrent requests. |
DOWNLOAD_DELAY |
3 |
Seconds to wait between requests to the same domain. |
RANDOMIZE_DOWNLOAD_DELAY |
true |
Introduces jitter (0.5x to 1.5x) to the delay to avoid detection. |
AUTOTHROTTLE_ENABLED |
true |
Automatically adjusts crawl speed based on server load. |
Proxy Configuration ([proxy])¶
Defines how traffic is routed. The application automatically detects .onion domains and routes them through Tor, while routing clearweb traffic through the VPN.
| Key | Default | Description |
|---|---|---|
vpn_host |
127.0.0.1 |
The host address of the VPN HTTP proxy (e.g., Gluetun). |
vpn_port |
8888 |
The port for the VPN HTTP proxy. |
tor_host |
127.0.0.1 |
The host address for the Tor proxy. |
tor_port |
9080 |
The port for the Tor HTTP proxy. |
tor_control_port |
9051 |
The port used to communicate with the Tor Control protocol. |
tor_control_password |
${VAR} |
Password for IP rotation commands. Supports environment interpolation. |