Skip to content

Base Configuration Reference

The config/base.toml file defines the global defaults for the scraper These values serve as the fallback for all site-specific configurations and define the core operating settings for the Scrapy engine and proxy middleware.

Configuration Merging

The application follows a strict hierarchy for setting values: 1. Site Config (config/sites/*.toml): Highest priority. 2. Environment Variables: Overrides prefixed with ONION_PEELER__ (e.g., ONION_PEELER__PROXY__TOR_PORT). 3. Base Config (config/base.toml): The global defaults.


System Settings ([system])

General application-level parameters.

Key Type Description
site string The default site ID to load if none is specified at runtime.

Scrapy Engine ([scrapy])

These settings directly map to Scrapy's internal settings. They control the "politeness" and performance of the crawler.

Key Default Description
USER_AGENT Firefox 115.0 The browser identifier sent with every request.
CONCURRENT_REQUESTS 4 Maximum number of concurrent requests.
DOWNLOAD_DELAY 3 Seconds to wait between requests to the same domain.
RANDOMIZE_DOWNLOAD_DELAY true Introduces jitter (0.5x to 1.5x) to the delay to avoid detection.
AUTOTHROTTLE_ENABLED true Automatically adjusts crawl speed based on server load.

Proxy Configuration ([proxy])

Defines how traffic is routed. The application automatically detects .onion domains and routes them through Tor, while routing clearweb traffic through the VPN.

Key Default Description
vpn_host 127.0.0.1 The host address of the VPN HTTP proxy (e.g., Gluetun).
vpn_port 8888 The port for the VPN HTTP proxy.
tor_host 127.0.0.1 The host address for the Tor proxy.
tor_port 9080 The port for the Tor HTTP proxy.
tor_control_port 9051 The port used to communicate with the Tor Control protocol.
tor_control_password ${VAR} Password for IP rotation commands. Supports environment interpolation.

Example base.toml

[system]
site = ""

[proxy]
vpn_host = "127.0.0.1"
vpn_port = 8888
tor_host = "127.0.0.1"
tor_port = 9080
tor_control_port = 9051
tor_control_password = "${TOR_CONTROL_PASSWORD}"