Getting Started¶
This tutorial will guide you through the process of setting up Onion Peelerand executing your first crawl.
By the end of this tutorial, you will have a working environment and your first set of extracted data from a dark web index.
Prerequisites¶
Before we begin, ensure you have the following installed on your machine:
- Python 3.12+: The core language used by the framework.
- uv: A fast Python package installer and resolver. Install uv here.
- Tor Browser / Tor Service: Required for accessing
.oniondomains. - Make: (Optional but recommended) For running automation commands.
Step 1: Clone the Repository¶
First, download the source code to your local machine:
Step 2: Install Dependencies¶
We use uv to manage a virtual environment and project dependencies. Run the following command to sync the project:
Under the Hood
This command executes
uv sync, which creates a.venvdirectory and installs all requirements listed inpyproject.tomlwith precise versions fromuv.lock.
Step 3: Configure Your Environment¶
Onion Peeler uses environment variables for sensitive configuration and feature toggling. Create a .env file from the provided example:
Open .env in your preferred editor. Ensure the following values are set (you can leave the VPN keys blank for now if you are just testing local Tor access):
Follow the instructions here to set up the mullvad vpn access.
Step 4: Verify Tor Connectivity¶
Before crawling, let's verify that the scraper can communicate through the Tor network. Onion Peeler is configured to use Tor as a proxy for .onion requests.
Run the following test command:
If successful, you should see a message confirming you are using Tor (usually "Congratulations. This browser is configured to use Tor.").
Step 5: Execute Your First Crawl¶
Now we are ready to scrape! We will use the daunt spider, which is dynamically generated from config/sites/daunt.toml.
Run the crawl and save the results to a JSON file:
What just happened?
The framework read the TOML configuration, identified the extraction selectors for the "daunt" site, spawned a Scrapy spider, routed requests through Tor, and extracted the data into
daunt_output.json.
Step 6: Inspect the Results¶
Open daunt_output.json. You should see a list of extracted items similar to this:
Next Steps¶
Congratulations! You've successfully completed your first crawl.
- Want to add a new site? Check out the Creating a Custom Site Configuration.
- Curious about the architecture? Read the Onion Peeler Project Plan.
- Need help with setup issues? Visit the Quick Start Setup Guide.