Argus Product Data Extractor
Stop writing custom selectors for every shop. Pass any product HTML to this service and instantly receive structured prices, brands, and specifications.
{
"data": {
"title": "Sony WH-1000XM5",
"price": 349.00,
"currency": "EUR",
"availability": "In Stock"
}
}
The Smart Way to Process Product Data
Standard parsing logic breaks the moment a website updates its layout. Argus Extractor is built to be resilient. Instead of rigid rules, it uses a modular "scoreboard" system that identifies key information based on content patterns - allowing you to process data from thousands of different sources without manual configuration.
Why developers choose Argus
- Format Independent: Extract data from any domain without writing site-specific CSS or XPath selectors.
- Handles Dynamic Content: Integrates with Playwright to capture data that only appears after JavaScript has fully rendered the page.
- Reliable Scoring: Independent parsers cross-verify data, ensuring you receive the result with the highest confidence score.
- Privacy Centric: Runs entirely on your own infrastructure. Your data never leaves your server, and you pay zero external API fees.
Quick Start & API Usage
To run the service locally, ensure you are in the `argus/services/extractor/` directory, then follow these steps:
- Copy environment file: `cp .env.example .env`
- Build the service: `make build-dev`
- Start extraction: `make up-dev`
The service is available at `http://localhost:8001` with full documentation at `/docs`.
curl -X POST "http://localhost:8001/api/v1/extract" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"url": "https://www.example.com/product/123",
"html_content": "<html><body><h1>Smart Watch Series 5</h1><p>Price: $299.00</p><span>Brand: Apple</span></body></html>",
"use_llm": false
}'
{
"data": {
"title": "Smart Watch Series 5",
"brand": "Apple",
"price": 299.00,
"currency": "USD"
},
"message": "Extraction successful"
}
Try it Yourself
Security
Every microservice is protected by an API key. For production environments, ensure the `AUTH__API_KEY` environment variable is set to a unique, secure string.
Multilingual Support & Customization
The extractor supports multilingual extraction by separating language-specific keywords and regex patterns from the core logic. These are defined in `config/patterns.yml`.
To add or override patterns for any language (e.g., adding German keywords), you can create a `config/custom_patterns.yml` file. This file is loaded after the default patterns and will safely merge with them, allowing you to customize the extractor without modifying core files.
# config/custom_patterns.yml
de:
availability_in_stock:
- 'auf lager'
- 'sofort lieferbar'
brand_label_regex: '\b(marke|hersteller)\b'