ArgusFlow Documentation
Your complete guide to installing, configuring, and using the ArgusFlow suite.
Introduction
ArgusFlow is a powerful, open-source suite of microservices designed for intelligent web data processing. Instead of focusing on crawling, ArgusFlow specializes in the post-fetch lifecycle of data: parsing raw HTML, structuring it into clean JSON, and providing tools for data generalization and matching. The entire environment is orchestrated via Docker, making it portable and easy to manage.
Requirements
Before you begin, please ensure you have the following tools installed on your system:
- Docker & Docker Compose: The entire environment runs in containers. Docker Desktop for Mac/Windows or Docker Engine for Linux is required. Get Docker here.
- Make: Used for running simple setup and control commands. Pre-installed on macOS & Linux. Windows users can get it via WSL or Chocolatey.
- Git & cURL: Standard command-line tools for downloading the project files. Pre-installed on most systems.
Installation
You can get the Argus environment up and running with a single command, or by cloning the repository directly.
Quick Install (Recommended)
This command downloads the latest version and unpacks it into a folder named `argus`.
mkdir argus && cd argus && curl -L https://get.argusflow.com | tar -xz --strip-components=1Alternative (Git Clone)
Developers who wish to contribute or manage the source code via Git can clone the repository:
git clone https://github.com/getargusflow/argus.gitUsage
After installing via either method, the setup and usage are the same.
-
Navigate into the Project Directory
cd argus -
Run the One-Time Setup
This command prepares the environment. It creates the necessary `.env` files and runs the individual setup script for each service (e.g., downloading AI models, checking database connections).
make setup -
Start the Services
This will build the Docker images (if they don't exist) and start all microservices in the background.
make up
The services are now running. You can verify this by running `docker ps`.
Available Makefile Commands
All project operations are managed through the central `Makefile` in the root directory. This file delegates commands to all services.
Project Setup
| Command | Description |
|---|---|
make setup | Run this first. Initializes the project by creating `.env` files and running one-time setup tasks for each service. |
Production Environment
| Command | Description |
|---|---|
make build | Builds the lean, optimized Docker images for production. |
make up | Starts all services in production mode. This is the default start command. |
Development Environment
| Command | Description |
|---|---|
make build-dev | Builds Docker images for development (includes debuggers, live-reloading). |
make up-dev | Starts all services in development mode. |
General & Utility Commands
| Command | Description |
|---|---|
make down | Stops and removes all running service containers. |
make restart | Restarts all services. |
make logs | Shows the live logs from all running services. |
make test | Runs the test suite for all services. |
make lint | Runs the linter to check code quality for all services. |
make clean | Cleans up temporary files like Python cache. |
Targeting a Specific Service
To run a command (like `logs`, `test`, or `build`) on a single service, `cd` into that service's directory and run the command from there.
# Show logs for only the extractor service
cd services/extractor
make logs
# Run tests for only the matcher service
cd ../matcher
make testCore Services & API Usage
Each microservice runs on its own port and has a specific function. All API endpoints (except `/health`) require an API key.
Extractor Service (Port 8001)
Extracts structured data (price, brand, etc.) from a product page's HTML using a modular, "scoreboard-based" parser system.
API Usage: `POST /api/v1/extract`
curl -X POST "http://localhost:8001/api/v1/extract" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"url": "https://www.example.com/product/123",
"html_content": "...",
"use_llm": false
}'LLM Parser Service (Port 8002)
This service parses complex HTML snippets into structured JSON data using a local LLM, with a grammar to guarantee valid JSON output.
API Usage: `POST /api/v1/parse`
curl -X POST "http://localhost:8002/api/v1/parse" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"html_snippet": "Specs
...
"
}'Generalizer Service (Port 8003)
Uses a local LLM with dynamic, language-specific grammars to transform a simple product title into a full, structured JSON object.
API Usage: `POST /api/v1/generalize`
curl -X POST "http://localhost:8003/api/v1/generalize" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"language": "en",
"title": "Solid oak dining table 200x100 cm natural finish"
}'Matcher Service (Port 8004)
Finds similar products in your database using a fine-tuned Sentence Transformer model and a high-speed FAISS vector index.
API Usage: `POST /api/v1/match/text`
curl -X POST "http://localhost:8004/api/v1/match/text" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"text": "Coca-Cola Classic",
"top_k": 3
}'API Usage: `POST /api/v1/match/id`
curl -X POST "http://localhost:8004/api/v1/match/id" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"product_id": 1,
"top_k": 3
}'Installing Argus Pro
Argus Pro extends the open-source suite with advanced, commercially-licensed modules for higher accuracy and broader data extraction capabilities.
Prerequisite
You must have the open-source version of Argus installed via the steps above before you can install the Pro add-on.
Installation Steps
After purchasing a license, you will receive a license key. Use this key with the following command inside your existing `argus` project directory.
-
Navigate into your project directory
cd argus -
Run the Pro Installer
This command will prompt you for your license key.
make install-proThe script uses your key to securely download the Argus Pro package and automatically integrates it into your environment.
-
Restart the Services
To activate the new Pro features, restart all services.
make restart