ArgusFlow

ArgusFlow Documentation

Your complete guide to installing, configuring, and using the ArgusFlow suite.

Introduction

ArgusFlow is a powerful, open-source suite of microservices designed for intelligent web data processing. Instead of focusing on crawling, ArgusFlow specializes in the post-fetch lifecycle of data: parsing raw HTML, structuring it into clean JSON, and providing tools for data generalization and matching. The entire environment is orchestrated via Docker, making it portable and easy to manage.

Requirements

Before you begin, please ensure you have the following tools installed on your system:

  • Docker & Docker Compose: The entire environment runs in containers. Docker Desktop for Mac/Windows or Docker Engine for Linux is required. Get Docker here.
  • Make: Used for running simple setup and control commands. Pre-installed on macOS & Linux. Windows users can get it via WSL or Chocolatey.
  • Git & cURL: Standard command-line tools for downloading the project files. Pre-installed on most systems.

Installation

You can get the Argus environment up and running with a single command, or by cloning the repository directly.

Quick Install (Recommended)

This command downloads the latest version and unpacks it into a folder named `argus`.

mkdir argus && cd argus && curl -L https://get.argusflow.com | tar -xz --strip-components=1

Alternative (Git Clone)

Developers who wish to contribute or manage the source code via Git can clone the repository:

git clone https://github.com/getargusflow/argus.git

Usage

After installing via either method, the setup and usage are the same.

  1. Navigate into the Project Directory
    cd argus
  2. Run the One-Time Setup

    This command prepares the environment. It creates the necessary `.env` files and runs the individual setup script for each service (e.g., downloading AI models, checking database connections).

    make setup
  3. Start the Services

    This will build the Docker images (if they don't exist) and start all microservices in the background.

    make up

The services are now running. You can verify this by running `docker ps`.

Available Makefile Commands

All project operations are managed through the central `Makefile` in the root directory. This file delegates commands to all services.

Project Setup

Command Description
make setupRun this first. Initializes the project by creating `.env` files and running one-time setup tasks for each service.

Production Environment

Command Description
make buildBuilds the lean, optimized Docker images for production.
make upStarts all services in production mode. This is the default start command.

Development Environment

Command Description
make build-devBuilds Docker images for development (includes debuggers, live-reloading).
make up-devStarts all services in development mode.

General & Utility Commands

Command Description
make downStops and removes all running service containers.
make restartRestarts all services.
make logsShows the live logs from all running services.
make testRuns the test suite for all services.
make lintRuns the linter to check code quality for all services.
make cleanCleans up temporary files like Python cache.

Targeting a Specific Service

To run a command (like `logs`, `test`, or `build`) on a single service, `cd` into that service's directory and run the command from there.

# Show logs for only the extractor service
cd services/extractor
make logs

# Run tests for only the matcher service
cd ../matcher
make test

Core Services & API Usage

Each microservice runs on its own port and has a specific function. All API endpoints (except `/health`) require an API key.

Extractor Service (Port 8001)

Extracts structured data (price, brand, etc.) from a product page's HTML using a modular, "scoreboard-based" parser system.

API Usage: `POST /api/v1/extract`

curl -X POST "http://localhost:8001/api/v1/extract" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "url": "https://www.example.com/product/123",
  "html_content": "...",
  "use_llm": false
}'

LLM Parser Service (Port 8002)

This service parses complex HTML snippets into structured JSON data using a local LLM, with a grammar to guarantee valid JSON output.

API Usage: `POST /api/v1/parse`

curl -X POST "http://localhost:8002/api/v1/parse" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "html_snippet": "

Specs

...
" }'

Generalizer Service (Port 8003)

Uses a local LLM with dynamic, language-specific grammars to transform a simple product title into a full, structured JSON object.

API Usage: `POST /api/v1/generalize`

curl -X POST "http://localhost:8003/api/v1/generalize" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "language": "en",
  "title": "Solid oak dining table 200x100 cm natural finish"
}'

Matcher Service (Port 8004)

Finds similar products in your database using a fine-tuned Sentence Transformer model and a high-speed FAISS vector index.

API Usage: `POST /api/v1/match/text`

curl -X POST "http://localhost:8004/api/v1/match/text" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "text": "Coca-Cola Classic",
  "top_k": 3
}'

API Usage: `POST /api/v1/match/id`

curl -X POST "http://localhost:8004/api/v1/match/id" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "product_id": 1,
  "top_k": 3
}'

Installing Argus Pro

Argus Pro extends the open-source suite with advanced, commercially-licensed modules for higher accuracy and broader data extraction capabilities.

Prerequisite

You must have the open-source version of Argus installed via the steps above before you can install the Pro add-on.

Installation Steps

After purchasing a license, you will receive a license key. Use this key with the following command inside your existing `argus` project directory.

  1. Navigate into your project directory
    cd argus
  2. Run the Pro Installer

    This command will prompt you for your license key.

    make install-pro

    The script uses your key to securely download the Argus Pro package and automatically integrates it into your environment.

  3. Restart the Services

    To activate the new Pro features, restart all services.

    make restart