ArgusFlow

Intelligently structuring web data

Open-source MIT-licensed Python Suite of microservices to extract, parse, structure, and match data.

What is ArgusFlow?

ArgusFlow is a collection of powerful microservices designed to be integrated into your data pipeline. Instead of fetching content, Argus specializes in parsing raw HTML or text, and converting it into clean, structured JSON.

  • Four Specialized Services
    Includes the generic Extractor, the AI-powered LLM-Parser, a data Generalizer, and a product Matcher.
  • Generic & Time-Saving
    The Extractor is designed to work "out-of-the-box" on a vast number of websites without needing site-specific selectors, saving you countless hours of manual configuration.
  • Flexible Integration
    Because the services accept raw content, they can be easily plugged into any existing scraper, crawler, or data processing workflow.
  • Open Source Core
    The free services are fully open source and released under the permissive MIT License, allowing for broad personal and commercial use.

What Can You Do With It?

Use the individual services to solve specific data challenges in your pipeline.

  • Enrich Your Scrapers
    Already have a crawler? Pass the downloaded HTML to the Extractor for superior, generic parsing of product data.
  • Parse Complex Data
    Isolate a complex HTML block (like a specifications table) and let the LLM-Parser turn it into structured key-value pairs.
  • Product Matching
    Use the Matcher to find and link similar products within your own database, powered by a fine-tuned embedding model and vector search.

Available Services

Requirements

Please ensure you have the following tools installed on your system before you begin.

  • Docker & Docker Compose

    The entire environment runs in containers. Docker Desktop for Mac/Windows or Docker Engine for Linux is required. Install Docker →

  • Make

    Used for running setup and start/stop commands. Pre-installed on macOS & Linux. Windows users can get it via WSL or Chocolatey. Learn More →

  • cURL

    Needed for the one-line installation command. Pre-installed on most systems.

  • Git

    Required for the alternative installation method (cloning the repository). Install Git →

Quick Start Guide

Get the entire Argus environment up and running with these simple commands.

  1. 1 Download and Unpack

    This command downloads the latest version and unpacks it into a folder named `argus`.

    mkdir argus && cd argus && curl -L https://get.argusflow.com | tar -xz --strip-components=1
  2. 2 Change into the new directory
    cd argus
  3. 3 Run the one-time setup
    make setup
  4. 4 Start all services
    make up

Alternative Installation

If you prefer to use Git, you can clone the repository directly. The setup steps are the same.

  1. 1 Clone the repository
    git clone https://github.com/getargusflow/argus.git
  2. 2 Change into the new directory
    cd argus
  3. From here, continue with the standard setup commands: make setup, followed by make up.