Intelligently structuring web data
Open-source MIT-licensed Python Suite of microservices to extract, parse, structure, and match data.
What is ArgusFlow?
ArgusFlow is a collection of powerful microservices designed to be integrated into your data pipeline. Instead of fetching content, Argus specializes in parsing raw HTML or text, and converting it into clean, structured JSON.
-
Four Specialized ServicesIncludes the generic Extractor, the AI-powered LLM-Parser, a data Generalizer, and a product Matcher.
-
Generic & Time-SavingThe Extractor is designed to work "out-of-the-box" on a vast number of websites without needing site-specific selectors, saving you countless hours of manual configuration.
-
Flexible IntegrationBecause the services accept raw content, they can be easily plugged into any existing scraper, crawler, or data processing workflow.
-
Open Source CoreThe free services are fully open source and released under the permissive MIT License, allowing for broad personal and commercial use.
What Can You Do With It?
Use the individual services to solve specific data challenges in your pipeline.
-
Enrich Your ScrapersAlready have a crawler? Pass the downloaded HTML to the Extractor for superior, generic parsing of product data.
-
Parse Complex DataIsolate a complex HTML block (like a specifications table) and let the LLM-Parser turn it into structured key-value pairs.
-
Product MatchingUse the Matcher to find and link similar products within your own database, powered by a fine-tuned embedding model and vector search.
Available Services
Extractor
Extracts structured data (price, brand, etc.) from any product page HTML.
LLM-Parser
Uses AI to parse complex HTML snippets into structured, grammar-guaranteed JSON.
Generalizer
Uses AI to convert a simple product title into a full, multi-language JSON object.
Matcher
Finds similar products in your database using deep learning and vector search.
Requirements
Please ensure you have the following tools installed on your system before you begin.
-
Docker & Docker Compose
The entire environment runs in containers. Docker Desktop for Mac/Windows or Docker Engine for Linux is required. Install Docker →
-
Make
Used for running setup and start/stop commands. Pre-installed on macOS & Linux. Windows users can get it via WSL or Chocolatey. Learn More →
-
cURL
Needed for the one-line installation command. Pre-installed on most systems.
-
Git
Required for the alternative installation method (cloning the repository). Install Git →
Quick Start Guide
Get the entire Argus environment up and running with these simple commands.
-
1 Download and Unpack
This command downloads the latest version and unpacks it into a folder named `argus`.
mkdir argus && cd argus && curl -L https://get.argusflow.com | tar -xz --strip-components=1 -
2 Change into the new directory
cd argus -
3 Run the one-time setup
make setup -
4 Start all services
make up
Alternative Installation
If you prefer to use Git, you can clone the repository directly. The setup steps are the same.
-
1 Clone the repository
git clone https://github.com/getargusflow/argus.git -
2 Change into the new directory
cd argus -
From here, continue with the standard setup commands:
make setup, followed bymake up.