ArgusFlow

Argus LLM Parser

A standalone microservice that uses a local AI model to convert complex HTML snippets into structured, grammar-guaranteed JSON.

What is the LLM Parser?

Argus LLM Parser is a microservice designed to parse HTML snippets into structured JSON data using a local Large Language Model (LLM). It leverages a JSON schema to generate a strict GBNF grammar, which ensures that the LLM's output is always valid and conforms to the desired structure.

Key Features

  • Strict Output Guarantee:

    Employs a GBNF grammar generated from a JSON schema to ensure valid, structured JSON output.

  • Local & Private:

    Uses `llama-cpp-python` for efficient inference on your own hardware. No data is sent to external APIs.

  • Secure by Default:

    All API endpoints (except `/health`) are protected by a mandatory `x-api-key` header.

  • Fully Dockerized:

    Includes a multi-stage Dockerfile and `docker-compose.yml` for fast, reproducible setup.

Quick Start & API Usage

To run the service, execute all commands from the `services/llm_parser/` directory:

  1. Copy the configuration file: `cp .env.example .env` (if it exists).
  2. Run the one-time setup: `make setup`. This downloads the model and generates the grammar file.
  3. Start the development service: `make up-dev`.

The service will be available at `http://localhost:8002` with interactive API docs at `http://localhost:8002/docs`.

API Request Example

The `/api/v1/parse` endpoint requires an API key. The default key for development is `default_dev_key`.


curl -X POST "http://localhost:8002/api/v1/parse" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "html_snippet": "<div><h1>Specification</h1><div><div>Size</div></div><div>xxl</div></div>"
}'

$ch = curl_init();

$url = 'http://localhost:8002/api/v1/parse';
$payload = json_encode([
    'html_snippet' => '<div><h1>Specification</h1><div><div>Size</div></div><div>xxl</div></div>',
]);
$headers = [
    'Content-Type: application/json',
    'x-api-key: default_dev_key' //
];

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
curl_close($ch);

$data = json_decode($response, true);

import requests

api_url = "http://localhost:8002/api/v1/parse"
headers = {
    "Content-Type": "application/json",
    "x-api-key": "default_dev_key" #
}
payload = {
    "html_snippet": "<div><h1>Specification</h1><div><div>Size</div></div><div>xxl</div></div>"
}

response = requests.post(api_url, json=payload, headers=headers)
data = response.json()

print(data)

const apiUrl = 'http://localhost:8002/api/v1/parse';
const payload = {
  html_snippet: '<div><h1>Specification</h1><div><div>Size</div></div><div>xxl</div></div>'
};
const headers = {
  'Content-Type': 'application/json',
  'x-api-key': 'default_dev_key' //
};

fetch(apiUrl, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(payload)
})
.then(response => response.json())
.then(data => {
  console.log(data);
});

Example Output

{
  "category": "Specification",
  "details": {
    "Size": "xxl"
  }
}

Security

All API endpoints (except for `/health`) require a valid API key to be passed in the `x-api-key` header.

The default development key is `default_dev_key`. For production, you must override this by setting the `AUTH__API_KEY` environment variable to a strong, randomly generated key.