ArgusFlow

Argus Generalizer

Transforms an unstructured product title into a fully structured, multi-language JSON object.

What is the Generalizer?

Argus Generalizer is a microservice designed to transform an unstructured product title or description into a fully structured, multi-language JSON object. The service utilizes a local Large Language Model (LLM), whose output is constrained by a dynamically generated, language-specific formal grammar (GBNF) to ensure accurate and consistently formatted data extraction.

Key Features

  • LLM-Powered Extraction:

    Uses a local, efficient Large Language Model (via `llama-cpp-python`) to understand and extract product information.

  • Dynamic Grammar-Constrained Output:

    Output is strictly defined by a language-specific GBNF grammar. This grammar is dynamically generated from a JSON schema and language-specific categories to guarantee valid JSON.

  • Multi-language by Design:

    Dynamically loads language-specific prompts, few-shot examples, and grammars for high-accuracy, localized extraction.

  • Secure by Default:

    All API endpoints (except `/health`) are protected by a mandatory `x-api-key` header.

How to Use

After running `make up-dev` from the `services/generalizer/` directory, the Generalizer service is available on port 8003. Use one of the examples below to send a POST request with the product title you want to structure.

API Request Example


curl -X POST "http://localhost:8003/api/v1/generalize" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "language": "en",
  "title": "Solid oak dining table 200x100 cm natural finish"
}'

$ch = curl_init();

$url = 'http://localhost:8003/api/v1/generalize';
$payload = json_encode([
    'language' => 'en',
    'title' => 'Solid oak dining table 200x100 cm natural finish',
]);
$headers = [
    'Content-Type: application/json',
    'x-api-key: default_dev_key',
];

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
curl_close($ch);

$data = json_decode($response, true);

import requests

api_url = "http://localhost:8003/api/v1/generalize"
headers = {
    "Content-Type": "application/json",
    "x-api-key": "default_dev_key"
}
payload = {
    "language": "en",
    "title": "Solid oak dining table 200x100 cm natural finish"
}

response = requests.post(api_url, json=payload, headers=headers)
data = response.json()

print(data)

const apiUrl = 'http://localhost:8003/api/v1/generalize';
const payload = {
  language: 'en',
  title: 'Solid oak dining table 200x100 cm natural finish'
};

fetch(apiUrl, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': 'default_dev_key'
  },
  body: JSON.stringify(payload)
})
.then(response => response.json())
.then(data => {
  console.log(data);
});

Example Output

{
  "extracted_data": {
    "material": "oak",
    "type": "dining table",
    "dimensions": "200x100 cm",
    "finish": "natural",
    "category": "furniture"
  },
  "process_time": 1.25
}

Multi-language Support

The Generalizer is built to be multi-lingual from the ground up. Each language lives in its own directory (e.g., `app/prompts/en/`, `app/prompts/nl/`). When you make a request, the service dynamically loads the correct prompt, few-shot examples, categories, and GBNF grammar for that specific language.

How to Add a New Language

Adding a new language (e.g., French `fr`) is straightforward:

  1. Create a new directory: `mkdir -p app/prompts/fr`
  2. Add translated files to that directory: `prompt.py` (system prompt), `examples.py` (few-shot examples), and `categories.yml` (list of categories).
  3. Run `make generate-grammars` from the `services/generalizer/` directory. This reads your `categories.yml` and builds the new `grammar.gbnf` file.
  4. Restart the service (`make restart`). You can now send requests with `"language": "fr"`.