Argus Generalizer
Transforms an unstructured product title into a fully structured, multi-language JSON object.
What is the Generalizer?
Argus Generalizer is a microservice designed to transform an unstructured product title or description into a fully structured, multi-language JSON object. The service utilizes a local Large Language Model (LLM), whose output is constrained by a dynamically generated, language-specific formal grammar (GBNF) to ensure accurate and consistently formatted data extraction.
Key Features
LLM-Powered Extraction:
Uses a local, efficient Large Language Model (via `llama-cpp-python`) to understand and extract product information.
Dynamic Grammar-Constrained Output:
Output is strictly defined by a language-specific GBNF grammar. This grammar is dynamically generated from a JSON schema and language-specific categories to guarantee valid JSON.
Multi-language by Design:
Dynamically loads language-specific prompts, few-shot examples, and grammars for high-accuracy, localized extraction.
Secure by Default:
All API endpoints (except `/health`) are protected by a mandatory `x-api-key` header.
How to Use
After running `make up-dev` from the `services/generalizer/` directory, the Generalizer service is available on port 8003. Use one of the examples below to send a POST request with the product title you want to structure.
API Request Example
curl -X POST "http://localhost:8003/api/v1/generalize" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"language": "en",
"title": "Solid oak dining table 200x100 cm natural finish"
}'
Example Output
{
"extracted_data": {
"material": "oak",
"type": "dining table",
"dimensions": "200x100 cm",
"finish": "natural",
"category": "furniture"
},
"process_time": 1.25
}
Multi-language Support
The Generalizer is built to be multi-lingual from the ground up. Each language lives in its own directory (e.g., `app/prompts/en/`, `app/prompts/nl/`). When you make a request, the service dynamically loads the correct prompt, few-shot examples, categories, and GBNF grammar for that specific language.
How to Add a New Language
Adding a new language (e.g., French `fr`) is straightforward:
- Create a new directory: `mkdir -p app/prompts/fr`
- Add translated files to that directory: `prompt.py` (system prompt), `examples.py` (few-shot examples), and `categories.yml` (list of categories).
- Run `make generate-grammars` from the `services/generalizer/` directory. This reads your `categories.yml` and builds the new `grammar.gbnf` file.
- Restart the service (`make restart`). You can now send requests with `"language": "fr"`.