ArgusFlow

Argus Product Matcher

A high-performance, secure-by-default microservice designed to find similar products using deep learning.

What is the Matcher?

The Argus Product Matcher is a microservice that leverages a fine-tuned Sentence Transformer model to convert product information into vector embeddings. It uses a FAISS index for incredibly fast similarity searches, allowing you to find matching products from millions of records in milliseconds.

Key Features

  • Vector-Based Matching:

    Uses a SentenceTransformer model to understand the semantic meaning of product data, not just keywords.

  • Blazing-Fast Search:

    Implements FAISS for efficient similarity search, capable of handling millions of vectors.

  • Fine-Tuning Pipeline:

    Includes a background pipeline to fine-tune the model using positive and hard-negative product pairs from your own database.

  • Secure by Default:

    All API endpoints (except `/health`) are protected by a mandatory `x-api-key` header.

  • Incremental Updates:

    Add or remove individual products from the live index using API endpoints without rebuilding everything.

  • Flexible Database Integration:

    Adapts to any SQL database schema by letting you define the data extraction queries in the configuration.

How to Use

The Matcher requires a trained model and a search index before it can serve requests. See the "Quick Start" section below to run the demo setup. Once running, the service is available on port 8004.

Example 1: Match by Text

This is the most common use case: providing unstructured text and finding the best matches for it in the index.


curl -X POST "http://localhost:8004/api/v1/match/text" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "text": "Coca-Cola Classic",
  "top_k": 3
}'

$ch = curl_init();

$url = 'http://localhost:8004/api/v1/match/text';
$payload = json_encode([
    'text' => 'Coca-Cola Classic',
    'top_k' => 3,
]);
$headers = [
    'Content-Type: application/json',
    'x-api-key: default_dev_key', //
];

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
curl_close($ch);

$data = json_decode($response, true);

import requests

api_url = "http://localhost:8004/api/v1/match/text"
headers = {
    "Content-Type": "application/json",
    "x-api-key": "default_dev_key" #
}
payload = {
    "text": "Coca-Cola Classic",
    "top_k": 3
}

response = requests.post(api_url, json=payload, headers=headers)
data = response.json()

print(data)

const apiUrl = 'http://localhost:8004/api/v1/match/text';
const payload = {
  text: 'Coca-Cola Classic',
  top_k: 3
};

fetch(apiUrl, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': 'default_dev_key' //
  },
  body: JSON.stringify(payload)
})
.then(response => response.json())
.then(data => {
  console.log(data);
});

Example Output (Match by Text)

{
  "query": "Coca-Cola Classic",
  "matches": [
    [1, 0.9999],
    [2, 0.9750],
    [3, 0.6812]
  ]
}
The 'matches' array contains tuples of `[product_id, score]`. In this demo, product ID 1 is 'Coca-Cola Classic' and ID 2 is 'Coke Classic 1,5L Fles', which are correctly identified as strong matches.

Example 2: Match by Product ID

This endpoint is used to find similar items to a product that *already exists* in your database. You provide a `product_id`, and the service looks up its data to find matches.


curl -X POST "http://localhost:8004/api/v1/match/id" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "product_id": 1,
  "top_k": 3
}'

$ch = curl_init();

$url = 'http://localhost:8004/api/v1/match/id';
$payload = json_encode([
    'product_id' => 1,
    'top_k' => 3,
]);
$headers = [
    'Content-Type: application/json',
    'x-api-key: default_dev_key',
];

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
curl_close($ch);

$data = json_decode($response, true);

import requests

api_url = "http://localhost:8004/api/v1/match/id"
headers = {
    "Content-Type": "application/json",
    "x-api-key": "default_dev_key"
}
payload = {
    "product_id": 1,
    "top_k": 3
}

response = requests.post(api_url, json=payload, headers=headers)
data = response.json()

print(data)

const apiUrl = 'http://localhost:8004/api/v1/match/id';
const payload = {
  product_id: 1,
  top_k: 3
};

fetch(apiUrl, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': 'default_dev_key'
  },
  body: JSON.stringify(payload)
})
.then(response => response.json())
.then(data => {
  console.log(data);
});

Example Output (Match by ID)

{
  "query_id": 1,
  "matches": [
    [1, 0.9999],
    [2, 0.9750],
    [3, 0.6812]
  ]
}
The response contains the `query_id` used. The matches are the same as the text search, because the service first fetched Product 1 ('Coca-Cola Classic') and then performed the search with that text.

Quick Start (Demo Environment)

This guide uses the included `docker-compose.dev.yml` to launch the service and a demo database populated with test data.

Run these commands from the `services/matcher/` directory:

  1. Create .env & Check DB:
    make setup

    This copies `.env.example` to `.env` and validates the (demo) database connection.

  2. Train the Model:
    make retrain

    This runs the full training pipeline using the demo data. It fine-tunes the model and builds the FAISS index, saving the files to `services/matcher/models/`.

  3. Start the Service:
    make up-dev

    This starts the service in development mode, which connects to the demo database.

The service is now running at `http://localhost:8004`. You can explore the full API at the Swagger UI: `http://localhost:8004/docs`.

Flexible Database Integration

The Matcher is designed to adapt to any SQL database schema. You configure this by providing the exact SQL queries in your configuration (e.g., in `.env` or `config.yml`).

The service relies on three key queries:

  • `DATABASE_QUERIES__TRAINING_PAIRS_QUERY`

    Fetches pairs of product IDs that are known matches (e.g., from a `product_matches` table) to be used as positive examples for fine-tuning.

  • `DATABASE_QUERIES__INDEXING_QUERY`

    Fetches all products that should be included in the FAISS search index. You must use `AS` to map your columns to the expected names (e.g., `product_id AS id`, `product_name AS title`).

  • `DATABASE_QUERIES__PRODUCT_BY_ID_QUERY`

    Fetches the full data for one or more products at runtime. This query must include the `:ids` placeholder.

The `demo.sql` file and the default queries in `docker-compose.dev.yml` provide a complete, working example of how the schema and queries work together.