ArgusFlow

Argus Smart Product Linker

Automatically link identical products across datasets. Our AI understands that "iPhone 15" and "Apple iPh 15" are the same device, even when the titles differ.

bash β€” argus-matcher
➜~
Response (200 OK):
{
  "matches": [
    {"id": 4021, "score": 0.992, "name": "Sony WH-1000XM5B - Black"},
    {"id": 4022, "score": 0.985, "name": "Sony Wireless NC Headphones XM5"},
    {"id": 9910, "score": 0.720, "name": "Sony LinkBuds S"}
  ]
}
Process completed in 42ms
➜ ~

Matches That Keywords Miss

Traditional keyword-based search fails when product names contain typos, different word orders, or abbreviations. The Argus Matcher uses deep learning (Sentence Transformers) to understand the "semantic meaning" of your data. It converts product info into vector embeddings, allowing you to find matches based on context rather than just characters.

Why It’s a Game Changer

  • Vector-Based Intelligence: Uses AI models to identify products that are conceptually identical but textually different.
  • Blazing-Fast Search: Implements FAISS (Facebook AI Similarity Search) to handle millions of products in milliseconds.
  • Self-Improving Pipeline: Includes a background process to fine-tune the model using your own database for maximum precision.
  • Secure & Private: Runs locally on your own hardware. Your proprietary product data never leaves your infrastructure.

API Usage: Match by Text

Provide unstructured text (like a name or description) and find the best matches currently in your search index.

curl -X POST "http://localhost:8004/api/v1/match/text" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "text": "Coca-Cola Classic",
  "top_k": 3
}'
$ch = curl_init('http://localhost:8004/api/v1/match/text');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'Content-Type: application/json',
    'x-api-key: default_dev_key'
]);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
    'text' => 'Coca-Cola Classic',
    'top_k' => 3
]));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = json_decode(curl_exec($ch), true);
import requests

response = requests.post(
    "http://localhost:8004/api/v1/match/text",
    headers={"x-api-key": "default_dev_key"},
    json={"text": "Coca-Cola Classic", "top_k": 3}
)
print(response.json())
fetch('http://localhost:8004/api/v1/match/text', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': 'default_dev_key'
  },
  body: JSON.stringify({
    text: 'Coca-Cola Classic',
    top_k: 3
  })
})
.then(response => response.json())
.then(data => console.log(data));
{
  "query": "Coca-Cola Classic",
  "matches": [
    [1, 0.9999],
    [2, 0.9750],
    [3, 0.6812]
  ]
}
The 'matches' array returns tuples of [product_id, score]. In this example, ID 2 might be 'Coke Classic 1,5L Fles', correctly identified as a near-perfect match.

API Usage: Match by ID

Find similar items for a product that already exists in your database. The service fetches the product details and performs a similarity search.

curl -X POST "http://localhost:8004/api/v1/match/id" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
  "product_id": 1,
  "top_k": 3
}'

Quick Start (Demo Environment)

Run these commands from the `services/matcher/` directory to launch a demo populated with test data:

  1. Setup DB:
    make setup
  2. Train AI:
    make retrain
    (Finetunes the model & builds the index)
  3. Start Service:
    make up-dev
    (Available on port 8004)

Flexible Database Integration

The Matcher adapts to any SQL schema. You define the logic by setting these three key queries in your environment variables:

  • `DATABASE_QUERIES__TRAINING_PAIRS_QUERY`

    Fetches pairs of product IDs that are known matches (e.g., from a `product_matches` table) to be used as positive examples for fine-tuning.

  • `DATABASE_QUERIES__INDEXING_QUERY`

    Fetches all products that should be included in the FAISS search index. You must use `AS` to map your columns to the expected names (e.g., `product_id AS id`, `product_name AS title`).

  • `DATABASE_QUERIES__PRODUCT_BY_ID_QUERY`

    Fetches the full data for one or more products at runtime. This query must include the `:ids` placeholder.

The `demo.sql` file and the default queries in `docker-compose.dev.yml` provide a complete, working example of how the schema and queries work together.