Argus Product Matcher
A high-performance, secure-by-default microservice designed to find similar products using deep learning.
What is the Matcher?
The Argus Product Matcher is a microservice that leverages a fine-tuned Sentence Transformer model to convert product information into vector embeddings. It uses a FAISS index for incredibly fast similarity searches, allowing you to find matching products from millions of records in milliseconds.
Key Features
Vector-Based Matching:
Uses a SentenceTransformer model to understand the semantic meaning of product data, not just keywords.
Blazing-Fast Search:
Implements FAISS for efficient similarity search, capable of handling millions of vectors.
Fine-Tuning Pipeline:
Includes a background pipeline to fine-tune the model using positive and hard-negative product pairs from your own database.
Secure by Default:
All API endpoints (except `/health`) are protected by a mandatory `x-api-key` header.
Incremental Updates:
Add or remove individual products from the live index using API endpoints without rebuilding everything.
Flexible Database Integration:
Adapts to any SQL database schema by letting you define the data extraction queries in the configuration.
How to Use
The Matcher requires a trained model and a search index before it can serve requests. See the "Quick Start" section below to run the demo setup. Once running, the service is available on port 8004.
Example 1: Match by Text
This is the most common use case: providing unstructured text and finding the best matches for it in the index.
curl -X POST "http://localhost:8004/api/v1/match/text" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"text": "Coca-Cola Classic",
"top_k": 3
}'
Example Output (Match by Text)
{
"query": "Coca-Cola Classic",
"matches": [
[1, 0.9999],
[2, 0.9750],
[3, 0.6812]
]
}
Example 2: Match by Product ID
This endpoint is used to find similar items to a product that *already exists* in your database. You provide a `product_id`, and the service looks up its data to find matches.
curl -X POST "http://localhost:8004/api/v1/match/id" \
-H "Content-Type: application/json" \
-H "x-api-key: default_dev_key" \
-d '{
"product_id": 1,
"top_k": 3
}'
Example Output (Match by ID)
{
"query_id": 1,
"matches": [
[1, 0.9999],
[2, 0.9750],
[3, 0.6812]
]
}
Quick Start (Demo Environment)
This guide uses the included `docker-compose.dev.yml` to launch the service and a demo database populated with test data.
Run these commands from the `services/matcher/` directory:
-
Create .env & Check DB:
make setupThis copies `.env.example` to `.env` and validates the (demo) database connection.
-
Train the Model:
make retrainThis runs the full training pipeline using the demo data. It fine-tunes the model and builds the FAISS index, saving the files to `services/matcher/models/`.
-
Start the Service:
make up-devThis starts the service in development mode, which connects to the demo database.
The service is now running at `http://localhost:8004`. You can explore the full API at the Swagger UI: `http://localhost:8004/docs`.
Flexible Database Integration
The Matcher is designed to adapt to any SQL database schema. You configure this by providing the exact SQL queries in your configuration (e.g., in `.env` or `config.yml`).
The service relies on three key queries:
-
`DATABASE_QUERIES__TRAINING_PAIRS_QUERY`
Fetches pairs of product IDs that are known matches (e.g., from a `product_matches` table) to be used as positive examples for fine-tuning.
-
`DATABASE_QUERIES__INDEXING_QUERY`
Fetches all products that should be included in the FAISS search index. You must use `AS` to map your columns to the expected names (e.g., `product_id AS id`, `product_name AS title`).
-
`DATABASE_QUERIES__PRODUCT_BY_ID_QUERY`
Fetches the full data for one or more products at runtime. This query must include the `:ids` placeholder.
The `demo.sql` file and the default queries in `docker-compose.dev.yml` provide a complete, working example of how the schema and queries work together.