Contents
Contributing to pg_vectorize
Prerequisites
- Rust - Toolchain including
rustc
,cargo
, andrustfmt
- PGRX - Rust-based PostgreSQL extension development framework
- Docker Engine - For running local containers
- psql - Terminal-based front-end to PostgreSQL
- pgmq - PostgreSQL extension for message queues
- pg_cron - PostgreSQL extension for cron-based job scheduling
- pgvector - PostgreSQL extension for vector similarity search
Building from source
This process is more involved, but can easily be distilled down into a handful of steps.
1. Set up pgrx
cargo pgrx init
2. Set up Docker container
docker run -d -p 3000:3000 quay.io/tembo/vector-serve:latest
Confirm a successful set up by running the following:
docker ps
:wrench: Note: Consider running the following to see the container logs real time:
docker logs <your-container-id> -f
3. Clone and compile pg_vectorize
and extension dependencies
:wrench: When progressing through these steps, refer to the following for troubleshooting:
cat ~/.pgrx/15.log
3.1. Apply configurations
Prior to compiling and running pg_vectorize
, it’s essential to update the postgresql.conf
file.
pgrx
uses a Postgres version-specific data directory, each containing its own postgresql.conf
file.
The following example, utilizes Postgres version 15.
If you’re using a different version, please alter the file path value data-<postgres-version>
and run the following:
<your-editor> ~/.pgrx/data-15/postgresql.conf
Within this document, add the following:
shared_preload_libraries = 'pg_cron, vectorize'
cron.database_name = 'postgres'
vectorize.embedding_service_url = 'http://localhost:3000/v1/embeddings'
:wrench: Note: If your machine is running a MacOS, you may need to apply the following configurations to Cargo’s config file:
<your-editor> ~/.cargo/config
[target.'cfg(target_os="macos")']
# Postgres symbols won't be available until runtime
rustflags = ["-Clink-arg=-Wl,-undefined,dynamic_lookup"]
3.2. Clone and enter directory
git clone https://github.com/tembo-io/pg_vectorize.git
cd pg_vectorize
3.3. Install dependencies
From within the pg_vectorize directory, run the following, which will install pg_cron
, pgmq
, and pgvector
:
make setup
3.4. Compile and run pg_vectorize
make run
4. Confirm successful build
4.1. Check extension presence
Once the above command is run, you will be brought into Postgres via psql
.
To list out the enabled extensions, run:
\dx
List of installed extensions
Name | Version | Schema | Description
------------+---------+------------+---------------------------------------------------------------------
pg_cron | 1.6 | pg_catalog | Job scheduler for PostgreSQL
pgmq | 1.1.1 | pgmq | A lightweight message queue. Like AWS SQS and RSMQ but on Postgres.
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vector | 0.6.0 | public | vector data type and ivfflat and hnsw access methods
vectorize | 0.10.1 | vectorize | The simplest way to do vector search on Postgres
(6 rows)
4.2 Confirm embedding service url is set to localhost
In section 3.1., we set the following postgresql.conf variable:
vectorize.embedding_service_url = 'http://localhost:3000/v1/embeddings'
To confirm its success, run the following SHOW command:
SHOW vectorize.embedding_service_url;
vectorize.embedding_service_url
-------------------------------------
http://localhost:3000/v1/embeddings
(1 row)
Say, for example, instead of local host, vector-serve:3000
was the target?
Should you desire to change this from within Postgre, simply run:
ALTER SYSTEM SET vectorize.embedding_service_url TO 'http://localhost:3000/v1/embeddings';
Making changes such as this requires the following to be run:
SELECT pg_reload_conf();
Running the earlier SHOW command should reveal the appropriate change:
SHOW vectorize.embedding_service_url;
vectorize.embedding_service_url
-------------------------------------
http://localhost:3000/v1/embeddings
(1 row)
4.3. Load example data
The following can be found within the this project’s README, under Hugging Face Example.
Begin by creating a producs
table with the dataset that comes included with pg_vectorize
.
CREATE TABLE products (LIKE vectorize.example_products INCLUDING ALL);
INSERT INTO products SELECT * FROM vectorize.example_products;
You can then confirm everything is correct by running the following:
SELECT * FROM products limit 2;
product_id | product_name | description | last_updated_at
------------+--------------+--------------------------------------------------------+-------------------------------
1 | Pencil | Utensil used for writing and often works best on paper | 2023-07-26 17:20:43.639351-05
2 | Laptop Stand | Elevated platform for laptops, enhancing ergonomics | 2023-07-26 17:20:43.639351-05
4.4. Sample queries
SELECT vectorize.table(
job_name => 'product_search_hf',
"table" => 'products',
primary_key => 'product_id',
columns => ARRAY['product_name', 'description'],
transformer => 'sentence-transformers/multi-qa-MiniLM-L6-dot-v1'
);
table
---------------------------------------------
Successfully created job: product_search_hf
(1 row)
SELECT * FROM vectorize.search(
job_name => 'product_search_hf',
query => 'accessories for mobile devices',
return_columns => ARRAY['product_id', 'product_name'],
num_results => 3
);
search_results
---------------------------------------------------------------------------------------------
{"product_id": 13, "product_name": "Phone Charger", "similarity_score": 0.8147812194590133}
{"product_id": 6, "product_name": "Backpack", "similarity_score": 0.774306211384604}
{"product_id": 11, "product_name": "Stylus Pen", "similarity_score": 0.7709903789778251}
(3 rows)
5. Local URL
Once all of the following is complete, you should be able to visit the Tembo-Embedding-Service
at http://localhost:3000/docs and explore.
This is a platform that allows, for example, the input of different sentence-transformers models from Hugging Face.
Releases
pg_vectorize
releases are automated through a Github workflow.
The compiled binaries are publish to and hosted at pgt.dev.
To create a release, create a new tag follow a valid semver, then create a release with the same name.
Auto-generate the release notes and/or add more relevant details as needed.