### Extensions

- vector 0.2.3
- Open-source vector similarity search for Postgres

### Documentation

- CHANGELOG
- CHANGELOG

### README

### Contents

# pgvector

Open-source vector similarity search for Postgres

```
sql
CREATE TABLE table (column vector(3));
CREATE INDEX ON table USING ivfflat (column vector_l2_ops);
SELECT * FROM table ORDER BY column <-> '[1,2,3]' LIMIT 5;
```

Supports L2 distance, inner product, and cosine distance

## Installation

Compile and install the extension (supports Postgres 9.6+)

```
sh
git clone --branch v0.2.3 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install # may need sudo
```

Then load it in databases where you want to use it

```
sql
CREATE EXTENSION vector;
```

You can also install it with Docker, Homebrew, or PGXN

## Getting Started

Create a vector column with 3 dimensions (replace `table`

and `column`

with non-reserved names)

```
sql
CREATE TABLE table (column vector(3));
```

Insert values

```
sql
INSERT INTO table VALUES ('[1,2,3]'), ('[4,5,6]');
```

Get the nearest neighbor by L2 distance

```
sql
SELECT * FROM table ORDER BY column <-> '[3,1,2]' LIMIT 1;
```

Also supports inner product (`<#>`

) and cosine distance (`<=>`

)

Note: `<#>`

returns the negative inner product since Postgres only supports `ASC`

order index scans on operators

## Indexing

Speed up queries with an approximate index. Add an index for each distance function you want to use.

L2 distance

```
sql
CREATE INDEX ON table USING ivfflat (column vector_l2_ops);
```

Inner product

```
sql
CREATE INDEX ON table USING ivfflat (column vector_ip_ops);
```

Cosine distance

```
sql
CREATE INDEX ON table USING ivfflat (column vector_cosine_ops);
```

Indexes should be created after the table has data for optimal clustering. If the distribution of data changes significantly, you can reindex without downtime:

```sql -- Postgres 12+ REINDEX INDEX CONCURRENTLY index_name;

-- Postgres < 12 CREATE INDEX CONCURRENTLY temp_name ON table USING ivfflat (column opclass); DROP INDEX CONCURRENTLY index_name; ALTER INDEX temp_name RENAME TO index_name; ```

Also, unlike typical indexes which only affect performance, you may see different results for queries after adding an approximate index.

### Index Options

Specify the number of inverted lists (100 by default)

```
sql
CREATE INDEX ON table USING ivfflat (column opclass) WITH (lists = 100);
```

A good place to start is `4 * sqrt(rows)`

### Query Options

Specify the number of probes (1 by default)

```
sql
SET ivfflat.probes = 1;
```

A higher value improves recall at the cost of speed.

Use `SET LOCAL`

inside a transaction to set it for a single query

```
sql
BEGIN;
SET LOCAL ivfflat.probes = 1;
SELECT ...
COMMIT;
```

### Indexing Progress [unreleased]

Check indexing progress with Postgres 12+

```
sql
SELECT phase, tuples_done, tuples_total FROM pg_stat_progress_create_index;
```

The phases are:

`initializing`

`sampling table`

`performing k-means`

`sorting tuples`

`loading tuples`

Note: `tuples_done`

and `tuples_total`

are only populated during the `loading tuples`

phase

### Partial Indexes

Consider partial indexes for queries with a `WHERE`

clause

```
sql
CREATE INDEX ON table USING ivfflat (column opclass) WHERE (other_column = 123);
```

To index many different values of `other_column`

, consider partitioning on `other_column`

.

## Performance

To speed up queries without an index, increase `max_parallel_workers_per_gather`

.

```
sql
SET max_parallel_workers_per_gather = 4;
```

To speed up queries with an index, increase the number of inverted lists (at the expense of recall).

```
sql
CREATE INDEX ON table USING ivfflat (column opclass) WITH (lists = 1000);
```

## Reference

### Vector Type

Each vector takes `4 * dimensions + 8`

bytes of storage. Each element is a float, and all elements must be finite (no `NaN`

, `Infinity`

or `-Infinity`

). Vectors can have up to 1024 dimensions.

### Vector Operators

Operator | Description --- | --- + | element-wise addition - | element-wise subtraction <-> | Euclidean distance <#> | negative inner product <=> | cosine distance

### Vector Functions

Function | Description --- | --- cosine_distance(vector, vector) | cosine distance inner_product(vector, vector) | inner product l2_distance(vector, vector) | Euclidean distance vector_dims(vector) | number of dimensions vector_norm(vector) | Euclidean norm

## Libraries

Libraries that use pgvector:

- pgvector-python (Python)
- Neighbor (Ruby)
- pgvector-node (Node.js)
- pgvector-go (Go)
- pgvector-rust (Rust)
- pgvector-cpp (C++)

## Frequently Asked Questions

#### How many vectors can be stored in a single table?

A non-partitioned table has a limit of 32 TB by default in Postgres. A partitioned table can have thousands of partitions of that size.

#### Is replication supported?

Yes, pgvector uses the write-ahead log (WAL), which allows for replication and point-in-time recovery.

#### What if my data has more than 1024 dimensions?

Two things you can try are:

- use dimensionality reduction
- compile Postgres with a larger block size (
`./configure --with-blocksize=32`

) and edit the limit in`src/vector.h`

## Additional Installation Methods

### Docker

Get the Docker image with:

```
sh
docker pull ankane/pgvector
```

This adds pgvector to the Postgres image.

You can also build the image manually

```
sh
git clone --branch v0.2.3 https://github.com/pgvector/pgvector.git
cd pgvector
docker build -t pgvector .
```

### Homebrew

On Mac with Homebrew Postgres, you can use:

```
sh
brew install pgvector/brew/pgvector
```

### PGXN

Install from the PostgreSQL Extension Network with:

```
sh
pgxn install vector
```

## Hosted Postgres

Some Postgres providers only support specific extensions. To request a new extension:

- Amazon RDS - follow the instructions on this page
- Google Cloud SQL - follow the instructions on this page
- DigitalOcean Managed Databases - vote or comment on this page
- Azure Database for PostgreSQL - follow the instructions on this page

## Upgrading

Install the latest version and run:

```
sql
ALTER EXTENSION vector UPDATE;
```

## Thanks

Thanks to:

- PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension
- Faiss: A Library for Efficient Similarity Search and Clustering of Dense Vectors
- Using the Triangle Inequality to Accelerate k-means
- k-means++: The Advantage of Careful Seeding
- Concept Decompositions for Large Sparse Text Data using Clustering

## History

View the changelog

## Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features

To get started with development:

```
sh
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
make install
```

To run all tests:

```
sh
make installcheck # regression tests
make prove_installcheck # TAP tests
```

To run single tests:

```
sh
make installcheck REGRESS=functions # regression test
make prove_installcheck PROVE_TESTS=test/t/001_wal.pl # TAP test
```

Resources for contributors