Exact and approximate vector search
The problem of finding the N closest points in a multi-dimensional (vector) space for a given point is known as nearest neighbor search or, in short: vector search. Two general approaches exist for solving vector search:
- Exact vector search calculates the distance between the given point and all points in the vector space. This ensures the best possible accuracy, i.e. the returned points are guaranteed to be the actual nearest neighbors. Since the vector space is explored exhaustively, exact vector search can be too slow for real-world use.
- Approximate vector search refers to a group of techniques (e.g., special data structures like graphs and random forests) which compute results much faster than exact vector search. The result accuracy is typically "good enough" for practical use. Many approximate techniques provide parameters to tune the trade-off between the result accuracy and the search time.
A vector search (exact or approximate) can be written in SQL as follows:
The points in the vector space are stored in a column vectors of array type, e.g. Array(Float64), Array(Float32), or Array(BFloat16).
The reference vector is a constant array and given as a common table expression.
<DistanceFunction> computes the distance between the reference point and all stored points.
Any of the available distance function can be used for that.
<N> specifies how many neighbors should be returned.
Exact vector search
An exact vector search can be performed using above SELECT query as is. The runtime of such queries is generally proportional to the number of stored vectors and their dimension, i.e. the number of array elements. Also, since ClickHouse performs a brute-force scan of all vectors, the runtime depends also on the number of threads by the query (see setting max_threads).
Example
returns
Approximate vector search
Vector Similarity Indexes
ClickHouse provides a special "vector similarity" index to perform approximate vector search.
Vector similarity indexes are available in ClickHouse version 25.8 and higher. If you run into problems, kindly open an issue in the ClickHouse repository.
Creating a Vector Similarity Index
A vector similarity index can be created on a new table like this:
Alternatively, to add a vector similarity index to an existing table:
Vector similarity indexes are special kinds of skipping indexes (see here and here).
Accordingly, above ALTER TABLE statement only causes the index to be build for future new data inserted into the table.
To build the index for existing data as well, you need to materialize it:
Function <distance_function> must be
- L2Distance, the Euclidean distance, representing the length of a line between two points in Euclidean space, or
- cosineDistance, the cosine distance, representing the angle between two non-zero vectors.
For normalized data, L2Distance is usually the best choice, otherwise cosineDistance is recommended to compensate for scale.
<dimensions> specifies the array cardinality (number of elements) in the underlying column.
If ClickHouse finds an array with a different cardinality during index creation, the index is discarded and an error is returned.
The optional GRANULARITY parameter <N> refers to the size of the index granules (see here).
The default value of 100 million should work reasonably well for most use cases but it can also be tuned.
We recommend tuning only for advanced users who understand the implications of what they are doing (see below).
Vector similarity indexes are generic in the sense that they can accommodate different approximate search method.
The actually used method is specified by parameter <type>.
As of now, the only available method is HNSW (academic paper), a popular and state-of-the-art technique for approximate vector search based on hierarchical proximity graphs.
If HNSW is used as type, users may optionally specify further HNSW-specific parameters:
These HNSW-specific parameters are available:
- <quantization>controls the quantization of the vectors in the proximity graph. Possible values are- f64,- f32,- f16,- bf16,- i8, or- b1. The default value is- bf16. Note that this parameter does not affect the representation of the vectors in the underlying column.
- <hnsw_max_connections_per_layer>controls the number of neighbors per graph node, also known as HNSW hyperparameter- M. The default value is- 32. Value- 0means using the default value.
- <hnsw_candidate_list_size_for_construction>controls the size of the dynamic candidate list during construction of the HNSW graph, also known as HNSW hyperparameter- ef_construction. The default value is- 128. Value- 0means using the default value.
The default values of all HNSW-specific parameters work reasonably well in the majority of use cases. We therefore do not recommend customizing the HNSW-specific parameters.
Further restrictions apply:
- Vector similarity indexes can only be build on columns of type Array(Float32), Array(Float64), or Array(BFloat16). Arrays of nullable and low-cardinality floats such as Array(Nullable(Float32))andArray(LowCardinality(Float32))are not allowed.
- Vector similarity indexes must be build on single columns.
- Vector similarity indexes may be build on calculated expressions (e.g., INDEX index_name arraySort(vectors) TYPE vector_similarity([...])) but such indexes cannot be used for approximate neighbor search later on.
- Vector similarity indexes require that all arrays in the underlying column have <dimension>-many elements - this is checked during index creation. To detect violations of this requirement as early as possible, users can add a constraint for the vector column, e.g.,CONSTRAINT same_length CHECK length(vectors) = 256.
- Likewise, array values in the underlying column must not be empty ([]) or have a default value (also[]).
Estimating storage and memory consumption
A vector generated for use with a typical AI model (e.g. a Large Language Model, LLMs) consists of hundreds or thousands of floating-point values. Thus, a single vector value can have a memory consumption of multiple kilobyte. Users who like to estimate the storage required for the underlying vector column in the table, as well as the main memory needed for the vector similarity index, can use below two formula:
Storage consumption of the vector column in the table (uncompressed):
Example for the dbpedia dataset:
The vector similarity index must be fully loaded from disk into main memory to perform searches. Similarly, the vector index is also constructed fully in memory and then saved to disk.
Memory consumption required to load a vector index:
Example for the dbpedia dataset:
Above formula does not account for additional memory required by vector similarity indexes to allocate runtime data structures like pre-allocated buffers and caches.
Using a Vector Similarity Index
To use vector similarity indexes, setting compatibility has be '' (the default value), or '25.1' or newer.
Vector similarity indexes support SELECT queries of this form:
ClickHouse's query optimizer tries to match above query template and make use of available vector similarity indexes. A query can only use a vector similarity index if the distance function in the SELECT query is the same as the distance function in the index definition.
Advanced users may provide a custom value for setting hnsw_candidate_list_size_for_search (also know as HNSW hyperparameter "ef_search") to tune the size of the candidate list during search (e.g.  SELECT [...] SETTINGS hnsw_candidate_list_size_for_search = <value>).
The default value of the setting 256 works well in the majority of use cases.
Higher setting values mean better accuracy at the cost of slower performance.
If the query can use a vector similarity index, ClickHouse checks that the LIMIT <N> provided in SELECT queries is within reasonable bounds.
More specifically, an error is returned if <N> is bigger than the value of setting max_limit_for_vector_search_queries with default value 100.
Too large LIMIT values can slow down searches and usually indicate a usage error.
To check if a SELECT query uses a vector similarity index, you can prefix the query with EXPLAIN indexes = 1.
As an example, query
may return
In this example, 1 million vectors in the dbpedia dataset, each with dimension 1536, are stored in 575 granules, i.e. 1.7k rows per granule. The query asks for 10 neighbours and the vector similarity index finds these 10 neighbours in 10 separate granules. These 10 granules will be read during query execution.
Vector similarity indexes are used if the output contains Skip and the name and type of the vector index (in the example, idx and vector_similarity).
In this case, the vector similarity index dropped two of four granules, i.e. 50% of the data.
The more granules can be dropped, the more effective index usage becomes.
To enforce index usage, you can run the SELECT query with setting force_data_skipping_indexes (provide the index name as setting value).
Post-filtering and Pre-filtering
Users may optionally specify a WHERE clause with additional filter conditions for the SELECT query.
ClickHouse will evaluate these filter conditions using post-filtering or pre-filtering strategy.
In short, both strategies determine the order in which the filters are evaluated:
- Post-filtering means that the vector similarity index is evaluated first, afterwards ClickHouse evaluates the additional filter(s) specified in the WHEREclause.
- Pre-filtering means that the filter evaluation order is the other way round.
The strategies have different trade-offs:
- Post-filtering has the general problem that it may return less than the number of rows requested in the LIMIT <N>clause. This situation happens when one or more result rows returned by the vector similarity index fails to satisfy the additional filters.
- Pre-filtering is generally an unsolved problem. Certain specialized vector databases provide pre-filtering algorithms but most relational databases (including ClickHouse) will fall back to exact neighbor search, i.e., a brute-force scan without index.
What strategy is used depends on the filter condition.
Additional filters are part of the partition key
If the additional filter condition is part of the partition key, then ClickHouse will apply partition pruning.
As an example, a table is range-partitioned by column year and the following query is run:
ClickHouse will prune all partitions except the 2025 one.
Additional filters cannot be evaluated using indexes
If additional filter conditions cannot be evaluated using indexes (primary key index, skipping index), ClickHouse will apply post-filtering.
Additional filters can be evaluated using the primary key index
If additional filter conditions can be evaluated using the primary key (i.e., they form a prefix of the primary key) and
- the filter condition eliminates at least one row within a part, the ClickHouse will fall back to pre-filtering for the "surviving" ranges within the part,
- the filter condition eliminates no rows within a part, the ClickHouse will perform post-filtering for the part.
In practical use cases, the latter case is rather unlikely.
Additional filters can be evaluated using skipping index
If additional filter conditions can be evaluated using skipping indexes (minmax index, set index, etc.), Clickhouse performs post-filtering. In such cases, the vector similarity index is evaluated first as it is expected to remove the most rows relative to other skipping indexes.
For finer control over post-filtering vs. pre-filtering, two settings can be used:
Setting vector_search_filter_strategy (default: auto which implements above heuristics) may be set to prefilter.
This is useful to force pre-filtering in cases where the additional filter conditions are extremely selective.
As an example, the following query may benefit from pre-filtering:
Assuming that only a very small number of books cost less than 2 dollar, post-filtering may return zero rows because the top 10 matches returned by the vector index could all be priced above 2 dollar.
By forcing pre-filtering (add SETTINGS vector_search_filter_strategy = 'prefilter' to the query), ClickHouse first finds all books with a price of less than 2 dollar and then executes a brute-force vector search for the found books.
As an alternative approach to resolve above issue, setting vector_search_index_fetch_multiplier (default: 1.0, maximum: 1000.0) may be configured to a value > 1.0 (for example, 2.0).
The number of nearest neighbors fetched from the vector index is multiplied by the setting value and then the additional filter to be applied on those rows to return LIMIT-many rows.
As an example, we can query again but with multiplier 3.0:
ClickHouse will fetch 3.0 x 10 = 30 nearest neighbors from the vector index in each part and afterwards evaluate the additional filters.
Only the ten closest neighbors will be returned.
We note that setting vector_search_index_fetch_multiplier can mitigate the problem but in extreme cases (very selective WHERE condition), it is still possible that less than N requested rows returned.
Rescoring
Skip indexes in ClickHouse generally filter at the granule level, i.e. a lookup in a skip index (internally) returns a list of potentially matching granules which reduces the number of read data in the subsequent scan. This works well for skip indexes in general but in the case of vector similarity indexes, it creates a "granularity mismatch". In more detail, the vector similarity index determines the row numbers of the N most similar vectors for a given reference vector, but it then needs to extrapolate these row numbers to granule numbers. ClickHouse will then load these granules from disk, and repeat the distance calculation for all vectors in these granules. This step is called rescoring and while it can theoretically improve accuracy - remember the vector similarity index returns only an approximate result, it is obvious not optimal in terms of performance.
ClickHouse therefore provides an optimization which disables rescoring and returns the most similar vectors and their distances directly from the index.
The optimization is enabled by default, see setting vector_search_with_rescoring.
The way it works at a high level is that ClickHouse makes the most similar vectors and their distances available as a virtual column _distances.
To see this, run a vector search query with EXPLAIN header = 1:
A query run without rescoring (vector_search_with_rescoring = 0) and with parallel replicas enabled may fall back to rescoring.
Performance tuning
Tuning compression
In virtually all use cases, the vectors in the underlying column are dense and do not compress well.
As a result, compression slows down inserts and reads into/from the vector column.
We therefore recommend to disable compression.
To do that, specify CODEC(NONE) for the vector column like this:
Tuning index creation
The life cycle of vector similarity indexes is tied to the life cycle of parts. In other words, whenever a new part with defined vector similarity index is created, the index is create as well. This typically happens when data is inserted or during merges. Unfortunately, HNSW is known for long index creation times which can significantly slow down inserts and merges. Vector similarity indexes are ideally only used if the data is immutable or rarely changed.
To speed up index creation, the following techniques can be used:
First, index creation can be parallelized. The maximum number of index creation threads can be configured using server setting max_build_vector_similarity_index_thread_pool_size. For optimal performance, the setting value should be configured to the number of CPU cores.
Second, to speed up INSERT statements, users may disable the creation of skipping indexes on newly inserted parts using session setting materialize_skip_indexes_on_insert. SELECT queries on such parts will fall back to exact search. Since inserted parts tend to be small compared to the total table size, the performance impact of that is expected to be negligible.
Third, to speed up merges, users may disable the creation of skipping indexes on merged parts using session setting materialize_skip_indexes_on_merge. This, in conjunction with statement ALTER TABLE [...] MATERIALIZE INDEX [...], provides explicit control over the life cycle of vector similarity indexes. For example, index creation can be deferred until all data was ingested or until a period of low system load such as the weekend.
Tuning index usage
SELECT queries need to load vector similarity indexes into main memory to use them. To avoid that the same vector similarity index is loaded repeatedly into main memory, ClickHouse provides a dedicated in-memory cache for such indexes. The bigger this cache is, the fewer unnecessary loads will happen. The maximum cache size can be configured using server setting vector_similarity_index_cache_size. By default, the cache can grow up to 5 GB in size.
The vector similarity index cache stores vector index granules. If individual vector index granules are bigger than the cache size, they will not be cached. Therefore, please make sure to calculate the vector index size (based on the formula in "Estimating storage and memory consumption" or system.data_skipping_indices) and size the cache correspondingly.
The current size of the vector similarity index cache is shown in system.metrics:
The cache hits and misses for a query with some query id can be obtained from system.query_log:
For production use-cases, we recommend that the cache is sized large enough so that all vector indexes remain in memory at all times.
Tuning quantization
Quantization is a technique to reduce the memory footprint of vectors and the computational costs of building and traversing vector indexes. ClickHouse vector indexes supports the following quantization options:
| Quantization | Name | Storage per dimension | 
|---|---|---|
| f32 | Single precision | 4 bytes | 
| f16 | Half precision | 2 bytes | 
| bf16 (default) | Half precision (brain float) | 2 bytes | 
| i8 | Quarter precision | 1 byte | 
| b1 | Binary | 1 bit | 
Quantization reduces the precision of vector searches compared to searching the original full-precision floating-point values (f32).
However, on most datasets, half-precision brain float quantization (bf16) results in a negligible precision loss, therefore vector similarity indexes use this quantization technique by default.
Quarter precision (i8) and binary (b1) quantization causes appreciable precision loss in vector searches.
We recommend both quantizations only if the the size of the vector similarity index is significantly larger than the available DRAM size.
In this case, we also suggest enabling rescoring (vector_search_index_fetch_multiplier, vector_search_with_rescoring) to improve accuracy.
Binary quantization is only recommended for 1) normalized embeddings (i.e. vector length = 1, OpenAI models are usually normalized), and 2) if the cosine distance is used as distance function.
Binary quantization internally uses the Hamming distance to construct and search the proximity graph.
The rescoring step uses the original full-precision vectors stored in the table to identify the nearest neighbours via cosine distance.
Tuning data transfer
The reference vector in a vector search query is provided by the user and generally retrieved by making a call to a Large Language Model (LLM). Typical Python code which runs a vector search in ClickHouse might look like this
Embedding vectors (search_v in above snippet) could have a very large dimension.
For example, OpenAI provides models that generate embeddings vectors with 1536 or even 3072 dimensions.
In above code, the ClickHouse Python driver substitutes the embedding vector by a human readable string and subsequently send the SELECT query entirely as a string.
Assuming the embedding vector consists of 1536 single-precision floating point values, the sent string reaches a length of 20 kB.
This creates a high CPU usage for tokenizing, parsing and performing thousands of string-to-float conversions.
Also, significant space is required in the ClickHouse server log file, causing bloat in system.query_log as well.
Note that most LLM models return an embedding vector as a list or NumPy array of native floats. We therefore recommend Python applications to bind the reference vector parameter in binary form by using the following style:
In the example, the reference vector is sent as-is in binary form and reinterpreted as array of floats on the server.
This saves CPU time on the server side, and avoids bloat in the server logs and system.query_log.
Administration and monitoring
The on-disk size of vector similarity indexes can be obtained from system.data_skipping_indices:
Example output:
Differences to regular skipping indexes
As all regular skipping indexes, vector similarity indexes are constructed over granules and each indexed block consists of GRANULARITY = [N]-many granules ([N] = 1 by default for normal skipping indexes).
For example, if the primary index granularity of the table is 8192 (setting index_granularity = 8192) and GRANULARITY = 2, then each indexed block will contain 16384 rows.
However, data structures and algorithms for approximate neighbor search are inherently row-oriented.
They store a compact representation of a set of rows and also return rows for vector search queries.
This causes some rather unintuitive differences in the way vector similarity indexes behave compared to normal skipping indexes.
When a user defines a vector similarity index on a column, ClickHouse internally creates a vector similarity "sub-index" for each index block. The sub-index is "local" in the sense that it only knows about the rows of its containing index block. In the previous example and assuming that a column has 65536 rows, we obtain four index blocks (spanning eight granules) and a vector similarity sub-index for each index block. A sub-index is theoretically able to return the rows with the N closest points within its index block directly. However, since ClickHouse loads data from disk to memory at the granularity of granules, sub-indexes extrapolate matching rows to granule granularity. This is different from regular skipping indexes which skip data at the granularity of index blocks.
The GRANULARITY parameter determines how many vector similarity sub-indexes are created.
Bigger GRANULARITY values mean fewer but larger vector similarity sub-indexes, up to the point where a column (or a column's data part) has only a single sub-index.
In that case, the sub-index has a "global" view of all column rows and can directly return all granules of the column (part) with relevant rows (there are at most LIMIT [N]-many such granules).
In a second step, ClickHouse will load these granules and identify the actually best rows by performing a brute-force distance calculation over all rows of the granules.
With a small GRANULARITY value, each of the sub-indexes returns up to LIMIT N-many granules.
As a result, more granules need to be loaded and post-filtered.
Note that the search accuracy is with both cases equally good, only the processing performance differs.
It is generally recommended to use a large GRANULARITY for vector similarity indexes and fall back to a smaller GRANULARITY values only in case of problems like excessive memory consumption of the vector similarity structures.
If no GRANULARITY was specified for vector similarity indexes, the default value is 100 million.
Example
returns
Further example datasets that use approximate vector search:
Quantized Bit (QBit)
One common approach to speed up exact vector search is to use a lower-precision float data type.
For example, if vectors are stored as Array(BFloat16) instead of Array(Float32), the data size is reduced by half, and query runtimes are expected to decrease proportionally.
This method is known as quantization. While it speeds up computation, it may reduce result accuracy despite performing an exhaustive scan of all vectors.
With traditional quantization, we lose precision both during search and when storing the data. In the example above, we would store BFloat16 instead of Float32, meaning we can never perform a more accurate search later, even if desired. One alternative approach is to store two copies of the data: quantized and full-precision. While this works, it requires redundant storage. Consider a scenario where we have Float64 as original data and want to run searches with different precision (16-bit, 32-bit, or full 64-bit). We would need to store three separate copies of the data.
ClickHouse offers the Quantized Bit (QBit) data type that addresses these limitations by:
- Storing the original full-precision data.
- Allowing quantization precision to be specified at query time.
This is achieved by storing data in a bit-grouped format (meaning all i-th bits of all vectors are stored together), enabling reads at only the requested precision level. You get the speed benefits of reduced I/O and computation from quantization while keeping all original data available when needed. When maximum precision is selected, the search becomes exact.
The QBit data type and its associated distance functions are currently experimental. To enable them, run SET allow_experimental_qbit_type = 1.
If you encounter problems, please open an issue in the ClickHouse repository.
To declare a column of QBit type, use the following syntax:
Where:
- element_type– the type of each vector element. Supported types are- BFloat16,- Float32, and- Float64
- dimension– the number of elements in each vector
Creating a QBit Table and Adding Data
Vector Search with QBit
Let's find the nearest neighbors to a vector representing word 'lemon' using L2 distance. The third parameter in the distance function specifies the precision in bits - higher values provide more accuracy but require more computation.
You can find all available distance functions for QBit here.
Full precision search (64-bit):
Reduced precision search:
Notice that with 12-bit quantization, we get a good approximation of the distances with faster query execution. The relative ordering remains largely consistent, with 'apple' still being the closest match.
In the current state, the speed-up is due to reduced I/O as we read less data. If the original data was wide, like Float64, choosing a lower precision will still result in distance calculation on data of the same width – just with less precision.
Performance Considerations
The performance benefit of QBit comes from reduced I/O operations, as less data needs to be read from storage when using lower precision. Moreover, when the QBit contains Float32 data, if the precision parameter is 16 or below, there will be additional benefits from reduced computation. The precision parameter directly controls the trade-off between accuracy and speed:
- Higher precision (closer to the original data width): More accurate results, slower queries
- Lower precision: Faster queries with approximate results, reduced memory usage
References
Blogs:
