4.1 KiB
Search Evaluation
This package contains scripts to evaluate the performance of the vector search component.
Evaluation
The search-eval script evaluates search performance. It can source data from either BigQuery or local files.
Local File Evaluation
To run the evaluation using a local file, use the --input-file option.
uv run search-eval -- --input-file /path/to/your/data.csv
Or for a SQLite database:
uv run search-eval -- --input-file /path/to/your/data.db
Input File Structures
CSV File
The CSV file must contain the following columns:
| Column | Description |
|---|---|
input |
The question to be used for the search query. |
source |
The expected document path for the question. |
SQLite Database
The SQLite database must contain a table named evaluation_data with the following columns:
| Column | Description |
|---|---|
input |
The question to be used for the search query. |
source |
The expected document path for the question. |
BigQuery Evaluation
The search-eval-bq script evaluates search performance using data sourced from and written to BigQuery.
BigQuery Table Structures
Input Table
The input table must contain the following columns:
| Column | Type | Description |
|---|---|---|
id |
STRING | A unique identifier for each question. |
question |
STRING | The question to be used for the search query. |
document_path |
STRING | The expected document path for the given question. |
question_type |
STRING | The type of question. Rows where question_type is 'Unanswerable' are ignored. |
Output Table
The output table will be created by the script if it doesn't exist, or appended to if it does. It will have the following structure:
| Column | Type | Description |
|---|---|---|
id |
STRING | The unique identifier for the question from the input table. |
question |
STRING | The question used for the search query. |
expected_document |
STRING | The expected document for the given question. |
retrieved_documents |
STRING[] | An array of document IDs retrieved from the vector search. |
retrieved_distances |
FLOAT64[] | An array of distance scores for the retrieved documents. |
is_expected_in_results |
BOOLEAN | A flag indicating whether the expected document was in the search results. |
evaluation_timestamp |
TIMESTAMP | The timestamp of when the evaluation was run. |
Usage
To run the BigQuery evaluation script, use the uv run search-eval-bq command with the following options:
uv run search-eval-bq -- --input-table <project.dataset.table> --output-table <project.dataset.table> [--project-id <gcp-project-id>]
Arguments:
--input-table: (Required) The full BigQuery table name for the input data (e.g.,my-gcp-project.my_dataset.questions).--output-table: (Required) The full BigQuery table name for the output results (e.g.,my-gcp-project.my_dataset.eval_results).--project-id: (Optional) The Google Cloud project ID. If not provided, it will use theproject_idfrom theconfig.yamlfile.
Example:
uv run search-eval-bq -- \
--input-table "my-gcp-project.search_eval.synthetic_questions" \
--output-table "my-gcp-project.search_eval.results" \
--project-id "my-gcp-project"