Search Evaluation

This package contains scripts to evaluate the performance of the vector search component.

Evaluation

The search-eval script evaluates search performance. It can source data from either BigQuery or local files.

Local File Evaluation

To run the evaluation using a local file, use the --input-file option.

uv run search-eval -- --input-file /path/to/your/data.csv

Or for a SQLite database:

uv run search-eval -- --input-file /path/to/your/data.db

Input File Structures

CSV File

The CSV file must contain the following columns:

Column	Description
`input`	The question to be used for the search query.
`source`	The expected document path for the question.

SQLite Database

The SQLite database must contain a table named evaluation_data with the following columns:

Column	Description
`input`	The question to be used for the search query.
`source`	The expected document path for the question.

BigQuery Evaluation

The search-eval-bq script evaluates search performance using data sourced from and written to BigQuery.

BigQuery Table Structures

Input Table

The input table must contain the following columns:

Column	Type	Description
`id`	STRING	A unique identifier for each question.
`question`	STRING	The question to be used for the search query.
`document_path`	STRING	The expected document path for the given question.
`question_type`	STRING	The type of question. Rows where `question_type` is 'Unanswerable' are ignored.

Output Table

The output table will be created by the script if it doesn't exist, or appended to if it does. It will have the following structure:

Column	Type	Description
`id`	STRING	The unique identifier for the question from the input table.
`question`	STRING	The question used for the search query.
`expected_document`	STRING	The expected document for the given question.
`retrieved_documents`	STRING[]	An array of document IDs retrieved from the vector search.
`retrieved_distances`	FLOAT64[]	An array of distance scores for the retrieved documents.
`is_expected_in_results`	BOOLEAN	A flag indicating whether the expected document was in the search results.
`evaluation_timestamp`	TIMESTAMP	The timestamp of when the evaluation was run.

Usage

To run the BigQuery evaluation script, use the uv run search-eval-bq command with the following options:

uv run search-eval-bq -- --input-table <project.dataset.table> --output-table <project.dataset.table> [--project-id <gcp-project-id>]

Arguments:

--input-table: (Required) The full BigQuery table name for the input data (e.g., my-gcp-project.my_dataset.questions).
--output-table: (Required) The full BigQuery table name for the output results (e.g., my-gcp-project.my_dataset.eval_results).
--project-id: (Optional) The Google Cloud project ID. If not provided, it will use the project_id from the config.yaml file.

Example:

uv run search-eval-bq -- \
  --input-table "my-gcp-project.search_eval.synthetic_questions" \
  --output-table "my-gcp-project.search_eval.results" \
  --project-id "my-gcp-project"