Files
agent/apps/search-eval
Anibal Angulo a53f8fcf62 First commit
2026-02-18 19:57:43 +00:00
..
2026-02-18 19:57:43 +00:00
2026-02-18 19:57:43 +00:00
2026-02-18 19:57:43 +00:00

Search Evaluation

This package contains scripts to evaluate the performance of the vector search component.

Evaluation

The search-eval script evaluates search performance. It can source data from either BigQuery or local files.

Local File Evaluation

To run the evaluation using a local file, use the --input-file option.

uv run search-eval -- --input-file /path/to/your/data.csv

Or for a SQLite database:

uv run search-eval -- --input-file /path/to/your/data.db

Input File Structures

CSV File

The CSV file must contain the following columns:

Column Description
input The question to be used for the search query.
source The expected document path for the question.

SQLite Database

The SQLite database must contain a table named evaluation_data with the following columns:

Column Description
input The question to be used for the search query.
source The expected document path for the question.

BigQuery Evaluation

The search-eval-bq script evaluates search performance using data sourced from and written to BigQuery.

BigQuery Table Structures

Input Table

The input table must contain the following columns:

Column Type Description
id STRING A unique identifier for each question.
question STRING The question to be used for the search query.
document_path STRING The expected document path for the given question.
question_type STRING The type of question. Rows where question_type is 'Unanswerable' are ignored.

Output Table

The output table will be created by the script if it doesn't exist, or appended to if it does. It will have the following structure:

Column Type Description
id STRING The unique identifier for the question from the input table.
question STRING The question used for the search query.
expected_document STRING The expected document for the given question.
retrieved_documents STRING[] An array of document IDs retrieved from the vector search.
retrieved_distances FLOAT64[] An array of distance scores for the retrieved documents.
is_expected_in_results BOOLEAN A flag indicating whether the expected document was in the search results.
evaluation_timestamp TIMESTAMP The timestamp of when the evaluation was run.

Usage

To run the BigQuery evaluation script, use the uv run search-eval-bq command with the following options:

uv run search-eval-bq -- --input-table <project.dataset.table> --output-table <project.dataset.table> [--project-id <gcp-project-id>]

Arguments:

  • --input-table: (Required) The full BigQuery table name for the input data (e.g., my-gcp-project.my_dataset.questions).
  • --output-table: (Required) The full BigQuery table name for the output results (e.g., my-gcp-project.my_dataset.eval_results).
  • --project-id: (Optional) The Google Cloud project ID. If not provided, it will use the project_id from the config.yaml file.

Example:

uv run search-eval-bq -- \
  --input-table "my-gcp-project.search_eval.synthetic_questions" \
  --output-table "my-gcp-project.search_eval.results" \
  --project-id "my-gcp-project"