agent/apps/search-eval/README.md

# Search Evaluation

This package contains scripts to evaluate the performance of the vector search component.

## Evaluation

The `search-eval` script evaluates search performance. It can source data from either BigQuery or local files.

### Local File Evaluation

To run the evaluation using a local file, use the `--input-file` option.

```bash
uv run search-eval -- --input-file /path/to/your/data.csv
```

Or for a SQLite database:

```bash
uv run search-eval -- --input-file /path/to/your/data.db
```

#### Input File Structures

**CSV File**

The CSV file must contain the following columns:

| Column | Description                                   |
|--------|-----------------------------------------------|
| `input`  | The question to be used for the search query. |
| `source` | The expected document path for the question.  |

**SQLite Database**

The SQLite database must contain a table named `evaluation_data` with the following columns:

| Column | Description                                   |
|--------|-----------------------------------------------|
| `input`  | The question to be used for the search query. |
| `source` | The expected document path for the question.  |

### BigQuery Evaluation

The `search-eval-bq` script evaluates search performance using data sourced from and written to BigQuery.

### BigQuery Table Structures

#### Input Table

The input table must contain the following columns:

| Column          | Type    | Description                                                                 |
| --------------- | ------- | --------------------------------------------------------------------------- |
| `id`            | STRING  | A unique identifier for each question.                                      |
| `question`      | STRING  | The question to be used for the search query.                               |
| `document_path` | STRING  | The expected document path for the given question.                          |
| `question_type` | STRING  | The type of question. Rows where `question_type` is 'Unanswerable' are ignored. |

#### Output Table

The output table will be created by the script if it doesn't exist, or appended to if it does. It will have the following structure:

| Column                   | Type      | Description                                                              |
| ------------------------ | --------- | ------------------------------------------------------------------------ |
| `id`                     | STRING    | The unique identifier for the question from the input table.             |
| `question`               | STRING    | The question used for the search query.                                  |
| `expected_document`      | STRING    | The expected document for the given question.                            |
| `retrieved_documents`    | STRING[]  | An array of document IDs retrieved from the vector search.               |
| `retrieved_distances`    | FLOAT64[] | An array of distance scores for the retrieved documents.                 |
| `is_expected_in_results` | BOOLEAN   | A flag indicating whether the expected document was in the search results. |
| `evaluation_timestamp`   | TIMESTAMP | The timestamp of when the evaluation was run.                            |

### Usage

To run the BigQuery evaluation script, use the `uv run search-eval-bq` command with the following options:

```bash
uv run search-eval-bq -- --input-table <project.dataset.table> --output-table <project.dataset.table> [--project-id <gcp-project-id>]
```

**Arguments:**

*   `--input-table`: **(Required)** The full BigQuery table name for the input data (e.g., `my-gcp-project.my_dataset.questions`).
*   `--output-table`: **(Required)** The full BigQuery table name for the output results (e.g., `my-gcp-project.my_dataset.eval_results`).
*   `--project-id`: (Optional) The Google Cloud project ID. If not provided, it will use the `project_id` from the `config.yaml` file.

**Example:**

```bash
uv run search-eval-bq -- \
  --input-table "my-gcp-project.search_eval.synthetic_questions" \
  --output-table "my-gcp-project.search_eval.results" \
  --project-id "my-gcp-project"
```