96 lines
4.1 KiB
Markdown
96 lines
4.1 KiB
Markdown
# Search Evaluation
|
|
|
|
This package contains scripts to evaluate the performance of the vector search component.
|
|
|
|
## Evaluation
|
|
|
|
The `search-eval` script evaluates search performance. It can source data from either BigQuery or local files.
|
|
|
|
### Local File Evaluation
|
|
|
|
To run the evaluation using a local file, use the `--input-file` option.
|
|
|
|
```bash
|
|
uv run search-eval -- --input-file /path/to/your/data.csv
|
|
```
|
|
|
|
Or for a SQLite database:
|
|
|
|
```bash
|
|
uv run search-eval -- --input-file /path/to/your/data.db
|
|
```
|
|
|
|
#### Input File Structures
|
|
|
|
**CSV File**
|
|
|
|
The CSV file must contain the following columns:
|
|
|
|
| Column | Description |
|
|
|--------|-----------------------------------------------|
|
|
| `input` | The question to be used for the search query. |
|
|
| `source` | The expected document path for the question. |
|
|
|
|
**SQLite Database**
|
|
|
|
The SQLite database must contain a table named `evaluation_data` with the following columns:
|
|
|
|
| Column | Description |
|
|
|--------|-----------------------------------------------|
|
|
| `input` | The question to be used for the search query. |
|
|
| `source` | The expected document path for the question. |
|
|
|
|
### BigQuery Evaluation
|
|
|
|
The `search-eval-bq` script evaluates search performance using data sourced from and written to BigQuery.
|
|
|
|
### BigQuery Table Structures
|
|
|
|
#### Input Table
|
|
|
|
The input table must contain the following columns:
|
|
|
|
| Column | Type | Description |
|
|
| --------------- | ------- | --------------------------------------------------------------------------- |
|
|
| `id` | STRING | A unique identifier for each question. |
|
|
| `question` | STRING | The question to be used for the search query. |
|
|
| `document_path` | STRING | The expected document path for the given question. |
|
|
| `question_type` | STRING | The type of question. Rows where `question_type` is 'Unanswerable' are ignored. |
|
|
|
|
#### Output Table
|
|
|
|
The output table will be created by the script if it doesn't exist, or appended to if it does. It will have the following structure:
|
|
|
|
| Column | Type | Description |
|
|
| ------------------------ | --------- | ------------------------------------------------------------------------ |
|
|
| `id` | STRING | The unique identifier for the question from the input table. |
|
|
| `question` | STRING | The question used for the search query. |
|
|
| `expected_document` | STRING | The expected document for the given question. |
|
|
| `retrieved_documents` | STRING[] | An array of document IDs retrieved from the vector search. |
|
|
| `retrieved_distances` | FLOAT64[] | An array of distance scores for the retrieved documents. |
|
|
| `is_expected_in_results` | BOOLEAN | A flag indicating whether the expected document was in the search results. |
|
|
| `evaluation_timestamp` | TIMESTAMP | The timestamp of when the evaluation was run. |
|
|
|
|
### Usage
|
|
|
|
To run the BigQuery evaluation script, use the `uv run search-eval-bq` command with the following options:
|
|
|
|
```bash
|
|
uv run search-eval-bq -- --input-table <project.dataset.table> --output-table <project.dataset.table> [--project-id <gcp-project-id>]
|
|
```
|
|
|
|
**Arguments:**
|
|
|
|
* `--input-table`: **(Required)** The full BigQuery table name for the input data (e.g., `my-gcp-project.my_dataset.questions`).
|
|
* `--output-table`: **(Required)** The full BigQuery table name for the output results (e.g., `my-gcp-project.my_dataset.eval_results`).
|
|
* `--project-id`: (Optional) The Google Cloud project ID. If not provided, it will use the `project_id` from the `config.yaml` file.
|
|
|
|
**Example:**
|
|
|
|
```bash
|
|
uv run search-eval-bq -- \
|
|
--input-table "my-gcp-project.search_eval.synthetic_questions" \
|
|
--output-table "my-gcp-project.search_eval.results" \
|
|
--project-id "my-gcp-project"
|
|
```
|