# Synthetic Question Generator This application generates a set of synthetic questions from documents stored in Google Cloud Storage (GCS) and saves them to a local CSV file. For each document, it generates one question for each predefined question type (Factual, Summarization, etc.). The output CSV is structured for easy uploading to a BigQuery table with the following schema: `input` (STRING), `expected_output` (STRING), `source` (STRING), `type` (STRING). ## Usage The script is run from the command line. You need to provide the path to the source documents within your GCS bucket and a path for the output CSV file. ### Command ```bash uv run python -m synth_gen.main [OPTIONS] GCS_PATH ``` ### Arguments * `GCS_PATH`: (Required) The path to the directory in your GCS bucket where the source markdown files are located (e.g., `documents/markdown/`). * `--output-csv, -o`: (Required) The local file path where the generated questions will be saved in CSV format. ### Example ```bash uv run python -m synth_gen.main documents/processed/ --output-csv synthetic_questions.csv ``` This command will fetch all documents from the `gs:///documents/processed/` directory, generate questions for each, and save them to a file named `synthetic_questions.csv` in the current directory.