WIP: feature: Add before Guardrail #26

Draft
A8080816 wants to merge 15 commits from feature/before-guardrail into main
Member

PR para: #20

Resumen

Se agrega el guardrail previo sobre cada conversación para clasificar prompts y devolver respuestas de bloqueo personalizadas, reforzando el filtro de emojis y su documentación.

Cambios principales

GovernancePlugin invoca un LLM con salida estructurada y puede responder con blocking_response mientras escribe el contexto de bloqueo.
🛡️ Agent registra before_model_callback para cortar la generación principal cuando la decisión es unsafe.
🧼 Regex + FORBIDDEN_EMOJIS cubren combinaciones con tonos/ZWJ y se validan vía tests/test_governance_emojis.py.
📝 README documenta cómo ejecutar el suite específico para el filtro de emojis.

Características destacadas / Detalles técnicos

  • GenerateContentConfig obliga JSON (GuardrailOutput) con temperature=0.1 y hasta 1000 tokens.
  • CallbackContext.state ahora guarda guardrail_blocked, guardrail_message y guardrail_reasoning para auditoría.
  • Se reutiliza settings.agent_model en el guardrail LLM, sin configuraciones extras fuera de google_cloud_project/location.

Pruebas / Checklist

  • uv run ruff check
  • uv run ty check
  • uv run pytest tests/test_governance_emojis.py

Ejemplo de bloqueo

image.png
image.png
image.png

> PR para: #20 ## Resumen Se agrega el guardrail previo sobre cada conversación para clasificar prompts y devolver respuestas de bloqueo personalizadas, reforzando el filtro de emojis y su documentación. ## Cambios principales ✨ GovernancePlugin invoca un LLM con salida estructurada y puede responder con `blocking_response` mientras escribe el contexto de bloqueo. 🛡️ Agent registra `before_model_callback` para cortar la generación principal cuando la decisión es `unsafe`. 🧼 Regex + `FORBIDDEN_EMOJIS` cubren combinaciones con tonos/ZWJ y se validan vía `tests/test_governance_emojis.py`. 📝 README documenta cómo ejecutar el suite específico para el filtro de emojis. ### Características destacadas / Detalles técnicos - `GenerateContentConfig` obliga JSON (`GuardrailOutput`) con temperature=0.1 y hasta 1000 tokens. - `CallbackContext.state` ahora guarda `guardrail_blocked`, `guardrail_message` y `guardrail_reasoning` para auditoría. - Se reutiliza `settings.agent_model` en el guardrail LLM, sin configuraciones extras fuera de `google_cloud_project/location`. ### Pruebas / Checklist - [x] `uv run ruff check` - [x] `uv run ty check` - [x] `uv run pytest tests/test_governance_emojis.py` ### Ejemplo de bloqueo ![image.png](/attachments/7b6168a4-31e5-48c6-b59c-f68310f0504f) ![image.png](/attachments/39c5bca7-0cc1-4c08-a5d9-af067ecfefe1) ![image.png](/attachments/5df2ded6-b208-4166-b75c-b26982976f2d)
A8080816 added 7 commits 2026-03-10 00:48:47 +00:00
A8080816 added 1 commit 2026-03-10 00:49:12 +00:00
A8080816 added 1 commit 2026-03-10 01:02:30 +00:00
Merge branch 'main' into feature/before-guardrail
Some checks failed
CI / ci (pull_request) Failing after 12s
0f06e106da
A8080816 added 1 commit 2026-03-10 01:07:35 +00:00
style(governance): reformat guardrail module
Some checks failed
CI / ci (pull_request) Failing after 13s
db9400fcf3
A8080816 added 1 commit 2026-03-10 01:13:17 +00:00
refactor(governance): type annotate forbidden emojis and reuse regex pattern
All checks were successful
CI / ci (pull_request) Successful in 21s
5e60cffcfe
A8065384 added 1 commit 2026-03-11 23:11:36 +00:00
Merge branch 'main' into feature/before-guardrail
All checks were successful
CI / ci (pull_request) Successful in 19s
0c790cc94e
A8080816 added a new dependency 2026-03-12 15:32:55 +00:00
A8080816 changed title from feature: Add before Guardrail to WIP: feature: Add before Guardrail 2026-03-12 15:41:54 +00:00
A8080816 added 1 commit 2026-03-12 21:00:40 +00:00
feat(governance): load guardrail instruction from config
All checks were successful
CI / ci (pull_request) Successful in 21s
01610683db
A8080816 added 1 commit 2026-03-12 21:30:00 +00:00
fix(guardrails): censor user and model events when blocked
Some checks failed
CI / ci (pull_request) Failing after 12s
d92a75a393
A8080816 added 1 commit 2026-03-12 23:16:19 +00:00
fix(session): skip current model event when censoring previous user message
Some checks failed
CI / ci (pull_request) Failing after 12s
6ce548e718
Some checks failed
CI / ci (pull_request) Failing after 12s
This pull request is marked as a work in progress.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feature/before-guardrail:feature/before-guardrail
git checkout feature/before-guardrail
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Reference: va/agent#26