From 12e8c5504b772fd3870e718376e28a0226c9ee6b Mon Sep 17 00:00:00 2001
From: cyroxx <cyroxx@ccc-p.org>
Date: Tue, 3 Feb 2026 02:55:39 +0100
Subject: [PATCH] [no ci] update README

---
 README.md | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 616ca3d..7c2bdcb 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,50 @@
 # check_pa
 
-Check Personalausweis/Reisepass availability for Berlin.
\ No newline at end of file
+Monitor the Berlin Perso/Passport portal, crack the audio CAPTCHA with Whisper, and notify a webhook when fresh status information drops.
+
+## Features
+- Automates Firefox with Selenium to reach the Berlin appointment status page
+- Downloads and transcribes audio CAPTCHAs via Whisper, falling back between attempts
+- Normalizes the returned status text and emits structured data to any HTTP webhook
+- Includes tooling for collecting captcha samples and benchmarking transcription quality
+
+## Requirements
+- Python 3.12
+- Firefox + `geckodriver` in `$PATH` for Selenium
+- `ffmpeg` (needed by `openai-whisper`)
+- Optional: Tesseract OCR if you experiment with the image-based approach in `ocr/`
+
+## Setup
+1. Clone the repo and create a virtual environment: `python -m venv .venv && source .venv/bin/activate`
+2. Install runtime dependencies: `pip install -r requirements.txt`
+3. (Optional) Add tooling such as pytest: `pip install -r dev-requirements.txt`
+4. Provide credentials:
+	- Copy `settings.example.py` to `settings.py`
+	- Set `DOCUMENT_ID` (the identifier embedded in the Berlin status URL)
+	- Set `WEBHOOK_URL` pointing to the service that should receive status payloads
+
+## Usage
+Run `python main.py` to start a polling cycle. The script will:
+1. Launch Firefox (set `USE_HEADLESS_MODE = True` in `main.py` for CI/servers)
+2. Download the audio CAPTCHA into `audio_captchas/`
+3. Transcribe it with Whisper via `transcription.py`
+4. Parse the resulting status page and post `{status, last_updated}` to your webhook
+
+Helpful utilities:
+- `test_transcription()` inside `main.py` evaluates every mp3 in `audio_captchas/` and writes `transcription_results.csv`
+- `test_parse_status_page()` parses the fixtures in `sample_html/` to validate the BeautifulSoup logic
+- `ocr/recognize*.py` contains earlier OCR experiments for the visual CAPTCHA
+
+## Testing
+- Install dev tooling: `pip install -r dev-requirements.txt`
+- Run `pytest`
+
+## Data & Artifacts
+- `audio_captchas/` collects downloaded mp3 files for debugging/benchmarking
+- `captchas/` and `ocr/` scripts help capture and label the image CAPTCHAs
+- `sample_html/` and `samples/` host anonymized HTML snapshots used for parsing tests
+
+## Troubleshooting
+- Whisper may need the `ffmpeg` binary; ensure `ffmpeg -version` works inside the venv
+- If Selenium cannot start, verify `geckodriver --version` is available and matches your Firefox version
+- For webhook issues, run a tool like `nc -l 8080` or `smee.io` to inspect outbound payloads
\ No newline at end of file