Go to file

Python tests / tests (push) Failing after 8s

Details

in order to allow for better efficiency and consistency

2026-02-05 02:24:37 +01:00

.gitea/workflows

add pytest

2026-02-03 02:42:46 +01:00

.vscode

add initial test cases for status page functionality

2026-02-03 01:12:06 +01:00

ocr

add ocr code

2026-02-03 00:59:15 +01:00

sample_html

add initial test cases for status page functionality

2026-02-03 01:12:06 +01:00

tests

add initial test cases for status page functionality

2026-02-03 01:12:06 +01:00

.gitignore

extend .gitignore

2026-02-03 00:58:58 +01:00

dev-requirements.txt

add pytest

2026-02-03 02:42:46 +01:00

main.py

feat: add model/MODEL_NAME parameters

2026-02-05 02:24:37 +01:00

pytest.ini

add initial test cases for status page functionality

2026-02-03 01:12:06 +01:00

README.md

[no ci] update README

2026-02-03 02:55:39 +01:00

requirements.txt

add working implementation

2026-02-03 00:58:30 +01:00

settings_example.py

enable WEBHOOK_URL in settings example

2026-02-03 02:52:14 +01:00

transcription.py

feat: add model/MODEL_NAME parameters

2026-02-05 02:24:37 +01:00

README.md

check_pa

Monitor the Berlin Perso/Passport portal, crack the audio CAPTCHA with Whisper, and notify a webhook when fresh status information drops.

Features

Automates Firefox with Selenium to reach the Berlin appointment status page
Downloads and transcribes audio CAPTCHAs via Whisper, falling back between attempts
Normalizes the returned status text and emits structured data to any HTTP webhook
Includes tooling for collecting captcha samples and benchmarking transcription quality

Requirements

Python 3.12
Firefox + geckodriver in $PATH for Selenium
ffmpeg (needed by openai-whisper)
Optional: Tesseract OCR if you experiment with the image-based approach in ocr/

Setup

Clone the repo and create a virtual environment: python -m venv .venv && source .venv/bin/activate
Install runtime dependencies: pip install -r requirements.txt
(Optional) Add tooling such as pytest: pip install -r dev-requirements.txt
Provide credentials:
- Copy settings.example.py to settings.py
- Set DOCUMENT_ID (the identifier embedded in the Berlin status URL)
- Set WEBHOOK_URL pointing to the service that should receive status payloads

Usage

Run python main.py to start a polling cycle. The script will:

Launch Firefox (set USE_HEADLESS_MODE = True in main.py for CI/servers)
Download the audio CAPTCHA into audio_captchas/
Transcribe it with Whisper via transcription.py
Parse the resulting status page and post {status, last_updated} to your webhook

Helpful utilities:

test_transcription() inside main.py evaluates every mp3 in audio_captchas/ and writes transcription_results.csv
test_parse_status_page() parses the fixtures in sample_html/ to validate the BeautifulSoup logic
ocr/recognize*.py contains earlier OCR experiments for the visual CAPTCHA

Testing

Install dev tooling: pip install -r dev-requirements.txt
Run pytest

Data & Artifacts

audio_captchas/ collects downloaded mp3 files for debugging/benchmarking
captchas/ and ocr/ scripts help capture and label the image CAPTCHAs
sample_html/ and samples/ host anonymized HTML snapshots used for parsing tests

Troubleshooting

Whisper may need the ffmpeg binary; ensure ffmpeg -version works inside the venv
If Selenium cannot start, verify geckodriver --version is available and matches your Firefox version
For webhook issues, run a tool like nc -l 8080 or smee.io to inspect outbound payloads