docs: update README and requirements for optional OCR dependencies
All checks were successful
Python tests / tests (push) Successful in 1m27s
All checks were successful
Python tests / tests (push) Successful in 1m27s
This commit is contained in:
@@ -15,12 +15,14 @@ Monitor the Berlin Perso/Passport portal, crack the audio CAPTCHA with Whisper,
|
|||||||
- Firefox + `geckodriver` in `$PATH` for Selenium
|
- Firefox + `geckodriver` in `$PATH` for Selenium
|
||||||
- `ffmpeg` (needed by `openai-whisper`)
|
- `ffmpeg` (needed by `openai-whisper`)
|
||||||
- Optional: Tesseract OCR if you experiment with the image-based approach in `ocr/`
|
- Optional: Tesseract OCR if you experiment with the image-based approach in `ocr/`
|
||||||
|
- Optional: Python packages from `requirements-ocr.txt` when working on the OCR experiments
|
||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
1. Clone the repo and create a virtual environment: `python -m venv .venv && source .venv/bin/activate`
|
1. Clone the repo and create a virtual environment: `python -m venv .venv && source .venv/bin/activate`
|
||||||
2. Install runtime dependencies: `pip install -r requirements.txt`
|
2. Install runtime dependencies: `pip install -r requirements.txt`
|
||||||
3. (Optional) Add tooling such as pytest: `pip install -r dev-requirements.txt`
|
3. (Optional) Install OCR extras: `pip install -r requirements-ocr.txt`
|
||||||
4. Provide credentials:
|
4. (Optional) Add tooling such as pytest: `pip install -r dev-requirements.txt`
|
||||||
|
5. Provide credentials:
|
||||||
- Copy `settings.example.py` to `settings.py`
|
- Copy `settings.example.py` to `settings.py`
|
||||||
- Set `DOCUMENT_ID` (the identifier embedded in the Berlin status URL)
|
- Set `DOCUMENT_ID` (the identifier embedded in the Berlin status URL)
|
||||||
- Set `WEBHOOK_URL` pointing to the service that should receive status payloads
|
- Set `WEBHOOK_URL` pointing to the service that should receive status payloads
|
||||||
|
|||||||
14
requirements-ocr.txt
Normal file
14
requirements-ocr.txt
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
# Optional OCR dependencies
|
||||||
|
|
||||||
|
# traditional OCR
|
||||||
|
pillow
|
||||||
|
pytesseract
|
||||||
|
opencv-python
|
||||||
|
tqdm
|
||||||
|
streamlit
|
||||||
|
python-Levenshtein
|
||||||
|
|
||||||
|
# ocr with keras/tensorflow
|
||||||
|
tensorflow
|
||||||
|
keras
|
||||||
|
matplotlib
|
||||||
@@ -5,15 +5,5 @@ beautifulsoup4
|
|||||||
# audio processing
|
# audio processing
|
||||||
openai-whisper
|
openai-whisper
|
||||||
|
|
||||||
# traditional OCR
|
# for sending results to webhook
|
||||||
pillow
|
requests
|
||||||
pytesseract
|
|
||||||
opencv-python
|
|
||||||
tqdm
|
|
||||||
streamlit
|
|
||||||
python-Levenshtein
|
|
||||||
|
|
||||||
# ocr with keras/tensorflow
|
|
||||||
tensorflow
|
|
||||||
keras
|
|
||||||
matplotlib
|
|
||||||
|
|||||||
Reference in New Issue
Block a user