The manual process that eats someone's Friday, the data trapped in three systems that won't talk, the report assembled by hand every month — Levelbrook builds the Python automation and ETL that makes it disappear. Billed corp-to-corp as a scoped project or ongoing staff augmentation.
A visual history pipeline for heavy-equipment rental: 20,000+ photos across 40 machines, embedded with DINOv2, reduced with UMAP, clustered with HDBSCAN, and auto-labelled zero-shot with CLIP — then served through a React viewer. A working, deployed Python ML system, not a notebook.
# 20k+ rental photos -> a browsable visual
# history, clustered by machine & viewpoint.
import torch, umap
from sklearn.cluster import HDBSCAN
feats = dinov2.encode(photos) # ViT-B/14
coords = umap.UMAP(n_neighbors=15).fit_transform(feats)
labels = HDBSCAN(min_cluster_size=12).fit_predict(coords)
# zero-shot names for each cluster via CLIP
names = clip.zero_shot(centroids(coords, labels),
prompts=EQUIPMENT_VIEWS)Some of the highest-leverage Python work has no glamour at all: it's the script that turns a half-day of copy-paste into a scheduled job, the pipeline that normalizes messy partner data, the scraper that keeps an internal dataset fresh. Levelbrook builds that kind of automation and data engineering — defensively, so it fails loudly and recovers cleanly instead of silently producing wrong numbers.
This is squarely in the wheelhouse: the Rails side of the practice has built import pipelines on AWS queueing that normalize enterprise-partner data at large daily volume, and the deployed equipment-cluster project is itself a multi-stage Python data pipeline over 20,000+ images.
Automation that silently breaks is worse than no automation. Every pipeline ships with validation, clear failure modes, logging, and re-run safety — and documentation so your team can operate it. Billed corp-to-corp through Levelbrook LLC, scoped as a project or run hourly.
Move and reshape data between systems with validation, idempotent re-runs, and clear failure modes — using pandas, Polars, or plain Python where it fits.
Resilient scrapers and ingestion jobs that survive the messy real world and keep an internal dataset current.
The monthly spreadsheet or PDF, generated on a schedule with the same numbers every time.
Small CLIs, scripts, and services that take a recurring manual step off a team's plate for good.
A scoped build: understand the manual process, build the Python pipeline or script with validation and logging, schedule it, and hand it off documented. Often a few days to a couple of weeks.
Yes — resilient, respectful scraping and ingestion, built to handle layout changes and failures gracefully. We'll flag anything that raises legal or terms-of-service questions.
pandas and Polars for transforms, requests/httpx and Playwright for ingestion, plus queues (Celery/RQ) and schedulers for anything that runs unattended.
Validation, explicit failure modes, logging, and alerting are part of every build — automation that fails loudly beats automation that quietly produces wrong output.
Corp-to-corp through Levelbrook LLC — fixed-scope for a defined automation, hourly for ongoing data work. MSA / SOW / NDA / COI ready on day one.
Describe the process or the data problem. You'll get an honest read on how we'd automate it within one business day.