An LLM feature that works in the demo is easy; one that's reliable in production is the actual job. Levelbrook builds RAG pipelines, agents, embeddings and vector search — plus the evals, retries, and guardrails that make them dependable. The deployed equipment-cluster vision-ML pipeline above is proof of the applied-ML side.
A visual history pipeline for heavy-equipment rental: 20,000+ photos across 40 machines, embedded with DINOv2, reduced with UMAP, clustered with HDBSCAN, and auto-labelled zero-shot with CLIP — then served through a React viewer. A working, deployed Python ML system, not a notebook.
# 20k+ rental photos -> a browsable visual
# history, clustered by machine & viewpoint.
import torch, umap
from sklearn.cluster import HDBSCAN
feats = dinov2.encode(photos) # ViT-B/14
coords = umap.UMAP(n_neighbors=15).fit_transform(feats)
labels = HDBSCAN(min_cluster_size=12).fit_predict(coords)
# zero-shot names for each cluster via CLIP
names = clip.zero_shot(centroids(coords, labels),
prompts=EQUIPMENT_VIEWS)The gap in most AI and LLM work isn't the model — it's everything around it: chunking and retrieval that actually returns the right context, evaluation so you know when a change made things worse, retries and fallbacks for when an API hiccups, and guardrails so the feature behaves. Levelbrook builds Python AI/LLM tooling with that reliability plumbing as a first-class concern, not an afterthought.
The applied-ML credibility is deployed and public: equipment-cluster (above) embeds 20,000+ images with DINOv2, reduces with UMAP, clusters with HDBSCAN, and applies CLIP zero-shot labels — a real vision-ML pipeline wired into a usable viewer. The same engineering discipline carries into LLM and RAG work.
We'll tell you when an LLM is the wrong tool, when a simpler classifier or a SQL query beats a model, and when a feature isn't ready to ship. Billed corp-to-corp through Levelbrook LLC, scoped as a project or run as ongoing staff augmentation.
Ingestion, chunking, embedding, hybrid retrieval, and re-ranking — the difference between a RAG demo and one that answers correctly.
Multi-step agent workflows with tool calling, scoped and guarded so behavior stays predictable in production.
Embedding pipelines and vector search (pgvector, FAISS, hosted) for semantic search, similarity, and clustering.
Evaluation harnesses and tracing, plus applied vision-ML — wiring CLIP/DINOv2 into real pipelines like equipment-cluster.
The equipment-cluster project above is deployed and public — a DINOv2 → UMAP → HDBSCAN → CLIP vision-ML pipeline over 20,000+ images, served through a React viewer. It's a working system, not a notebook.
Provider-agnostic — Claude, OpenAI, and open models via PyTorch / Hugging Face. Embeddings and vision via open_clip, DINOv2, and the usual ecosystem. We pick based on the task and your constraints.
Yes — ingestion, chunking, embedding, retrieval, re-ranking, generation, and the evals to know it's actually working. Vector store can be pgvector, FAISS, or a hosted option.
Yes. Part of the value is honesty about when a simpler classifier, a SQL query, or no model at all is the better answer. We won't sell you an LLM you don't need.
Corp-to-corp through Levelbrook LLC — fixed-scope for a defined feature, hourly for ongoing AI work. MSA / SOW / NDA / COI ready on day one.
Describe what you want the model to do. You'll get an honest read — including whether AI is the right tool — within one business day.