All posts | Florins Tech log
Profile picture

Written by Florin — full-stack & AI engineer.

Latest articles

Online Evaluation Guardrails: Catching LLM Drift Before Users Do
September 29, 20251 min read
Online Evaluation Guardrails: Catching LLM Drift Before Users Do

Offline evals are necessary, but not enough. Once a system is in production, behavior drifts. Online evals are how you catch that drift…

LangGraphLangSmithLangChainevalsonline evalsmonitoringproduction
Read More
Treating Prompts Like Code: A Production Prompt Lifecycle
September 23, 20252 min read
Treating Prompts Like Code: A Production Prompt Lifecycle

Prompt work feels like magic until it breaks. The day I started treating prompts like code, everything became easier to debug: versions…

LangGraphLangSmithLangChainpromptspromptingevalstesting
Read More
Building Golden Evaluation Datasets from Production Traces
September 17, 20256 min read
Building Golden Evaluation Datasets from Production Traces

I used to run evals on whatever dataset I had around. It was noisy, and I kept chasing false regressions that were just bad data. The fix…

LangGraphLangSmithLangChainAI agentsevalsdatasets
Read More

Algorithms · 3 posts

Valid Parentheses
March 15, 20251 min read
Valid Parentheses

Problem Given a string containing just the characters '(', ')', '{', '}', '[', and ']', determine if the input string is valid. Description…

algorithmsstacksparentheses
Read More
Climbing Stairs
March 08, 20252 min read
Climbing Stairs

Problem You are climbing a staircase with n steps. At each step, you can climb either 1 or 2 steps. Determine the number of distinct ways to…

algorithmsstairs
Read More
Anagram
March 01, 20252 min read
Anagram

Problem Given two strings, a and b, write a function to determine if a is an anagram of b Description This is a classic algorithm problem…

algorithmsanagram
Read More

LangChain · 6 posts

Schema-Driven Validation for Stable LLM Evaluations
September 11, 20254 min read
Schema-Driven Validation for Stable LLM Evaluations

My earliest eval runs looked fine until they didn't: a single malformed row could flip a whole report. That was on me. The fix was simple…

LangGraphLangSmithLangChainevalsdatasetsschemastesting
Read More
Why Averages Lie: Using Tags to Expose Hidden LLM Regressions
September 05, 20252 min read
Why Averages Lie: Using Tags to Expose Hidden LLM Regressions

Averages hide real problems. I learned that the hard way: a bundle recommender looked stable overall, but a single region-specific slice was…

LangGraphLangSmithLangChainevalsdatasetstagstesting
Read More
Tracing LLM Pipelines: From Bug Report to Root Cause
August 30, 20252 min read
Tracing LLM Pipelines: From Bug Report to Root Cause

The fastest way I've found to debug an LLM app isn't logs. It's a trace. A good trace shows exactly where the chain broke, which inputs…

LangGraphLangSmithLangChaintracingobservabilitydebugging
Read More

Node.js · 1 posts