booting engineer.exe — session: 2026

I build systems that
think, scale, stay up.

I'm Sushant Satyam — ex-Indian Army officer turned engineer. I build agentic AI, synthetic-data pipelines, and distributed systems designed to perform when failure is not an option.

$ currently

See the work Get in touch

sushant@satyam: ~/whoami

$ whoami

engineer · ai tinkerer · systems thinker

$ cat /etc/interests

LLM agents & tool-use architectures
Synthetic data, privacy, eval frameworks
Algo trading (Indian markets, Zerodha)
Full-stack product engineering
Boring infra that makes the fun stuff possible

$ ls ~/now

synthforge/ codewithcolonel/ algo-trading/ writing/

$ echo $MISSION

"ship small, honest things that compound."

uptime: 10y+

// whoami

Built in battlefields. Refined in data.

Ex-Indian Army officer turned engineer. Years leading men, systems, and missions in some of the most demanding environments imaginable — artillery operations, counter-insurgency deployments — where decisions carried real consequences and pressure was a constant companion. Today that same precision fuels my work in Data Engineering, AI, and distributed systems.

I build scalable data platforms, intelligent automation, AI-driven architectures, and high-performance cloud systems — using SQL, Python, PySpark, AWS, Databricks, Kafka, Snowflake, and modern ML frameworks. My work revolves around transforming fragmented data into actionable intelligence, and designing systems that think, learn, and scale.

Beyond engineering, I'm deeply invested in Generative AI, Agentic AI, synthetic data generation, and building technology that augments human decision-making in real-world environments — from SynthForge (LLM-native synthetic tabular data) to CodeWithColonel (a public ML + AI agent curriculum, built in the open).

What the Army taught me still defines me: discipline under chaos. Ownership without excuses. Calmness under pressure. Mission-first thinking. And the ability to lead when the situation is uncertain.

> I don't just build pipelines and platforms. I build systems designed to perform when failure is not an option.

Agentic AI, built to operate

LLM agents, RAG, GraphRAG, tool-use, memory, evals. From SynthForge's LLM-augmented synthesis pipelines to agent architectures that route, plan, and recover — I build AI that performs under load, not just on a demo reel.

Data as the mission

Scalable platforms on SQL, Python, PySpark, AWS, Databricks, Kafka, Snowflake. Synthetic generation, privacy intelligence, semantic inference. If the data is wrong, the product is wrong — I spend the weeks upstream so everything downstream tells the truth.

Systems that don't blink

Distributed systems, cloud architecture, full-stack delivery — Python/Next.js/NestJS/Prisma. Built like the Army taught me: discipline under chaos, ownership without excuses, designed to perform when failure is not an option.

// selected work

Projects I've shipped, broken, and learned from.

A mix of AI/ML systems, trading infra, logistics, and full-stack product work. Open source projects link out to GitHub; private ones are sketched briefly.

// open source

4 repos · linked

2026·public featured

SynthForge

LLM-native synthetic tabular data at scale

GitHub

Generates high-fidelity synthetic datasets from tiny production samples. Combines Gaussian Copula, CTGAN, TVAE, TabSyn, and Diffusion with LLM-powered semantic inference, PII/MNPI detection, and business-rule extraction. Scales from 1K to 10M+ rows with a built-in eval framework for fidelity, utility, and privacy.

PythonCTGANDiffusionClaudeLiteLLMPresidioPandas

2026·public featured

CodeWithColonel

21-day ML + AI agent curriculum, built in the open

GitHub

A public, two-track learning repo: six AI/LLM projects covering prompt engineering, agent patterns, tool-routing, memory, and RAG — plus a 21-day ML curriculum from data cleaning to deep learning and recommenders. Progress over perfection.

PythonOpenAIRAGAgentsScikit-learn

2025·public

prompt_eng

Prompt engineering notebook

GitHub

A working notebook of prompt engineering patterns — structured output, role conditioning, evals, and tool-use — maintained as I iterate on LLM applications.

Prompt EngineeringLLMs

2026·public

ELBRouting-6899

Load-balancer routing exercise

GitHub

A TypeScript exercise modeling elastic load-balancer routing semantics. A small, focused repo exploring the edge cases of path-based routing.

TypeScript

// private / work

10 repos · brief only

private·2026

SmartTender

AI-assisted tender discovery and bid prep

Ingests public tender corpora, extracts requirements, and drafts bid outlines with citation-grounded rationale. Built for procurement teams buried in PDFs.

PythonLLMsRAGOCR

private·2025

SuperLogistica

Logistics operations platform

A logistics platform experiment — dispatch, tracking, and ops surfaces for last-mile workflows.

TypeScriptNext.js

private·2025

OptiFlowAI

Smarter logistics, seamless supply chains

AI-driven optimization for logistics routing and supply-chain flows. Focused on where planners actually spend their time.

AIOptimizationLogistics

private·2025

DAN_AGRO

Agri-tech: farm data and decision support

Agri-tech product work combining farm data capture with decision-support surfaces for growers and ops teams.

JavaScriptAgri-tech

private·2026

BookBridge

Reading / books product (web)

A web product exploring a books-centric UX. Built on the feature/web branch with a TypeScript stack.

TypeScriptWeb

private·2026

pat-task

Take-home engineering task

A compact Python solution to a take-home engineering challenge — kept private but shippable.

Python

private·2026

Report Analytics Engine

Report analytics and insight engine

An engine for parsing, analyzing, and surfacing insights from report corpora — the plumbing behind a reporting product.

PythonAnalytics

private·2023

DrishtiV3

Drishti — third iteration

Third-generation iteration of a long-running personal product line focused on vision / insight tooling.

CSSWeb

private·2023

drishtiSQL

Query layer for Drishti

SQL layer and query patterns supporting the Drishti product experiments.

SQL

private·2023

trim-adv

Early Python experiments

An older Python sandbox kept for reference — part of how I got here.

Python

// private repos: details intentionally light. happy to walk through any of these over a call.

// lab

Live experiments & interactive artifacts.

Small interactive things I've built or sketched — usually as a Claude artifact. Click through to play with them live.

demo·2026

TheBookLane — One Pager

open

A standalone one-pager demo for TheBookLane, a web product focused on sharing books and growing community.

UX/UIFrontendDemo

$ open /artifact/TheBookLane One-Pager _Standalone_.html

tool·2026

AI Research Reading Tracker

open

A personal tracker for 26 foundational AI/ML papers + Google's 5-Day AI Agents Intensive whitepapers — all in one place with checkboxes, topic filters, and direct links to every paper. From Attention Is All You Need → Chinchilla → LLaMA → RoPE → FlashAttention → RAG → InstructGPT → DPO → ReAct → DeepSeek-R1 → Scaling Monosemanticity, plus Google's agent series (Intro to Agents, MCP & Tool Use, Context Engineering, Agent Quality, Prototype to Production). Free, no login.

TransformersScaling LawsAlignmentMoEAgentsMCPRAGReading List

$ open /artifact/ai-papers-tracker.html

visualization·2026

Agent Memory & RAG — A 9-Part Deep Dive

open

Soup-to-nuts walkthrough of how agent memory actually works under the hood. Part 1: why agents need memory & the RAG problem statement. Part 2: embedding models (text → vectors, InfoNCE training). Part 3: backpropagation — how embedding weights get learned. Part 4: vector RAG (semantic retrieval). Part 5: Graph RAG & entity extraction. Part 6: the Leiden algorithm for graph communities. Part 7: Microsoft GraphRAG, formalised. Part 8: HNSW — how vector DBs search fast. Part 9: a grand unified architecture connecting every algorithm into one pipeline, with a complete formula reference.

RAGGraphRAGEmbeddingsVector SearchHNSWLeidenKnowledge GraphsBackpropAgent Memory

$ open /artifact/agentic-memory.html

visualization·2026

Sinusoidal Positional Encoding — Geometry & Intuition

open

A complete, self-contained dark editorial explainer (Instrument Serif + DM Mono) that builds positional encoding from first principles across 6 sections: metric goals, single-wave ambiguity, unit-circle intuition, sin-vs-sin+cos comparison, full formula breakdown, and an interactive pair explorer with live computed values and wave-speed shifts for i=0→255.

TransformersPositional EncodingSine/CosineGeometryInteractiveMath Visualisation

$ open /artifact/sine-cosine-explainer.html

visualization·2026

The Transformer Architecture — From Raw Words to Output Tokens

open

A complete first-principles walkthrough of the full Transformer pipeline with a dark editorial aesthetic: encoder/decoder big picture, embedding geometry, positional encoding, interactive attention visualiser, Add & Norm derivation, FFN intuition, decoder generation stepper, output softmax, and training dynamics with live loss curves.

TransformersAttentionEncoder-DecoderPositional EncodingDeep LearningInteractiveEducation

$ open /artifact/transformer-architecture.html

interactive·2026

XGBoost Text Comparison

open

An interactive browser demo that compares two text inputs using similarity features (Jaccard, fuzzy ratio, n-gram overlap, length, common words) and explains how an XGBoost-style classifier would combine them into a final match score and verdict.

XGBoostNLPText SimilarityFeature EngineeringInteractiveClassification

$ open /artifact/xgboost-text-comparison.html

visualization·2026

LiteLLM PyPI Supply Chain Attack — Incident Report

open

A forensic-style interactive report mapping the full attack chain from compromised GitHub Actions to malicious LiteLLM PyPI releases, suppression botnet behavior, indicators of compromise, and an actionable response checklist for engineering teams.

Supply Chain SecurityIncident ResponsePyPICI/CDThreat IntelligenceLiteLLM

$ open /artifact/litellm-supply-chain-incident-report.html

visualization·2026

Data Mesh — The Architecture of Ownership

open

A visual deep dive into Data Mesh: why centralized data platforms bottleneck at scale, the four core principles (domain ownership, data as a product, self-serve platform, federated governance), and when this paradigm is the right fit.

Data MeshData ArchitectureDomain OwnershipData ProductsFederated GovernancePlatform Engineering

$ open /artifact/data-mesh-architecture.html

visualization·2026

SynthForge — Synthetic Data Generation with LLM-Augmented Pipelines

open

A deep interactive walkthrough of SynthForge: six synthesis backends, LLM-augmented schema + privacy intelligence, and a five-layer evaluation stack for generating high-fidelity synthetic tabular data from small production samples.

Synthetic DataLLMsDiffusionTabSynPrivacyEvaluationPython

$ open /artifact/synthforge-synthetic-data.html

visualization·2026

Agentic SDLC — The New Operating Model

open

A strategic visual explainer of how software delivery shifts from human handoffs to agent-orchestrated execution: intent synthesis, parallel implementation, sentinel quality gates, MCP-enabled toolchains, and phased enterprise migration.

Agentic SDLCMCPMulti-Agent SystemsDevOpsSoftware ArchitectureEnterprise AI

$ open /artifact/agentic-sdlc-operating-model.html

visualization·2026

RAG in 2025-2026 — State of the Art

open

An interactive research report on modern RAG: adoption trends, eight leading techniques (GraphRAG, Self-RAG, CRAG, Agentic RAG, and more), architecture trade-offs, key papers, enterprise adoption patterns, and persistent risk surfaces.

RAGAgentic RAGGraphRAGLLM ResearchEnterprise AIAI Safety

$ open /artifact/rag-state-of-the-art-2026.html

// activity

Shipping, measurably.

A live snapshot of the last 365 days on github.com/sushantsatyam. Reflects public commits by default — private repo contributions show up too if enabled in GitHub profile settings.

last 365d

1,371

active days

current streak

longest streak

21d

this month

best day

70 · May 11

$ git log --since=1.year --count-days

lessmore

hover a square for details

$ commits --by-month

last 12 months

Jul

103

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

224

Apr

629

May

Jun

// stack

The tools I reach for.

A working set, not a museum. I'm opinionated about a few things and pragmatic about the rest.

languages

PythonTypeScriptJavaScriptSQLJava

ai / ml

LLM agentsRAGPrompt engineeringCTGAN / TVAEDiffusionClaude / OpenAI SDKsScikit-learnPyTorch

backend

NestJSFastAPIPostgresPrismaRedisRESTWebSockets

frontend

Next.jsReactTailwindFramer Motion

infra

VercelDockerGitHub ActionsAWS

domains

Fintech / Algo TradingLogisticsAgri-techProcurementData platforms

// contact

Let's build something useful.

I'm open to interesting problems — AI systems, data platforms, fintech, or anything where the domain is messy and the bar is high.

$ cat ./handshake.sh

sushant.satyam@gmail.com

connect →

github

github.com/sushantsatyam

connect →

linkedin.com/in/sushant-satyam

connect →

medium

medium.com/@techdoctrinewithcolonel

connect →

web

www.techwithcolonel.com

connect →

$ echo "best way in: a short email with the problem you're solving."

I build systems thatthink, scale, stay up.

Built in battlefields. Refined in data.

Agentic AI, built to operate

Data as the mission

Systems that don't blink

Projects I've shipped, broken, and learned from.

// open source

SynthForge

CodeWithColonel

prompt_eng

ELBRouting-6899

// private / work

SmartTender

SuperLogistica

OptiFlowAI

DAN_AGRO

BookBridge

pat-task

Report Analytics Engine

DrishtiV3

drishtiSQL

trim-adv

Live experiments & interactive artifacts.

Shipping, measurably.

The tools I reach for.

Let's build something useful.

I build systems that
think, scale, stay up.