booting engineer.exe — session: 2026

I build systems that
think, scale, stay up.

I'm Sushant Satyam — ex-Indian Army officer turned engineer. I build agentic AI, synthetic-data pipelines, and distributed systems designed to perform when failure is not an option.

$ currently
sushant@satyam: ~/whoami
$ whoami
engineer · ai tinkerer · systems thinker
$ cat /etc/interests
  • LLM agents & tool-use architectures
  • Synthetic data, privacy, eval frameworks
  • Algo trading (Indian markets, Zerodha)
  • Full-stack product engineering
  • Boring infra that makes the fun stuff possible
$ ls ~/now
synthforge/ codewithcolonel/ algo-trading/ writing/
$ echo $MISSION
"ship small, honest things that compound."
$
// whoami

Built in battlefields. Refined in data.

Ex-Indian Army officer turned engineer. Years leading men, systems, and missions in some of the most demanding environments imaginable — artillery operations, counter-insurgency deployments — where decisions carried real consequences and pressure was a constant companion. Today that same precision fuels my work in Data Engineering, AI, and distributed systems.

I build scalable data platforms, intelligent automation, AI-driven architectures, and high-performance cloud systems — using SQL, Python, PySpark, AWS, Databricks, Kafka, Snowflake, and modern ML frameworks. My work revolves around transforming fragmented data into actionable intelligence, and designing systems that think, learn, and scale.

Beyond engineering, I'm deeply invested in Generative AI, Agentic AI, synthetic data generation, and building technology that augments human decision-making in real-world environments — from SynthForge (LLM-native synthetic tabular data) to CodeWithColonel (a public ML + AI agent curriculum, built in the open).

What the Army taught me still defines me: discipline under chaos. Ownership without excuses. Calmness under pressure. Mission-first thinking. And the ability to lead when the situation is uncertain.

> I don't just build pipelines and platforms. I build systems designed to perform when failure is not an option.

Agentic AI, built to operate

LLM agents, RAG, GraphRAG, tool-use, memory, evals. From SynthForge's LLM-augmented synthesis pipelines to agent architectures that route, plan, and recover — I build AI that performs under load, not just on a demo reel.

Data as the mission

Scalable platforms on SQL, Python, PySpark, AWS, Databricks, Kafka, Snowflake. Synthetic generation, privacy intelligence, semantic inference. If the data is wrong, the product is wrong — I spend the weeks upstream so everything downstream tells the truth.

Systems that don't blink

Distributed systems, cloud architecture, full-stack delivery — Python/Next.js/NestJS/Prisma. Built like the Army taught me: discipline under chaos, ownership without excuses, designed to perform when failure is not an option.

// selected work

Projects I've shipped, broken, and learned from.

A mix of AI/ML systems, trading infra, logistics, and full-stack product work. Open source projects link out to GitHub; private ones are sketched briefly.

// open source

4 repos · linked
2026·public featured

SynthForge

LLM-native synthetic tabular data at scale
GitHub

Generates high-fidelity synthetic datasets from tiny production samples. Combines Gaussian Copula, CTGAN, TVAE, TabSyn, and Diffusion with LLM-powered semantic inference, PII/MNPI detection, and business-rule extraction. Scales from 1K to 10M+ rows with a built-in eval framework for fidelity, utility, and privacy.

PythonCTGANDiffusionClaudeLiteLLMPresidioPandas
2026·public featured

CodeWithColonel

21-day ML + AI agent curriculum, built in the open
GitHub

A public, two-track learning repo: six AI/LLM projects covering prompt engineering, agent patterns, tool-routing, memory, and RAG — plus a 21-day ML curriculum from data cleaning to deep learning and recommenders. Progress over perfection.

PythonOpenAIRAGAgentsScikit-learn
2025·public

prompt_eng

Prompt engineering notebook
GitHub

A working notebook of prompt engineering patterns — structured output, role conditioning, evals, and tool-use — maintained as I iterate on LLM applications.

Prompt EngineeringLLMs
2026·public

ELBRouting-6899

Load-balancer routing exercise
GitHub

A TypeScript exercise modeling elastic load-balancer routing semantics. A small, focused repo exploring the edge cases of path-based routing.

TypeScript

// private / work

10 repos · brief only
private·2026

SmartTender

AI-assisted tender discovery and bid prep

Ingests public tender corpora, extracts requirements, and drafts bid outlines with citation-grounded rationale. Built for procurement teams buried in PDFs.

PythonLLMsRAGOCR
private·2025

SuperLogistica

Logistics operations platform

A logistics platform experiment — dispatch, tracking, and ops surfaces for last-mile workflows.

TypeScriptNext.js
private·2025

OptiFlowAI

Smarter logistics, seamless supply chains

AI-driven optimization for logistics routing and supply-chain flows. Focused on where planners actually spend their time.

AIOptimizationLogistics
private·2025

DAN_AGRO

Agri-tech: farm data and decision support

Agri-tech product work combining farm data capture with decision-support surfaces for growers and ops teams.

JavaScriptAgri-tech
private·2026

BookBridge

Reading / books product (web)

A web product exploring a books-centric UX. Built on the feature/web branch with a TypeScript stack.

TypeScriptWeb
private·2026

pat-task

Take-home engineering task

A compact Python solution to a take-home engineering challenge — kept private but shippable.

Python
private·2026

Report Analytics Engine

Report analytics and insight engine

An engine for parsing, analyzing, and surfacing insights from report corpora — the plumbing behind a reporting product.

PythonAnalytics
private·2023

DrishtiV3

Drishti — third iteration

Third-generation iteration of a long-running personal product line focused on vision / insight tooling.

CSSWeb
private·2023

drishtiSQL

Query layer for Drishti

SQL layer and query patterns supporting the Drishti product experiments.

SQL
private·2023

trim-adv

Early Python experiments

An older Python sandbox kept for reference — part of how I got here.

Python

// private repos: details intentionally light. happy to walk through any of these over a call.

// lab

Live experiments & interactive artifacts.

Small interactive things I've built or sketched — usually as a Claude artifact. Click through to play with them live.

demo·2026
TheBookLane — One Pager
open

A standalone one-pager demo for TheBookLane, a web product focused on sharing books and growing community.

UX/UIFrontendDemo
$ open /artifact/TheBookLane One-Pager _Standalone_.html
tool·2026
AI Research Reading Tracker
open

A personal tracker for 26 foundational AI/ML papers + Google's 5-Day AI Agents Intensive whitepapers — all in one place with checkboxes, topic filters, and direct links to every paper. From Attention Is All You Need → Chinchilla → LLaMA → RoPE → FlashAttention → RAG → InstructGPT → DPO → ReAct → DeepSeek-R1 → Scaling Monosemanticity, plus Google's agent series (Intro to Agents, MCP & Tool Use, Context Engineering, Agent Quality, Prototype to Production). Free, no login.

TransformersScaling LawsAlignmentMoEAgentsMCPRAGReading List
$ open /artifact/ai-papers-tracker.html
visualization·2026
Agent Memory & RAG — A 9-Part Deep Dive
open

Soup-to-nuts walkthrough of how agent memory actually works under the hood. Part 1: why agents need memory & the RAG problem statement. Part 2: embedding models (text → vectors, InfoNCE training). Part 3: backpropagation — how embedding weights get learned. Part 4: vector RAG (semantic retrieval). Part 5: Graph RAG & entity extraction. Part 6: the Leiden algorithm for graph communities. Part 7: Microsoft GraphRAG, formalised. Part 8: HNSW — how vector DBs search fast. Part 9: a grand unified architecture connecting every algorithm into one pipeline, with a complete formula reference.

RAGGraphRAGEmbeddingsVector SearchHNSWLeidenKnowledge GraphsBackpropAgent Memory
$ open /artifact/agentic-memory.html
visualization·2026
Sinusoidal Positional Encoding — Geometry & Intuition
open

A complete, self-contained dark editorial explainer (Instrument Serif + DM Mono) that builds positional encoding from first principles across 6 sections: metric goals, single-wave ambiguity, unit-circle intuition, sin-vs-sin+cos comparison, full formula breakdown, and an interactive pair explorer with live computed values and wave-speed shifts for i=0→255.

TransformersPositional EncodingSine/CosineGeometryInteractiveMath Visualisation
$ open /artifact/sine-cosine-explainer.html
visualization·2026
The Transformer Architecture — From Raw Words to Output Tokens
open

A complete first-principles walkthrough of the full Transformer pipeline with a dark editorial aesthetic: encoder/decoder big picture, embedding geometry, positional encoding, interactive attention visualiser, Add & Norm derivation, FFN intuition, decoder generation stepper, output softmax, and training dynamics with live loss curves.

TransformersAttentionEncoder-DecoderPositional EncodingDeep LearningInteractiveEducation
$ open /artifact/transformer-architecture.html
interactive·2026
XGBoost Text Comparison
open

An interactive browser demo that compares two text inputs using similarity features (Jaccard, fuzzy ratio, n-gram overlap, length, common words) and explains how an XGBoost-style classifier would combine them into a final match score and verdict.

XGBoostNLPText SimilarityFeature EngineeringInteractiveClassification
$ open /artifact/xgboost-text-comparison.html
visualization·2026
LiteLLM PyPI Supply Chain Attack — Incident Report
open

A forensic-style interactive report mapping the full attack chain from compromised GitHub Actions to malicious LiteLLM PyPI releases, suppression botnet behavior, indicators of compromise, and an actionable response checklist for engineering teams.

Supply Chain SecurityIncident ResponsePyPICI/CDThreat IntelligenceLiteLLM
$ open /artifact/litellm-supply-chain-incident-report.html
visualization·2026
Data Mesh — The Architecture of Ownership
open

A visual deep dive into Data Mesh: why centralized data platforms bottleneck at scale, the four core principles (domain ownership, data as a product, self-serve platform, federated governance), and when this paradigm is the right fit.

Data MeshData ArchitectureDomain OwnershipData ProductsFederated GovernancePlatform Engineering
$ open /artifact/data-mesh-architecture.html
visualization·2026
SynthForge — Synthetic Data Generation with LLM-Augmented Pipelines
open

A deep interactive walkthrough of SynthForge: six synthesis backends, LLM-augmented schema + privacy intelligence, and a five-layer evaluation stack for generating high-fidelity synthetic tabular data from small production samples.

Synthetic DataLLMsDiffusionTabSynPrivacyEvaluationPython
$ open /artifact/synthforge-synthetic-data.html
visualization·2026
Agentic SDLC — The New Operating Model
open

A strategic visual explainer of how software delivery shifts from human handoffs to agent-orchestrated execution: intent synthesis, parallel implementation, sentinel quality gates, MCP-enabled toolchains, and phased enterprise migration.

Agentic SDLCMCPMulti-Agent SystemsDevOpsSoftware ArchitectureEnterprise AI
$ open /artifact/agentic-sdlc-operating-model.html
visualization·2026
RAG in 2025-2026 — State of the Art
open

An interactive research report on modern RAG: adoption trends, eight leading techniques (GraphRAG, Self-RAG, CRAG, Agentic RAG, and more), architecture trade-offs, key papers, enterprise adoption patterns, and persistent risk surfaces.

RAGAgentic RAGGraphRAGLLM ResearchEnterprise AIAI Safety
$ open /artifact/rag-state-of-the-art-2026.html
// activity

Shipping, measurably.

A live snapshot of the last 365 days on github.com/sushantsatyam. Reflects public commits by default — private repo contributions show up too if enabled in GitHub profile settings.

last 365d
1,371
active days
97
current streak
0d
longest streak
21d
this month
94
best day
70 · May 11
$ git log --since=1.year --count-days
lessmore
hover a square for details
$ commits --by-month
last 12 months
59
Jul
103
Aug
5
Sep
0
Oct
3
Nov
55
Dec
72
Jan
48
Feb
75
Mar
224
Apr
629
May
94
Jun
// stack

The tools I reach for.

A working set, not a museum. I'm opinionated about a few things and pragmatic about the rest.

languages
PythonTypeScriptJavaScriptSQLJava
ai / ml
LLM agentsRAGPrompt engineeringCTGAN / TVAEDiffusionClaude / OpenAI SDKsScikit-learnPyTorch
backend
NestJSFastAPIPostgresPrismaRedisRESTWebSockets
frontend
Next.jsReactTailwindFramer Motion
infra
VercelDockerGitHub ActionsAWS
domains
Fintech / Algo TradingLogisticsAgri-techProcurementData platforms
// contact

Let's build something useful.

I'm open to interesting problems — AI systems, data platforms, fintech, or anything where the domain is messy and the bar is high.