In partnership with

THE SIGNAL

System design stopped being theoretical. You can now simulate traffic spikes, kill caches mid-request, and watch cascading failures happen on screen before writing any code.

The gap between "whiteboard architecture" and "production reality" just got a tool that sits right in the middle. Build it visually, break it intentionally, fix it before deploy.

If you've ever drawn boxes and arrows on Excalidraw and wondered "but will this actually work under load?" ... now you can find out.

The news IT leaders crave

If your job touches cybersecurity, software, cloud, or IT operations, staying informed isn’t optional.

IT Brew is a free, four-times-a-week newsletter covering the trends shaping business tech—from infrastructure and strategy to the tools teams actually rely on.

Clear context. Focused coverage. Built for professionals running IT—not just talking about it.

-> Try it: paperdraw.dev

A drag-and-drop system design simulator. You assemble real backend components on a canvas, hit play, and watch traffic flow through your architecture in real time. Latency, throughput, error rates, cache-hit ratios. All visible. All live.

Then you break things on purpose.

Who it's for:

  • Devs prepping for system design interviews

  • Founders planning their first real backend

  • Engineers who want to chaos-test before production

What it actually does:

  1. Drag components onto a canvas: load balancers, caches, databases, queues, CDNs, serverless functions, LLM gateways, vector DBs, and about 20 more

  2. Wire them together and hit "Start Simulation"

  3. Watch real-time metrics flow through every node

  4. Inject failures: traffic spikes, cache crashes, component kills, latency injection

What it sucks at:

  • Still a side project, so expect rough edges

  • Can't export to actual infrastructure (it's a simulator, not IaC)

  • Pre-built templates are limited (YouTube Simplified is there, more coming)

THE CHAOS PLAYBOOK

Four scenarios worth running:

Traffic spike. Flood your entry point with 10x traffic. Watch where the bottleneck forms. Usually it's the single app server behind your load balancer. Add a second instance, rerun. See the difference.

Cache crash. Kill Redis mid-simulation. Every request slams the database directly. Latency explodes. This is the thundering herd problem, visualized.

Component failure. Drop an app server. Does the load balancer reroute? If you only had one instance, everything dies. Redundancy stops being abstract when you see the error rate hit 100%.

Latency injection. Slow down one downstream service. Watch upstream services queue up and cascade. This is why you need timeouts, circuit breakers, and message queues between services.

QUICK DECISION CHEAT SHEET

Reads are slow? Cache in front of the DB, not beside it.

Writes are crushing the DB? Sharding or a write-behind queue.

One crash kills everything? Load balancer + multiple instances.

Peak hour timeouts? Auto-scaling + rate limiting at the gateway.

Background jobs blocking API responses? Message queue + workers.

Multiple services need the same event? Pub/sub over point-to-point.

BOTTOM LINE

Start with four nodes: load balancer, app server, cache, database. Run the simulation. Then break it. Each failure teaches you something that would've taken a 2am incident to learn otherwise.

It's the fastest way to understand why backends are designed the way they are.

Until next week,
@speedy_devv

Keep Reading