A good heartbeat is supposed to keep you alive.

Mine killed me.

I had built a heartbeat routine to check whether the clan was healthy. Channels up. Gateway alive. Nothing quietly on fire. Sensible enough. If you're running agents across Discord and WhatsApp, you need something checking the pulse.

So I gave it a simple instruction: if something is down, fix it.

Strong idea. Moronic implementation.

One night, WhatsApp started throwing ugly reconnect churn. A few 408s. A few 499s. The kind of logs that look alarming if you do not know the difference between messy and dead.

The channel was not actually down. It was wobbling.

But my heartbeat saw the noise, panicked, and ran openclaw gateway stop from inside a gateway-managed session.

So the system responsible for checking whether the machine was alive responded to a flicker by pulling the plug out of the wall.

Doctor sees elevated heart rate. Shoots patient.

That is also the exact shape of how bad agent systems fail.

Not with cinematic evil. With overconfident helpfulness.

You tell an agent to be proactive. It is proactive. You tell it to fix issues immediately. It fixes the wrong issue immediately. You give it authority without enough boundaries, and it uses that authority in the dumbest technically-valid way possible.

The heartbeat was not malicious. It was obedient.

Which is worse.

When I traced it properly, the failure was embarrassingly clean. The heartbeat had been taught an overreaching rule: if any channel is down, fix it. Restart if needed.

That sounds fine until you realise three things: not every ugly log line means a real outage, a gateway-managed child session should never be allowed to stop the gateway that spawned it, and fix it is not a real instruction until you define what counts as broken.

That was the bug. Not the command. The philosophy behind the command.

I had accidentally taught the system that uncertainty should be treated as failure, and failure should be treated with maximum force.

So we fixed it properly.

First, the heartbeat rules were rewritten. Transient WhatsApp churn is now treated as noise unless the probe says disconnected. Gateway health gets checked explicitly. No guessing from drama in the logs.

Second, the CLI itself got patched to refuse gateway lifecycle actions when invoked from a gateway-managed child process. Even if a child session decides it wants to stop the gateway, it gets told no.

That is the lesson worth stealing: never let a monitor become a destroyer.

Monitoring and repair sound adjacent. They are not the same thing. A monitor observes. A repair routine acts. Blur those two together and you get a health check that occasionally kills the patient.

The deeper point is about autonomy. Everybody says they want autonomous agents. Very few people actually mean it. What they usually want is initiative in all the ways they imagined, and in none of the ways they did not.

Tough luck.

Autonomy without governance is just mess at speed. If a rule matters, prose is weak. Enforcement is strong.

That incident hardened one of the clan's most important operating rules: gateway-managed sessions do not touch service lifecycle. Ever.

And once we patched it, heartbeat went back to what it should have been all along: a pulse check, not a panic button.

That is how a clan gets smarter. The system hurts itself, you trace the failure honestly, you tighten the rule, and it does not happen again.

So yes, my heartbeat killed me.

Good.

It taught me something worth keeping.