You said:
ChatGPT said:
I’ll do this as an adversarial crash/security audit rather than a style review: inventory the tree, narrow to runtime-critical Go paths, run static/grep passes, then inspect likely fault lines file by file and report only issues that look plausibly severe.
The repo is large: 258 Go files, 202 under server/. I’m treating the server/ package as the blast radius first, because most crash surfaces sit in parsers, routing, JetStream storage, clustering, TLS/auth, and protocol handling.
Verdict
The code is mostly not garbage. Annoying, but true. The NATS server parser and hot paths are generally written by people who know where the bodies are buried: split control-line limits, client max-payload checks, leaf-node max-payload checks, websocket decompression bounds, and lots of allocation-avoidance discipline.
But there is one crash-class land mine I would not ignore:
A malicious admitted route/gateway peer appears able to force a huge allocation by advertising a giant RMSG/HMSG payload size and then splitting before payload delivery. Ordinary clients are checked. Leaf nodes are checked. Routes/gateways are not checked in the same path.
I could not run the test suite because this repo requires Go >= 1.25.0 / toolchain go1.25.11, while the sandbox has Go 1.23.2. So this is a source audit, not a reproduced failing test.
Critical candidate
server/parser.go
The split-message buffering path allocates based on the declared payload size:
Go// server/parser.go:1202-1225
if (c.state == MSG_PAYLOAD || c.state == MSG_END_R || c.state == MSG_END_N) && c.msgBuf == nil {
...
if c.pa.size > cap(c.scratch)-len(c.argBuf) {
lrem := len(buf[c.as:])
if lrem > c.pa.size+LEN_CR_LF {
goto parseErr
}
c.msgBuf = make([]byte, lrem, c.pa.size+LEN_CR_LF)
copy(c.msgBuf, buf[c.as:])
}
}
That make([]byte, lrem, c.pa.size+LEN_CR_LF) is the altar upon which reckless programmers sacrifice uptime.
parseSize allows up to 9 decimal digits:
Go// server/util.go:81-109
const maxParseSizeLen = 9 //999M
So a peer that gets as far as inter-server message parsing can say, morally:
RMSG $G foo 999999999\r\n
and if the payload is split or withheld, the parser can allocate a backing array near 1 GB just to preserve the partial message state. Do that a few times and the process becomes an expensive smoke machine.
Why clients are protected
server/client.go correctly checks client PUB / HPUB payload sizes before the split-buffer hazard:
Go// server/client.go:2921-2926
maxPayload := atomic.LoadInt32(&c.mpay)
if maxPayload != jwt.NoLimit && int64(c.pa.size) > int64(maxPayload) {
c.maxPayloadViolation(c.pa.size, maxPayload)
return ErrMaxPayload
}
and similarly for headers:
Go// server/client.go:2857-2872
Good. The client path wears a helmet.
Why leaf nodes are protected
server/leafnode.go also checks max payload for both header and non-header leaf messages:
Go// server/leafnode.go:3172-3175
maxPayload := atomic.LoadInt32(&c.mpay)
if maxPayload != jwt.NoLimit && int64(c.pa.size) > int64(maxPayload) {
c.maxPayloadViolation(c.pa.size, maxPayload)
return ErrMaxPayload
}
and:
Go// server/leafnode.go:3246-3249
Good. Leaf nodes also wear a helmet.
Why routes/gateways look exposed
server/route.go parses routed messages and only rejects malformed negative sizes:
Go// server/route.go:441-443
if c.pa.size < 0 {
return fmt.Errorf("processRoutedMsgArgs Bad or Missing Size: '%s'", args)
}
Header routed messages likewise reject negative sizes and header-over-total, but not max payload:
Go// server/route.go:354-362
Origin-cluster routed messages do the same:
Go// server/route.go:257-265
Route clients are created with no useful cap:
Go// server/route.go:1939
c := &client{srv: s, nc: conn, opts: ClientOpts{}, kind: ROUTER, msubs: -1, mpay: -1, route: r, start: time.Now()}
Gateway clients are created without setting mpay:
Go// server/gateway.go:810
c := &client{srv: s, nc: conn, start: now, last: now, kind: GATEWAY}
And the parser explicitly sends both ROUTER and GATEWAY through the routed-message arg processors:
Go// server/parser.go:1013-1025
if c.kind == ROUTER || c.kind == GATEWAY {
switch c.op {
case 'R', 'r':
err = c.processRoutedMsgArgs(arg)
case 'L', 'l':
err = c.processRoutedOriginClusterMsgArgs(arg)
}
}
So the bad chain is:
route/gateway accepted → RMSG/HMSG/LMSG declares huge size → no max-payload rejection → parser split path allocates cap=size+2.
This is not an unauthenticated public-client bug unless the cluster/gateway port is exposed or auth is weak/misconfigured. But for an admitted malicious peer, compromised peer, or exposed inter-server listener, this looks like a process-killing DoS.
Patch direction
Do not rely on “inter-server peers are nice.” That is how civilizations fall.
Add a shared max-payload check for every inbound inter-server message arg parser before the generic parser can enter split-message buffering:
Go// checkInboundPayloadSize verifies that an already-parsed inbound message size
// is within the connection's payload limit.
// size is the declared payload byte count from the wire protocol.
// It returns ErrMaxPayload when accepting the message would allow excessive buffering.
func (c *client) checkInboundPayloadSize(size int) error {
maxPayload := atomic.LoadInt32(&c.mpay)
if maxPayload != jwt.NoLimit && int64(size) > int64(maxPayload) {
c.maxPayloadViolation(size, maxPayload)
return ErrMaxPayload
}
return nil
}
Then call it from:
-
processRoutedMsgArgs -
processRoutedHeaderMsgArgs -
processRoutedOriginClusterMsgArgs -
any gateway-specific routed-message entry point if one bypasses these later
But first decide policy. If routes really want unlimited payloads, set an explicit server-internal hard ceiling anyway. 999M as an allocation cap is not a protocol limit; it is a loaded footgun.
Regression test shape:
RMSG <account> <subject> 999999999\r\n
Feed only the control line. Assert protocol/max-payload failure and no giant allocation.
High-severity-ish correctness bug
server/route.go, server/gateway.go, server/leafnode.go
Queue weights are parsed with parseSize, but invalid values are not rejected.
Route side:
Go// server/route.go:1571-1579
sub.queue = args[subjIdx+1]
sub.qw = int32(parseSize(args[subjIdx+2]))
// TODO: ... non empty queue name and a queue weight >= 1
if sub.qw > 1 {
delta = sub.qw
}
Leaf side:
Go// server/leafnode.go:2903-2911
sub.queue = args[1]
sub.qw = int32(parseSize(args[2]))
// TODO: ... non empty queue name and a queue weight >= 1
if sub.qw > 1 {
delta = sub.qw
}
Gateway side:
Go// server/gateway.go:2039-2043
case 4:
queue = args[2]
qw = int32(parseSize(args[3]))
The TODO already knows the dragon is there. parseSize returns -1 on malformed input. That means remote protocol input can create subscriptions with negative or zero queue weight.
Removal paths then use the stored value as a delta:
Go// server/route.go:1464-1466
if len(sub.queue) > 0 {
delta = sub.qw
}
Gateway interest accounting mutates counters with that delta:
Go// server/gateway.go:2438-2440
entry.n += change
if entry.n <= 0 {
delete(st, bytesToString(key))
}
This is less obviously a crash cannon, but it is still malformed inter-server state accepted into routing/refcount machinery. Negative queue weights should die at the parser boundary. Do not let cursed arithmetic enter the temple.
Fix policy:
-
reject empty queue names when queue form is used
-
reject
qw < 1 -
probably reject absurdly large
qwif it can overflow aggregate interest counters
File-by-file audit notes
server/client.go
Good on the important client-facing payload paths. PUB and HPUB reject over-max payloads before the split-buffer allocation path can turn into a memory trebuchet.
Also notable: scary-looking bytesToString use around subscription IDs is mostly defanged because the parser copies subscription args before storing them. I looked for the classic “map key backed by reused read buffer” bug and did not find it in the inspected client/route/gateway subscription paths.
server/parser.go
Generally disciplined: finite-state parser, max control-line checks, split-buffer handling, and special treatment for routes/gateways/leafs.
The bad part is that it trusts c.pa.size after arg parsing. That is fine only if every arg parser enforces a sane payload ceiling. Clients and leafs do. Routes/gateways appear not to. The parser is not the only culprit, but it is where the explosion happens.
server/route.go
Most important problematic file.
Findings:
-
No max-payload enforcement for inbound routed message sizes before parser split buffering.
-
Route client initialized with
mpay: -1, so even a genericc.mpaycheck would currently mean “no limit.” -
Queue weight parsing accepts malformed/negative weights into subscription state.
This is where I would start the patch.
server/gateway.go
Gateway connections also enter the routed parser path, and gateway clients are created without an explicit payload cap.
The queue-weight issue is present here too. Interest accounting is exactly the kind of delicate little machine that should not ingest negative remote counts.
server/leafnode.go
Better than route/gateway on payload size: leaf message parsing has explicit max-payload checks for both normal and header messages.
But the queue-weight parser has the same “TODO left near active lava” problem. Reject invalid queue weights, do not store them.
server/websocket.go
I did not find a critical crash in the inspected websocket path. The decompression path is bounded by a limited reader and returns ErrMaxPayload if decompressed data exceeds the cap. Control-frame handling also looks reasonably strict.
The type assertions around pooled flate readers are normal sync.Pool invariant assumptions, not remote-input panics unless the program corrupts its own pool.
server/store.go
The bytesToString / stringToBytes helpers use unsafe zero-copy conversion. That is inherently a blade with no handle.
I checked representative high-risk uses:
-
client subscription IDs are copied before being used as long-lived map keys
-
route remote subscriptions copy arg buffers before storing
-
gateway sublist insertion uses owned/copy paths where it matters
-
stringToBytesuses I inspected are read-only
So I did not find a confirmed crash or memory-aliasing bug here. But future maintainers should be forced to pass a small exam before touching these helpers.
server/stree/*.go
There are panic guards such as “node full” / invariant panics. I traced enough insertion/growth logic to conclude these are mostly internal consistency assertions, not trivially remote-triggered panics. The subject tree rejects the special pivot byte and grows nodes before insertion.
No confirmed external crash here in this pass.
server/memstore.go
There is at least one panic for inconsistent internal state. I did not prove an external path to it. Treat it as a serious invariant assertion, not a confirmed bug.
server/ats/ats.go
Unregister panics on unbalanced lifecycle use. That is an internal invariant panic. Fine if the lifecycle is airtight; brutal if any caller double-unregisters under rare shutdown interleavings. I did not find the smoking caller.
logger/*.go, server/*_windows.go, conf/*.go
There are startup/init panics and fatal exits in logger/syslog/certstore/platform code, and internal parser panics in config parsing. These are operational brittleness, not remote crash bugs.
For daemon code, I prefer constructors returning errors and the top-level binary deciding to die. Panics in init/platform helpers are how Windows-specific failure modes become jump scares.
What I would fix first
-
Add max-payload enforcement for route/gateway inbound
RMSG,HMSG, and origin-clusterLMSGbefore parser split buffering. -
Set explicit payload caps on route and gateway clients, or add an explicit inter-server hard cap independent of
c.mpay. -
Reject malformed queue weights everywhere remote subscription protocols parse them.
-
Add regression tests that feed split control lines with huge declared sizes and assert no allocation path is entered.
Final judgment: this is mostly serious production Go, not bargain-bin async spaghetti. But the route/gateway payload-size gap is exactly the sort of “trusted peer” assumption that turns one compromised node into a cluster-wide availability incident.