Go Code Review

nats-server-main.zip

Zip Archive

is this Go code any good? please go file by file to review the NATS server for any critical bugs that could cause crashes or other bad stuff

I’ll treat this as a critical-failure review, not a style review: first map the repo, run the cheap automated checks, then inspect suspicious files and summarize file-by-file only where there is meaningful risk.

The repo is large: about 192 Go files under server, test, logger, conf, and internals, around 425k lines including tests. I can’t run the full test suite in this environment because the module requires Go 1.25.11 and the sandbox has Go 1.23.2 with no network access to download the toolchain, so I’m switching to static review and targeted inspection.

A concrete issue has turned up: config integer suffixes (k, mb, gb, etc.) are multiplied after strconv.ParseInt without overflow checks. I verified x: 9223372036854775807k parses as -1000 instead of returning an error; that can bypass later option validation for fields such as max_payload.

I reviewed the uploaded nats-server-main.zip as a crash/DoS/security-stability pass, not as a style review.

I did find a few real issues worth fixing. I did not find an obvious unauthenticated remote crash in the core NATS protocol parser in this pass.

I could not run the full suite because the module requires Go 1.25.11, while the sandbox has Go 1.23.2 and no network access to download the toolchain. I did run the packages that were feasible locally, plus static scans for panic, unchecked dangerous IO, unsafe, ignored errors, unbounded reads, and config overflow paths.

Highest-priority findings

File	Severity	Finding
`conf/parse.go`	High	Integer suffix multiplication overflows silently.
`server/opts.go`	High	Several parsed numeric options accept negative or overflowed values with insufficient validation.
`logger/log.go`	Medium/High	Log rotation can `panic` if reopening the log file fails. Also ignores `os.Rename` error.
`server/accounts.go`	Medium	URL account resolver uses unbounded `io.ReadAll` on remote HTTP response.
`server/ocsp.go` / `server/certidp/ocsp_responder.go`	Medium	OCSP HTTP responses are read with unbounded `io.ReadAll`.
`server/filestore.go` / `server/ocsp_responsecache.go`	Medium	S2 decompression uses unbounded `io.ReadAll`; corrupt or malicious compressed local state could cause memory exhaustion.
`logger/syslog_windows.go`, `server/pse/*_windows.go`, `server/sysmem/mem_windows.go`, `server/certstore/certstore_windows.go`	Medium	Windows init/setup code panics on OS API load/open failures instead of returning errors.

File-by-file findings

`conf/parse.go` — real bug: integer suffix overflow

In processItem, integer values are parsed with strconv.ParseInt, then suffixes are applied by multiplication:

Go
case "k":
	setValue(it, num*1000)
case "kb", "ki", "kib":
	setValue(it, num*1024)
...
case "eb", "ei", "eib":
	setValue(it, num*1024*1024*1024*1024*1024*1024)

There is no overflow check after multiplying.

I verified this behavior with a temporary test:


"x: 9223372036854775807k"  ->  -1000
"x: 9223372036854775807kb" ->  -1024
"x: -9223372036854775808k" ->  0

That is bad because the parser returns nil error and hands later config code a corrupted value.

Fix: use checked multiplication before setValue.

Go
func multiply_config_int(value int64, factor int64) (int64, error) {
	assert(factor > 0)
	if value > 0 && value > math.MaxInt64/factor {
		return 0, fmt.Errorf("integer value overflows int64")
	}
	if value < 0 && value < math.MinInt64/factor {
		return 0, fmt.Errorf("integer value underflows int64")
	}
	return value * factor, nil
}

Then each suffix case should call that helper and return a parse error on overflow.

`server/opts.go` — real validation gap

This file compounds the parser overflow issue. Examples:

Go
case "max_control_line":
	if v.(int64) > 1<<31-1 {
		...
	}
	o.MaxControlLine = int32(v.(int64))

case "max_payload":
	if v.(int64) > 1<<31-1 {
		...
	}
	o.MaxPayload = int32(v.(int64))

case "max_pending":
	o.MaxPending = v.(int64)

The code checks upper bounds in some places, but not lower bounds. So after conf/parse.go overflows to -1000, max_payload can become negative and still pass this block. validateOptions only checks:

Go
if int64(o.MaxPayload) > o.MaxPending {
	...
}

A negative MaxPayload does not trip that check. The server can start with a nonsense max payload and then reject essentially all payloads.

Also:

Go
return time.Duration(v.(int64)) * time.Second

in parseDuration can overflow time.Duration silently for huge numeric config values.

Fixes:

For fields where negative is not explicitly meaningful, reject <= 0 or < 0 before casting:

Go
func require_int32_range(field string, value int64, min int64, max int64) (int32, error) {
	assert(min <= max)
	if value < min || value > max {
		return 0, fmt.Errorf("%s must be in range [%d, %d], got %d", field, min, max, value)
	}
	return int32(value), nil
}

For numeric durations:

Go
func seconds_to_duration(field string, seconds int64) (time.Duration, error) {
	const max_seconds = int64(math.MaxInt64 / int64(time.Second))
	const min_seconds = int64(math.MinInt64 / int64(time.Second))
	if seconds < min_seconds || seconds > max_seconds {
		return 0, fmt.Errorf("%s overflows time.Duration", field)
	}
	return time.Duration(seconds) * time.Second, nil
}

`logger/log.go` — server can crash during log rotation

In fileLogger.Write:

Go
os.Rename(fname, bak)
fileflags := os.O_WRONLY | os.O_APPEND | os.O_CREATE
f, err := os.OpenFile(fname, fileflags, defaultLogPerms)
if err != nil {
	l.Unlock()
	panic(fmt.Sprintf("Unable to re-open the logfile %q after rotation: %v", fname, err))
}

Two problems:

os.Rename(fname, bak) error is ignored.
os.OpenFile failure causes a panic.

This can crash the server because of an operational condition: directory permissions changed, filesystem issue, disk problem, path disappeared, etc.

For infrastructure software, logging failure should not normally panic the process.

Fix: handle rename and reopen errors by disabling rotation, falling back to stderr, or returning the write error. Do not panic.

`server/accounts.go` — unbounded HTTP body read

URLAccResolver.Fetch does:

Go
body, err := io.ReadAll(resp.Body)

There is a timeout, but no response-size cap. A compromised or malicious account resolver endpoint can return a huge response and drive memory growth.

Fix: cap the body to the maximum plausible JWT/account claim size.

Go
const max_account_jwt_response = 1 << 20 // tune as needed

body, err := io.ReadAll(io.LimitReader(resp.Body, max_account_jwt_response+1))
if err != nil {
	return _EMPTY_, err
}
if len(body) > max_account_jwt_response {
	return _EMPTY_, fmt.Errorf("account resolver response too large")
}

`server/ocsp.go` and `server/certidp/ocsp_responder.go` — unbounded OCSP response reads

Both GET/POST OCSP flows use io.ReadAll(resp.Body) without a cap.

OCSP responses should be small. A malicious responder, captive proxy, or misconfigured override URL can return a very large body and cause memory pressure.

Fix: use a small limit, probably tens or hundreds of KiB.

`server/filestore.go` — unbounded decompression

StoreCompression.Decompress does:

Go
output, err := io.ReadAll(reader)

The compressed input comes from local store data. If the file is corrupt or attacker-controlled through some disk/snapshot path, this can decompress without an output bound.

Fix: carry expected uncompressed size in metadata, or limit decompression to the maximum block size plus checksum overhead.

`server/ocsp_responsecache.go` — unbounded decompression

Same pattern as filestore.go:

Go
output, err := io.ReadAll(reader)

For OCSP cache data, the expected decompressed size should be very small. Add an explicit cap.

`logger/syslog_windows.go` — panic on syslog/event-log setup

The Windows syslog/event logger panics on install/open failures:

Go
panic(fmt.Sprintf("could not access event log: %v", err))
panic(fmt.Sprintf("could not open event log: %v", err))

This is an operational failure path, not a programming invariant. It should be returned as an error and handled by startup/config code.

The Unix logger/syslog.go uses log.Fatalf for syslog connection errors, which also exits. That is arguably startup-fatal, but it is still harsh for logging fallback behavior.

`server/pse/pse_windows.go`, `server/sysmem/mem_windows.go`, `server/certstore/certstore_windows.go`

These Windows-specific files panic during init if DLL/procedure loading fails.

That means importing the package on a broken/minimal Windows environment can crash before the server has a chance to report a normal config/startup error.

This may be acceptable for “this platform API must exist” assumptions, but it is still crash-prone. Prefer lazy initialization with returned errors.

`server/memstore.go`

There is one deliberate invariant panic:

Go
panic("jetstream memstore has inconsistent state, can't find first seq msg")

I did not prove a path to trigger it externally. Treat this as a defensive invariant. It is fine if the invariant is actually impossible; otherwise it converts store inconsistency into whole-process death.

For server software, returning a store corruption error is usually better than panic unless the process state is unrecoverable.

`server/ats/ats.go`

Unregister panics on unbalanced unregister:

Go
panic("unbalanced unregister for access time state")

I checked the obvious memstore/filestore registration paths; they appear guarded against double-stop/double-unregister. No concrete bug found here.

`server/stree/*.go`

The subject tree has several defensive panics (node4 full, node16 full, leaf methods that should never be called, etc.). I patched only the Go 1.25-only strings.SplitSeq calls locally to run the tests under Go 1.23; server/gsl and server/stree tests passed.

I do not see a concrete crash path here from the static pass.

`server/gateway.go`

The two panics are from json.Marshal on internal structs:

Go
b, err := json.Marshal(g.info)
if err != nil {
	panic(err)
}

and similar for gateway connect info. For the current structs, marshal failure is effectively impossible unless unsupported fields are added later. I would still return/log an error instead of panicking, but this is not a practical bug as-is.

`server/websocket.go`

I specifically checked compressed WebSocket handling. It has both compressed-input and decompressed-output limits:

Go
limit := wsMaxMessageSize(mpay)
...
lr := io.LimitedReader{R: d, N: int64(mpay + 1)}

This looks good against compressed-message memory bombs.

`server/parser.go` / `server/client.go`

The split-message buffering path checks max control line and max payload before allocating large buffers. I did not find a concrete unauthenticated parser crash in this pass.

`server/store.go` and `server/stree/stree.go`

These use unsafe zero-copy string/byte conversions. The code comments acknowledge the lifetime constraints, and the major uses I checked either avoid mutation or copy on insert. This is a legitimate footgun, but I did not find a concrete crash from it.

Automated checks I ran

Passed:


go test -race ./conf ./server/avl ./server/ats ./internal/fastrand
go vet ./conf ./server/avl ./server/ats ./internal/fastrand

Also passed after a local compatibility patch replacing Go 1.25 strings.SplitSeq with strings.Split solely to run under Go 1.23:


go test ./server/gsl ./server/stree
go vet  ./server/gsl ./server/stree

Failed/not runnable in this sandbox:


go test ./...

Reason: repo requires Go 1.25.11; sandbox has Go 1.23.2 and no network access to download the toolchain/dependencies.

Bottom line

This codebase is generally disciplined for a large network server: the core protocol parser and WebSocket decompression paths have sane limits, and many scary-looking panics are defensive invariants.

The things I would fix first are:

Config integer suffix overflow in conf/parse.go.
Negative/range validation gaps in server/opts.go.
Remove panic from log rotation in logger/log.go.
Add hard byte limits around remote io.ReadAll calls.
Add decompression output limits for local compressed state/cache.

ChatGPT can make mistakes. OpenAI doesn't use Workspace workspace data to train its models.

You said:

ChatGPT said:

Highest-priority findings

File-by-file findings

conf/parse.go — real bug: integer suffix overflow

server/opts.go — real validation gap

logger/log.go — server can crash during log rotation

server/accounts.go — unbounded HTTP body read

server/ocsp.go and server/certidp/ocsp_responder.go — unbounded OCSP response reads

server/filestore.go — unbounded decompression

server/ocsp_responsecache.go — unbounded decompression

logger/syslog_windows.go — panic on syslog/event-log setup

server/pse/pse_windows.go, server/sysmem/mem_windows.go, server/certstore/certstore_windows.go

server/memstore.go

server/ats/ats.go

server/stree/*.go

server/gateway.go

server/websocket.go

server/parser.go / server/client.go

server/store.go and server/stree/stree.go