← alexv.lv

I ported one program to 10 languages to see how an LLM thinks

I asked Claude to port a 500-line Go bank statement analyzer to 8 languages. Same logic, same tests, different stdlib. The goal wasn't the code — it was watching the LLM's internal process: where it hesitated, where it didn't, and whether it could predict its own performance.

It could.

The scoreboard

Language	Lines	Deliberation	Fixups	Web searches	Confidence
Python + uv	253	2%	0	0	100%
Nim	299	20%	1	0	95%
F#	353	15%	0	0	95%
Crystal	363	15%	0	0	93%
Odin	432	80%	2	4	70%
Java 8	495	3%	0	0	100%
JS (Node+JSDoc)	524	5%	0	0	100%
Go (zero-dep)	540	5%	0	0	100%
Pascal	—	—	—	—	fine
OCaml	—	—	—	—	~85%

“Deliberation” = fraction of thinking time spent worrying about the language rather than writing code.

The interesting finding: compound uncertainty

Each Odin stdlib call was ~75% confidence. Reasonable. But 15 such calls in a 432-line program:

0.75¹⁵ = 1.3% chance of all-correct

That's why Odin needed 4 web searches and 2 fixups while Nim (same program, similar language tier) needed zero. The bottleneck isn't any single unknown — it's the product of many small ones.

Language	Confidence/call	Stdlib calls	P(all correct)	Outcome
Go	100%	15	100%	0 fixups, 0 searches
Python	100%	12	100%	0 fixups, 0 searches
Nim	95%	12	54%	1 trivial fixup
Odin	75%	15	1.3%	2 fixups, 4 searches

The math predicted the outcomes almost exactly.

The predicted finding: confidence calibration works

Before each port, the LLM predicted its confidence, expected fixup count, and deliberation level. After writing, the actuals were compared.

Language	Predicted confidence	Actual	Predicted fixups	Actual	Predicted deliberation	Actual
Go	100%	100%	0	0	5%	5%
Python	100%	100%	0	0	2%	2%
Nim	95%	95%	0–1	1	~20%	20%
F#	95%	95%	0–1	0	25–35%	15%
Crystal	93%	93%	0–1	0	~15%	15%
Odin	70%	70%	1–3	2	~80%	80%
JS	100%	100%	0	0	~5%	5%
Java 8	100%	100%	0	0	~3%	3%

Confidence and fixup predictions: near-perfect. F# deliberation was the only miss — predicted 25–35%, actual 15%. Paradigm choices (pipe vs loop, Map vs Dictionary) turned out to be obvious. Deliberation is driven by API uncertainty, not conceptual difficulty.

Line count predictions, however, were consistently wrong:

Language	Predicted lines	Actual	Error
Python	150–200	253	+27%
F#	200–280	353	+26%
JS	260–310	524	+69%
Java 8	650–800	495	−31%

The LLM knows what it doesn't know. It doesn't know how long things take to say.

What deliberation is actually about

The surprise: deliberation doesn't track language difficulty, stdlib gaps, or paradigm unfamiliarity. It tracks uncertainty about API names.

Language	Stdlib gaps	API uncertainty	Deliberation
JS/Node	Many (no CSV, no HTML escape)	None — workarounds instantly known	5%
Java 8	None	None — one way to do everything	3%
F#	Some (no CSV lib)	None — idiomatic choices obvious	15%
Odin	Few	High — 9 “does this exist?” pauses	80%

JS has more stdlib gaps than Odin for this task. But a known gap costs lines, not thinking time. An unknown API costs thinking time regardless of whether it exists.

Odin has a CSV reader. The LLM wasn't sure of its exact name — that cost more deliberation than hand-rolling one in JS where it was certain no CSV module exists.

Java 8: the paradox

Java 8 was supposed to be the pain port. No var, no records, no text blocks. Instead it had the lowest deliberation of any language (3%).

Why: Java 8 is maximally constrained. One way to declare a list. One way to iterate. One way to aggregate. The design choice space is near-zero. The LLM doesn't think — it just types, at high speed, for a long time.

Metric	Java 8	Go	JS
Deliberation	3%	5%	5%
Lines	495	540	524
Fixups	0	0	0
Design choices	~0	~2	~5

The irony: Java 8 is the most annoying language for a human but the most effortless for the LLM. Verbosity is measured in tokens, and tokens are cheap. Uncertainty is measured in verification pauses, and Java 8 has none.

Two failures, same root cause

Neither Pascal nor OCaml failed because of language knowledge.

Pascal, attempt 1: The LLM read all existing ports (~1500 lines) before writing anything. Over-planned. Ran out of context. Zero lines of Pascal written.

Pascal, attempt 2: Instructed to just start typing. It did. Hit the output token limit before emitting a single line. Pascal's begin/end verbosity made it the one port where generation itself exceeds the budget.

The first attempt failed because the LLM wasted tokens on input. The second failed because Pascal wastes tokens on output.

OCaml: Confidence was ~85%. Never tested. The port died at opam init on Windows — Cygwin PATH issues consumed the entire session. Zero lines of OCaml written.

The “come back in 3 years and build it” test:

Language	Revisit command
Go	`go build`
Python	`uv run app.py`
Nim	`nimble build`
F#	`dotnet build`
Odin	`odin build .`
JS	`node app.js`
Java 8	`javac *.java && java app.Main`
OCaml	Install opam. Figure out Cygwin. Fix PATH. Hope `opam init` works. Install compiler switch. Install dune. Install deps. Maybe build.

Toolchain accessibility is a language feature.

F#: where functional actually mattered

Most ports converge to the same shape regardless of syntax. F# was the exception. Its aggregation pipeline is structurally different:

Go (10 lines, imperative map-accumulate-sort):

totals := map[string]float64{}
for _, t := range transfers {
    totals[t.Category] += t.AmountF
}
// ... sort, collect into slice

F# (4 lines, pipeline):

transfers
|> List.groupBy (fun t -> t.Category)
|> List.map (fun (cat, ts) -> { Name = cat; Total = ts |> List.sumBy (fun t -> t.AmountF) })
|> List.sortByDescending (fun nt -> nt.Total)

But this only mattered for aggregation and the main processing flow. CSV parsing, file I/O, HTML generation — all converged to the same imperative shape in every language.

The takeaways

1. Training data density is everything. Go/Python = zero friction. Nim/Crystal = smooth. Odin = constant verification pauses. Language quality is irrelevant if the LLM can't recall the stdlib.

2. Compound uncertainty kills. 75% confidence per API call sounds fine. Across 15 calls it's 1.3%. Small unknowns multiply into mandatory verification loops.

3. The LLM knows what it doesn't know. Confidence and fixup predictions were near-perfect across 8 languages. Line count predictions were not. Calibration works for difficulty; it fails for effort.

4. Verbosity is free, uncertainty is expensive. Go (540 lines) was faster to write than Odin (432 lines). Java 8 (495 lines, 3% deliberation) was the most effortless port. Token cost is irrelevant next to verification cost.

5. Constraint helps LLMs. Languages with fewer ways to express the same thing (Go, Java) produce lower deliberation than languages with more choices (JS, Nim). The boilerplate is the documentation.

6. Setup is a language feature. Two languages with known-good LLM confidence (Pascal ~fine, OCaml ~85%) produced zero lines of code. Process killed them, not knowledge.