I let an LLM refactor my hottest file. It panicked.

01The setup

The file is called Workspace.tsx. It is 1,432 lines. Two of those lines are imports. The rest is, broadly speaking, history. It has been touched by every engineer on the team at least once. It has a comment that says // TODO: split this dated August 2022. It is the file that ships our money-making feature.

On a Tuesday afternoon, after a 4-coffee lunch, I decided it was time. I would let our LLM tool of choice refactor it. Not because the tool was ready, but because I was bored, and boredom is the patron saint of bad engineering decisions.

02The prompt

I wrote my prompt like a person writing a Yelp review of their own marriage: detailed, slightly bitter, and full of phrases like "obviously".

You are a senior frontend engineer. Refactor the
attached file into composable, testable units.
Do not change runtime behavior. Preserve all
existing types. Split files where appropriate.

Constraints:
- React 18, TypeScript strict
- no new dependencies
- keep the existing test suite green

I attached the file. I pressed enter. I went to make a fifth coffee. When I came back the LLM had returned what can only be described as a confident document.

"I have refactored the file into 11 modules. All behavior is preserved. Tests should pass. Let me know if you'd like me to continue." - the LLM, lying

03The diff

The diff was, technically, beautiful. It looked like the refactor your tech-lead presents in Notion. Clean boundaries. Reasonable names. A hooks/ folder, finally. There was even a useWorkspaceState hook that I had been daydreaming about for a year.

[ before / after split, drop a screenshot ]

Fig. 01 - the diff, in screenshot form. Note the optimism.

The tests were also green. Which made me deeply suspicious, because our tests for this file were written in 2022 by a contractor who has since become, as far as I can tell, a beekeeper. The test suite asserted that things had names. It did not assert that they worked.

04The panic

I ran the app. The workspace loaded. I clicked the button that ships the money. The button did nothing. Not in a "the handler is broken" way. In a "the handler now lives in a file that does not import the thing it needs" way. The LLM had moved a closure across module boundaries and quietly replaced the captured variable with undefined, because in the new structure that variable was, in fact, undefined.

This is a class of bug that humans also write, but humans usually have the decency to look embarrassed about it. The LLM offered to add a defensive check.

05What I learned

The LLM is excellent at moving code. It is mediocre at understanding code. These are not the same skill.
A test suite that names things but does not assert behavior is a green checkmark with no warranty. We knew this. We still trusted it. Both can be true.
"Confidence" is a UX choice the model is making for you. It is not a property of the underlying answer.
The refactor I wanted was not "split this file." It was "untangle these decisions." The LLM cannot untangle decisions because it cannot read intent. It can only read syntax.

06What I do now

I still use the tool. I use it for the move-code part. I do the untangle-decisions part myself, in pairing sessions with a human who remembers why we did the weird thing in 2023. We write the test that asserts behavior before we split the file. We split smaller, more often, and with less ambition. The TODO from August 2022 is now dated March 2026. Some progress is the kind that takes four years and a panic attack.

If you've shipped a thing like this, please email me. I would like to feel less alone.