fig. 03
reverse engineeringmar 08, 20265 min read

letting an llm read a 2015 codebase so i don't have to.

twenty-three mcp tools, one custom lzw codec, no dignity.

WinDev 20 is a windows-only RAD ide from PC Soft, vintage 2015. it has its own programming language (WLanguage), its own database engine (HyperFileSQL), its own binary file format (PCS\x00 magic bytes), and an audience of about four hundred italian small businesses.

i'm now responsible for one of those.

DreepCash is a point-of-sale app — forty-four windows, one hundred and seven queries, one hundred and fifty-seven database tables. the codebase is in italian: cCassa, QRY_Abilitazioni, RPT_ElencoPsp_ARTI, movicasse. the source files are not text. they are LZW-compressed binary records with chained variable-length sections that PC Soft ships with the IDE. open one in vim and you get a hex dump.

obviously the move is to read all of this from claude. obviously.

what's inside a .wdw file

every PC Soft file starts with PCS\x00. inside is an LZW stream — but PC Soft's variant: lsb-first bit packing, nine-bit initial code width growing to sixteen, token 0 is clear, token 257 is end. pcs/lzw.py is two hundred and one lines of BitReader/BitWriter matching that exact dialect.

once decompressed, the stream is a flat sequence of records. code lines are tagged 0x02. strings are bracketed by FF CE 88 F1 (start) and FE ED 98 E1 (end). section boundaries are FE 0F. ui control records start at offset -16 from a name marker, with type ids like 120 (button), 8 (combo), 0xFFFFFFFF (image), 0x82000000 (edit). all of this was reverse-engineered from hex dumps of windows i intentionally broke and re-saved.

the developer-name bug

WinDev embeds the name of whoever last touched a line into the binary, right next to the line itself. the original parser i wrote happily included those names in the code output, so claude was reading procedures like:

SI cliente.id > 0 ALORS
MARCO
  ANDREY
RENVOYER cliente.nom
DANILO
FIN

so _KNOWN_DEVS is now a hardcoded set: {MARCO, DANILO, ANDREY, YAKU, STEFANO, VALERIO, LUCA, MAURO, ALFIO, RENZO, GIANNI}. it also contains 'WHEN EXCEPTION' and 'END', because both of those happened to pass the "uppercase ascii, two to twenty chars" heuristic for developer names. they are actual WLanguage keywords.

the heuristic is bad. the fix is uglier than the heuristic.

twenty-three tools, one mcp server

the agent surface is a FastMCP("windev-pcs") server in pcs/mcp_server.py. twenty-three tools, all wrapped with a _tool decorator that catches exceptions because mcp clients hate panics:

  • recap — dumps an entire project (windows, schema, queries, entry points) as one markdown brief
  • code / get_procedure / list_procedures — read WLanguage source from any binary file
  • design / list_controls / get_control — inspect ui layouts
  • move_control — move a button. the only write operation. more on this in a second.
  • search_code / find_usages — grep across forty-four windows
  • project_graph — outputs graphviz dot for the dependency tree
  • diff — diff two versions of a .wdw

a two-tier cache (pcs/api.py) keeps reads cheap: in-memory LRU plus on-disk pickle keyed by (path, mtime), dropped in %LOCALAPPDATA%/wd-agent/. the LLM hits the same window thirty times in a single conversation. it's important.

why i can only move buttons

you cannot inject WLanguage code through this tool. i can't fix it.

each line of code lives in a variable-sized record inside a chained LZW stream. the record's length is a header field. the next record's offset is computed from this one's length. changing a line's byte length cascades through the entire file and invalidates every downstream record. PC Soft made this choice in approximately 2003 and never had any reason to revisit it.

so claude reads the code, generates new WLanguage, and i copy-paste it into the IDE like a man feeding cards into a 2015 turing machine. moving a control is fine — fixed-size write, four signed int32s for (x, y, w, h).

(the original parser used <I instead of <i and printed coordinates as 4294967292 instead of -4. one character fix. there are a lot of one-character fixes.)

the part where it stole my job

SecondaryDisplay_Implementation.md is a forty-two-kilobyte document in this repo. claude wrote it. it is the complete WLanguage implementation of a customer-facing display for the cash register — multi-monitor support, registry-backed config, the works. it includes the discovery that SysNbScreen() does not exist in WinDev 20, so you have to call API("user32","GetSystemMetrics",80) directly.

i did not write a line of WLanguage for that feature. i opened claude code, typed "add a customer display to dreepcash," and waited. it called recap. it called code against the existing display logic. it called search_code for MoveTo. it produced something that compiled.

this is, technically, the future of legacy software maintenance. it is also kind of bleak. the customer display works.

what i learned

  • a binary format from 2003 cannot be injected into, only moved around
  • if your "is this a developer name" heuristic matches 'END', your heuristic is bad
  • LSB-first LZW with 9→16 bit growth is not actually that hard. it is just annoying.
  • the moment you give an LLM a recap tool, it stops asking what to read

the source isn't public yet. ask if you need it.