fig. 01
field notesmar 30, 20264 min read

i wanted to flirt. i built an agentic crm.

twenty-three thousand whatsapp messages, a 27b uncensored qwen, and a dashboard at localhost:8000 that watches a robot version of me have feelings.

i wanted to build an ai that would flirt for me on instagram. that did not work. what i ended up with is approximately thirty thousand lines of python that runs my entire dm inbox, posts unhinged stories twice a day, and decides at three in the morning whether the conversation it's in is going anywhere or if it should fire [NO_REPLY] and let the thread die.

it's called marcogpt. it is, technically, not a fine-tune.

the model

the original plan was to fine-tune. low-rank adapters, a few thousand whatsapp pairs, the usual. i had a kaggle pipeline. i abandoned it after about a week — fine-tuning at 27b is annoying, the gpu time is annoying, and rag turned out to be sufficient for the task. the project's CHAT_MODEL is now this string, in production, in src/config.py:

CHAT_MODEL = "qwen3.5-27b-uncensored-hauhaucs-aggressive"

i did not pick the name. the name picked me. the model runs locally through lm studio because none of the cloud apis would touch the system prompt i wanted to give it. the system prompt has a <security> section that is, blunt, about how to respond if anyone asks if it's an ai.

what it actually does

python main.py serve starts five things at once:

  • a whatsapp parser (src/parser/whatsapp_parser.py) that ingests .txt exports from real chats, separating text, media, stickers, audio, system messages.
  • a fact scanner that regex-greps for italian identity markers (mi piace, odio, sono, lavoro) and batches the candidates through the llm to consolidate db/identity_profile.txt.
  • a style analyzer that computes message-length distributions, emoji density, catchphrases, and writes few-shot examples to db/style_profile.json.
  • a memory extractor that chunks every conversation, asks qwen for structured json memories tagged by category (identity, relationship, interest, dislike, event, opinion), and stuffs them into chromadb. retrieval threshold is 0.4.
  • an orchestrator (src/orchestrator/engine.py) that polls instagram dms every thirty seconds. one twenty seconds at night, between nine pm and nine am, because i don't want to be online at four am.

the orchestrator loop

each thread gets scored by two independent systems. the first is interest_index.py, which is seven weighted metrics: how recent the messages are, who initiates, response cadence, message-length ratios, emoji symmetry, the works. the second is live_engagement.py, which decides if the conversation is currently alive.

then strategy.py picks one of: respond, wait, send a reel, stop entirely. the response is generated through the rag pipeline (style profile + identity + per-contact memories + few-shot examples + the recent thread) and shipped through instagrapi. ratelimit: 45 outgoing dms per hour. proactive cold opens are capped at twelve per day.

incoming reels get downloaded with yt-dlp, keyframes pulled with ffmpeg, audio transcribed with faster-whisper, and the keyframes shown to the same qwen vision model. this entire pipeline runs because i wanted to be able to react to a reel without having to actually watch it. the architecture is, on reflection, more effort than just watching the reel.

the stories module

src/stories/poster.py posts text-only instagram stories. the prompt instructs the model to produce "UNHINGED hot takes" (capitalized in the source). the prompt's <tip> section contains a literal hardcoded hate list including java, php, friuli venezia giulia, whatsapp desktop, resin printers, prompt engineers, philosophy, tiktok, and politicians. i did not write that list dispassionately.

example outputs from production:

  • chiunque usa java e un maniaco sessuale
  • se programmi in php nel 2026 meriti la galera

stories render onto a background image with pillow, custom font from assets/storyfont.otf, uploaded twelve times a day max. the llm decides when. i do not.

the dashboard

localhost:8000 runs a fastapi backend with websocket event streaming and a react+vite frontend. seven pages: dashboard, threadlist, conversationview, profileanalyzer, socialgraph, reellist, storiespage. it's a control panel for a robot version of myself, and i can watch it have conversations live while i eat dinner.

the project's own internal docs say, verbatim:

marcogpt is what happens when you try to automate flirting and accidentally build a miniature agentic crm for one extremely specific italian human.

i did not write that line either. it was already in the readme when i went to write this article. one of us is funnier than the other and i am unsure who.

what i learned

  • you cannot fine-tune your way out of needing a good system prompt
  • you can absolutely rag your way out of needing to fine-tune
  • if your llm log shows a 150-second compute returning zero characters, that is your llm telling you to write a smaller prompt
  • the moment you give an llm [NO_REPLY] as an option, it will use it on a real person who you actually wanted to talk to

the source isn't public. obviously.