field notesmar 30, 20264 min read

i wanted to flirt. i built an agentic crm.

twenty-three thousand whatsapp messages, a 27b uncensored qwen, and a dashboard at localhost:8000 that watches a robot version of me have feelings.

i wanted to build an ai that would flirt for me on instagram. that did not work. what i ended up with is approximately thirty thousand lines of python that runs my entire dm inbox, posts unhinged stories twice a day, and decides at three in the morning whether the conversation it's in is going anywhere or if it should fire [NO_REPLY] and let the thread die.

it's called marcogpt. it is, technically, not a fine-tune.

the model

the original plan was to fine-tune. low-rank adapters, a few thousand whatsapp pairs, the usual. i had a kaggle pipeline. i abandoned it after about a week — fine-tuning at 27b is annoying, the gpu time is annoying, and rag turned out to be sufficient for the task. the project's CHAT_MODEL is now this string, in production, in src/config.py:

CHAT_MODEL = "qwen3.5-27b-uncensored-hauhaucs-aggressive"

i did not pick the name. the name picked me. the model runs locally through lm studio because none of the cloud apis would touch the system prompt i wanted to give it. the system prompt has a <security> section that is, blunt, about how to respond if anyone asks if it's an ai.

what it actually does

python main.py serve starts five things at once:

a whatsapp parser (src/parser/whatsapp_parser.py) that ingests .txt exports from real chats, separating text, media, stickers, audio, system messages.
a fact scanner that regex-greps for italian identity markers (mi piace, odio, sono, lavoro) and batches the candidates through the llm to consolidate db/identity_profile.txt.
a style analyzer that computes message-length distributions, emoji density, catchphrases, and writes few-shot examples to db/style_profile.json.
a memory extractor that chunks every conversation, asks qwen for structured json memories tagged by category (identity, relationship, interest, dislike, event, opinion), and stuffs them into chromadb. retrieval threshold is 0.4.
an orchestrator (src/orchestrator/engine.py) that polls instagram dms every thirty seconds. one twenty seconds at night, between nine pm and nine am, because i don't want to be online at four am.

the orchestrator loop

each thread gets scored by two independent systems. the first is interest_index.py, which is seven weighted metrics: how recent the messages are, who initiates, response cadence, message-length ratios, emoji symmetry, the works. the second is live_engagement.py, which decides if the conversation is currently alive.

then strategy.py picks one of: respond, wait, send a reel, stop entirely. the response is generated through the rag pipeline (style profile + identity + per-contact memories + few-shot examples + the recent thread) and shipped through instagrapi. ratelimit: 45 outgoing dms per hour. proactive cold opens are capped at twelve per day.

incoming reels get downloaded with yt-dlp, keyframes pulled with ffmpeg, audio transcribed with faster-whisper, and the keyframes shown to the same qwen vision model. this entire pipeline runs because i wanted to be able to react to a reel without having to actually watch it. the architecture is, on reflection, more effort than just watching the reel.

the stories module

src/stories/poster.py posts text-only instagram stories. the prompt instructs the model to produce "UNHINGED hot takes" (capitalized in the source). the prompt's <tip> section contains a literal hardcoded hate list including java, php, friuli venezia giulia, whatsapp desktop, resin printers, prompt engineers, philosophy, tiktok, and politicians. i did not write that list dispassionately.

example outputs from production:

chiunque usa java e un maniaco sessuale
se programmi in php nel 2026 meriti la galera

stories render onto a background image with pillow, custom font from assets/storyfont.otf, uploaded twelve times a day max. the llm decides when. i do not.

the dashboard

localhost:8000 runs a fastapi backend with websocket event streaming and a react+vite frontend. seven pages: dashboard, threadlist, conversationview, profileanalyzer, socialgraph, reellist, storiespage. it's a control panel for a robot version of myself, and i can watch it have conversations live while i eat dinner.

the project's own internal docs say, verbatim:

marcogpt is what happens when you try to automate flirting and accidentally build a miniature agentic crm for one extremely specific italian human.

i did not write that line either. it was already in the readme when i went to write this article. one of us is funnier than the other and i am unsure who.

what i learned

you cannot fine-tune your way out of needing a good system prompt
you can absolutely rag your way out of needing to fine-tune
if your llm log shows a 150-second compute returning zero characters, that is your llm telling you to write a smaller prompt
the moment you give an llm [NO_REPLY] as an option, it will use it on a real person who you actually wanted to talk to

the source isn't public. obviously.

nexti wanted beta features. i got a bot farm.

back to all experiments