i wanted to flirt. i built an agentic crm.
twenty-three thousand whatsapp messages, a 27b uncensored qwen, and a dashboard at localhost:8000 that watches a robot version of me have feelings.
i wanted to build an ai that would flirt for me on instagram. that did not work. what i ended up with is approximately thirty thousand lines of python that runs my entire dm inbox, posts unhinged stories twice a day, and decides at three in the morning whether the conversation it's in is going anywhere or if it should fire [NO_REPLY] and let the thread die.
it's called marcogpt. it is, technically, not a fine-tune.
the model
the original plan was to fine-tune. low-rank adapters, a few thousand whatsapp pairs, the usual. i had a kaggle pipeline. i abandoned it after about a week — fine-tuning at 27b is annoying, the gpu time is annoying, and rag turned out to be sufficient for the task. the project's CHAT_MODEL is now this string, in production, in src/config.py:
CHAT_MODEL = "qwen3.5-27b-uncensored-hauhaucs-aggressive"i did not pick the name. the name picked me. the model runs locally through lm studio because none of the cloud apis would touch the system prompt i wanted to give it. the system prompt has a <security> section that is, blunt, about how to respond if anyone asks if it's an ai.
what it actually does
python main.py serve starts five things at once:
- a whatsapp parser (
src/parser/whatsapp_parser.py) that ingests.txtexports from real chats, separating text, media, stickers, audio, system messages. - a fact scanner that regex-greps for italian identity markers (
mi piace,odio,sono,lavoro) and batches the candidates through the llm to consolidatedb/identity_profile.txt. - a style analyzer that computes message-length distributions, emoji density, catchphrases, and writes few-shot examples to
db/style_profile.json. - a memory extractor that chunks every conversation, asks qwen for structured json memories tagged by category (identity, relationship, interest, dislike, event, opinion), and stuffs them into chromadb. retrieval threshold is
0.4. - an orchestrator (
src/orchestrator/engine.py) that polls instagram dms every thirty seconds. one twenty seconds at night, between nine pm and nine am, because i don't want to be online at four am.
the orchestrator loop
each thread gets scored by two independent systems. the first is interest_index.py, which is seven weighted metrics: how recent the messages are, who initiates, response cadence, message-length ratios, emoji symmetry, the works. the second is live_engagement.py, which decides if the conversation is currently alive.
then strategy.py picks one of: respond, wait, send a reel, stop entirely. the response is generated through the rag pipeline (style profile + identity + per-contact memories + few-shot examples + the recent thread) and shipped through instagrapi. ratelimit: 45 outgoing dms per hour. proactive cold opens are capped at twelve per day.
incoming reels get downloaded with yt-dlp, keyframes pulled with ffmpeg, audio transcribed with faster-whisper, and the keyframes shown to the same qwen vision model. this entire pipeline runs because i wanted to be able to react to a reel without having to actually watch it. the architecture is, on reflection, more effort than just watching the reel.
the stories module
src/stories/poster.py posts text-only instagram stories. the prompt instructs the model to produce "UNHINGED hot takes" (capitalized in the source). the prompt's <tip> section contains a literal hardcoded hate list including java, php, friuli venezia giulia, whatsapp desktop, resin printers, prompt engineers, philosophy, tiktok, and politicians. i did not write that list dispassionately.
example outputs from production:
- chiunque usa java e un maniaco sessuale
- se programmi in php nel 2026 meriti la galera
stories render onto a background image with pillow, custom font from assets/storyfont.otf, uploaded twelve times a day max. the llm decides when. i do not.
the dashboard
localhost:8000 runs a fastapi backend with websocket event streaming and a react+vite frontend. seven pages: dashboard, threadlist, conversationview, profileanalyzer, socialgraph, reellist, storiespage. it's a control panel for a robot version of myself, and i can watch it have conversations live while i eat dinner.
the project's own internal docs say, verbatim:
marcogpt is what happens when you try to automate flirting and accidentally build a miniature agentic crm for one extremely specific italian human.
i did not write that line either. it was already in the readme when i went to write this article. one of us is funnier than the other and i am unsure who.
what i learned
- you cannot fine-tune your way out of needing a good system prompt
- you can absolutely rag your way out of needing to fine-tune
- if your llm log shows a 150-second compute returning zero characters, that is your llm telling you to write a smaller prompt
- the moment you give an llm
[NO_REPLY]as an option, it will use it on a real person who you actually wanted to talk to
the source isn't public. obviously.