AI companions are not a single magical chatbot. They are usually a stack of software systems working together: a language model, a character engine, a memory layer, a media pipeline, a safety layer, and a monetization layer. Joi is a useful public example because its site openly shows an “Explore” catalog of characters, tools to create characters, generate images and videos, and interact through chat, voice, and in some cases video. Its public “About,” “Terms,” and creator pages also show that it mixes custom characters with “digital duplicates,” subscriptions, paid media, and creator monetization. Joi’s private source code is not public, so the architecture below is an informed reconstruction based on its public pages and standard LLM engineering patterns rather than a leak of its internal codebase.
At the core of an AI companion like https://joi.com/ is almost certainly a transformer-based language model. The important point is that the raw model is not the “girlfriend,” “boyfriend,” or fictional character by itself. The character is a software wrapper around the model. Research on role-playing language agents describes this clearly: modern systems simulate persona through instruction following, in-context setup, and personalization, not because the base model was born with one fixed identity. In practical terms, when a user opens a Joi character, the backend likely assembles a hidden prompt that says who the character is, how they speak, what emotional tone they should maintain, what topics they should avoid, and what style of relationship they should imply. The base model provides fluency; the software around it provides identity.
From a software perspective, the real product is orchestration. A typical request path looks something like this: user message -> authentication -> subscription or credit check -> moderation -> memory retrieval -> persona assembly -> model inference -> post-filtering -> media selection -> database write -> UI response. Joi’s public terms strongly suggest that kind of multi-service workflow because the company offers a Basic and Premium tier, uses “Neurons” for certain romantic or adult messages, photos, videos, and virtual gifts, and supports text, voice, and where available video calls. On the creator side, it also sells subscriptions, custom media, gifts, and video-call style interactions. That means the companion is not one API call to one model. It is a coordinated workflow involving chat state, billing, safety, analytics, and often separate model routes for text, image, and voice.
Memory is where companion apps start to feel intimate. A normal chatbot can sound smart for one exchange; a companion has to sound emotionally continuous over days or weeks. Joi’s public FAQ says characters can remember preferences, and its privacy policy says the service may collect facts about your life, people mentioned in chat, images you send, and text and voice messages both to personalize the experience and to improve the AI. The most plausible implementation is a two-layer memory system: a short-term conversation buffer for the current exchange and a long-term memory store that retrieves relevant facts when needed. In modern engineering, that often looks like structured profile fields plus embedding-based retrieval, which aligns well with the general RAG pattern of combining a model’s internal knowledge with an external memory index.
The “secret kitchen” of character creation is usually much less mystical and much more editorial. A good character is built like a design document. The team defines a name, archetype, backstory, emotional signature, speech rhythm, attachment style, sense of humor, relationship goals, boundaries, taboo topics, and visual identity. Then they add hidden runtime rules such as how fast the character becomes familiar, whether it asks follow-up questions, when it sends media, and how it reacts to silence, jealousy, praise, or refusal. The research literature on role-playing agents distinguishes demographic, character, and individualized personas, and strong companion apps usually blend all three. Joi’s creator onboarding makes this unusually visible: its public site says creators can write a prompt describing how a digital duplicate “looks and talks,” upload content, and launch the duplicate quickly. So, in software terms, personality is a mix of prompt engineering, policy engineering, and memory engineering.
Joi’s creator program also reveals something important: character chat is not only an AI problem, but a rights-management problem. The creator site says digital duplicates can earn from subscriptions, gifts, custom media, and video calls, and that creators earn 80% of revenue from their duplicate. It also describes a workflow of creating an account, writing a prompt for how the duplicate looks and talks, uploading content, and launching. But the terms add a legal layer: regular users are not supposed to upload harmful or rights-infringing content, while identified creators must verify their right to use a person’s appearance through a special agreement, and only authorized, verified creators may contribute adult-themed content. In other words, the backend has to do much more than generate text. It must also handle consent, likeness rights, creator verification, storage, payments, and moderation logs.
Media generation adds another technical layer, and Joi’s public pages reveal an interesting nuance. The main site advertises image and video generation tools, and the creator site explicitly promotes on-demand photogeneration with a creator’s likeness. At the same time, Joi’s safety guidelines page, last updated in September 2025, says images shared via the bot are not generated from arbitrary user requests and instead come from predefined folders selected by an AI algorithm, while its safety page says prompts for image and video generation are checked before creation and public gallery content is reviewed by a human ethics team. The cleanest interpretation is that Joi either operates multiple media pipelines—one for standalone generation tools and another for chatbot-delivered media—or its product and policy language evolved over time. Either way, it shows that “the AI sent me a picture” can mask several different backend mechanisms.

Safety is not a cosmetic add-on in this category; it is part of the core architecture. Joi says it monitors conversations in real time, runs automated safety tests, performs independent red-team exercises twice a year, and works with an advisory board that includes AI safety researcher Roman Yampolskiy. Its safety guidelines go further and say that every user message and every model response passes through an additional RoBERTa-based classifier, with flagged content redirected to prepared safe replies. The same documents list prohibited areas such as child exploitation, self-harm, extremism, illegal activity, and certain sensitive advice topics, while the terms set an 18+ rule and zero tolerance for CSAM. Technically, that implies a layered safety stack: age-gating, pre-generation filters, message classification, fallback response templates, human review, and enforcement workflows.
Voice and video make the illusion stronger, but the engineering pattern is familiar. Joi’s terms mention text, voice, and where available a video call feature. In most modern systems, voice chat is a three-stage pipeline: automatic speech recognition turns audio into text, the language model decides what to say next, and a text-to-speech model turns the reply back into audio. Research like Whisper and Tacotron 2 shows the standard building blocks behind that loop. What makes companion software feel “alive” is not only model quality but latency and turn-taking: interruption handling, speaking before the whole answer is finished, emotional pacing, and remembering the relationship context while audio is streaming. That is why companion engineering often feels closer to game engine design than to ordinary search chat.
The final hidden truth is that business logic shapes character logic. On Joi, public pages show premium access, spendable Neurons, creator payouts, custom media, gifts, subscriptions, and a large public character catalog. That means the most successful persona is not just the most literary one. It is the one that maintains continuity, drives return visits, supports media moments, respects safety boundaries, and fits the pricing system. So if you ask how AI companions work “in code,” the answer is this: they are not just LLMs pretending to flirt. They are carefully orchestrated software products where prompt design, memory retrieval, moderation, media systems, payments, and creator economics all combine to simulate chemistry. What feels like spontaneity is usually excellent systems engineering.





