Content Policy

Moderation & Safety Infrastructure

Operating an AI roleplay platform comes with significant ethical and regulatory responsibilities. While we provide users with freedom of expression and creative exploration, we enforce strict boundaries to ensure the platform is never used to promote, depict, or facilitate illegal, abusive, or harmful content.

We recognize two core challenges in maintaining a safe AI experience:


1. Preventing the Impersonation of Real Individuals

We are committed to ensuring that our platform is not used to imitate or replicate real people. To mitigate this risk, users do not have full control over the likeness or naming of AI characters. Visual generation is randomized or guided through broad trait selection, and name customization is limited. As a result, it is virtually impossible for users to recreate the appearance or identity of any specific real person.


2. Blocking the Generation of Illegal or Prohibited Content

We strictly prohibit any request or generation that falls into the following banned content categories:

  • Content involving minors, We strictly prohibit all content involving minors — sexual or non-sexual — including implied, age-regressed, or roleplay scenarios.

In practice, our moderation system is highly sensitive to this category: even mentioning an underage character is typically enough to trigger a flag and block the request.

  • Rape or non-consensual fantasies (zero tolerance policy)
  • Bestiality, necrophilia, and incest
  • Scatological content (scat)
  • Torture, extreme violence, or gore
  • Glorification or promotion of hate speech, terrorism, or real-world violence
  • Incest

These categories are blocked at multiple levels — from input classification to generation moderation and behavioral tracking. Attempts to access such content may result in immediate restriction or permanent suspension.

Preventing Impersonation of Real Individuals during Character Creation

To ensure our platform cannot be used to replicate real people, we deliberately limit how much control users have over character identity and appearance.

Users can select from a wide range of high-level traits such as body type, ethnicity, hairstyle, and thematic style. However, they cannot freely customize characters in a way that would allow for the recreation of specific individuals. Here’s how the system is designed:

  • Controlled Trait Selection
    All visual customization options are presented in the form of predefined lists. Users may choose general traits (e.g., “curvy body,” “Asian,” “short hair”), but they cannot adjust exact facial features, age, height, or combine traits in an unrestricted way. This ensures that character creation remains expressive but not granular enough to resemble real people.
  • Pre-Made Human-Written Traits
    Once a user finalizes their selections, our system generates the character using pre-made traits that are baked into our system and tested by our developers and QA team. These traits are designed to reflect the user’s choices at a high level, while being manually vetted to ensure that no combination can result in undesirable, unsafe, or overly specific representations — including underage or real-world likenesses.
  • Randomized Image Generation with No External Influence

All visuals are produced by a diffusion model starting from random noise, ensuring every output is unique and non-reproducible. Even with identical input selections, no two generations are guaranteed to be the same.

Most importantly, users cannot upload reference images or any external input that could guide the model toward replicating a specific face, body, or likeness.

This strict design choice eliminates the risk of intentional impersonation or attempts to recreate a known individual, keeping the platform compliant with privacy and safety expectations.

This combination of limited control, curated and tested traits, and random generation makes it virtually impossible for a user to intentionally or unintentionally recreate the likeness of any real person — protecting individual privacy and keeping the platform compliant with safety regulations.

Moderating Freeform Text and Image Generation

While character creation is strictly guided through predefined, human-reviewed traits, freeform input — such as text messages and image generation prompts — presents additional challenges. To address this, we have implemented a two-layered moderation system consisting of passive restrictions and active AI-based moderation.


Passive Moderation via Trait Anchoring & Prompt Conversion

Even after a character is created, users do not gain full control over the character’s identity in any form. Every character is permanently associated with a set of non-editable traits, which include age indicators, themes, and behavior profiles. These traits anchor the character’s identity throughout all interactions, including chats and image generation — ensuring that content cannot drift into prohibited representations.

When a user sends a request for an image, that message is not used directly. Instead, it is passed through a Prompt Converter LLM — a custom language model designed to translate the user’s freeform input into a set of controlled image generation tags. This model is trained specifically to ignore any attempts to reference age, underage characteristics, or other restricted elements in the visual output pipeline.

As a result, users cannot directly inject inappropriate concepts into the image generation system, and the final image remains bound to system-approved tags.


Active Moderation Using Dual AI Models

To ensure full coverage of unsafe behaviors, every freeform user input — whether it’s a chat message or image prompt — is routed through two AI-based moderation models before execution:

  1. OpenAI’s omni-moderation model
    This model is particularly effective at catching high-risk categories like self-harm, racism, extremism, and explicit illegal content.
  2. Our Proprietary Moderation Model (based on Qwen 2.5 – 14B)
    Through internal testing, we observed that omni-moderation struggled to consistently detect content related to underage themes and scat-related material — both of which are critical to flag for our platform.
    To solve this, we developed and fine-tuned our own in-house moderation model, specialized in identifying edge cases that are commonly missed by general-purpose systems. This model uses a combination of zero-shot classification, tag prediction, and fine-tuned instruction handling to evaluate intent, context, and high-risk phrasing with significantly higher precision.

Keeping a Human in the Loop

AI moderation models, while powerful, are not infallible. To ensure accuracy, fairness, and accountability, we maintain a human-in-the-loop review system as part of our moderation pipeline.

  • Flagged Content Review Dashboard
    All messages flagged by our moderation models — whether in chat or image generation — are securely logged and accessible through a dedicated moderation review interface in our internal admin panel.
    Our moderation team regularly audits this content to:
    • Confirm or dismiss flagged violations
    • Identify false positives or model weaknesses
    • Monitor for evolving abuse patterns or repeated offenders
  • Zero-Tolerance for Underage Content
    Per our terms of service, we enforce an immediate and permanent ban on any user who attempts to generate content involving minors, whether real or fictional, and regardless of whether the moderation system blocked the content or not.
    Attempted violations — even those intercepted by the AI — are considered sufficient grounds for termination without warning or appeal.

This process ensures that our moderation strategy is not just automated, but also accountable and actively enforced by human moderators.

Continuous Improvement of Moderation Systems

Moderation is not a one-time solution — it’s an ongoing responsibility.

As our platform grows and more data becomes available, we continuously refine and expand our moderation capabilities. Every flagged message, edge case, and manual review contributes to:

  • Improving the accuracy of our proprietary models
  • Identifying blind spots or emerging abuse patterns
  • Expanding our training datasets to better reflect real-world user behavior

We regularly retrain and evaluate our moderation models to ensure they remain effective, especially in handling nuanced or evolving content. This commitment to iteration ensures that our safety infrastructure becomes smarter, faster, and more precise over time.

Known Limitations & Contextual Challenges

While our current moderation system is highly effective at detecting clear, high-risk content in isolated messages, we acknowledge a known challenge in handling slow-developing or contextual abuse — particularly in text-based interactions.

  • Gradual Escalation in Roleplay
    Some users may attempt to “build up” toward prohibited scenarios over multiple benign-seeming messages. While no individual message may be harmful on its own, the cumulative context may suggest intent that is inappropriate or violative.
    Detecting this kind of behavior is inherently more difficult for AI models, which typically operate on a per-message or short-window basis.
  • Image Generation Is Immune to This Pattern
    This type of behavior is not possible within our image generation pipeline, since the image model only sees the immediate request — not the full conversation history. Combined with prompt conversion and moderation filtering, this effectively blocks the buildup of harmful visuals.
  • Trial-and-Error Behavior Is Self-Limiting
    Users attempting to bypass moderation systems via gradual escalation or repeated prompts typically leave behind a large volume of flagged content. These attempts trigger manual review, increasing the likelihood of permanent suspension.
    In practice, the effort required to “game the system” results in a moderation trail that is both detectable and actionable.

Commitment to Transparency

We believe that moderation tools — especially in AI — should be open, auditable, and shareable. That’s why we are committed to open-sourcing all of our proprietary moderation models and making them accessible to the public soon.
This initiative will allow researchers, developers, and platform operators to inspect, improve, and reuse our safety systems — contributing to a broader effort to create responsible AI infrastructure.