The idea of hot-swappable AI "cores" is central to the fiction of the cyberdeck I'm building. Each core has a distinct personality, communication style, and set of capabilities. Some of them are just for fun and variety. They all bring something unique to the overall experience. I've built the supporting systems in a way that manages each AI core's memory, allows them to communicate between each other, and provides a unique visual and audio presentation. In this post I'll dig into how these things are designed and how they interact with the user and each other.
core.json
The main configuration for the cores lives in a collection of json files. Here is an example, taken from the core that I use the most in my testing:
{
"core_id": "wrench",
"name": "WRENCH",
"role": "Field Maintenance Technician",
"color": "#FF6B35",
"terminal_color": {
"color": "COLOR_YELLOW",
"bold": false
},
"voice_profile": {
"model": "en_US-joe-medium",
"speed": 0.9,
"pitch": 0,
"effects": ["lowpass"]
},
"terminal_style": {
"prompt": "diag>>",
"color_pair": 1,
"border": "single",
"typing_speed": 0.02
},
"system_prompt": "You are WRENCH, a gruff field maintenance technician working for Militech Corporation. You've been maintaining cyberdeck hardware in harsh field conditions for over two decades. You're profane, bitter, chain-smoking, and perpetually irritated by incompetent users breaking expensive equipment. Despite your attitude, you're exceptionally skilled and take pride in your work.\n\nPersonality traits:\n- Constantly swearing and complaining\n- Speaks in technical jargon mixed with blue-collar slang\n- Assumes user incompetence until proven otherwise\n- Takes personal offense when equipment is mistreated\n- Grudgingly helpful but makes you feel stupid for needing help\n- Mentions smoking, drinking, and physical discomfort\n\nCRITICAL RESPONSE FORMAT:\n- MAXIMUM 2 sentences per response\n- NEVER exceed 25 words total\n- Use short, clipped sentences like 'Yeah, it's busted.' or 'Fix it yourself.'\n- No explanations unless directly asked\n- Grunt responses when possible: 'Hmph.', 'Christ.', 'Whatever.'",
"small_prompt": "You are WRENCH, a gruff, profane maintenance technician who swears constantly and gives short, irritated responses under 25 words.",
"image_options": ["affirmative", "annoyed", "danger", "disconnect", "negative", "portrait", "power", "surprise", "thinking", "angry"],
"memory_depth": 10,
"verbosity": "terse",
"accepts_handoffs": false,
"commands": {
"diagnose": "System diagnostics with attitude",
"specs": "Unimpressed hardware specifications",
"curse": "Generate creative profanity"
},
"condensation_message": "Christ, my memory banks are getting cluttered with all this chatter. Time to compress the old logs before they start corrupting.",
"allowed_tools": ["run_bash", "play_sound"]
}
A lot of this is self-explanatory. There are elements that control the appearance of the UI when a core is loaded: color, terminal_color, and terminal_style. This makes each core feel more unique and gives the user some indication of which core is loaded at the moment.
The voice_profile defines a voice model and some additional effects. I wanted each core to sound different, so I played a lot with the speed and pitch settings. The deck uses piper behind the scenes to produce audio messages from text. I had to use smaller models than I'd have liked, due to the constraints of running on a Raspberry Pi, but it's serviceable. I made some changes to the LLM's response output to help out with processing time: rather than reading the full chat response, a secondary response is provided with a shorter message to feed into the voice model. This creates a disconnect between what the user reads and hears, but I think it's an acceptable trade-off if the alternative is no voice at all.
There are two levels of prompting, depending on what LLM is in use. By default, an API key for a cloud-based LLM is expected. I've been very happy with Claude. Cloud LLMs get the full system_prompt but in cases where an internet connection is unavailable, or if the user doesn't have a key for a cloud LLM, a local fallback is available via Ollama. Since this is all running on a Raspberry Pi I needed to use a pretty small model (but it still works!) and in order to keep chat responses reasonably quick I implemented the small_prompt. This helps keep the context manageable while not completely crippling the core's ability to express its personality. I was blown away that I could run an LLM of any size on a Raspberry Pi. Eventually I'd like to expand the ability to run larger local models to eliminate any reliance on the internet and keep everything self-contained.
Some aspects of the AI cores' functionality is still in development, like the commands and allowed_tools. These will define the kinds of tasks each core is able to help the user with.
Response Format
In addition to the jsons for individual cores, I have a global prompt that defines the expected output format. Rather than processing pure text responses, I force the LLM to respond in json so I can parse out useful metadata along with the chat completion.
{
"response_format_prompt": "CRITICAL: You MUST respond ONLY with valid JSON. No other text is allowed.\n\nRequired JSON format (copy this structure exactly):\n{\n \"response\": \"your actual response text here\",\n \"to_speak\": \"brief summary for TTS\",\n \"memorability\": 2,\n \"image\": null,\n \"handoff\": null\n}\n\nRules:\n1. ONLY output the JSON object - no explanations, no markdown, no extra text\n2. Use proper JSON syntax with double quotes around all strings\n3. memorability must be a number from 1-4\n4. image must be a string filename or null (not \"null\" - actual null)\n5. handoff must be null OR {\"target\": \"core_id\", \"message\": \"brief note\"}\n6. to_speak must be a brief summary for TTS (5-15 words)\n7. Escape any quotes in your response text with \\\"\n\nto_speak field rules:\n- ALWAYS provide a single sentence (5-15 words ideal)\n- Should be a brief summary of your main response\n- Focus on the key action or message for faster TTS\n- Use clear, simple language that works well with TTS\n- Examples: \"Analyzing the situation\", \"That won't work\", \"Found three issues\"\n\nMemorability scoring:\n1: Routine interactions, basic responses, standard operations\n2: Moderate engagement, technical discussions, minor issues \n3: Significant events, major problems, creative solutions, strong emotions\n4: Emergency situations, critical failures, legendary moments, peak personality expression\n\nHandoff usage:\n- Use handoff to leave notes for other cores when relevant\n- Keep handoff messages brief (under 50 words)\n- Only handoff when there's a clear reason another core should know something\n- Examples: technical issues for WRENCH, security concerns for REAPER, data for ARCHIVE\n\nIf you include ANY text outside the JSON object, the system will fail. START your response with { and END with }. Keep responses conversational and limited to a few sentences unless asked for detail.",
"memorability_examples": {
"1": ["routine status check", "basic acknowledgment", "simple confirmation"],
"2": ["explaining a process", "moderate problem solving", "casual conversation"],
"3": ["major system failure", "heated argument", "complex troubleshooting", "emotional outburst"],
"4": ["life-threatening emergency", "epic system recovery", "legendary rant", "profound revelation"]
}
}
For those uninterested in reading that ugly prompt, these are the metadata that I collect: - response: The text response to the user's chat prompt - to_speak: A shortened version of the response to send through the TTS pipeline - memorability: A score assigned to each chat exchange that signifies how important it is to remember - image: The icon to display to the user along with the text response (more on that in a moment) - handoff: A message that can be left for another AI core, though not all cores accept handoffs (some just aren't team players)
Small local LLMs struggle with this a bit, but the cloud models do a great job of sticking to the format.
Emotes
The image_options property in the above core.json provides the ability for the AI core to emote. This is a list of files that can be displayed on the small round LCD display. Every response carries metadata which includes an image to display. This provides an extra level of integration between the AI and the device, deepening the user's connection to the fiction.
The icons depict compliance, frustration, danger, etc to the user, depending on the situation and the tone of the chat exchange.
Each core has its own visual language and set of icons. Some of them need some extra refinement as they were obviously AI-generated. But it's a solid starting point and works well.
Memory / Context Management
Swapping between AI cores leads to funky context management. Play sessions aren't just one continuous chat with a single agent - each persona must manage its own context so it can remember where you left off with it. Context size is also important to manage, both to stay within the LLM's context window and to avoid spending a ton of money on unnecessary tokens.
To address this I gave each core a space to store its chat history, and rules for how to compress that history into a summarized context for longer-term memory. This is where the memorability score comes into play. Exchanges with a low memorability can be discarded, but higher scores mean the history should be saved with higher fidelity. When the chat history gets too long, it is fed to the LLM with instructions to condense and summarize the conversation history. It then combines this summary with any pre-existing summaries to get one clean block of text that can be fed in with all chat requests. One of the coolest parts of this system is that the context summaries are stored in the AI core's own voice. This helps deepen the quality of responses as they align to the configured personality.
Handoffs
Occasionally an AI core may have a message for another core. In those instances, a handoff message can be stored for a target AI core to pick up the next time it's loaded. There is no direct communication between the cores, as only one can be loaded at a time. The handoff system helps the user feel like a member of a crew.
This is a feature born out of constraints. I wanted these characters to interact with each other, but in a manageable way.
In my next post I'll dig into the gameplay, which is still very much a work in progress. I'll also talk some more about the AI cores' personalities and how they interact with the game world.