Creative Personal Assistant is a speculative prototype I made to explore Stable Diffusion, the open source text-to-image AI model that was released August 2022.
As I started exploring Stable Diffusion for this prototype just a few weeks after its release, the infrastructure required to run it was slow, unstable and had associated fees, so no online demo is available for the time being.
As outlined the original post here on the blog Week 82: Stable Diffusion as a Creative Personal Assistant, it’s been clear to me that these years new wave of generative AI/ML technologies are going to change the technology product landscape and how we work. I prefer to experience these technologies up-close, hands-on. For this project, as usual, I try to combine multiple motivations into a single project:
Gain hands-on experience with Stable Diffusion
Use prototyping to speculate how this technology could integrate into daily life
Explore the concept of AI personal assistants by John Carmack
A few weeks ago, I was listening to an episode of Lex Friedman’s podcast featuring John Carmack. It’s a whopping 5 hour conversation, but I found it all kinds of interesting. Especially the part around Artificial General Intelligence (AGI) caught my ear. I’m really curious about the idea that, in the future, we will have a collection of AI’s as personal assistants.
Creative Personal Assistant
So why not start developing a AGI as a personal assistant today? a Creative Personal Assistant using Stable Diffusion.
Basically the idea is to build a service you can email with a prompt, and receive a reply with the output from stable diffusion. An effortless, informal conversation with a personal assistant. The flow would look something like this:
I created an e-mail address for the purpose and registered the IMAP/SMTP connection and authentication info. Then I built a Node.js server that monitors the inbox of e-mail address. When it registers a new e-mail, it extracts the first line from the message body and defines that as the prompt. A text-to-image stable diffusion python-script is then spawned from my server, and when it detects a new output image, the image is e-mailed back to the sender.
Server receives e-mail. Registers sender and message as prompt
Server spawns the Stable Diffusion txt2img script
Server registers image file output from txt2img
Server sends e-mail back to sender containing image file
10 hours after thinking of the idea, I had a working proof-of-concept prototype. Incredible what’s possible with technology nowadays.
Morphaweb is a free, open source, web-based application that allows you to drag-and-drop audio files into the browser, add markers and export in the correct format for the Make Noise Morphagene eurorack module. Fast, easy, free.
One of my hobbies is modular synthesis. One of the modules I use is Morphagene by Make Noise. By their own description:
The Morphagene music synthesizer module is a next generation tape and microsound music module that uses Reels, Splices and Genes to create new sounds from those that already exist. Search between the notes to find the unfound sounds.
It’s one of my favorite modules, but it can be a bit tricky to get sounds from your computer onto the Morphagene. Morphagene has an SD-card slot for storing “reel” WAV-files. The reels have to be a particular format though, and if you want to have configured your Morphagene “splices” up front, you have to download paid software.
To work around this, I have developed a free, open source, web-based app called Morphaweb. It allows you to build reels with splice markers and export them in the correct format. All without uploading anything to a server, to protect your privacy.
To use it, you simply import audio by dragging it into the app, then use the waveform editor and shortcuts to add/remove splice markers. When you’re done, you can download your reel as a Morphagene-compatible wave-file.
Winters in Denmark are very brutal because of the lack of daylight, and the COVID-lockdown of 2020 only made it worse. I didn’t use to be affected by the darkness, but living with a foreigner experiencing her first Danish winters, I became much more aware of it.
Some years back, I started following @SunOfSeldo on Twitter. It’s an account that writes daily tweets about the advancement of daylight between winter- and summer-solstice, and finds this wonderful balance between quantitative measures of minutes and seconds of added daylight, but also writing qualitative, emotionally uplifting statements about the sun.
This winter, being back in Denmark, I realized that Sun of Seldo tweets about the daylight in San Francisco doesn’t map that well to Denmark. Additionally, I felt lonely having just relocated back to Denmark and being stuck in COVID-lockdown. I thought maybe I could start my own account, and @SunOfDenmark was born.
I spent a bunch of time studying how to calculate the daylight of a given location, but didn’t seem to get the right numbers. I reached out to Laurie Voss, the person behind @SunOfSeldo, and he was very kind in giving me access to the tool he built to monitor the daylight for a given location, as well as providing some interesting statistics for it.
As someone who hardly ever posts anything anywhere and work inside confidential structures normally, it felt intimidating to start tweeting. Finding my voice, especially as Danish is a very different language than English, and at this point almost less preferable or comfortable for me to write in.
The first week or two, I wrote each tweet daily, whenever I reminded myself during the day. That quickly became unsustainable, mostly because I found it hard to be creative. Instead, I started writing a week’s worth of of tweets every Sunday. The act of compounding the writing made it a much more interesting creative process. Instead of thinking about each tweet separately, I could start building up narratives, and think about a weekly progression.
When I moved into my apartment in Boston in 2018, I wanted to decorate my place. In recent years, I have refrained from putting posters or paintings my walls, because I feel like they become invisible to me over time.
As a reaction to this, I started thinking about modular, low-cost, parametric DIY-art. Made in such a way that it could be reshaped regularly.
An artist I have been very inspired by is HOTTEA who makes vibrant yarn installations, often creating gradients by having hundreds or thousands of suspended strings of yarn in slightly different hues.
My thinking was, by using Rhino + Grasshopper as a parametric CAD and sketching tool, and the idea of strings of yarn in diffent constellations, I could create recipes for art works that could be mounted on my wall around specific anchor points.
I took a picture of my wall and put it in Rhino as a viewport background. Then I went into Grasshopper and started building up an anchor point generator setup that would allow me to control how many strings I’d like, and how long they would be. Assigning the color is done through reading an Image Sampler that you can put any bitmap into, and have it read the colors of pixels in the bitmap and map them in 3D.
An important component in this project is the anchor point. I imagine:
Something that mounts into a wall with a single small screw
A center “spool” that you can tighten the string around
A cover that hides the screw and the spooled and loose end part of the string
A cover might have to have one or more outlets that lets the string in/out
In this post dive into why I chose this project, thoughts behind the design, technical implementation and details, and how I could continue the work.
For the past few years, as also mentioned in last week’s post, I have been curious about the synergy of human and machine intelligence, especially in creative work. With the emergence of readily available large-language models (LLMs), it’s never been easier to experiment with new experiences and interactions.
The goals I had for this project:
Become more familiar with LLMs, prototyping with OpenAI models
Improve my front-end development skills by learning Svelte
Play with the interaction design
Adventure Writer is fairly simple:
A few starting lines is generated by AI
You continue writing
You can request a couple more sentences from the AI if desired
It’s a speculation on how machine intelligence may be able to help you stay in flow so you spend less time creatively blocked, while maintaining creative control.
I initially developed Adventure Writer for the desktop browser since that allows me to share the prototype anywhere easily. The core interaction with the AI should be effortless and minimally disruptive to the writing flow. To me that meant using a keyboard key for interaction, but it shouldn’t be too complex or use a character used in the writing. Since it’s usually used to switch between interaction elements rather than a text entity, I decided to use the Tab-key.
But once I sent the first prototype to a friend, his first reaction was “oh cool, but doesn’t work on my phone, so I guess I’ll try it at home”. It was a little embarrasing. After a decade of mobile-first development, I should know better. I’m continuously impressed how many typing activities are increasingly possible on mobile, like coding using Repl.it Mobile
So I switched strategy to enable a mobile experience, but this introduced a new challenge - there’s no Tab-key on the phone virtual keyboard, or other unused keys that could be used instead. My solution is to put a floating button right above the keyboard that triggers the AI generation.
On desktop, typing interactions like deleting a few word or lines are easy, but unfortunately on mobile they’re not. My solution to this is to add a slider that allows you to scale the amount of AI-generated text you’d like to keep.
Although it’s a bit buggy due to lack of native ability to detect the on-screen keyboard or align things to it, I’m happy with this highly “thumbable” mobile interaction design concept.
To retrieve AI-generated text, I use OpenAI’s GPT-3 Text Completion model. I initially used the most capable, but also slower and more expensive Davinci model, but at the time of writing using the faster and cheaper Curie, which seems to be more than capable for this concept. To use the OpenAI API, I wrote a Node.js server that I run on a free Render instance.
When starting the adventure, I send OpenAI the prompt: Start a fictional fairytale story:. When asking to generate additional content I send the prompt: Continue this story:<TEXT> with <TEXT> being what’s been typed so far.
For the website itself, everything is written in Svelte.
Ideas for improvement
This was mostly meant to be a quick and fun learning and prototyping exercise. But if I were to continue the work, here are a few ideas that would be worth exploring:
Enable cycling through multiple suggestions instead of just one
Fine-tune the AI model on adventures
Provide AI story analysis, i.e. is this a “good” adventure?
Reflections on process
I would rather spend the time doing creative exploration, so for future work, I’ll consider either dropping mobile, sharing demos as videos rather than working prototypes or build native apps. Neither seem optimal to me, so will keep thinking about this.