This second week of self-employment has been a really fun mix of interesting conversations and building prototypes. People have already been using the opportunity to book 30 min. with me on Calendly or otherwise contact me. Thank you to everyone who has, and please keep doing so. I have been thorougly enjoying conversations about life, technology, design, careers and potential collaborations. By request, I have also added some time slots that should work for folks over on the US west coast. If you still don’t find times that work for you, let me know.
Adventure Writer
As this year’s first side project, I decided to make an Adventure Writer that lets people write AI-assisted adventures. If you want to give it a try, visit: http://knandersen.github.io/adventure-writer/
Demo recording of Adventure Writer at 2x speed
In this post dive into why I chose this project, thoughts behind the design, technical implementation and details, and how I could continue the work.
Background
For the past few years, as also mentioned in last week’s post, I have been curious about the synergy of human and machine intelligence, especially in creative work. With the emergence of readily available large-language models (LLMs), it’s never been easier to experiment with new experiences and interactions.
The goals I had for this project:
Become more familiar with LLMs, prototyping with OpenAI models
Improve my front-end development skills by learning Svelte
Play with the interaction design
Concept
Adventure Writer is fairly simple:
A few starting lines is generated by AI
You continue writing
You can request a couple more sentences from the AI if desired
Repeat 2.-3.
It’s a speculation on how machine intelligence may be able to help you stay in flow so you spend less time creatively blocked, while maintaining creative control.
Interaction Design
I initially developed Adventure Writer for the desktop browser since that allows me to share the prototype anywhere easily. The core interaction with the AI should be effortless and minimally disruptive to the writing flow. To me that meant using a keyboard key for interaction, but it shouldn’t be too complex or use a character used in the writing. Since it’s usually used to switch between interaction elements rather than a text entity, I decided to use the Tab-key.
But once I sent the first prototype to a friend, his first reaction was “oh cool, but doesn’t work on my phone, so I guess I’ll try it at home”. It was a little embarrasing. After a decade of mobile-first development, I should know better. I’m continuously impressed how many typing activities are increasingly possible on mobile, like coding using Repl.it Mobile
So I switched strategy to enable a mobile experience, but this introduced a new challenge - there’s no Tab-key on the phone virtual keyboard, or other unused keys that could be used instead. My solution is to put a floating button right above the keyboard that triggers the AI generation.
Screenshot of mobile interface controls
On desktop, typing interactions like deleting a few word or lines are easy, but unfortunately on mobile they’re not. My solution to this is to add a slider that allows you to scale the amount of AI-generated text you’d like to keep.
Slider interaction
Although it’s a bit buggy due to lack of native ability to detect the on-screen keyboard or align things to it, I’m happy with this highly “thumbable” mobile interaction design concept.
Technical implementation
To retrieve AI-generated text, I use OpenAI’s GPT-3 Text Completion model. I initially used the most capable, but also slower and more expensive Davinci model, but at the time of writing using the faster and cheaper Curie, which seems to be more than capable for this concept. To use the OpenAI API, I wrote a Node.js server that I run on a free Render instance.
When starting the adventure, I send OpenAI the prompt: Start a fictional fairytale story:. When asking to generate additional content I send the prompt: Continue this story:<TEXT> with <TEXT> being what’s been typed so far.
For the website itself, everything is written in Svelte.
Ideas for improvement
This was mostly meant to be a quick and fun learning and prototyping exercise. But if I were to continue the work, here are a few ideas that would be worth exploring:
Enable cycling through multiple suggestions instead of just one
Fine-tune the AI model on adventures
Provide AI story analysis, i.e. is this a “good” adventure?
Reflections on process
It’s generally been fun getting my hands dirty with GPT and Svelte. On the other hand, it was very frustrating spending a disproportionate amount of time debugging weird CSS/HTML/JavaScript issues across the mobile and desktop implementation.
I would rather spend the time doing creative exploration, so for future work, I’ll consider either dropping mobile, sharing demos as videos rather than working prototypes or build native apps. Neither seem optimal to me, so will keep thinking about this.
For the past nine years I’ve invented new technologies, products, tools, teams and processes. The only constant has been that the missions and their deadlines were “impossible”, so I’ve had to invent new ways of thinking and doing to make them happen. I’m still curious to do that in an environment where people are excited to learn and build together.
I have spent some time reflecting on the products I have built over the years, as well as the sketching and prototyping environments I built to make them happen. To me, products like LEGO Super Mario and LEGO Education SPIKE Prime are tools for play and learning. And a lot of the prototypes I built to bring them into the world were built as tools for sketching faster and more creatively across design, engineering and business.
At Bang & Olufsen I drove new thinking about redefining audiovisual products, patented a number of technologies to make interactions more intuitive and expressive, and managed a team of young talented designers for the first time. I removed abstractions between design and engineering through closer collaboration and an emphasis on learning and shared responsibility.
Now I’m searching for a new mission and group of people to do something exciting with. Searching for something relentful. Something that has the ambition to build products that are more like tools.
Something that takes the best of human and machine intelligence and treats them as complementary. With the new wave of AI technologies, it feels like Mark Andreessen’s old claim that “software is eating the world” is happening again. I’m still curious though, about the interfaces to that software. And what could the physical entrypoints to that software be?
Gathering
The past month, I have been focused on gathering. Listening, reading, talking to people.
I’ve dabbled in learning Svelte and connecting it with THREE.js through the Threlte library. I still build websites and have increasingly used web-based interfaces to rapidly sketch with hardware and software. Maybe I’m wrong on this, but Svelte feels easier and lighter than React. I’m considering respinning my portfolio using Svelte and THREE.js, since it could be a fun project for learning.
Like everyone else, I have also been dabbling in ChatGPT and playing with GPT-3. Although the hype will subside, I believe these AI technologies will change the field of technology products, how they are made, and even what kind of teams will be needed to make them.
As a learning project for GPT-3, I have been toying with the idea of an Adventure Writer application that supports writing adventures in a way that combines human and machine intelligence. The concept is that you enter a loop where you alternate between writing by yourself and using GPT-3 to generate a few sentences at a time as inspiration. Through a few key interactions like scrolling to increase/decrease the amount of text generated, and clicking words to generate synonyms, I’m hoping to build an interface that stimulates the creative process.
Based on user feedback, I have also been making updates to Morphaweb and added a version number to the website that will help when debugging with users. Curious about automating the versioning through GitHub Actions or something clever - might ask GPT for ideas.
12 years ago, while I was studying, I came across an article that described an ambient Justin Bieber song. Sounded like an oxymoron to me, but also intriguing. A few seconds in, I was completely hooked.
I remember taking note that the song was made using something called PaulStretch, but I guess I never investigated further back then. I’ve thought about PaulStretch several times since then, and this week I decided to look into it. Apparently, it’s made by Paul Nasca, and he even put a C++ and python version on GitHub.
I’m always fascinated by people like Paul. He reminds me a bit of Tom Erbe. They seem like people who able to operate at the intersection of mathematics, code, sound in an aesthetic way. And that rather than building just for themselves, are able to provide beautiful sound tools.
One of my piano recordings before stretching:
After 8x stretching:
Next week
Tuesday I’ll be doing a Shader Prototyping workshop hosted by Patricio González Vivo. Really excited to learn from Patricio. He’s the creator of Book of Shaders and I am perpetually in awe of the work he shares in his twitter feed.
I started writing this weeknote almost a month ago - so much for weeknotes. But it’s fine, this was never meant to be a chore.
Work has been incredibly busy, and I’ve underestimated how difficult it is to build something truly new, while building an organization, a culture and being in a transition from individual contributer to leader. In a sense, I’m learning to be the diffuser of my way of thinking and doing, rather than doing the thinking or doing.
Stable Diffusion and Whisper
It’s wild how much is happening every day with Stable Diffusion. I’m trying to keep up, and given my MacBook 2020’s limitations, I have been curious to create an efficient sandbox for myself to try out some of these new models.
I found Iulia Turc’s How To Run a Stable Diffusion Server on Google Cloud Platform (GCP) helpful, and I’ve set up VSCode to be able to SSH into my GCP server. I should probably use Google Colab like any other sane person, but I’m intrigued by having my own cloud computer. Feels like less of an abstraction somehow.
Now that I have the server, I have also been taking other models, like Whisper out for a spin. It’s fascinating, I can put an audio clip of me speaking non-sense Danish in, and it does a perfect translation in a few seconds.
Next, I’m curious to get DreamBooth up and running, though it’s proving difficult with the limited VRAM I have available on my GCP instance. By the time you’re reading this, I’m sure someone will have found a way to make it run on significantly less VRAM.
Reading
I finished the Build book some weeks back. It’s good… really good. My leadership team has read it, but other departments have read it too, and it’s serving as a great common language and way of thinking.
I also finished the first book, A Wizard of Earthsea, from Ursula K. Le Guin’s Earthsea: The First Four Books. One thing I take away is the power of language and names. That a name is precious and reveals someone’s true nature, even forces one to surrender. Next is the second book, The Tombs of Atuan, and I’m curious to see if Ursula has another beautiful concept for me to digest.
I’m also reading Let My People Go Surfing by Yvon Chouinard as an audiobook. Not giving it my undivided attention, but it’s a good book filled with inspiring stories from a long life of a principled person.
Admittedly, the potential of missing the generative AI/ML train that’s running full speed these years gives me anxiety. I haven’t found the way to use it in my current job yet, but I’m curious.
This week I have been reading about Stable Diffusion, the open source latent text-to-image diffusion model capable of generating photo-realistic images given any text input. It being open source, malleable and run locally, is exciting.
Three teddybears watching a sunset together, by Stable Diffusion
AI as Personal Assistant
A few weeks ago, I was listening to an episode of Lex Friedman’s podcast featuring John Carmack. It’s a whopping 5 hour conversation, but I found it all kinds of interesting. Especially the part around Artificial General Intelligence (AGI) caught my ear. I’m really curious about the idea that, in the future, we will have a collection of AI’s as personal assistants.
Creative Personal Assistant
So why not start developing a AGI as a personal assistant today? a Creative Personal Assistant using Stable Diffusion.
Basically the idea is to build a service you can email with a prompt, and receive a reply with the output from stable diffusion. It would look something like this:
I created an e-mail address for the purpose and registered the IMAP/SMTP connection and authentication info. Then I built a Node.js server that monitors the inbox of e-mail address. When it registers a new e-mail, it extracts the first line from the message body and defines that as the prompt. A text-to-image stable diffusion python-script is then spawned from my server, and when it detects a new output image, the image is e-mailed back to the sender.
Server receives e-mail. Registers sender and message as prompt
Server spawns the Stable Diffusion txt2img script
Server registers image file output from txt2img
Server sends e-mail back to sender containing image file
10 hours after thinking of the idea, I have a working proof-of-concept prototype. Incredible what’s possible with technology nowadays.
E-mail response from the Creative Personal Assistant