Week 82: Stable Diffusion as a Creative Personal Assistant

4 Sep, 2022

Admittedly, the potential of missing the generative AI/ML train that’s running full speed these years gives me anxiety. I haven’t found the way to use it in my current job yet, but I’m curious.

This week I have been reading about Stable Diffusion, the open source latent text-to-image diffusion model capable of generating photo-realistic images given any text input. It being open source, malleable and run locally, is exciting.

Stable Diffusion Hello World

I decided to take it out for a spin on my laptop, a 2020 Intel MacBook Pro. This Setting up Stable Diffusion for MacOS article by Craig Morten really helped. In less than 30 minutes I was generating my first images.

Three teddybears watching a sunset together

Three teddybears watching a sunset together, by Stable Diffusion

AI as Personal Assistant

A few weeks ago, I was listening to an episode of Lex Friedman’s podcast featuring John Carmack. It’s a whopping 5 hour conversation, but I found it all kinds of interesting. Especially the part around Artificial General Intelligence (AGI) caught my ear. I’m really curious about the idea that, in the future, we will have a collection of AI’s as personal assistants.

Creative Personal Assistant

So why not start developing a AGI as a personal assistant today? a Creative Personal Assistant using Stable Diffusion.

Basically the idea is to build a service you can email with a prompt, and receive a reply with the output from stable diffusion. It would look something like this:

  1. Send an email to personalassistant@domain.com with a prompt
  2. Wait a few minutes
  3. Receive a reply

I created an e-mail address for the purpose and registered the IMAP/SMTP connection and authentication info. Then I built a Node.js server that monitors the inbox of e-mail address. When it registers a new e-mail, it extracts the first line from the message body and defines that as the prompt. A text-to-image stable diffusion python-script is then spawned from my server, and when it detects a new output image, the image is e-mailed back to the sender.

  1. Server receives e-mail. Registers sender and message as prompt
  2. Server spawns the Stable Diffusion txt2img script
  3. Server registers image file output from txt2img
  4. Server sends e-mail back to sender containing image file

10 hours after thinking of the idea, I have a working proof-of-concept prototype. Incredible what’s possible with technology nowadays.

E-mail response from the Creative Personal Assistant

E-mail response from the Creative Personal Assistant