Week 82: Stable Diffusion as a Creative Personal Assistant

4 Sep, 2022

Admittedly, the potential of missing the generative AI/ML train that’s running full speed these years gives me anxiety. I haven’t found the way to use it in my current job yet, but I’m curious.

This week I have been reading about Stable Diffusion, the open source latent text-to-image diffusion model capable of generating photo-realistic images given any text input. It being open source, malleable and run locally, is exciting.

Stable Diffusion Hello World

I decided to take it out for a spin on my laptop, a 2020 Intel MacBook Pro. This Setting up Stable Diffusion for MacOS article by Craig Morten really helped. In less than 30 minutes I was generating my first images.

Three teddybears watching a sunset together

Three teddybears watching a sunset together, by Stable Diffusion

AI as Personal Assistant

A few weeks ago, I was listening to an episode of Lex Friedman’s podcast featuring John Carmack. It’s a whopping 5 hour conversation, but I found it all kinds of interesting. Especially the part around Artificial General Intelligence (AGI) caught my ear. I’m really curious about the idea that, in the future, we will have a collection of AI’s as personal assistants.

Creative Personal Assistant

So why not start developing a AGI as a personal assistant today? a Creative Personal Assistant using Stable Diffusion.

Basically the idea is to build a service you can email with a prompt, and receive a reply with the output from stable diffusion. It would look something like this:

  1. Send an email to personalassistant@domain.com with a prompt
  2. Wait a few minutes
  3. Receive a reply

I created an e-mail address for the purpose and registered the IMAP/SMTP connection and authentication info. Then I built a Node.js server that monitors the inbox of e-mail address. When it registers a new e-mail, it extracts the first line from the message body and defines that as the prompt. A text-to-image stable diffusion python-script is then spawned from my server, and when it detects a new output image, the image is e-mailed back to the sender.

  1. Server receives e-mail. Registers sender and message as prompt
  2. Server spawns the Stable Diffusion txt2img script
  3. Server registers image file output from txt2img
  4. Server sends e-mail back to sender containing image file

10 hours after thinking of the idea, I have a working proof-of-concept prototype. Incredible what’s possible with technology nowadays.

E-mail response from the Creative Personal Assistant

E-mail response from the Creative Personal Assistant

Week 81: Morephaweb and Forensic Architecture

28 Aug, 2022

It wasn’t until today Sunday that I found some time to work on side projects. Last week I added tracking to Morphaweb so I can see if people actually use the site to export reels - and it turns out people do! That gave me some motivation to work a bit more on morphaweb (yes the title pun was terrible).

Morphaweb

One of the major challenges has been handling multiple files. After an hour or two trying to wrestle nested javascript promises, I also discovered the Crunker library which lets me concatenate audio files into a single audio file.

Next up I would like to automatically add a marker between each of the files uploaded for convenience. I should also start tagging my releases like the real software developers do.

Forensic Architecture

Last Sunday I went to Louisiana Art Museum, and saw their Forensic Architecture exhibit.

The fifth exhibition Louisiana’s series The Architect’s Studio presents Forensic Architecture, an interdisciplinary research agency, based at Goldsmiths, University of London. Working in the intersection of architecture, law, journalism, human rights and the environment, Forensic Architecture investigates conflicts and crimes around the world.

The exhibition itself was really interesting, but even more so was that they actually keep a GitHub repository of all the models and tools they’ve created.

Week 80: Blender and a bit of reading

21 Aug, 2022

Another light week of side project work, but I did manage to finish the Blender x Three.js tutorials and there by also the last bit of Three.js Journey. It’s been a great course, and the best money I’ve ever spent on e-learning. Bruno is a great teacher, and he’s even added lessons since I bought the course, so maybe I’ll even get more for my money’s worth.

Blender x Three.js

Since a couple of weeks ago I finished the UV unwrapping of the model, so the last part of the lesson was how to do model optimizations and exporting everything correctly from Blender.

Render of final blender model in three.js

Render of final blender model in three.js

Quite a lot of things to keep track of - somehow both easier and more difficult than I had imagined. Final result looks great though, and I’m excited and terrified to get started on my portfolio and working on the LEGO elements.

Moving around the scene

Moving around the scene

Reading

Hacking around with the ScotRail audio announcements — Simon Willison found all the sound files of a Scottish train operator and wrote this great blog post on how he scraped it and built fun prototypes with it.

Physically Based is a database of physically based values for CG artists.

CSS Grid and Custom Shapes pt 1 — I keep being surprised how much is possible to do in pure CSS

Color and Contrast — A comprehensive guide for exploring and learning about the theory, science, and perception of color and contrast.

Week 79: Morphagene reel splicer

14 Aug, 2022

This week was going to be a slow side project week due to lots of work and sunny evenings in Copenhagen. But then I saw a Github notification about a project I worked on a while back…

Morphaweb

One of my hobbies is modular synthesis. One of the modules I use is Morphagene by Make Noise. By their own description:

The Morphagene music synthesizer module is a next generation tape and microsound music module that uses Reels, Splices and Genes to create new sounds from those that already exist. Search between the notes to find the unfound sounds.

It’s one of my favorite modules, but it can be a bit tricky to get sounds outside the modular system onto the Morphagene. It has a slot for an SD-card where you can place reels. The reels have to be a particular format though, and if you want to have configured your splices up front, you have to download paid software.

Morphagene in my eurorack setup

Morphagene in my eurorack setup

To work around this, I have developed a free, open source, web-based app called Morphaweb. It allows you to build reels with splice markers and export them in the correct format. All without uploading anything to a server, to protect your privacy.

My app Morphaweb

My app Morphaweb

To use it, you simply import audio by dragging it into the app, then use the waveform editor and shortcuts to add/remove splice markers. When you’re done, you can download your reel as a Morphagene-compatible wave-file.

If you have a Morphagene or just curious, please try it out at https://knandersen.github.io/morphaweb/

I also added Morphaweb as a project here on the blog. Now I’ll go enjoy the rest of the sunny and hot afternoon outside!

Week 78: RSS, Build, and other small bits

7 Aug, 2022

This week was a bit sporadic because of work taking up a lot of time and headspace, but I did find time to think about newspapers, work a bit more on blender lessons, and read.

RSS Newspaper

I spent my holidays reflecting a bit about my use of time on social media, and how to stay up to date with “the industry” and people whose work I like.

Twitter has brought me so much joy, allowing me to digest an incredible amount of news and work from a wide range of organizations and people. But the product has devolved into so much suggested content that I didn’t ask for yet hits all the right brain chemicals to keep me scrolling. Same story for instagram.

At the same time, RSS feeds are seeing a resurgence, and I have now switched the same setup as Matt, combining NetNewsWire and Feedbin. Feedbin even allows for twitter integration, so now I can get the tweets I’m interested in without any of the garbage and NetNewsWire has a “star” functionality that functions like a archive, a great replacement for twitter’s bookmarks.

Screenshot of my NetNewsWire

Screenshot of my NetNewsWire

NetNewsWire is efficient, but not exciting. I miss beautiful typography as graphic elements, layout and images from traditional media. Feedbin has an API, so I’m wondering if I should try and sketch an “RSS newspaper”, a more visceral, scrolling that would allow me to consume more news in a visually enticing way.

I’m still experimenting with my setup and building it out. Still check twitter and instagram, but through the web-versions rather than the apps.

Blender lesson progress

The lessons continue, now on Bruno Simon’s lesson on unwrapping and baking a scene in Blender.

Blender model of a small grove with a portal

Blender model from last week, now with materials

I’m at the stage of doing UV unwrapping of all objects, which is fun and frustrating. Since the whole idea was to create a 3D world including LEGO elements for a portfolio, I’m not looking forward to unwrap those.

Blender UV unwrapping setup

Blender UV unwrapping setup

Reading

I’m reading Build by Tony Fadell together with colleagues. It’s curious, because it’s the kind of book I would have been ecstatic to read 5 years ago. I have a lot of respect for Tony Fadell and what he has accomplished in his career, but coming into this book I have to admit I thought didn’t need to read it. My manager kept pushing it though, so now I am, and learning a lot from it.

Also a hat tip to Linus’ article on designing with materials. It features work by Tyler who I met while I worked at MIT Media Lab, and it’s the kind of material that makes me happy and hopeful for the future of software. It made me think of a lecture Lucas Ochoa and Gautam Bose gave on A Tinkerer’s Guide to the AI Galaxy. They also talk about technologies as materials. So did BERG when they were around.