I started writing this weeknote almost a month ago - so much for weeknotes. But it’s fine, this was never meant to be a chore.
Work has been incredibly busy, and I’ve underestimated how difficult it is to build something truly new, while building an organization, a culture and being in a transition from individual contributer to leader. In a sense, I’m learning to be the diffuser of my way of thinking and doing, rather than doing the thinking or doing.
Stable Diffusion and Whisper
It’s wild how much is happening every day with Stable Diffusion. I’m trying to keep up, and given my MacBook 2020’s limitations, I have been curious to create an efficient sandbox for myself to try out some of these new models.
I found Iulia Turc’s How To Run a Stable Diffusion Server on Google Cloud Platform (GCP) helpful, and I’ve set up VSCode to be able to SSH into my GCP server. I should probably use Google Colab like any other sane person, but I’m intrigued by having my own cloud computer. Feels like less of an abstraction somehow.
Now that I have the server, I have also been taking other models, like Whisper out for a spin. It’s fascinating, I can put an audio clip of me speaking non-sense Danish in, and it does a perfect translation in a few seconds.
Next, I’m curious to get DreamBooth up and running, though it’s proving difficult with the limited VRAM I have available on my GCP instance. By the time you’re reading this, I’m sure someone will have found a way to make it run on significantly less VRAM.
Reading
I finished the Build book some weeks back. It’s good… really good. My leadership team has read it, but other departments have read it too, and it’s serving as a great common language and way of thinking.
I also finished the first book, A Wizard of Earthsea, from Ursula K. Le Guin’s Earthsea: The First Four Books. One thing I take away is the power of language and names. That a name is precious and reveals someone’s true nature, even forces one to surrender. Next is the second book, The Tombs of Atuan, and I’m curious to see if Ursula has another beautiful concept for me to digest.
I’m also reading Let My People Go Surfing by Yvon Chouinard as an audiobook. Not giving it my undivided attention, but it’s a good book filled with inspiring stories from a long life of a principled person.
Admittedly, the potential of missing the generative AI/ML train that’s running full speed these years gives me anxiety. I haven’t found the way to use it in my current job yet, but I’m curious.
This week I have been reading about Stable Diffusion, the open source latent text-to-image diffusion model capable of generating photo-realistic images given any text input. It being open source, malleable and run locally, is exciting.
A few weeks ago, I was listening to an episode of Lex Friedman’s podcast featuring John Carmack. It’s a whopping 5 hour conversation, but I found it all kinds of interesting. Especially the part around Artificial General Intelligence (AGI) caught my ear. I’m really curious about the idea that, in the future, we will have a collection of AI’s as personal assistants.
Creative Personal Assistant
So why not start developing a AGI as a personal assistant today? a Creative Personal Assistant using Stable Diffusion.
Basically the idea is to build a service you can email with a prompt, and receive a reply with the output from stable diffusion. It would look something like this:
I created an e-mail address for the purpose and registered the IMAP/SMTP connection and authentication info. Then I built a Node.js server that monitors the inbox of e-mail address. When it registers a new e-mail, it extracts the first line from the message body and defines that as the prompt. A text-to-image stable diffusion python-script is then spawned from my server, and when it detects a new output image, the image is e-mailed back to the sender.
Server receives e-mail. Registers sender and message as prompt
Server spawns the Stable Diffusion txt2img script
Server registers image file output from txt2img
Server sends e-mail back to sender containing image file
10 hours after thinking of the idea, I have a working proof-of-concept prototype. Incredible what’s possible with technology nowadays.
It wasn’t until today Sunday that I found some time to work on side projects. Last week I added tracking to Morphaweb so I can see if people actually use the site to export reels - and it turns out people do! That gave me some motivation to work a bit more on morphaweb (yes the title pun was terrible).
Morphaweb
One of the major challenges has been handling multiple files. After an hour or two trying to wrestle nested javascript promises, I also discovered the Crunker library which lets me concatenate audio files into a single audio file.
Next up I would like to automatically add a marker between each of the files uploaded for convenience. I should also start tagging my releases like the real software developers do.
Forensic Architecture
Last Sunday I went to Louisiana Art Museum, and saw their Forensic Architecture exhibit.
The fifth exhibition Louisiana’s series The Architect’s Studio presents Forensic Architecture, an interdisciplinary research agency, based at Goldsmiths, University of London. Working in the intersection of architecture, law, journalism, human rights and the environment, Forensic Architecture investigates conflicts and crimes around the world.
The exhibition itself was really interesting, but even more so was that they actually keep a GitHub repository of all the models and tools they’ve created.
Another light week of side project work, but I did manage to finish the Blender x Three.js tutorials and there by also the last bit of Three.js Journey. It’s been a great course, and the best money I’ve ever spent on e-learning. Bruno is a great teacher, and he’s even added lessons since I bought the course, so maybe I’ll even get more for my money’s worth.
Blender x Three.js
Since a couple of weeks ago I finished the UV unwrapping of the model, so the last part of the lesson was how to do model optimizations and exporting everything correctly from Blender.
Quite a lot of things to keep track of - somehow both easier and more difficult than I had imagined. Final result looks great though, and I’m excited and terrified to get started on my portfolio and working on the LEGO elements.
Reading
Hacking around with the ScotRail audio announcements — Simon Willison found all the sound files of a Scottish train operator and wrote this great blog post on how he scraped it and built fun prototypes with it.
Physically Based is a database of physically based values for CG artists.
This week was going to be a slow side project week due to lots of work and sunny evenings in Copenhagen. But then I saw a Github notification about a project I worked on a while back…
Morphaweb
One of my hobbies is modular synthesis. One of the modules I use is Morphagene by Make Noise. By their own description:
The Morphagene music synthesizer module is a next generation tape and microsound music module that uses Reels, Splices and Genes to create new sounds from those that already exist. Search between the notes to find the unfound sounds.
It’s one of my favorite modules, but it can be a bit tricky to get sounds outside the modular system onto the Morphagene. It has a slot for an SD-card where you can place reels. The reels have to be a particular format though, and if you want to have configured your splices up front, you have to download paid software.
To work around this, I have developed a free, open source, web-based app called Morphaweb. It allows you to build reels with splice markers and export them in the correct format. All without uploading anything to a server, to protect your privacy.
To use it, you simply import audio by dragging it into the app, then use the waveform editor and shortcuts to add/remove splice markers. When you’re done, you can download your reel as a Morphagene-compatible wave-file.