Week 109: My AI Team

26 Mar, 2023

Just like week 108, I started writing this post late-march, but got so busy that it wasn’t finished and published until May.

On March 21, Morten Just posted a tweet that set my brain on fire:

Morten Just tweeted about his Slack-based AI team

Apparently, he had created a Slack bot with different personas based on which slack channel he uses, e.g. #ai-copywriter for copy writing, #ai-web-developer, #ai-lawyer, etc.

I thought the idea of having an AI team was genius, and started recreating it myself. But I had never built a slack bot before.

Iteration 1: Zapier

I googled a bit and found that I could use Zapier to create the necessary glue between OpenAI’s GPT and Slack:

Zapier monitors my slack channels
If message appears in #ai-copywriter, then…
Prompt GPT
Post GPT response back into slack channel

It got me started quickly, but I found Zapier’s GPT integrations to be buggy, and it costs money to have the amount of Zaps that I need running. Their no-code flow was barely easier, and surely more limited, than writing my own client.

Iteration 2: SlackGPT v1

Zapier seemed unnecessary middleware, so I asked ChatGPT if it could write me a Node.JS-based Slack bot that basically does the four-step process outlined above. It generated fairly decent code that got me 90% there, and I only had to modify a few dependencies and functions to get it working.

This is a great next iteration, but has its limitations. Technically, every GPT prompt is a single prompt-response pair, meaning it won’t hold in memory any previous conversation. In my experience, GPT rarely gets anything right the first time, the power is in the conversation.

Iteration 3: SlackGPT v2

So I decided to change the bot to create new threads every time I submit a prompt. That way, it will be able to hold the thread in memory as a conversation, but there’s also a benefit for me in that I have a mental model for what I want it to remember. I.e. it’s memory is thread-based.

Limitations

Multiple channels instead of bots

I wanted each role to have it’s own user, like @Copywriter and @WebDeveloper, but it’s only possible to have a single bot per Slack App. So rather than having multiple bots, I have a single bot that monitors each channel, and based on which channel it detects a mention in, it switches persona per my prompt design:

These prompts are ok, but I know there is a lot more that can be done on this aspect of SlackGPT to make it work a lot better.

I also need to mention @SlackGPT every time I want to prompt. This should be fairly trivial to solve if I bothered reading the Slack API, but even when I’m in a thread with @SlackGPT, I currently need to mention it to trigger it. Within a thread, that shouldn’t be necessary.

Perspectives

This was a really fun experiment done in a couple of days, and a powerful prototype. There are so many features that I can see being within reach:

Higher abstraction level tasks

Currently, each member of the team solve tasks for me based on a prompt. But what if I could instead give the team higher level tasks, such as:

“I want to develop a new feature for our website where clients can book consultancy hours with me. It should appear as a small link on the website, and when clicked, go to my Calendly site”

Perhaps a #ai-product-manager role could take this task, break it into separate parts, and assign it to the various team members who will need to write copy, design visual appearance and develop the code necessary to launch the feature. Maybe the team members could even have seniority or rank, e.g. #ai-web-developer takes orders from #ai-product-manager or when in conflict, #ai-designer outranks #ai-web-developer. Suddenly introducing social dynamics into a team.

How I spend my time

I’m curious how this approach could enable me to spend less time coding and doing production work, and more time at higher level problems, which is how I’d like to spend my time nowadays. The AI team can work countless iterations, around the clock, so I would also be able to run many more iterations significantly faster.

As we work together more, I could pick the best examples of work the team has done and include it in their prompt design or even fine-tune a model so that my team is refined over time.

Being able to ask a bot to generate applications or components is fantastic, already making me significantly more efficient. But the idea of having a team that works around the clock and solving things under my direction is, at the time of writing, a superpower.

Cost and Scale

I’m currently running the first iteration of an AI-team of three roles for a few dollars per month. For example, the NodeJS server that runs the SlackGPT bot cost 490 OpenAI tokens to generate, which translates to around $0.01. And then of course 20-30 min of my time to iterate and implement the code, which hopefully will decrease as the systems become more sophisticated.

But you get the idea, right? As a generalist who knows just enough to get by in full-stack design and engineering, I can single-handedly build products at speed.

Role of Automation

The idea isn’t to replace humans, I dream of having a human partner to build products with, but I see an AI-team as a great supplement, and potentially the main resource for lower-level production tasks relating to code, design and text.

A source of inspiration for me has been the No Man’s Sky team. They built a procedurally generated game so vast that it would be impossible for them to do quality control on every planet. So they built space probes that allows them to at least do quality control on a significantly larger number of planets to fine tune the parameters.

Presentation and Curation

This system of thinking seems critical for this age of AI-enabled design and development. With smaller constraints on the amount of work that can be produced, my role will be to direct and curate the work, so that process needs to be effective and constructive too.

I’m curious how the team could present the work, whether it be code or design or text. Maybe there could be a Notion integration so that I simply click onto a “Week 109” page, where multiple solutions are laid out in front of me, and I can give the team feedback and consolidate the work?

What’s next

In summary, I think what’s next is to:

Fix limitations outlined above if possible
Write better prompts for each team member
Add ability for team to collaborate on tasks
Add a designer to the team, and ability to generate visual results
Add integrations with systems that allow the team to present work

The code for SlackGPT, along with instructions for how to deploy it on an online hosting service like Render, is available at https://github.com/knandersen/slackgpt.

Week 108: CADGPT

21 Mar, 2023

This post was originally written March 21, but wasn’t posted until late April…

It’s been quiet for a few weeks! Once you miss a weeknote it’s easier to miss the next one too, unfortunately. But I’ve been busy.

CADGPT

A few weeks ago I thought, sure 2D bitmaps through generative AI is interesting, but I’m even more curious about how generative AI could be a useful tool in physical product making.

I’ve worked with parametric CAD modeling for years, but the problem there is that you still have to know which steps are needed to get from intention to result. GPT-style generation holds the promise that you don’t have to know how, in order to get to where you want. And allows for effortless mixing of concepts and mental models.

I’m a Rhino-user, so I decided to try and built “CADGPT”, a chatbot for generating Rhino-files. See an example here:

The chat conversation — The CADGPT chat conversation

The source code can be found here: https://github.com/knandersen/cad-gpt

How I made it

I saw a few different ways to get to the result above:

Write a standalone Python/JavaScript GPT client that generates Rhino-files
Write a Rhino integration that allows for querying GPT

Approach 2 seemed difficult and like I would spend unnecessary time figuring out how to write Rhino plug-ins, so I went for Approach 1.

For Approach 1, I was inspired by Nat Friedman’s natbot, where he uses a library called Playwright to navigate a browser based on an elaborate GPT prompt design (demo here).

I found a library from the Rhino developers called rhino3dm which allows access to basic Rhino functionality. Then, I tried writing my own elaborate prompt design a’la natbot: https://github.com/knandersen/cad-gpt/blob/main/cadgpt.py

Reflections

It was fun to go from idea to prototype in just a couple of hours, but I also quickly realised I might be solving the right problem with the wrong solution. What’s our language for describing shape? How would you tell CADGPT to generate anything but principal shapes, say, a dog. How would you describe a dog’s shape without using the word dog? Maybe text or chat isn’t the right modality for this. I’ll stop for now, but I have more thoughts on this which I might get to at a later time.

Editing this post a month later, I realized I could have just asked GPT how to think about approaching this task, i.e. approach 1 or 2. When I began using GPT, I was mostly thinking low-level tasks. Now that I have become more experienced and absorbed more approaches from people I follow on Twitter, I find myself giving GPT increasingly higher level abstraction tasks.

Week 105: LAVIS Examples and Self-Employment Status

26 Feb, 2023

LAVIS

I mentioned last week that I was trying out LAVIS, but that the Ubuntu machine I was running it on died. I spend some time reinstalling and now have a bit to share.

It’s really cool how easy it is to play with image analysis, and like I also mentioned last week, feels like reverse diffusion. Here’s a photo I found on unsplash:

“Photo by FETHI BOUHAOUCHINE” — Photo by FETHI BOUHAOUCHINE

If you like this photo, please check out the photographer FETHI BOUHAOUCHINE. Running BLIP Caption on it, I get the caption: “boy smiling while holding leaf”

That’s cool, but what is even cooler is that you can use GradCam to give you a heatmap of the parts of the image that are “boy smiling while holding leaf”:

And you can even get gradient maps for the individual keywords.

“Gradmap of individual keywords” — Gradmap of individual keywords

I think I’ll leave LAVIS there for now, but I have a feeling I will return to it at some point.

Self-Employment Status

I’m not a solo-entrepeneurial type, and I have always worked in teams and mostly in large organizations, so this is a journey filled with a lot of new learnings. About myself, about living with uncertainty MOREMOREMORE

Most people have responded ranging from curiosity and support to envy. For a lot of people, I’m living the dream. I don’t quite see it that way, or maybe it’s just about defining what dream means.

As I have also experienced in my work, working on a dream-product or dream-project doesn’t mean it’s fun all the time. “If you love what you do, you’ll never work a day in your life”, seems complete bullshit to me. A better way of framing it, in my opinion, is the Fun Scale-system that I learned about when I read The Ultimate Hiker’s Gear Guide:

Fun Scale

Type 1 fun is fun to do and fun to talk about later.
Type 2 fun is not fun to do but fun to talk about later.
Type 3 fun is not fun to do and not fun to talk about later.

Type 2 and 3, over time, yield the most memorable and significant kinds of fun. But they wear you out, so you need some type 1 fun during your day or week to keep you going.

Being my own manager now, although I have total freedom, setting that balance for myself is challenging. In a lot of ways I feel way more busy than when I had a regular job. Without the pressures of deadlines or other constraints, I am ultimately faced with the vastness of being able to do whatever I want.

I’m finding it hard to prioritize my projects and time. There are so many things I want to do. Should I prioritize doing:

What I feel most like doing?
What might make me unique from other people out there?
What might be the most exciting career path?
What might be the safest career path?

One thing that’s been helping is listening to the Pathless Path podcast, which is about the art of not having a regular job. I love how it ranges from talking about the more emotional aspects of loneliness, social pressures, etc., to things like how to think about financials, budget, or marketing.

Particularly, “The Art of Sabbaticals” episode was really helpful. I don’t know a lot of people who have taken sabbaticals or broken out as freelancers, and even fewer within my domain, so it’s nice to hear from someone with more experience.

A brief status from my side at this point after 6 weeks:

What’s going well

Cooking more, fermenting food and drinks again
Sleeping a lot better
Absorbing a lot of research and literature
Meeting lots of interesting people
Exploring Copenhagen more

Not going so well

Learnings feel scattered and unfocused
Starting to feel lonely working by myself
Struggle sitting at home

What I’m doing about it

To deal with the things not going so well, a few things are in the works. I already use cafees and workspaces a lot, but I can mostly do writing work.

To be able to do physical prototyping work and to feel less lonely, I am applying for a 12-week startup incubator course which will start in April. I think it could be a great opportunity to meet other people facing similar challenges, and have a place to work from.

To modularize my learning, I am creating timeboxes for myself and planning projects to work on during the coming months. I am trying to fold learning into projects. A few topics I’m wanting to learn and the preliminary plan:

March: Calculus using 3blue1brown
April: Deep Learning / GPT using Andrej Karpathy’s videos
May?: Transformers (not sure when/where yet)
May?: Procedural mechanisms (not sure when/where yet)

For Calculus, I did actually take (and surprisingly pass) it at uni, but I never really learned and it seems to be foundational for learning ML.

Volunteering and local clubs

I have also volunteered to help start up a local Coding Pirates club where kids can learn to code. I’m really excited to play and learn with kids, and hopefully reconnect with some of the products I’ve helped design.

I also want to be part of things, however, where I don’t have to lead or drive things, but can just have fun. Together with a friend, I recently started going to CPH MUSIC MAKER SPACE which is a sound hacking community that meets every week to build music machines.

This week, I soldered up a couple of Breadboard Friends, a set of helper modules designed by the legendary Eurorack synthesizer manufacturer Mutable Instruments aka. Émilie Gillet.

“A couple of Breadboard Friends. Top breaks mini-jacks for CV/audio out to breadboard format, bottom breaks potentiometers out to breadboard.” — A couple of Breadboard Friends. Top breaks mini-jacks for CV/audio out to breadboard format, bottom breaks potentiometers out to breadboard.

I have a number of MI modules, but what I especially love by Émilie is how incredibly thoughtful the modules are designed and that all hardware and software is open source.

My current big picture plan is to use the Raspberry Pico as my MCU and MicroPython for use in my eurorack synth, and write a set of modules for doing audio/CV stuff. The Pico W will soon have BLE support, and I was thinking I could make a sequencer app on a phone and use it to talk to a Pico W in a eurorack. Would also love to play with AI x Eurorack. More on this later probably.

Blog updates

Fixed an issue that content was overflowing on mobile. Didn’t show up when using responsive mode in my desktop browser. Good reminder to always test on device.

Week 104: Portfolio Refresh and LAVIS

19 Feb, 2023

Portfolio Refresh

Updating my portfolio always seems exciting to begin with and then I get stuck in a technical detail or find myself unhappy with the creative direction. But this week I did it! It felt necessary, as I can see a lot of people visit my portfolio and it doesn’t include any of my professional work the past 8 years.

“Screenshot of new portfolio at portfolio.kevinandersen.dk” — Screenshot of new portfolio at portfolio.kevinandersen.dk

When interviewing I have been building private portfolios which is fun and allows a more targeted approach, but it’s also a big timesuck. These days I’m really trying to optimize my time for research and creative pursuits, so updating my public portfolio felt right.

It’s always difficult. How many projects do you include? how many details? which ones? I made a decision to include what I feel most represent the kind of work I’m excited to do in the future. And I wanted the entire portfolio to just be a single page, no navigating in and out of pages except for contact. Others might not like this (feedback is welcome), but it’s based on my experience hiring designers and looking through hundreds of portfolios the past couple of years.

I see portfolios as being about catching someone’s attention, and I find that the faster I can navigate through the information and find something appealing, the better. The portfolio isn’t end, but the beginning of a conversation. For all the projects I have on display, I have tons more stories and details.

I already got some good feedback from friends, and based on that will likely make bigger changes, but I feel like I have a good foundation to work from now.

LAVIS

A friend recently mentioned LAVIS, an open source deep learning library for Language-Vision Intelligence by Salesforce. It provides really easy means to use a bunch of language-vision models. I decided to take it for a spin and try out BLIP.

BLIP is basically “stable diffusion in reverse”, in that you input an image, and it returns a textual description of the image. I was going to show some of the experiments here, but unfortunately the ubuntu partition I was running everything corrupted, so that’ll have to wait for another time.

Week 103: Read-Write Why

12 Feb, 2023

This week has been less about creating and more about gathering.

Weekly reminder, please keep booking 30 min with me to talk about anything. I’m having a lot of fun learning and exploring by myself, but also starting to miss the social interactions and sense of purpose from working with a team. So if you’ve got an interesting project and could use a hand - please do reach out.

Writing The Why

As I mentioned last week, I have started writing a book. The working title is “Tools & Interactions”, based on the name of my former team at Bang & Olufsen.

I haven’t written anything substantial since my Master’s thesis, and who knows if I’ll ever finish or publish it, but so far I’m excited to work on it.

Currently, it’s a way of looking back and reflecting on my experience working in the fuzzy 0-to-1 end of product design. On trying to bring interaction design to companies where it didn’t seem to propery exist previously, while still defining it for myself.

It’s also about how after all this time, I’m still not sure what the role of interaction design is or if it’s even understood. How the current state of “UX/UI Design” seems to mean making high-fidelity flows in Figma. How Figma seems to have become the hammer that makes every problem look like a nail. And how this single-tool, single-mind mentality has reduced design to being a styling discipline rather than using design to explore new mental models or material qualities of technology. Interaction Design doesn’t really seem to be concerned with coming up with new embodied interfaces anymore, settling for multi-touch glass rectangles. I think we can and should do better.

I think designers lack tools that let them use utilize computation and machine intelligence to augment their thinking and creating, and my hope is that the book will be successful at arguing why we should build them, and provide examples of how we can think and make our way there.

I’m also trying to argue that in the tool-building, you form a deep understanding of technology and information as material, which in turn allows you to be more creative with it.

Steve Jobs did a great job of articulating the potential of the computer as a “bicycle for the mind”. In the video he references the chart below from the Scientific American. We are indeed not the only animal that makes tools, but we are the animal most capable of making tools that augment our capabilities by orders of magnitude.

“Chart from Scientific American, 1973” — Chart from Scientific American, 1973

Reading

Thinking about my writing as a book is proving to have good side-effects. Books should be properly researched, and when I find myself wanting to recount history or make a certain claim, this forces me to do more research.

Thinking about computers as tools brought me back to reading MINDSTORMS by Seymour Papert. I first read it when we were building LEGO Education SPIKE Prime and forming a relationship with the Lifelong Kindergarten group at the Media Lab. In designing the product, the programming hub user interface, bringing Scratch Blocks as the visual programming language to LEGO, and forming the overall technology concept, it was a huge inspiration.

Photo of LEGO Education SPIKE Prime by LEGO Education via https://education.lego.com/en-us/products/lego-education-spike-prime-set/45678

The book is 30 years old this year, but more relevant than ever. Papert talks about using the computer as an “object-to-think-with”, and hammers it home with this quote:

One might say the computer is being used to program the child. In my vision, the child programs the computer and, in doing so, both acquires a sense of mastery over a piece of the most modern and powerful technology and establishes an intimate contact with some of the deepest ideas from science, from mathematics, and from the art of intellectual model building.
Seymour Papert, MINDSTORMS

I’m also looking forward to reading some recently published articles on “Toolformer: Language Models Can Teach Themselves to Use Tools” and “Theory of Mind May Have Spontaneously Emerged in Large Language Models”.