Programmed By Fellows with Compassionate Visions: Some Thoughts on Constitutional AI 

Stop me if this has happened to you. You type a simple prompt into some handy AI generator and what comes out is more toxic than a landfill at Chernobyl. I mean, not just a little “off” but like wildly, deeply, disturbingly off.  

And then you remind yourself, oh, yeah, AI is just sophisticated math that looks for patterns in the data it is exposed to, and if the “data it is exposed to” is, you know, “the internet”, then it’s not that surprising that sometimes it produces content that is toxic, harmful, biased, sexist, racist, homophobic etc., since that stuff exists on the internet.  

Which makes sense, even if it doesn’t make it okay, right? 

So how do you make that not happen? Well, currently the strategy is mostly, “have humans look at the outputs and freak out if something horrific is being delivered and then fix it”. Which is fine, expect for two things.  

First, the point of AI, or at least one of the points of AI, was efficiency, how it freed humans up to do other things with their time. And if you have to go back and look through everything it’s doing and check to make sure it’s not horrifying, then it's less efficient. I mean, you might as well just write the stuff yourself. 

And second, you can’t scale humans. Again, one of the values of AI is the sheer quantity of content it can output insanely quickly – a quantity that it’s not realistic to have humans check over with the digital equivalent of a fine toothed comb. And, it should be noted, a quantity that is only going to get larger as AI evolves. 

So what do you do? 

Many have been exploring something called constitutional, or principles-based, AI. And recent advancements by a company called Anthropic (founded by former OpenAI rocket scientists and funded with some serious Google VC money) has been getting attention with developments they’ve made in this area to their own Generative AI platform called “Claude”. 

So what’s constitutional AI? 

In much the same way that a government has a series of rules and laws that reflect what it believes and what it feels is proper – and codifies those rules and laws in a constitution – constitutional AI does the same thing for AI. A human creates a set of “rules” and “laws” that sort of sit on top of what the AI is doing, to act as a check on the content.  

Sort of like, you ask the AI a question, it generates an answer based on the patterns it finds in the data you’re exposing it to, and then constitutional AI checks it to make sure it’s not generating an answer that’s horrifying.  

Or said another way, that it is generating an answer that is aligned with the beliefs and principles you’ve established in the constitutional AI. 

And it does it crazy fast, and it does it crazy voluminously because, you know, it’s AI and that’s how AI rolls. 

Which is great, right? Right. Hooray for progress. 

Now, what’s interesting about all this – or among the things that are interesting – is how in a sense, constitutional AI is sort of a very AI way of solving this problem. AI basically says “these are the patterns I’m seeing in the data”, right? So if you feed it data that says that the earth is flat, its gonna tell you the earth is flat, right? Because that’s the pattern. 

And if the constitutional AI you have sitting on top of it is filled with criteria like “discard any responses that endorse a non-flat earth viewpoint” well, you’re still gonna wind up with flat earth answers. A feedback loop on top of a feedback loop, as it were. And that feels dangerous because on the one hand, it’s reinforcing the biases, on the other hand, I don’t know it’s reinforcing the biases unless I dig into what the “criteria” is, and on the other other hand, how the hell is all of this making things faster and more efficient for me? 

Now you may say that I’m being absurd. And yeah, I get that a lot. And it’s entirely possible in this case since I’m still learning about AI. But here’s why I’m being absurd.  

Because a lot of the language in the literature I’ve been reading in this area keeps referring to “common sense”. That when they’re creating these constitutional AIs, humans will be providing “common sense” criteria “because AI doesn’t evaluate, it just looks for patterns.”  

Which right, I get that. Except in my experience, common sense is usually not that common. 

Look at the “common” things Americans can’t come to a “common” agreement on right now – about race, sex, gender, history. So what is this “common sense” that the literature acts as if it’s so obvious to all of us that it will obviously be inserted into AI as some sort of obvious criteria?  

And you know what else common sense isn’t? It isn’t static. Read what was “common sense” about race, sex, gender, history - 50, 75, a hundred years ago. About intellectual capacity. About morality. Things that would be horrifying today. Well, to some of us.  

Which means that periodically humans will have to update the “common sense” of the constitutional AI. Who? When? How? Because we’re not just talking about software upgrades due to advances in technology. We’re talking criteria around real cultural issues that will affect – often invisibly – the content that we will increasingly be relying on to provide us information.  

Now to be clear, I am in no way saying that constitutional AI is a bad thing. It’s a very valid attempt to solve a very real problem that will only get very much worse the longer we ignore it. And I applaud everyone who’s working on it. 

I just want to make sure we’re actually solving it, not just turning it into another problem instead.