I do: AI alignment

Wednesday, May 20, 2026

AI alignment - I of V

AI is constantly in the news. More and more of the world is being turned over to various mathematical and computational models. Though they range widely in complexity, they are steadily replacing both human judgment and explicitly programmed software of the more traditional variety. Some do really cool things like discovering new molecules for medicine and some do really dark things like that AI meal planner app proposing a crowd-pleasing recipe featuring chlorine gas.

AI doesn’t mean one thing. There are chatbots, whose function is to output plausible looking text. You have image generators, whose function is to create images based on text input. Similarly for video generators. There are also systems designed to play games like chess or Go. There are systems designed to map from sequences of amino acids to predicted structures of the folded protein. There are systems that are designed to determine what goes into algorithmic feeds.

When you open Google Maps, call Alexa or book an Uber you are dealing with a form of AI. The content on your social feeds or the ads that you that are targeted at you using AI. When you try to get a loan from a bank, you are screened by AI. What price you pay for your home, or your car insurance, are decided by AI. When you are interviewing for a job, your face and responses may be analysed by AI.

What all of these things do have in common is that they are the result of doing statistical processing over large data sets. But the input data that's used to create the systems are different. The kind of statistical patterns that are being mapped are different. Just saying "AI" gives the impression that there's one thing out there and it knows "about the shape of folded proteins", and also about "how to play chess", and it knows the answer to whatever question you might put into the chatbot. That makes it seem like it's one super intelligent entity when it's actually a bunch of separate software programs designed by different people, trained on different data for different purposes.

There was a time when most artificial intelligence was programmed by computer scientists. And then scientists figured out how to get AI to learn how to do what we instructed it to do but we still would provide them with the instructions that define the goal of the AI model. In other words, they got a digital computer to improve of its own accord. By developing machines that could learn by human instruction or their own experience, they removed the need for programming.

This gave rise to a new issue, the alignment problem viz. whether the AI is reaching its intended goal or giving some unintended result. In the last five years or so, these fears have started coming to life. We are living in a world full of examples of this - image recognition software that captioned a selfie of two black Americans as "gorillas", or self-driving cars that fail to identify jaywalking pedestrians and end up causing fatal collisions. Broadly, we can think of a machine learning system as having two halves. Each of these halves offers an opportunity for things to become misaligned:

There is the training data, the set of examples from which the system learns. The AI is then at the mercy of the examples from which it is taught. If a certain type of data is underrepresented or absent from the training data but present in the real world, then things will go wrong.
The objective function, which is how we are going to mathematically define success in each of those examples. It basically tells them what we want it to do.

Take the 2018 crash of the Uber car that killed a pedestrian in Arizona. The system was built on an object classification system that had a very rigid set of categories that included pedestrian, cyclist, debris, etc. and had thousands of examples of each of those things. The system did not have any training data of jaywalkers so it was unprepared to encounter someone crossing a road not at a crosswalk. But this particular woman was walking a bicycle across the street, which was something that the system had never seen causing a fatal crash. The model is only as good as what data was put into it.

I do

Wednesday, May 20, 2026

AI alignment - I of V

No comments:

Post a Comment

About Me

Blog Archive