I do: AI alignment

A group of researchers were building a model to better understand pneumonia. A hospital has to make one critical decision quickly - whether to treat the person as an inpatient or an outpatient. Pneumonia was at the time the sixth leading cause of death in the United States. So, correctly identifying which patients were at the greatest risk would result in a lot of lives being saved. The group had been given a dataset of about fifteen thousand pneumonia patients

One night, as a researcher was training the model, he noticed that it had learned a rule that seemed very strange. The rule was “If the patient has a history of asthma, then they are low-risk and you should treat them as an outpatient.” He didn’t know what to make of it because you don’t have to be a doctor to know that asthma is dangerous for a pneumonia patient. The doctors he consulted said, "We consider asthma such a serious risk factor for pneumonia patients that we not only put them right in the hospital . . . we probably put them right in the ICU and critical care."

What was going on? The correlation that the system had learned was real. Asthmatics really were, on average, less likely to die from pneumonia than the general population. But the model had blindly noticed the correlation but didn’t know the reason - the positive correlation was precisely because of the elevated level of care they received. A researcher remarked, “So the very care that the asthmatics are receiving that is making them low-risk is what the model would deny from those patients." A model that was recommending outpatient status for asthmatics wasn’t just wrong; it was life-threateningly dangerous.

The researcher built another, more complicated model which seemed to work well but it too started giving strange results. It started saying that chest pain, heart disease and being over 100 is good for the patients when it obvious that they were not good for them. None of them made any more medical sense than asthma; the correlations were just as real, but again it was precisely the fact that these patients were prioritized for more intensive care that made them as likely to survive as the data showed.

A department of the US government had sent data scientists to Afghanistan to analyze data - financial records, movement records, cell phone logs, and more - to try to find patterns that would be useful to the war fighters. And they were already beginning to see that these machine-learning techniques were learning interesting patterns, but the users often didn’t get an explanation for why these patterns indicate something suspicious.

Analysts had to put their names on the recommendation that goes forward. And they get scored based on whether that recommendation is correct. But they didn’t understand the rationale for the recommendation they were getting from the learning algorithm. Should they sign their name to it, or not? And on what basis, exactly, should they decide? As computing technology progresses, defense personnel have begun thinking about what risks and questions surround the idea of ever more autonomous weapons.

As increasingly complex AI models keep getting deployed throughout the decision-making world, people have started recognizing how little they know about what’s actually going on inside those models. Whether it was getting rejected for a loan, being turned down for a credit card, being detained pending trial or denied parole, if a machine-learning system was behind it, you cannot be absolutely sure of how it arrived at the decision.

In The Alignment Problem, Brian Christian gives the example of a Princeton cognitive scientist whose little daughter liked cleaning things. Once there were some chips on the floor, and she cleaned them up. He said to her, ‘Wow! Great job! Good cleaning! Well done!’ He thought that with the right praise, he would get some help in keeping the house clean. But it was not so simple. His daughter found the loophole in seconds. “She looked up at us and smiled,” he says, “and then dumped the chips out of the pan, back onto the floor, and cleaned them up again to try and get more praise.” This was a metaphor for how AI systems might do the wrong things with great speed and efficiency.

The problem with machine learning systems was pointed out in 1960 by Norbert Wiener, a legendary professor at MIT and one of the leading mathematicians of the mid-twentieth century. In a paper, “Some Moral and Technical Consequences of Automation", here’s how he states the main point:

If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively . . . we had better be quite sure that the purpose put into the machine is the purpose which we really desire.

He further said, "It is my thesis that machines can and do transcend some of the limitations of their designers, and that in doing so they may be both effective and dangerous... Man and machine operate on two distinct time scales; the machine is much faster than man and the two do not gear together without serious difficulties." The computer does precisely what we tell it to do, just not what we thought we had told it to do. Much of software engineering is simply figuring out how to close the gap between those two things

I do

Tuesday, May 26, 2026

AI alignment - II of V

No comments:

Post a Comment

About Me

Blog Archive