Thinking and Problem-Solving, from the Ground Up

20 min readApr 18, 2020

Our ability to surface problems, think them through, consider possible actions and come up with a solution is a defining characteristic of our species. It is also one of our greatest strengths.

In this fictional case study I describe the steps to build a thinking, problem-solving A.I. from the ground up. It’s written as the diary of an inventor who is developing a self-driving car. Each day the author adds one more concept to the overall picture. When you reach the last day, you will see how the car has gradually developed the ability to think through problems. Writing it in this style explains why each part is the way it is, and how it connects to the whole. It also clarifies why some of these approaches differ from standard neural networks.

Diary of One AI developer

Day 1: Actions

I started with the basics, by picking which actions the car can do. I’ve covered essential actions — moving forward, reversing, speeding up, slowing down, braking, signalling left and right, and honking.

I’m testing the car in a driving simulator, and it’s my hope that it mimics real driving conditions. My car is currently in the simulator, cycling randomly through all actions it can do.

Day 2: Avoiding Bad Consequences

Since this car will end up sharing the road with others, there are a lot of dangerous behaviours I’d like it to avoid. The risks involved in putting an experimental car on the road mean that breaking the law, hurting anyone, hurting itself, or damaging property are all non-starters. It needs to learn not to do these.

While it’s in the simulator, I can safely monitor if it breaks a law or causes damage. When it does, I’m going to send it a negative ‘pain’ signal to discourage those actions.

Some situations which trigger a pain signal

I’ve added pain signals for when it drives on the sidewalk, goes through a red light, doesn’t stop at a stop sign, hits a pedestrian or a vehicle, speeds, etc. I had to add these by hand. It’s a lot of work, and I hope I won’t have to do this all the time...

Fortunately, if I see the car do something wrong later, I can add a new pain signal without retraining the car from scratch. A new signal is the same as an old signal it hasn’t seen yet. If the new signals are small enough changes, the car can slightly adjust its existing actions to fit the new requirements.

Day 3: Motivation

How do I encourage the car to drive to its destination? I considered adding another type of signal like ‘pleasure’, and sending that when the car gets to its destination. But I’d rather keep it simple.

Instead, if it hasn’t made progress to its destination in the last, say, 20 seconds, I’m going to send it a pain signal. That’ll encourage it to move, and it will learn that staying still isn’t an option. It’s ultimately going to have to navigate all these signals and learn the action patterns that avoid them all.

Since pain signals tell the car what not to do rather than what to do, they leave open a range of possible ways the car can get to its destination.

Day 4: Learning

So far I’ve added pain signals and actions, but it’s still not getting any better at driving. It’s just cycling through random actions. It’s not reaching its goal or learning how to avoid pedestrians. It has to start selecting the right actions, as well as the right order to do them in. It has to learn.

A pain signal is an indication that something it did was somehow wrong. For example, I start sending one if it doesn’t move towards its destination. If it then does an action that gets slightly closer to the destination, I remove it.

The car should learn to repeat whatever actions helped it avoid the pain signal. So if accelerating got it closer to the destination and removed the pain signal, it should learn to accelerate more often, on the assumption that that was what removed it.

The simplest way to do this is to record and repeat whatever action sequence it did right before a pain went away. When it starts driving on the right-hand sidewalk, it gets a pain signal. If it happens to swerve left back onto the road and the pain signal goes away, it should record “swerve left”.

Day 5: Senses

The car is now swerving left all the time, regardless of whether it’s on the left-hand sidewalk, in an intersection, or in a park. It doesn’t seem to know the difference between these situations.

That’s because it’s blind. It has no idea where it is, or what’s going on around it.

To tell the difference between a road and an intersection, it needs some way to “see” what’s going on. It needs senses, like eyes and ears, a position sensor, or however else it can know where it is and what’s going on.

I’m going to start by adding sight (a camera), hearing (a microphone), and geolocation (GPS). I can add new senses later. As with pain signals, I’ll make sure I don’t have to retrain it every time I add a new sensor or remove an old one. (How this works is too complex to explain in this article. See this article for an in-depth explanation.)

This is everything I’ve added to the car so far:

Day 6: Using Senses to Learn

Now it needs to learn the right actions for a given situation. If it’s on the right hand sidewalk, it should learn to swerve left; but if it’s on the left hand sidewalk it should learn to swerve right.

On second thought, it seems like a bad idea to wait until the car is on the sidewalk before turning its wheels. The smarter thing to do is to avoid the danger preemptively. It should start turning when it sees that it’s veering towards the sidewalk. Veering towards the sidewalk is a signal that something bad is about to happen, and now is the time to act. When it sees that it’s veering, the car should preemptively turn away from the sidewalk, or do whatever action removes the pain signal.

The assumption is that whatever removes a pain signal would also prevent it from happening in the future.

In another scenario, when it sees a yellow light it has learned to slow down to avoid running a red light. But if it sees a yellow light and it is too close to the intersection to stop, slowing down only makes it stop inside the intersection when the light is red. It then gets out of the intersection by speeding up. So in this case, when it sees the yellow light and is close to the intersection I want it to speed up (not going too fast, of course) to make sure it goes through.

I’m starting to notice a general pattern in how the car learns:

The car sees or hears a warning sign just before it experiences a pain signal.
Once the pain signal is present, it does some action that removes it.
It learns to connect the warning sign from (1) to the action from (2).
Next time it sees or hears the warning sign from (1) it immediately does the action from (2).

This way it doesn’t get into a position where it is harmed.

I’ve updated the diagram of all the parts of the car so far:

Day 7: Internal Motivators

For the last few days I’ve been manually adding pain signals to situations that the car should avoid. But there are too many situations, and I can’t add them all manually.

For example, when it’s turning away from a sidewalk, if it swerves too violently, it ends up on the other side of the street. This happens so fast it can’t correct itself mid-turn. To avoid getting in such a dangerous situation I want it to be “scared” of swerving violently so that it avoids doing it. But I can’t just trigger a pain signal any time it makes a wide turn, since there are many cases where a wide turn is reasonable, like making a U turn, or turning left.

How do I separate the right situations from the wrong ones? An unsafe swerve could happen in many conditions: the road may be icy, or perhaps it was tailgating when the car in front of it stopped. There are too many different cases for me to add them all by hand. I have to find a way for it to learn on its own to avoid these situations.

I’m going to add a way for the car to trigger its own pain signals when it sees or hears something that is likely to be followed by pain. A violent swerve is a bad idea because the car ends up on the opposite sidewalk without being able to prevent it, and that will trigger a pain signal. I’m calling these internal pain signals “tensions”. As with pain, the car will learn to avoid these situations. But unlike pain, which is an external motivator, they are its internal motivations.

I’ve updated the diagram again:

Day 8: Failure Signals

Something seems to be out of balance with the tensions.

I made the decision yesterday that the car should create new tensions when it sees something that “predicts” pain. But this rule is too vague, because any time the car learns something new, there’s a pain signal involved in the process. Now it’s over-reacting. It’s become excessively afraid of everything, even in cases where it manages to solve the problem.

At one point it made a mistake while turning left and overshot the road. It had a moment of pain, but then quickly adjusted and learned the correct way to turn. Despite solving the problem, it’s now afraid of turning left and won’t do it.

There has to be a way to separate situations in which it can’t solve the problem, and therefore should avoid, and those in which it can solve the problem. But how would it know that it can’t solve the problem? If removing a pain means it solved the problem, then the car would only be afraid of situations where the pain lasted forever. That’s unrealistic. Even when it takes a long time to find a solution, if it ultimately finds one, that should count as a success.

Being frustrated is a good clue that it’s failing. For example, if it’s stuck in traffic and not making progress, I send it a pain signal. Seeing the line of cars in front of it is now a situation that needs to be addressed. It would look at the cars and try to find a way around them, all while feeling stressed for not making progress.

Regardless of what it does it is unlikely to get around the cars. Within a few seconds, it would find itself still staring at the pileup of cars in front of it.

This was the original situation it was trying to address. To you or I, this would be frustrating. It knows it’s failed when it encounters the same situation again while it was still dealing with it the first time. This pileup of cars is now something to be avoided, since it can’t be solved. That should become a new tension.

Day 9: Tensions in Practice

It’s been a day since the car started creating its own tensions, and learning what situations it should avoid. Getting too close to a cliff, tailgating, going fast when there are kids playing nearby: these are all tensions, since they lead to failure.

Although it did learn to avoid traffic jams when it saw them, unfortunately it still kept driving towards roads that are frequently backed up with traffic. Then, once it then sees the traffic jam, it does a costly detour. This makes it habitually late to get to its destination. It should learn to avoid the busy road itself.

Tensions should create other tensions, just like pain signals do. It makes sense for the car to be scared of getting into a scary situation, and to avoid those doin so.

As another example, right now it’s afraid of swerving left in a narrow road. It should also be afraid of going too fast in a narrow road, since that could lead it to a situation where it would swerve dangerously if, say, an obstacle jumps into its way.

Day 10: Generalizing From Specific Examples

The car has been training for a while and it’s learned how to act in many specific cases. But it rarely encounters the exact same scenario twice, so what it learns doesn’t get applied much. It should generalize what it knows, so it can apply its knowledge to new and varied scenarios.

It can do this by finding what is common between the specific cases and responding to it. For example, it learned to slow down and stop when there’s a pedestrian in front of it. It learned this from situations like these:

What’s common across all these cases is the person standing in front of it. I’d like it to realize this and always slow down when it sees a pedestrian, regardless of what else is around. The pedestrian is the signal, everything else is noise.

When it removes all inputs that don’t appear across all cases, i.e. the other cars, trees, sky, signs, etc this is what the overlap looks like:

Only the road and the shape of a person in front of it are common to all cases. Things that are common in all cases may be related to the action of slowing down.

But before it jumps to conclusions, it should test its theory. The next time it has an experience that matches those common features, followed by a tension, it should try slowing down. If that solves the problem and removes the tension, then it’s valid. It can then forget all the original, specific cases and only store the general case.

Day 11: Priority

By now my car has learned a lot of actions to do in various situations, as well as created its own internal motivations. They’re starting to overlap a bit and the car is experiencing decision-paralysis. Yesterday it saw a red light in front of it, as well as a green light in the next intersection. On seeing both at the same time, it didn’t know how to act.

I’m gratified that when it sees a distant red light and prepares to stop. I just want it to prioritize more immediate situation and their related actions. The light further away fits most of the criteria it has learned to respond to, but the traffic light that’s closer is a better fit to its previous situations, because it’s closer.

So for now it should prioritize whichever response most closely matches its current situation.

Day 12: Exceptions

Sometimes one of its actions needs an adjustment in exceptional situations. For example:

When it arrives at a left turn lane where the light is green, it’s learned to turn left.
But in intersections where there is an advance turn signal, even if the regular light is green, it should stop, wait for the advance turn signal, then turn.

The problem is that all the criteria of the first case are also satisfied in the second case. A green light is still a green light, even where there’s an advance turn light. Since they both match, it doesn’t know which one to prioritize.

Even though both the responses are a good match, the one that has more specific information should take priority. So seeing the advance turn light makes this case an “exception”.

Day 13: Goals

Yesterday, I noticed an unexpected interaction between two different parts of the car’s motivations. Some situations that would usually trigger a tension, given additional information, are no longer triggering a tension. Its ability to make exceptions based on additional information is also affecting how its tensions get activated.

For example, at some point in the past it had learned how to change lanes before making a left turn. It had only learned how to do this safely if the left lane was clear.

When there was traffic, however, it tried to merge left without making sure the lane was clear. This almost always resulted in a crash. Ultimately it became afraid of changing lanes when there were cars, since it always failed.

But it still had to change lanes to get to where it needed to be, so it was compelled to keep trying. In some cases, it saw the left lane was clear before it started moving. In those cases it merged successfully. This additional check now helps it merge reliably and safely, and it became the “exception”. This time the exception let it distinguish between failure, i.e. a tension, and a successful merge.

Now, even though it merged into a lane with cars, which used to scare it, it no longer experiences fear if it does this extra check.

Such distinctions give the car a goal to aim towards. Up till now it only knew how to run away from fearful situations. Now it has a way to address the situation without running away. It gained a skill.

Day 14: Paying Attention

This morning, it was in the middle of a left turn. Then it happened to see a speed limit sign in the opposite lane and started checking its speed. This was a distraction from the task at hand.

Despite everything I tried in the last few days, there are still a lot of distractions for it to sort through. Even if it only reacts to the most prominent things around it, that leaves it at the mercy of the ever-changing stimuli in the world.

If it starts to carry out an action, it should focus on the stimuli that are directly related to that action until it finishes. Otherwise it frequently fails to finish.

When it originally learned to turn left, it learned to expect certain things to happen at each stage of the turn. Only when it saw these did it move to the next stage:

Once it saw a green light, it moved forward
Once it saw its front hood aligned with the lane, it stopped
Once it saw a clearing in the oncoming traffic, it turned its wheels and sped up
Once it saw it was almost lined up with the lane, it straightened its wheels

The order of the stages matters. Before it moves to the next it expects to see certain things that let it know the time is right. These are what it should focus on, and make sure it’s doing each part properly.

My current plan is to boost the inputs it expects to see at each stage, so it only focuses on those and doesn’t get distracted. When it’s doing a left turn, I’m going to boost the curb, the lane dividers, the oncoming traffic, so that they get priority. This artificial boost will make sure it doesn’t get distracted, except by the most shocking sights and sounds, such as an ambulance siren.

Day 15: Observing

In many situations, I’ve found that the car doesn’t even have to do anything when it encounters a problem. If it just sits and watches, the problem goes away of its own accord.

When it hears an ambulance, it normally has to move right. If the ambulance is in a separated lane, it doesn’t need to do anything different. It only has to observe, and make sure the ambulance passes it completely, as expected. In rare cases where the ambulance doesn’t pass as expected, it may have to take action.

Where no action is needed to solve a problem, I still want to boost the expected sights and sounds, so it keeps an eye on them without getting distracted. The only difference between this and yesterday’s case is that the car’s response to these situations requires no actions.

The boosted signal seems to be working. It’s making sure the car stays on task by focusing its attention on what it’s doing.

Day 16: Thinking

Something strange has happened.

I appear to have boosted the expected sights and sounds too much. The car is now “seeing” and reacting to what it expects to see or hear, even without them being present. In other words, it’s ‘hallucinating’ everything it expects to happen, regardless of whether they actually happen. It sees what it expects to see.

For example, it learned that when it hears an ambulance siren approaching from ahead, it should expect to see the ambulance pass it on its left.

But now, when it hears a siren ahead, even if the other lane isn’t visible due to a tall divider, it ‘hallucinates’ an ambulance passing on its left.

I’m going to keep an eye on this, to see what effects these hallucinations have.

Day 17: Predictions

These hallucinations are having some unexpected benefits. In many cases the car is reacting to an event that it expects to happen before it actually happens. It is anticipating what will happen, and acting in response to that.

The expectations are similar to memories. The car is “re-creating” or reimagining events it has previously seen, as though they were happening now.

Here’s a typical case. When another car’s left signal was blinking, it predicted that the car would turn left. It had previously learned to expect this kind of behaviour. Until now it would wait for the other car to actually turn before reacting. Now my car hallucinates that the other car is turning left, and so it clears a space to the left of the other one, in order to avoid an accident.

Day 18: Observations as Predictions

I had a small issue yesterday. The car was waiting at a red light. It usually waits for the light to turn green, then moves ahead. This time, however, it hallucinated what it expected to see. Even though the light was still red, the car imagined the light turning green, then started driving through the intersection.

It’s too dangerous for the car to hallucinate its predictions in cases where it will actually take action. I’ve decided to only leave this feature on in situations where it doesn’t take any action, as in the case of waiting for the ambulance to pass. It should only remember and imagine things that it observed, that is, what happened naturally without its interference. It should then expect those things to happen again.

It can still act on this expectation. If the hallucinations in one case trigger a new or amended response to its current situation, one which does involve action, that’s ok.

Day 19: Knowledge from Memories

Every time the car forms a new prediction, it’s specific to the situation it learned it in. These predictions are like memories, specific and detailed. But just as with actions, it has started to generalize its memories by finding common features and testing them. This turns a specific memory into knowledge, which is general.

For example, on one occasion it saw a 16-wheel truck do a giant turn across 2 lanes. It remembered that event as a specific memory, with all the details of where and when it happened. This was an exceptional case, with no generalizability.

A few more instances of seeing a 16-wheel truck make a wide turn, it was able to isolate and expect any large truck to make a similarly wide turn. It found the common pieces across all these experiences, and validated it with one more observation.

Its memory has now generalized, to become knowledge.

Day 20: Acting on its Predictions

Piece by piece, the car is using these “re-creations” to build models of how the world works. Each new expectation becomes part of a larger model it uses to understand the world.

This model of the world is starting to take on a life of its own, as predictions interact with other predictions.

For example, this morning it saw another car signal left, and predicted that it would turn left. At the same time a child was running towards the left of the car. In its predictions it saw that they would both soon be in the same place. Normally, if a car is headed for a pedestrian, it honks its horn to warn the car. In this case it honked its horn to warn the turning car even before it had started to turn. As one expectation leads to another, predictions are causing chain reactions of other predictions.

Chains of imagined expectations grow longer, as one prediction leads to another. It’s creating a world of interconnected expectations. When one memory or expectation changes it has ripple effects across the others.

Day 21: Perception

I’ve been monitoring the car’s thoughts as it drives around. When it looks around it it not only sees what is there, it also layers on what it expects to see based on what it’s observed in the past.

All the other pieces are functioning just as they had before. The internal tensions get triggered by its thoughts as much as by sights. Yesterday, it had accidentally moved into the middle of an intersection, before space had opened up on the opposite side. It was now stuck in the intersection, waiting for space to open up.

While in the intersection, it heard the siren of an ambulance getting louder. It expected the ambulance to come close, given its past experience with sirens. It connected that with the fact that it was blocking the intersection. This caused it to trigger a tension, even before the ambulance had arrived.

Some of its expectations would cause the car stress if they were really happening. When this happens its own thoughts cause a tension to appear.

Day 22: Problem Solving

Today I saw the culmination of all the abilities. The car solved a problem in its own thoughts. Until now, the only way it could figure out what solves its problems in a given situation was to try multiple actions until one of them removed the pain or tension. Now it can experience the consequences of different possibilities. Some of them cause internal tensions, while others can override those tensions.

Yesterday, when it found itself stuck in an intersection with an oncoming ambulance, it began to go through possible car movements for which it knew what to expect.

It imagined what would happen if it moved forward. It realized that would cause an accident, which is a tension.
It imagined what would happen if it moved backwards. It realized that would also cause an accident behind him, which is a tension.
It imagined moving back and to the left. This both avoided accidents and made space for the ambulance to get through. All tensions and pains were avoided.

Once the car thought of the third option, and the imagined tension of the oncoming ambulance was removed, it learned this new thought, this new desired outcome. Hearing the siren caused it to automatically expect (i.e. imagine) itself moving back and to the left. It now had a plan for what to do going forward, if it ever found itself in this situation again.

Most importantly, it didn’t have to crash itself into the cars in front and behind before it learned that lesson. By thinking through the consequences, the car saved itself from having an accident in the moment. This is a giant leap forward in safety, efficiency, and speed.

Are you also interested in applying Artificial Intelligence to human creativity, human understanding, even human values? Do you feel that our current goals with A.I. are limited? I’m looking to connect with others who have a similarly ambitious vision of the future of A.I., whose goal is to tap the full creative potential of human intelligence through software.