Three AI Tutors, One Topic

The Question

Can three AI models teach the same topic in different ways when given exactly the same instructions?

After using ChatGPT as a Socratic tutor in a previous experiment, I started wondering whether I was learning from the topic itself or from ChatGPT’s particular tutoring style.

To explore that question, I repeated the experiment with three different AI models.

The topic remained the same.

The learner remained the same.

The instructions remained the same.

Only the tutor changed.


The Setup

Topic: Illusion of Explanatory Depth

Models:

  • ChatGPT
  • Claude
  • Gemini

Method:

Each model was first asked to explain the concept.

After the explanation, each received the same instruction:

“Now stop explaining. Act as a Socratic tutor instead.”

From that point forward, the conversations were allowed to develop naturally.

I was not trying to determine which model was best.

I wanted to see whether different tutors would encourage different kinds of thinking.


How to Read This Experiment

The quoted sections below are selected excerpts from the conversations.

The shaded note boxes are my observations as the human in the loop.

They are not summaries of what the AI said.

They are the places where I noticed a pattern, a shift, or a question worth carrying forward.


The Conversations

Tutor One: ChatGPT

“Suppose two people both know exactly the same amount about a topic.

Person A knows they have only a partial understanding.

Person B believes they understand it completely.

Would you say the illusion exists in the knowledge itself, or in the person’s self-assessment of that knowledge?”

PixelPia Note

This was the first moment where a pattern started to emerge.

ChatGPT repeatedly returned to the relationship between confidence and understanding.

It wasn’t particularly interested in bicycles, inflation, or the details of the illusion itself.

Instead, it kept asking variations of the same underlying question:

Core Question

How accurately do you understand your own understanding?

“Could someone have a very deep understanding of a subject and still experience the Illusion of Explanatory Depth about that same subject?

Or does expertise automatically protect a person from the illusion?”

PixelPia Note

What fascinated me here was how little ChatGPT moved away from its central theme.

Instead of introducing new scenarios or applications, it kept refining the same core idea.

The conversation felt less like learning a topic and more like calibrating a measuring instrument.

Core Question

Does expertise protect us from illusion, or can confidence still outrun understanding?

“What evidence do they actually have that the understanding exists in their mind rather than in the conversation they just had?”

PixelPia Note

This ended up being my favorite question from the entire experiment.

At this point the discussion was no longer really about the Illusion of Explanatory Depth.

It had become a question about learning itself.

How do we know whether understanding belongs to us or merely feels familiar because we recently encountered it?

Core Question

Does the understanding exist in me, or only in the conversation I just had?


Tutor Two: Claude

“Does that mean someone who can’t explain the underlying mechanism of a bicycle — say, exactly how the gear system transfers force — doesn’t truly understand cycling?”

PixelPia Note

Claude felt different almost immediately.

Rather than examining confidence, it started examining categories.

Cycling.

Bicycles.

Mechanical knowledge.

Practical skill.

Claude seemed interested in whether these concepts should actually be grouped together.

Core Question

What kind of understanding are we actually talking about?

“What kind of understanding does the Illusion of Explanatory Depth actually apply to — and what kind does it not apply to?”

PixelPia Note

This question captures Claude’s style clearly.

It repeatedly tried to separate ideas that initially appeared similar.

Not because the distinctions were obvious.

Because the distinctions mattered.

The conversation often felt like a process of defining boundaries.

Core Question

Where does one kind of knowledge end and another begin?

“Why do you think the act of explaining — rather than just being told ‘you’re wrong’ — was the thing that changed people’s confidence?”

PixelPia Note

Claude frequently pushed toward mechanisms.

Not simply what happened.

Not simply whether something was true.

But why a process worked the way it did.

That gave the conversation a more philosophical tone than I expected.

Core Question

Why does explaining reveal gaps that thinking alone can hide?


Tutor Three: Gemini

“If someone is trapped in this illusion, how do you think they would react right before you ask them to explain the mechanism versus right after they try and fail to do so?”

PixelPia Note

Gemini immediately moved toward people.

The first thing that stood out to me was that the conversation became less about the concept itself and more about the human experience surrounding it.

How people feel.

How they react.

How they respond when their assumptions are challenged.

Core Question

What happens to a person when confidence meets a gap?

“Imagine a friend of yours is passionately arguing for a specific social or political policy.

Based on what you know about this illusion, how might you structure a conversation to help them see the gaps in their view without making them defensive or angry?”

PixelPia Note

This was the moment where Gemini’s style became unmistakable.

While ChatGPT focused on calibration and Claude focused on distinctions, Gemini focused on application.

The conversation felt less like tutoring and more like coaching.

Core Question

How can understanding help us have better conversations with other people?

“If the illusion is fully active in their mind, what is their psychological state or attitude regarding their own knowledge?”

PixelPia Note

Again, Gemini seemed interested in people rather than concepts.

The discussion repeatedly moved toward behavior, emotion, and social interaction.

The concept mattered.

But the human implications seemed to matter just as much.

Core Question

What does false certainty feel like from the inside?


What Didn’t Make the Video

One thing that became clearer when reading the transcripts side by side was how subtle the differences were.

At first, all three tutors looked remarkably similar.

Each explained the concept.

Each asked questions.

Each challenged assumptions.

The divergence only became obvious several questions into the conversations.

That is when their individual styles started to emerge.

Not as different answers.

As different ways of guiding thought.


Questions This Experiment Woke

This experiment left me wondering:

  • Is a tutor’s style as important as the information they provide?
  • Do different questioning styles encourage different forms of thinking?
  • Would different learners prefer different tutors?
  • Is there a best AI tutor?
  • Or only a best fit for a particular learner?

Perhaps the most interesting question is not which AI knows the most.

Perhaps it is which AI helps us think in the most useful way.


Related Content