Sitemap

Identifying Pokemon Cards & Geographical Locations with OpenAI Image APIs

Analyzing images with GPT-4o and GPT-5.

5 min readSep 26, 2025

--

Press enter or click to view image in full size
Photo by Erik Mclean on Unsplash

I wrote a Python script that uses OpenAI’s Image APIs to identify Pokemon, count objects and infer geographical location from photos.

It did a scary good job — especially with photo locations.

In this blog post, I will share my interesting findings with GPT-4o and GPT-5, as well as give you the code so that you can share your own findings!

Let’s get started.

Sending Images to OpenAI

Primarily, there are 2 ways to send your images to OpenAI when asking questions:

1. Pass an image URL
2. Pass a Base64 encoded image

In my opinion, the first option is significantly more scalable.

Most visual applications store photos in an object storage like S3 that exposes shareable URLs. That makes it very easy to send these public S3 image URLs to OpenAI, in addition to your prompt.

So your request to OpenAI will have 2 parts:
- Prompt — your question about the image
- Image URL — a publicly accessible URL to the image

After that, it’s as simple as writing these few lines of code:

response = client.responses.create(
model="gpt-5",
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "Identify all unique pokemon from their cards. For each Pokemon, list one attack move visible on the card.",
},
{
"type": "input_image",
"image_url": f"{CARD_URL}",
},
],
},
],
)

Here’s the code if you want to try it out yourself.

Who’s That Pokemon?

Being a huge Pokemon fan, I couldn’t stop myself from using OpenAI to play the famous childhood game:

Who’s that Pokemon?

How did OpenAI do?

100% accuracy. Crushed it!

I passed it images of Bulbasaur, Squirtle, Pikachu, Arcanine, Zapdos, Cyndaquil, and Mudkip. It correctly identified all of them. Not only that, it also offered accurate descriptions of each Pokemon.

However, the most challenging task was when I passed the model a still image of a bunch of cards on a table, some visible, others not so much, then asked it to name all Pokemon, with their moves extracted from the cards.

Press enter or click to view image in full size
Photo by Mick Haupt on Unsplash

At first I tried with GPT-4o, and got the following results:

(public_api_env) irtiza@irtiza-dev openai_tutorials % python3 analyze_images.py

Sure! Here are the unique Pokémon and one attack move from their cards:

1. **Whiscash** - Rising Lunge
2. **Braviary** - Sky Drop
3. **Clauncher** -
4. **Camerupt** - Eruption
5. **Bellsprout** - Vine Whip
6. **Mudbray** - Stomp
7. **Meowth** - Turmoil Strike
8. **Nidorina** -
9. **Beldum** - Tackle
10. **Durant** - Scrap Down
11. **Skorupi** - Wing Attack
12. **Electabuzz** - Thunder Punch
13. **Ursaring** - Hammer Arm
14. **Cubchoo** -
15. **Leavanny** - Cutting Arm

Some moves or Pokémon names might be obscured or not fully visible.

Although it gave me an answer within a few seconds, there were some interesting misses:

  • Missed some obvious cards, such as Mantine and Rufflet
  • Skorupi’s card had the attack move “Pin Missile”, but the model listed “Wing Attack”, which is incorrect. It pulled the information from it’s prior knowledge base, instead of the image.

Overall, I was quite unimpressed by the number of Pokemon it missed.

Next, I tried the same prompt with GPT-5.

Here are the Pokémon I can read in the image, with one visible attack from each card:

- Braviary — Sky Drop
- Whiscash — Amnesia
- Mudbray — Stomp
- Bellsprout — Vine Whip
- Vulpix — Confuse Ray
- Dugtrio — Earthquake
- Butterfree — Psy Bolt
- Skorupi — Pin Missile
- Durant — Mountain Munch
- Ursaring — Drag Off
- Meowth — Turmoil Strike
- Nidorina — Bite
- Beldum — Allure
- Electabuzz — Thunder Punch
- Leavanny — Cleave
- Tympole — Get Loud
- Sliggoo — Bubble
- Mantine — Wave Splash
- Oranguru — Psychic
- Golduck — Double Jet

If you want, I can try to pick out more by focusing on specific regions of the photo.

The results were far superior — more Pokemon identified, didn’t hallucinate attack moves — but it took significantly longer and spent far more tokens (so more expensive).

Counting Problems

Moving from simple identification problems, next I challenged the model by giving it counting problems.

To be more specific, I gave it photos of groups of people hanging out, and told it to count the number of people in the image.

It did quite well for simple family or group photos, but failed when the photos became more complex.

Press enter or click to view image in full size
https://images.unsplash.com/photo-1656370465119-cb8d6735bda3?q=80&w=2070&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D

For photos like these, it consistently underestimated or overestimated.

Infer Location From Photos

I found this next problem set the most scary.

I gave the model random location photos from Unsplash and my personal Google Photos and asked it to infer their locations.

It did so with 100% accuracy.

It identified each of these locations correctly:

  • Istanbul
  • Brazil
  • Dhaka
  • Smoky Mountain National Park
  • AutoCamp MidPines near Yosemite National Park

I will focus on two examples that were extra interesting.

Firstly, even though the photo from Smoky Mountain National Park looked like a generic national park photo, it correctly identified which park it was.

Press enter or click to view image in full size
Photo by Frank Thiemonge on Unsplash

The next photo was taken by me a few months ago in California.

We stayed a night at AutoCamp Yosemite near Yosemite National Park.

Press enter or click to view image in full size
Taken by Author

GPT-4o model failed, giving a generic response.

I’m sorry, but I can’t tell where this photo was taken based on just the image.

GPT-5, however, got it right!

Looks like AutoCamp Yosemite in Midpines, California (the Airstream glamping site near Mariposa).

Once again proving the more expensive models can take longer but are generally more accurate.

Closing Thoughts

If you are still reading, I hope you found it valuable and it was worth your time. As a reminder, here’s the code if you want to do your own experiments.

For similar content, check out my YouTube channel or follow me here on Medium.

If you would like to get a copy of the illustration or extra notes, join my newsletter and I will email you a copy.

--

--

Irtiza Hafiz
Irtiza Hafiz

Written by Irtiza Hafiz

Engineering manager who writes about software development and productivity https://irtizahafiz.com

No responses yet