Object Detection with YOLO Part 1 (Ultralytics, YOLO11)

2026年2月16日 2026年2月16日

管理人

table of contents

Info

This article is translated from Japanese to English.

https://404background.com/program/yolo-1

Introduction

In this post, I tried using YOLO, a well-known object detection algorithm.
Since this is my first time working with object detection, I started with something easy to execute while doing some research. I haven't really mounted cameras on my robots before, so I’d like to incorporate this into my future designs.

I began researching with the goal of detecting tomatoes from images, and I found that the latest version is already up to v11. With such frequent version updates, I do get a bit worried about package dependencies.
▼Previous articles are here:

Related Information

Overview

YOLO stands for "You Only Look Once," and it allows for high-speed object detection from images.
▼Although it only goes up to YOLO v7, this article provides a clear summary of YOLO's evolution. There are projects derived from YOLO, and the primary development entities seem to have changed over time.

【YOLO】各バージョンの違いを簡単にまとめてみた【物体検出アルゴリズム】 #初心者 - Qiita

The latest YOLO11 is developed by a company called Ultralytics.
▼Ultralytics page is here. Demos are also available.

https://www.ultralytics.com/yolo

▼The page regarding the latest YOLO11 is here. It seems speed and accuracy have improved significantly.

YOLO11 🚀 NEW - Ultralytics YOLO Docs

There is an app called "Ultralytics YOLO" that allows you to try YOLO on your smartphone.
▼I tried it on my iPhone 8, but the app crashed when using a large model, likely due to heavy processing. It seems the iPhone 8 might finally be getting too old…

https://apps.apple.com/jp/app/ultralytics-yolo/id1452689527

Datasets

Objects are detected based on a dataset; if there isn't one that fits your specific purpose, you’ll need to create it.
▼Ultralytics has a page regarding datasets. There are datasets for African animals, drones, and more.

https://docs.ultralytics.com/ja/datasets

▼I found a dataset that looks useful for the agricultural field.

https://blog.roboflow.com/top-agriculture-datasets-computer-vision

▼This is a tomato dataset on Kaggle.

https://www.kaggle.com/datasets/andrewmvd/tomato-detection

▼I found an article about creating a tomato detector! I definitely want to use this as a reference.

https://farml1.com/tomato_yolov5

Setting Up the Environment

For this setup, I am using a Windows 10 laptop.
I had previously tried setting up a YOLO v10 environment while looking at the GitHub repository, but I encountered errors due to package version conflicts when running samples.
▼It seemed like packages requiring Python 3.9+ and 3.10+ were mixed together.

This time, when creating the Python virtual environment, I specifically designated Python version 3.9.
▼Information on Python virtual environments is here:

I used the following commands to create and activate the virtual environment:

py -3.9 -m venv yolo39
cd yolo39
.\Scripts\activate

Next, I installed and executed the Ultralytics package.
▼The Ultralytics repository is here:

https://github.com/ultralytics/ultralytics

Install ultralytics into the virtual environment:

pip install ultralytics

That’s all you need to install. Running the following command saved an image with the objects detected.

yolo predict model=yolo11n.pt source='https://ultralytics.com/images/bus.jpg'

▼The processed image was saved as shown below.

Trying Real-time Object Detection

When actually mounting this on a robot, you need to process camera data in real-time. I tried running a simple demo.
▼I referred to this page:

https://docs.ultralytics.com/ja/guides/streamlit-live-inference

Just run the following command:

yolo streamlit-predict

The app launched, and I accessed the URL displayed in the command prompt to see the interface.
▼From the left sidebar, you can select the model, target objects, and start the detection.

I selected "person," "keyboard," and "book" as targets and performed the detection.
▼This is YOLO11s.

▼This is YOLO11n. The FPS is higher than YOLO11s.

While fine parameters can be adjusted, I felt it detected objects quite well. It was even able to detect TVs or keyboards when only half of them were in the frame.

Finally

I’ve heard that when actually mounting this on a robot, you use it by matching the coordinates of the detected object with the depth camera coordinates to measure the distance to the object. Next, I plan to work on obtaining coordinates and enabling communication.
I also need to learn how to create my own datasets for future projects. I’ve heard that color information can vary significantly depending on ambient light, which is something to keep in mind.