[Python Roboflow] 如何使用ResNet-50 How to Use ResNet-50

Translate From (Mar 4, 2024). How to Use ResNet-50. Roboflow Blog

Introduction 引言

ResNet-50, introduced in the paper "Deep Residual Learning for Image Recognition" in 2015, is an image classification architecture developed by Microsoft Research. The default ResNet50 checkpoint was trained on the ImageNet-1k dataset, which contains data on 1,000 classes of images.


In this guide, we are going to walk through how to install ResNet-50 and classify images using ResNet-50.


By the end of this guide, we will have code that assigns the class “forklift” to the following image: 通过本指南,我们将获得一段代码,该代码将类别“叉车”分配给以下图像:

Forklift 叉车图像

What is ResNet-50?


ResNet-50 is an image classification model architecture. Introduced in 2015, ResNet-50 won first place on the ILVRC 2015 image classification task. While many new model architectures that achieve strong performance have since been introduced, ResNet-50 is still a notable architecture in the history of computer vision.

ResNet-50是一种图像分类模型架构。它在2015年的ILVRC 2015图像分类任务中获得了第一名。尽管此后引入了许多性能强大的新模型架构,但ResNet-50仍然是计算机视觉历史上值得注意的架构。

The default ResNet checkpoint can identify any of 1,000 classes in the ImageNet-1k dataset. 默认的ResNet检查点可以在ImageNet-1k数据集中识别任何1,000个类别。

How to Install ResNet-50 如何安装ResNet-50

You can install ResNet-50 using the HuggingFace Transformers Python package. 你可以使用HuggingFace Transformers Python包来安装ResNet-50。

To get started, first install Transformers: 要开始,请先安装Transformers:

pip install transformers

Once you have installed Transformers, you can load the microsoft/resnet-50 model in your code with the ResNetForImageClassification data loader.


To get started, create a new Python file and add the following code:


from transformers import AutoImageProcessor, ResNetForImageClassification
import torch
from datasets import load_dataset
from PIL import Image

image = Image.open("image.jpg")

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50")

inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_label = logits.argmax(-1).item()

In this code, we first open an image called image.jpg. Then, we load our model. We run inference on our model with the model(**inputs) function call. Finally, we retrieve the class with the highest confidence returned by our model.


In the code above, replace image.jpg with the name of the image on which you want to run inference.


Conclusion and The Current Classification Landscape


ResNet-50 is an image classification architecture introduced in 2015 and was trained on the ImageNet-1k dataset. You can train models on a custom dataset using the ResNet architecture if you want to identify your own classes.


While ResNet is several years old, the model is established as an image classification model. Since then, many new architectures have been introduced that allow you to fine-tune a model on a custom dataset, including: The Vision Transformer, FastViT, Ultralytics YOLOv8, ResNext.

尽管ResNet已经有几年历史,但该模型已被确立为图像分类模型。自那以后,引入了许多新的架构,允许你在自定义数据集上微调模型,包括:视觉变换器、FastViT、Ultralytics YOLOv8、ResNext。

There are also zero-shot classification models where you can use the model on arbitrary classes without fine-tuning models.


For example, you can use OpenAI CLIP to assign labels to images without fine-tuning the model. This is because CLIP has been trained on a large dataset with a wide range of descriptions.

例如,你可以使用OpenAI CLIP在不微调模型的情况下给图像分配标签。这是因为CLIP已经在包含广泛描述的大型数据集上进行了训练。

Zero-shot models like CLIP can be used on their own (i.e., for classification, content moderation), or used to auto-label framework like Autodistill for use in training a faster, fine-tuned vision model.
