[Python Roboflow] 如何使用ResNet-50 How to Use ResNet-50

Translate From (Mar 4, 2024). How to Use ResNet-50. Roboflow Blog

Introduction 引言

ResNet-50, introduced in the paper "Deep Residual Learning for Image Recognition" in 2015, is an image classification architecture developed by Microsoft Research. The default ResNet50 checkpoint was trained on the ImageNet-1k dataset, which contains data on 1,000 classes of images.

ResNet-50是在2015年的论文《深度残差学习用于图像识别》中介绍的,由微软研究院开发的图像分类架构。默认的ResNet50检查点是在包含1000个类别图像的ImageNet-1k数据集上训练的。

In this guide, we are going to walk through how to install ResNet-50 and classify images using ResNet-50.

在本指南中,我们将逐步介绍如何安装ResNet-50并使用ResNet-50对图像进行分类。

By the end of this guide, we will have code that assigns the class “forklift” to the following image: 通过本指南,我们将获得一段代码,该代码将类别“叉车”分配给以下图像:

Forklift 叉车图像

What is ResNet-50?

ResNet-50是什么?

ResNet-50 is an image classification model architecture. Introduced in 2015, ResNet-50 won first place on the ILVRC 2015 image classification task. While many new model architectures that achieve strong performance have since been introduced, ResNet-50 is still a notable architecture in the history of computer vision.

ResNet-50是一种图像分类模型架构。它在2015年的ILVRC 2015图像分类任务中获得了第一名。尽管此后引入了许多性能强大的新模型架构,但ResNet-50仍然是计算机视觉历史上值得注意的架构。

The default ResNet checkpoint can identify any of 1,000 classes in the ImageNet-1k dataset. 默认的ResNet检查点可以在ImageNet-1k数据集中识别任何1,000个类别。

How to Install ResNet-50 如何安装ResNet-50

You can install ResNet-50 using the HuggingFace Transformers Python package. 你可以使用HuggingFace Transformers Python包来安装ResNet-50。

To get started, first install Transformers: 要开始,请先安装Transformers:

pip install transformers

Once you have installed Transformers, you can load the microsoft/resnet-50 model in your code with the ResNetForImageClassification data loader.

安装完Transformers后,你可以使用ResNetForImageClassification数据加载器在代码中加载microsoft/resnet-50模型。

To get started, create a new Python file and add the following code:

要开始,请创建一个新的Python文件并添加以下代码:

from transformers import AutoImageProcessor, ResNetForImageClassification
import torch
from datasets import load_dataset
from PIL import Image

image = Image.open("image.jpg")

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50")

inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])

In this code, we first open an image called image.jpg. Then, we load our model. We run inference on our model with the model(**inputs) function call. Finally, we retrieve the class with the highest confidence returned by our model.

在这段代码中,我们首先打开一个名为image.jpg的图像。然后,我们加载我们的模型。我们使用model(**inputs)函数调用在我们的模型上运行推理。最后,我们检索模型返回的最有信心的类别。

In the code above, replace image.jpg with the name of the image on which you want to run inference.

在上述代码中,将image.jpg替换为你想要运行推理的图像的名称。

Conclusion and The Current Classification Landscape

结论和当前的分类景观

ResNet-50 is an image classification architecture introduced in 2015 and was trained on the ImageNet-1k dataset. You can train models on a custom dataset using the ResNet architecture if you want to identify your own classes.

ResNet-50是在2015年引入的图像分类架构,并在ImageNet-1k数据集上进行了训练。如果你想识别自己的类别,你可以使用ResNet架构在自定义数据集上训练模型。

While ResNet is several years old, the model is established as an image classification model. Since then, many new architectures have been introduced that allow you to fine-tune a model on a custom dataset, including: The Vision Transformer, FastViT, Ultralytics YOLOv8, ResNext.

尽管ResNet已经有几年历史,但该模型已被确立为图像分类模型。自那以后,引入了许多新的架构,允许你在自定义数据集上微调模型,包括:视觉变换器、FastViT、Ultralytics YOLOv8、ResNext。

There are also zero-shot classification models where you can use the model on arbitrary classes without fine-tuning models.

还有一些零样本分类模型,你可以在不微调模型的情况下对任意类别使用模型。

For example, you can use OpenAI CLIP to assign labels to images without fine-tuning the model. This is because CLIP has been trained on a large dataset with a wide range of descriptions.

例如,你可以使用OpenAI CLIP在不微调模型的情况下给图像分配标签。这是因为CLIP已经在包含广泛描述的大型数据集上进行了训练。

Zero-shot models like CLIP can be used on their own (i.e., for classification, content moderation), or used to auto-label framework like Autodistill for use in training a faster, fine-tuned vision model.

零样本模型如CLIP可以单独使用(例如,用于分类、内容审核),或用于自动标记框架如Autodistill,用于训练更快的、微调的视觉模型。