AI時代的網站與手機App建置與開發Part28 - 使用YOLO模型進行物件偵測
l 摘要
應用程式要提供物件偵測功能, 除了能夠使用Microsoft ML.NET本身的物件偵測模型(請參考: AI時代的網站與手機App建置與開發Part27 -
ML.NET與物件偵測的說明)以外, 也可以利用SSD(Single Shot MultiBox Detector)或YOLO(You Only Look Once)等演算法提供即時的物件偵測功能.
![]() |
圖: YOLO梗圖
· 認識SSD與YOLO演算法
SSD和YOLO演算法都是用來執行物件偵測常用的做法, 其主要特性比較如表1的說明:
|
Feature |
SSD |
YOLO |
|||
|
辨識效率 |
快(較R-NN快, 較YOLO慢) |
|
|||
|
準確度 |
辨識小型物件準確度佳 |
辨識大型物件準確度佳 |
|||
|
複雜度 |
中等 |
|
|||
|
辨識圖片大小 |
支援 |
版本越高支援越好 |
表1: SSD和YOLO演算法主要特性比較
SSD與YOLO物件偵測演算法的適用場合
·
YOLO物件偵測演算法的適用場合:
ü 即時運算(Real-time)程式, 例如監視器, 自動駕駛交通工具, 機器人
ü 辨識速度很關鍵的場合
·
SSD物件偵測演算法的適用場合:
ü 辨識小型物件準確度高的場合, 例如文件分析或辨識微小物件的場合
· 準備YOLO預訓練模型
首先請連線到以下的連結(模型下載網址: https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/end-to-end-apps/ObjectDetection-Onnx/OnnxObjectDetection/ML/OnnxModels), 下載TinyYolo2_model.onnx檔案到專案中名稱為Models資料夾中, 並到[屬性]視窗將TinyYolo2_model.onnx檔案的[複製到輸出目錄]屬性的內容值設定為:有更新時才複製.
· 使用YOLO預訓練模型進行物件偵測
首先請定義描述偵測到的物件位置的BoundingBox類別:
public class BoundingBox
{
public float X {
get; set; } //左上角點X座標
public float Y {
get; set; } //左上角點Y座標
public float
Height { get; set; }
//高度
public float
Width { get; set; }
//寬度
}
定義描述欲偵測的圖片的資訊的ImageData類別:
public class ImageData
{
[LoadColumn(0)]
public string
ImagePath; //欲偵測的圖片的路徑
[LoadColumn(1)]
public string
Label; //欲偵測的圖片的標籤
public static IEnumerable<ImageData> ReadFromFile(string imageFolder)
{
return Directory
.EnumerateFiles(imageFolder)
.Where(filePath => Path.GetExtension(filePath) != ".md")
.Select(filePath => new ImageData { ImagePath = filePath, Label =
Path.GetFileName(filePath) });
}
}
定義描述欲辨識的圖片規格的ImageSettings結構:
public struct ImageSettings
{
public const int imageHeight = 416; //圖片高度
public const int imageWidth = 416; //圖片寬度
}
定義描述YOLO模型輸入和輸出欄位名稱的ModelSettings結構
public struct ModelSettings
{
public const string ModelInput = "image"; // 輸入欄位名稱
public const string ModelOutput = "grid"; // 輸出欄位名稱
}
定義包裝OnnxModel功能的類別
public class OnnxModel
{
private readonly string imagesFolder;
private readonly string modelLocation;
private readonly MLContext mlContext;
//建構函式
public OnnxModel(string imagesFolder, string modelLocation, MLContext mlContext)
{
this.imagesFolder = imagesFolder;
this.modelLocation = modelLocation;
this.mlContext = mlContext;
}
//支援載入YOLO模型的函式
private ITransformer
LoadModel(string
modelLocation)
{
Trace.WriteLine("Read model");
Trace.WriteLine($"Model location: {modelLocation}");
Trace.WriteLine($"Default parameters: image size=({ImageSettings.imageWidth},{ImageSettings.imageHeight})");
// 取得輸入資料的相關資訊
var data = mlContext.Data.LoadFromEnumerable(new List<ImageData>());
var pipeline = mlContext.Transforms.LoadImages(outputColumnName: "image",
imageFolder:
"",
inputColumnName: nameof(ImageData.ImagePath))
.Append(mlContext.Transforms.ResizeImages(outputColumnName:
"image",
imageWidth: ImageSettings.imageWidth,
imageHeight: ImageSettings.imageHeight,
inputColumnName: "image"))
.Append(mlContext.Transforms.ExtractPixels(outputColumnName:
"image"))
.Append(mlContext.Transforms.ApplyOnnxModel(modelFile:
modelLocation,
outputColumnNames: new[] { ModelSettings.ModelOutput }, inputColumnNames: new[] { ModelSettings.ModelInput
}));
var model = pipeline.Fit(data);
// 傳回建立的物件偵測模型
return model;
}
public IEnumerable<float[]> Score(IDataView data)
{
//載入物件偵測模型
var model = LoadModel(modelLocation);
//叫用PredictDataUsingModel函式進行物偵測並傳回偵測的結果
return PredictDataUsingModel(data, model);
}
private IEnumerable<float[]> PredictDataUsingModel(IDataView testData,
ITransformer model)
{
Trace.WriteLine($"Images location: {imagesFolder}");
Trace.WriteLine("");
Trace.WriteLine("=====Identify the objects in the
images=====");
Trace.WriteLine("");
//執行物件偵測並取得偵測結果
IDataView scoredData = model.Transform(testData);
//取得物件偵測結果的信心指數
IEnumerable<float[]>
probabilities =
scoredData.GetColumn<float[]>(ModelSettings.ModelOutput);
//傳回取得的物件偵測結果的信心指數
return probabilities;
}
}
定義描述偵測到的物件資訊的YoloBoundingBox類別:
public class YoloBoundingBox
{
// 記錄偵測到的物件的位置的屬性
public BoundingBox
Dimensions { get; set; }
public string
Label { get; set; }
public float
Confidence { get; set; }
public RectangleF
Rect
{
get { return new RectangleF(
Dimensions.X,
Dimensions.Y, Dimensions.Width, Dimensions.Height); }
}
public Color
BoxColor { get; set; }
}
定義支援解讀Yolo物件偵測模型偵測圖片的結果的YoloOutputParser類別:
public class YoloOutputParser
{
public const int ROW_COUNT = 13;
public const int COL_COUNT = 13;
public const int CHANNEL_COUNT = 125;
public const int BOXES_PER_CELL = 5;
public const int BOX_INFO_FEATURE_COUNT = 5;
public const int CLASS_COUNT = 20;
public const float CELL_WIDTH = 32;
public const float CELL_HEIGHT = 32;
{
1.08F,
1.19F, 3.42F, 4.41F, 6.63F, 11.38F, 9.42F, 5.11F, 16.62F, 10.52F
};
// 支援偵測的物件種類
private string[]
labels = new string[]
{
"aeroplane", "bicycle", "bird", "boat", "bottle",
"bus", "car", "cat", "chair", "cow",
"diningtable", "dog", "horse", "motorbike", "person",
"pottedplant", "sheep", "sofa", "train", "tvmonitor"
};
//繪製物件種類使用的顏色
private static Color[] classColors = new Color[]
{
Color.Khaki,
Color.Fuchsia,
Color.Silver,
Color.RoyalBlue,
Color.Green,
Color.DarkOrange,
Color.Purple,
Color.Gold,
Color.Red,
Color.Aquamarine,
Color.Lime,
Color.AliceBlue,
Color.Sienna,
Color.Orchid,
Color.Tan,
Color.LightPink,
Color.Yellow,
Color.HotPink,
Color.OliveDrab,
Color.SandyBrown,
Color.DarkTurquoise
};
private float
Sigmoid(float
value)
{
var k = (float)Math.Exp(value);
return k / (1.0f + k);
}
//支援計算傳入的陣列參數每一個值的機率分佈
private float[]
Softmax(float[]
values)
{
var maxVal = values.Max();
var exp = values.Select(v => Math.Exp(v - maxVal));
var sumExp = exp.Sum();
}
private int
GetOffset(int x, int y, int channel)
{
return (channel * this.channelStride)
+ (y * COL_COUNT) + x;
}
private BoundingBox
ExtractBoundingBoxDimensions(
float[] modelOutput, int x, int y, int channel)
{
return new BoundingBox
{
X = modelOutput[GetOffset(x, y,
channel)],
Y = modelOutput[GetOffset(x, y,
channel + 1)],
Width =
modelOutput[GetOffset(x, y, channel + 2)],
Height =
modelOutput[GetOffset(x, y, channel + 3)]
};
}
//取得信心指數
private float
GetConfidence(float[]
modelOutput, int x, int y, int channel)
{
return Sigmoid(modelOutput[GetOffset(x, y, channel + 4)]);
}
//將BoundingBox換算成相對於Cell的座標
private BoundingBox
MapBoundingBoxToCell(int x, int y, int box, BoundingBox boxDimensions)
{
return new BoundingBox
{
X = ((float)x + Sigmoid(boxDimensions.X)) * CELL_WIDTH,
Y = ((float)y + Sigmoid(boxDimensions.Y)) * CELL_HEIGHT,
Width = (float)Math.Exp(boxDimensions.Width)
* CELL_WIDTH * anchors[box * 2],
Height = (float)Math.Exp(boxDimensions.Height)
* CELL_HEIGHT * anchors[box * 2 + 1],
};
}
//取得偵測到的物件的種類
public float[]
ExtractClasses(float[]
modelOutput, int x, int y, int channel)
{
float[] predictedClasses = new float[CLASS_COUNT];
int predictedClassOffset = channel + BOX_INFO_FEATURE_COUNT;
for (int
predictedClass = 0; predictedClass < CLASS_COUNT; predictedClass++)
{
predictedClasses[predictedClass] =
modelOutput[GetOffset(
x,
y, predictedClass + predictedClassOffset)];
}
return Softmax(predictedClasses);
}
//取得可能性最高的種類
private ValueTuple<int, float> GetTopResult(float[] predictedClasses)
{
return predictedClasses
.Select((predictedClass, index)
=> (Index: index, Value: predictedClass))
.OrderByDescending(result =>
result.Value)
.First();
}
private float
IntersectionOverUnion(RectangleF boundingBoxA,
RectangleF boundingBoxB)
{
var areaA = boundingBoxA.Width * boundingBoxA.Height;
if (areaA <= 0)
return 0;
var areaB = boundingBoxB.Width * boundingBoxB.Height;
if (areaB <= 0)
return 0;
var minX = Math.Max(boundingBoxA.Left,
boundingBoxB.Left);
var minY = Math.Max(boundingBoxA.Top,
boundingBoxB.Top);
var maxX = Math.Min(boundingBoxA.Right,
boundingBoxB.Right);
var maxY = Math.Min(boundingBoxA.Bottom, boundingBoxB.Bottom);
var intersectionArea = Math.Max(maxY - minY, 0) * Math.Max(maxX - minX, 0);
}
//解析物件偵測的結果
public IList<YoloBoundingBox>
ParseOutputs(float[]
yoloModelOutputs,
float threshold = .3F)
{
var boxes = new List<YoloBoundingBox>();
{
for (int
column = 0; column < COL_COUNT; column++)
{
for (int box
= 0; box < BOXES_PER_CELL; box++)
{
var channel = (box * (CLASS_COUNT + BOX_INFO_FEATURE_COUNT));
BoundingBox boundingBoxDimensions = ExtractBoundingBoxDimensions(yoloModelOutputs, row, column, channel);
float confidence = GetConfidence(yoloModelOutputs, row,
column,
channel);
BoundingBox mappedBoundingBox = MapBoundingBoxToCell(
row,
column, box, boundingBoxDimensions);
if (confidence < threshold)
continue;
float[] predictedClasses = ExtractClasses(yoloModelOutputs,
row,
column, channel);
var (topResultIndex, topResultScore) =
GetTopResult(predictedClasses);
var topScore = topResultScore * confidence;
if
(topScore < threshold)
continue;
boxes.Add(new YoloBoundingBox()
{
Dimensions = new BoundingBox
{
X=(mappedBoundingBox.X-mappedBoundingBox.Width
/ 2),
Y=(mappedBoundingBox.Y-mappedBoundingBox.Height
/ 2),
Width =
mappedBoundingBox.Width,
Height =
mappedBoundingBox.Height,
},
Confidence =
topScore,
Label =
labels[topResultIndex],
BoxColor =
classColors[topResultIndex]
});
}
}
}
return boxes;
}
public IList<YoloBoundingBox>
FilterBoundingBoxes(IList<YoloBoundingBox>
boxes,
int
limit, float
threshold)
{
var activeCount = boxes.Count;
var isActiveBoxes = new bool[boxes.Count];
for (int i =
0; i < isActiveBoxes.Length; i++)
isActiveBoxes[i] = true;
var sortedBoxes = boxes.Select((b, i) => new { Box = b, Index = i })
.OrderByDescending(b => b.Box.Confidence).ToList();
var results = new List<YoloBoundingBox>();
for (int i =
0; i < boxes.Count; i++)
{
if
(isActiveBoxes[i])
{
var boxA = sortedBoxes[i].Box;
results.Add(boxA);
if (results.Count >= limit)
break;
for (var j =
i + 1; j < boxes.Count; j++)
{
if (isActiveBoxes[j])
{
var boxB = sortedBoxes[j].Box;
if
(IntersectionOverUnion(boxA.Rect, boxB.Rect) >
threshold)
{
isActiveBoxes[j]
= false;
activeCount--;
if (activeCount <= 0)
break;
}
}
}
if (activeCount <= 0)
break;
}
}
return results;
}
}
· 使用YOLO模型實作物件偵測
//選取欲辨識圖片的函式
private void
btnSelect_Click(object
sender, EventArgs e)
{
if (openFileDialog1.ShowDialog()
== DialogResult.OK)
{
picOriginal.ImageLocation
= openFileDialog1.FileName;
}
}
//執行物件偵測
private void
btnDetect_Click(object
sender, EventArgs e)
{
var modelFilePath = "Model/TinyYolo2_model.onnx"; //YOLO模型的位置
MLContext mlContext = new MLContext(); //建立MLContext類別的物件
try
{
// 載入指定資料夾中的所有圖片進行物件偵測
IEnumerable<ImageData>
images = ImageData.ReadFromFile("images");
IDataView imageDataView =
mlContext.Data.LoadFromEnumerable(images);
// 建立類別的物件
var modelScorer = new OnnxModel("images", modelFilePath,
mlContext);
// 物件資料中的圖片並取得偵測結果
IEnumerable<float[]> probabilities = modelScorer.Score(imageDataView);
// 解析偵測結果
YoloOutputParser parser = new YoloOutputParser();
//取得偵測到的物件的座標位置和大小
var boundingBoxes =
probabilities
.Select(probability =>
parser.ParseOutputs(probability))
.Select(boxes =>
parser.FilterBoundingBoxes(boxes, 5, .5F));
// 繪製被偵測的圖片中辨識成功的物件
for (var i =
0; i < images.Count(); i++)
{
string imageFileName = images.ElementAt(i).Label;
IList<YoloBoundingBox> detectedObjects = boundingBoxes.ElementAt(i);
DrawBoundingBox("images", imageFileName,
detectedObjects);
LogDetectedObjects(imageFileName,
detectedObjects);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
//繪製物件的位置矩形
void DrawBoundingBox(string inputImageLocation, string imageName,
IList<YoloBoundingBox> filteredBoundingBoxes)
{
Image image = Image.FromFile(Path.Combine(inputImageLocation, imageName));
var originalImageHeight = image.Height;
var originalImageWidth = image.Width;
foreach (var box
in
filteredBoundingBoxes)
{
//
Get Bounding Box Dimensions
var x = (uint)Math.Max(box.Dimensions.X, 0);
var y = (uint)Math.Max(box.Dimensions.Y, 0);
var width = (uint)Math.Min(originalImageWidth - x,
box.Dimensions.Width);
var height = (uint)Math.Min(originalImageHeight - y,
box.Dimensions.Height);
//
Resize To Image
x = (uint)originalImageWidth * x / ImageSettings.imageWidth;
y = (uint)originalImageHeight * y / ImageSettings.imageHeight;
width = (uint)originalImageWidth * width / ImageSettings.imageWidth;
height = (uint)originalImageHeight * height / ImageSettings.imageHeight;
//
Bounding Box Text
string text = $"{box.Label} ({(box.Confidence * 100).ToString("0")}%)";
using (Graphics
thumbnailGraphic = Graphics.FromImage(image))
{
thumbnailGraphic.CompositingQuality
= CompositingQuality.HighQuality;
thumbnailGraphic.SmoothingMode = SmoothingMode.HighQuality;
thumbnailGraphic.InterpolationMode
=
InterpolationMode.HighQualityBicubic;
//
Define Text Options
Font drawFont = new Font("Arial", 12, FontStyle.Bold);
SizeF size = thumbnailGraphic.MeasureString(text, drawFont);
SolidBrush fontBrush = new SolidBrush(Color.Black);
Point atPoint=new Point((int)x, (int)y -
(int)size.Height
- 1);
//
Define BoundingBox options
Pen pen = new Pen(box.BoxColor, 3.2f);
SolidBrush colorBrush = new SolidBrush(box.BoxColor);
//
Draw text on image
thumbnailGraphic.FillRectangle(colorBrush,
(int)x,
(int)(y
- size.Height
- 1), (int)size.Width,
(int)size.Height);
thumbnailGraphic.DrawString(text,
drawFont, fontBrush, atPoint);
thumbnailGraphic.DrawRectangle(pen,
x, y, width, height);
}
}
image.Save(imageName);
picDetected.ImageLocation=imageName ;
}
//顯示辨識結果的函式
void LogDetectedObjects(string imageName, IList<YoloBoundingBox>
boundingBoxes)
{
Trace.WriteLine(
$"The
objects in the image {imageName}
are detected as below....");
foreach (var box
in
boundingBoxes)
{
Trace.WriteLine(
$"{box.Label} and its Confidence score: {box.Confidence}");
}
Trace.WriteLine("");
}
執行上述的程式碼會顯示對每一份測試資料預測的結果, 如圖1所示:
範例下載:


留言
張貼留言