Computer Vision: Algorithms and Applications阅读笔记

娄德运

2023-12-01

Computer Vision: Algorithms and Applications 计算机视觉算法与应用

英本大二人工智能专业 Computer Vision学科

Look at Chapter 14 section 1 and section 4 of Computer Vision: Algorithms and Applications (Links to an external site.) for more information on “classical object detection” and image classification (1-3 hours).
Week 6 object detection老师推荐读物

https://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf

学习笔记

正好是个白天没课的周二，游戏不想打，闲着也是闲着，不如做个学习笔记

14.1 Object detection物体检测

14.1.1 人脸检测Face detection

在将面部识别应用于一般图像之前，必须首先找到任何面部的位置和大小。原则上，我们可以在每个像素和比例上应用人脸识别算法，但是这样的过程在实践中会太慢。
According to the taxonomy ofYang, Kriegman, and Ahuja(2002), 面部检测技术可以分为基于特征的，基于模板的或基于外观的。feature-based, template-based, or appearance-based.

基于特征试图找到独特的图像特征的位置，然后验证这些特征是否在合理的几何排列中。
Feature-based techniques attempt to find the locations of distinctive image features such as the eyes,nose, and mouth, and then verify whether these features are in a plausible geometrical ar-rangement.

基于模板的方法可以处理各种各样的姿势和表情可变性。但是通常情况下，它们需要在真实面部附近进行良好的初始化，因此不适合用作快速面部检测器。
Template-based approaches, such as active appearance models (AAMs) (Section14.2.2),can deal with a wide range of pose and expression variability. Typically, they require goodinitialization near a real face and are therefore not suitable as fast face detectors.

基于外观的方法会扫描小图像的重叠矩形块？然后可以用一系列更昂贵但是有选择性的检测算法进行优化。为了处理比例变化，通常将图像转换为 asub-octave pyramid？，并在每个级别上执行单独的扫描。
Appearance-based approaches scan over small overlapping rectangular patches of the im-age searching for likely face candidates, which can then be refined using acascade of more expensive but selective detection algorithms (Sung and Poggio 1998;Rowley, Baluja, andKanade 1998a;Romdhani, Torr, Sch ̈olkopfet al.2001;Fleuret and Geman 2001;Viola andJones 2004).
In order to deal with scale variation, the image is usually converted into asub-octave pyramid and a separate scan is performed on each level. Most appearance-basedapproaches today rely heavily on training classifiers using sets of labeled face and non-facepatches.

clustering and PCA ~~?pca是什么~~
Neural networks ：directly outputs the likelihood of a face at the center of every overlapping patch in a multi-resolution pyramid.~~直接在？？的中心直接输出人脸的可能性。~~
Support vector machines ： SVMs have been used byother researchers for both face detection and face recognition (Heisele, Ho, Wuet al.2003
Boosting：……

14.1.2 行人检测Pedestrian detection

行人探测器是个general的物体检测例子……可用于汽车安全应用中，例如从行进中的车辆中检测行人和其他汽车。
An example of a well-known pedestrian detector is the algorithm developed byDalaland Triggs(2005), who use a set of overlapping histogram of oriented gradients(HOG) de-scriptors fed into a support vector machine.
SVM的行人检测器性能最好
compare a number of pedestrian detectors and conclude that those based on local receptive fields and SVMs perform the best, with a boosting-based ap-proach coming close.

——————————————————————————————————————

感觉有很多算法都可以应用于面部检测，但是有点难懂，主要是专有名词不太理解哭哭

14.4 Category recognition类别辨识

14.4.1 词袋Bag of words

一种最简单的类别识别算法，该算法仅计算查询图像中发现的视觉单词的分布（直方图），并将此分布与训练图像中的分布进行比较。
The biggest difference from instance recognition is the absence of a geometric verification stage, since individual instances of generic visual categories, such as those shown in Figure14.35, have relatively little spatial coherence to their features.
~~我怎么传不了图，反正里面有三十张日常生活中的杂图~~

14.4.2 Part-based models

Using pictorial structures to locate and track a person
基于零件识别摩托车，提供摩托车零件的图 The top figureshows the mean relative locations for each part along with their position covariances and likelihood of occurrence.

14.4.3 分割识别Recognition with segmentation

The most challenging version of generic object recognition is to simultaneously performrecognition with accurate boundary segmentation

For more complex (flexible) object models, such as those for humans Figure14.1f, adifferent approach is to pre-segment the image into larger or smaller pieces (Chapter5) andthen match such pieces to portions of the model (Mori, Ren, Efroset al.2004;Mori 2005;He, Zemel, and Ray 2006;Gu, Lim, Arbelaezet al.2009)
一种更全面的方法是将问题表述图像中的每个像素
A more holistic approach to recognition and segmentation is to formulate the problem as one of labeling every pixel in an image with its class membership, and to solve this problem using energy minimization or Bayesian inference techniques.

14.4.4 智能照片编辑Application: Intelligent photo editing

A different application of image recognition and segmentation is to infer 3D structurefrom a single photo by recognizing certain scene structures

Most of these techniques rely either on a set of labeled training images, or the even more recent explosion in imagesavailable on the Internet.

Face detection and localization can also be used in a variety of photo editing application

——————————————————————————————————————
四个小时过去了没学懂……啃原版书果然还是很吃力呜呜慢慢来