learning deep architectures for AI

谭云瀚
2023-12-01

1,介绍

一张图片的像素级别是lower-level ,而图像的内容是高度抽象的。因此想要让机器像人一样识别需要将lower-level一层一层抽象到higher-level。

1.1:深度学习实现了上述想法,并提高了准确率

1.2-1.3:

 Ability to learn complex, highly-varyingfunctions, i.e., with a number of variations much greater than the number of training examples.
• Ability to learn with little human input the low-level, intermediate, and high-level abstractions that would be useful to represent the kind of complex functions needed for AI tasks.
• Ability to learn from a very large set of examples: computation time for training should scale well with the number of examples, i.e. close to linearly.
• Ability to learn from mostly unlabeled data, i.e. to work in the semi-supervised setting, where not all the examples come with complete and correct semantic labels.
• Ability to exploit the synergies present across a large number of tasks, i.e. multi-task learning. These synergies exist because all the AI tasks provide different views on the same underlying reality.
• Strong unsupervised learning (i.e. capturing most of the statistical structure in the observed data), which seems essential in the limit of a large number of tasks and when future tasks are not known ahead of time.

 

2

The main pointof this section is that some functionscannotbe efficiently represented (in terms of number of tunable elements) by architectures that are too shallow.

We say that the expression of a function is compact when it has few computational elements,.

 functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k − 1 architecture

. The basic conclusion that these results suggest is that when a function can be compactly represented by a deep architecture, it might need a very large architecture to be represented by an insufficiently deep one. 

 there are functions computable with a polynomial-size logic gates circuit of depth k that require exponential size when restricted to depth k −1 (Ha˚stad, 1986). 

Depth of architecture is connected to the notion of highly-varying functions. We argue that, in general, deep architectures can compactly represent highly-varying functions which would otherwise require a very large size to be represented with an inappropriate architecture. We say that a function is highly-varying when a piecewise approximation (e.g., piecewise-constant or piecewise-linear) of that function would require a large number of pieces.

 

3

3.1

An estimator that is local in input space obtains good generalization for a new input x by mostly exploiting training examples in the neighborhood of x.

Local estimators implicitly or explicitly partition the input space in regions (possibly in a soft rather than hard way) and require different parameters or degrees of freedom to account for the possible shape of the target function in each of the regions. When many regions are necessary because the function is highly varying, the number of required parameters will also be large, and thus the number of examples needed to achieve good generalization. 

因此若要正确的分类或预测需要将原始输入数据一步步抽象到更高纬度,避免local representation

3.2

 A cartoon local representation for integers i ∈ {1,2,...,N} is a vector r(i) of N bits with a single 1 and N −1 zeros, i.e. with j-th element rj(i) = 1i=j, called the one-hot representation of i. A distributed representation for the same integer could be a vector of log2 N bits, which is a much more compact way to represent i. For the same number of possible configurations, a distributed representation can potentially be exponentially more compact than a very local one. 

 

 

 

 


 

 类似资料:

相关阅读

相关文章

相关问答