问题：

如何将输入图像映射到CNN第一conv层的神经元？

文德曜

2023-03-14

我刚刚完成了ANN课程，开始学习CNN。我对CNN中的填充和跨步操作有基本的了解。

但在第一层神经元映射输入图像有困难，但我对人工神经网络中输入特征如何映射到第一层有基本的了解。

理解输入图像和第一conv层神经元之间映射的最佳方法是什么？

如何澄清我对以下代码示例的疑虑？代码取自Coursera的DL课程。

    def initialize_parameters():
        """
        Initializes weight parameters to build a neural network with tensorflow. The shapes are:
                            W1 : [4, 4, 3, 8]
                            W2 : [2, 2, 8, 16]
        Returns:
        parameters -- a dictionary of tensors containing W1, W2
        """

        tf.set_random_seed(1)                              # so that your "random" numbers match ours

        ### START CODE HERE ### (approx. 2 lines of code)
        W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
        W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
        ### END CODE HERE ###

        parameters = {"W1": W1,
                      "W2": W2}

        return parameters


     def forward_propagation(X, parameters):
        """
        Implements the forward propagation for the model:
        CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

        Arguments:
        X -- input dataset placeholder, of shape (input size, number of examples)
        parameters -- python dictionary containing your parameters "W1", "W2"
                      the shapes are given in initialize_parameters

        Returns:
        Z3 -- the output of the last LINEAR unit
        """

        # Retrieve the parameters from the dictionary "parameters" 
        W1 = parameters['W1']
        W2 = parameters['W2']

        ### START CODE HERE ###
        # CONV2D: stride of 1, padding 'SAME'
        Z1 = tf.nn.conv2d(X,W1, strides = [1,1,1,1], padding = 'SAME')
        # RELU
        A1 = tf.nn.relu(Z1)
        # MAXPOOL: window 8x8, sride 8, padding 'SAME'
        P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')
        # CONV2D: filters W2, stride 1, padding 'SAME'
        Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')
        # RELU
        A2 = tf.nn.relu(Z2)
        # MAXPOOL: window 4x4, stride 4, padding 'SAME'
        P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1], strides = [1,4,4,1], padding = 'SAME')
        # FLATTEN
        P2 = tf.contrib.layers.flatten(P2)
        # FULLY-CONNECTED without non-linear activation function (not not call softmax).
        # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None" 
        Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn=None)
        ### END CODE HERE ###

        return Z3

    with tf.Session() as sess:
        np.random.seed(1)
        X, Y = create_placeholders(64, 64, 3, 6)
        parameters = initialize_parameters()
        Z3 = forward_propagation(X, parameters)
        init = tf.global_variables_initializer()
        sess.run(init)
        a = sess.run(Z3, {X: np.random.randn(1,64,64,3), Y: np.random.randn(1,6)})
        print("Z3 = " + str(a))

每个大小为4*4*3的8个过滤器如何处理大小为64*64*3的输入图像？

步幅=1，填充=相同，batch_size=1。

到目前为止，我所理解的是第一卷积层中的每个神经元将有8个滤波器，每个滤波器的大小为4*4*3。第一卷积层中的每个神经元将获取与滤波器大小相同的部分输入图像（这里是4*4*3）并应用卷积运算并生成8个64*64特征映射。

如果我的理解正确，那么：

如果没有，则：

3个

或者输入图像的一部分？我们如何知道输入图像的哪一部分映射到第一层的哪个神经元？

6个

长孙宜

2023-03-14

我找到了我的问题的相关答案，并在这里发布。

首先，神经元的概念也存在于conv层，但它是间接的。基本上，conv层中的每个神经元都处理输入图像的一部分，该部分与conv层中使用的内核大小相同。

每个神经元将只关注输入图像的特定部分（在完全连接的神经网络中，每个神经元关注整个图像），每个神经元使用n个滤波器/核来获得对图像特定部分的更多洞察力。

这些n个过滤器/内核由给定conv层中的所有神经元共享。由于这些权重（内核/过滤器）的共享性质，conv层需要学习的参数数量将更少。其中，在完全连接的ANN网络中，每个神经元作为其自身的权重矩阵，因此需要学习的参数数量更多。

现在，给定conv层“L”中的神经元数量取决于input\u大小（前一层L-1的输出）、L层中使用的Kernel\u大小、L层中使用的填充和L层中使用的步幅。

现在让我们回答上面指定的每个问题。

    From above code example for conv layer 1:
    Batch size = 1
    Input image size = 64*64*3  
    Kernel size = 4*4*3 ==> Taken from W1
    Number of kernel = 8 ==> Taken from W1
    Padding = same
    stride = 1

    Stride = 1 means that you are sliding the kernel one pixel at a time. Let's consider x axis and number pixels 1, 2, 3 4 ... and 64. 

    The first neuron will see pixels 1 2,3 and 4, then the kernel is shifted by one pixel and the next neuron will see pixels 2 3, 4 and 5 and last neuron will see pixels 61, 62, 63 and 64 This happens if you use valid padding. 

    In case of same padding, first neuron will see pixels 0, 1, 2, and 3, the second neuron will see pixels 1, 2, 3 and 4, the last neuron will see pixels 62,63, 64 and (one zero padded). 

    In case the same padding case, you end up with the output of the same size as the image (64 x 64 x 8). In the case of valid padding, the output is (61 x 61 x 8).

    Where 8 in output represent the number of filters.

神经元只查找输入图像的一部分，请参考第一个问题答案，您将能够在输入图像和神经元之间映射。

3个

这是该层的内核总数，该层的所有神经元将共享相同的内核，用于学习输入图像的不同部分。因此，在convnet中，要学习的参数数量比完全连接的ANN少。

它取决于input_size（前一层L-1的输出），层L中使用的Kernel_size，层L中使用的填充和层L中使用的步幅。请参阅上面的第一个问题答案以获得更多澄清。

与尊重数没有关系，但每个神经元使用n个过滤器/内核（这些内核在特定层的所有神经元之间共享）来了解更多关于输入图像特定部分的信息。

下面的示例代码将帮助我们阐明卷积运算的内部实现。

def conv_forward(A_prev, W, b, hparameters):
    """
    Implements the forward propagation for a convolution function

    Arguments:
    A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
    b -- Biases, numpy array of shape (1, 1, 1, n_C)
    hparameters -- python dictionary containing "stride" and "pad"

    Returns:
    Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward() function
    """

    # Retrieve dimensions from A_prev's shape (≈1 line)  
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

    # Retrieve dimensions from W's shape (≈1 line)
    (f, f, n_C_prev, n_C) = W.shape

    # Retrieve information from "hparameters" (≈2 lines)
    stride = hparameters['stride']
    pad = hparameters['pad']

    # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
    n_H = int(np.floor((n_H_prev-f+2*pad)/stride)) + 1
    n_W = int(np.floor((n_W_prev-f+2*pad)/stride)) + 1

    # Initialize the output volume Z with zeros. (≈1 line)
    Z = np.zeros((m,n_H,n_W,n_C))

    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev,pad)

    for i in range(m):                               # loop over the batch of training examples
        a_prev_pad = A_prev_pad[i]                               # Select ith training example's padded activation
        for h in range(n_H):                           # loop over vertical axis of the output volume
            for w in range(n_W):                       # loop over horizontal axis of the output volume
                for c in range(n_C):                   # loop over channels (= #filters) of the output volume

                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h*stride
                    vert_end = vert_start+f
                    horiz_start = w*stride
                    horiz_end = horiz_start+f

                    # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
                    a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]

                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
                    Z[i, h, w, c] = conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])                                      

    return Z



A_prev = np.random.randn(1,64,64,3)
W = np.random.randn(4,4,3,8)
#Don't worry about bias , tensorflow will take care of this.
b = np.random.randn(1,1,1,8)
hparameters = {"pad" : 1,
               "stride": 1}

Z = conv_forward(A_prev, W, b, hparameters)

如何将输入图像映射到CNN第一conv层的神经元？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档