Computer Vision/CS131

[CS131] Lecture 3 & 4 (Convolutions, Cross-Correlation, Seperable Filters)

수수킴 2021. 7. 18.

 

본 게시글은 Standford University (Department of Computer Science)의 강의 자료를 바탕으로 공부한 내용입니다.


Images that we deal in computer vision are digital.
(discrete representations of the photographed scenes).

Image is also treated as a function f : $R^2$ → $R^N$

eg)


Part 1 : Convolutions

Convolution : System that uses information from neighboring pixels to filter the target pixel

    1. Commutative Property (교환 법칙)

    2. Shift Invariance (시불변)

    3. Linearity (선형성)

Kernal (image processing) : In image processing, a kernel, convolution matrix, or mask is a small matrix. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image.

eg)

https://en.wikipedia.org/wiki/Kernel_(image_processing)


[filters.py]

1) conv_nested

def conv_nested(image, kernel):
    """A naive implementation of convolution filter.

    This is a naive implementation of convolution using 4 nested for-loops.
    This function computes convolution of an image with a kernel and outputs
    the result that has the same shape as the input image.

    Args:
        image: numpy array of shape (Hi, Wi).
        kernel: numpy array of shape (Hk, Wk). Dimensions will be odd.

    Returns:
        out: numpy array of shape (Hi, Wi).
    """
    Hi, Wi = image.shape
    Hk, Wk = kernel.shape
    out = np.zeros((Hi, Wi))

    ### YOUR CODE HERE
    for m in range(Hi):
        for n in range(Wi):
            sum = 0
            for i in range(Hk):
                for j in range(Wk):
                    if m+1-i < 0 or n+1-j < 0 or m+1-i >= Hi or n+1-j >= Wi:
                        sum += 0
                    else:
                        sum += kernel[i][j] * image[m+1-i][n+1-j]
            out[m][n] = sum
    ### END YOUR CODE

    return out

Result

 

2) zero_pad

def zero_pad(image, pad_height, pad_width):
    """ Zero-pad an image.

    Ex: a 1x1 image [[1]] with pad_height = 1, pad_width = 2 becomes:

        [[0, 0, 0, 0, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 0]]         of shape (3, 5)

    Args:
        image: numpy array of shape (H, W).
        pad_width: width of the zero padding (left and right padding).
        pad_height: height of the zero padding (bottom and top padding).

    Returns:
        out: numpy array of shape (H+2*pad_height, W+2*pad_width).
    """

    H, W = image.shape
    out = None

    ### YOUR CODE HERE
    out = np.zeros((H+2*pad_height, W+2*pad_width))
    out[pad_height: H+pad_height, pad_width: W+pad_width] = image
    ### END YOUR CODE
    return out

Result

 

3) conv_fast

def conv_fast(image, kernel):
    """ An efficient implementation of convolution filter.

    This function uses element-wise multiplication and np.sum()
    to efficiently compute weighted sum of neighborhood at each
    pixel.

    Hints:
        - Use the zero_pad function you implemented above
        - There should be two nested for-loops
        - You may find np.flip() and np.sum() useful

    Args:
        image: numpy array of shape (Hi, Wi).
        kernel: numpy array of shape (Hk, Wk). Dimensions will be odd.

    Returns:
        out: numpy array of shape (Hi, Wi).
    """
    Hi, Wi = image.shape
    Hk, Wk = kernel.shape
    out = np.zeros((Hi, Wi))

    ### YOUR CODE HERE
    image = zero_pad(image, Hk//2, Wk//2)
    kernel = np.flip(kernel, 0)
    kernel = np.flip(kernel, 1)
    for m in range(Hi):
        for n in range(Wi):
            out[m, n] =  np.sum(image[m: m+Hk, n: n+Wk] * kernel)
    ### END YOUR CODE

    return out
    
# def conv_faster(image, kernel):
#     """
#     Args:
#         image: numpy array of shape (Hi, Wi)
#         kernel: numpy array of shape (Hk, Wk)
#     Returns:
#         out: numpy array of shape (Hi, Wi)
#     """
#     Hi, Wi = image.shape
#     Hk, Wk = kernel.shape
#     out = np.zeros((Hi, Wi))

#     ### YOUR CODE HERE
#     image = zero_pad(image, Hk//2, Wk//2)
#     kernel = np.flip(np.flip(kernel, 0), 1)
#     # The trick is to lay out all the (Hk, Wk) patches and organize them into a (Hi*Wi, Hk*Wk) matrix.
#     # Also consider the kernel as (Hk*Wk, 1) vector. Then the convolution naturally reduces to a matrix multiplication.
#     mat = np.zeros((Hi*Wi, Hk*Wk))
#     for i in range(Hi*Wi):
#         row = i // Wi
#         col = i % Wi
#         mat[i, :] = image[row: row+Hk, col: col+Wk].reshape(1, Hk*Wk)
#     out = mat.dot(kernel.reshape(Hk*Wk, 1)).reshape(Hi, Wi)
    
#     ### END YOUR CODE

#     return out

Zero padding을 이용하여 convolution 연산을 더 빠르게 함이 가능하다.

[Faster Version의 알고리즘]
1. Zero-pad an image.
2. Flip the kernel horizontally and vertically.
3. Compute weighted sum of the neighbor at each pixel.

np.flip(image, axis)

  • axis = 0 : flipud() (vertical flipping)
  • axis = 1 : fliplr() (horizontal flipping)

Result

실제 결과, conv_nested conv_fast의 연산 시간 차이가 많이 난다.

 

Part 2 : Cross-Correlation

Cross-correlation : same as convolution, except that the filter kernel is not flipped. It can be used for template matching. Template g is multiplied with regions of a larger image f to measure how similar each region is to the template.

https://primo.ai/index.php?title=Convolution_vs._Cross-Correlation_(Autocorrelation)


[filters.py]

1) cross_correlation

def cross_correlation(f, g):
    """ Cross-correlation of image f and template g.

    Hint: use the conv_fast function defined above.

    Args:
        f: numpy array of shape (Hf, Wf).
        g: numpy array of shape (Hg, Wg).

    Returns:
        out: numpy array of shape (Hf, Wf).
    """

    out = None
    ### YOUR CODE HERE
    g = np.flip(np.flip(g, 0), 1)
    out = conv_fast(f, g)
    ### END YOUR CODE

    return out

[line 151] np.flip(np.flip(g, 0), 1) : horizontal flip of vertical flip of g

결과적으로 g의 상하좌우 반전

[line 152] conv_fast of f and g
▷ convolution
을 하면 상하좌우 반전되었던 g를 다시 상하좌우 반전하여 연산
결과적으로 원본의 g를 뒤집지 않고 f와 곱

 

먼저 shelf.jpg(f) templage.jpg(g) Grayscale로 변환 후 두 이미지를 cross-correlation 한다.

Grayscale을 하는 이유 : classification에선 grayscale 로도 충분

shelf.jpg(f) template.jpg(g)

Result

Q : How does the output of cross-correlation filter look? Explain what problems there might be with using a raw template as a filter.

A : 제대로 template detection 하지 못했을 뿐만 아니라 서로 correlated 된 부분이 너무 많아 부정확한 결과가 나왔다.

Solution : zero-mean cross-correlation

 

2) zero_mean_cross_correlation

def zero_mean_cross_correlation(f, g):
    """ Zero-mean cross-correlation of image f and template g.

    Subtract the mean of g from g so that its mean becomes zero.

    Hint: you should look up useful numpy functions online for calculating the mean.

    Args:
        f: numpy array of shape (Hf, Wf).
        g: numpy array of shape (Hg, Wg).

    Returns:
        out: numpy array of shape (Hf, Wf).
    """

    out = None
    ### YOUR CODE HERE
    g = g - np.mean(g)
    out = cross_correlation(f, g)
    ### END YOUR CODE

    return out

Result

Zero-mean Cross-correlation을 한 결과, template을 정확하게 detection 하였다.

(appropriate scaling and thresholding)

def check_product_on_shelf(shelf, product):
    out = zero_mean_cross_correlation(shelf, product)
    
    # Scale output by the size of the template
    out = out / float(product.shape[0]*product.shape[1])
    
    # Threshold output (this is arbitrary, you would need to tune the threshold for a real application)
    out = out > 0.025
    
    if np.sum(out) > 0:
        print('The product is on the shelf')
    else:
        print('The product is not on the shelf')

왜 zero-mean cross-correlation을 하면 정확하게 detection을 하는 것인가?

https://martin-thoma.com/zero-mean-normalized-cross-correlation/)

3) normalized_cross_correlation

def normalized_cross_correlation(f, g):
    """ Normalized cross-correlation of image f and template g.

    Normalize the subimage of f and the template g at each step
    before computing the weighted sum of the two.

    Hint: you should look up useful numpy functions online for calculating 
          the mean and standard deviation.

    Args:
        f: numpy array of shape (Hf, Wf).
        g: numpy array of shape (Hg, Wg).

    Returns:
        out: numpy array of shape (Hf, Wf).
    """

    out = None
    ### YOUR CODE HERE
    f = (f - np.mean(f))/np.var(f)
    g = (g - np.mean(g))/np.var(g)
    out = cross_correlation(f, g)
    ### END YOUR CODE

    return out

Zero-mean cross-correlation is not robust to change in lighting condition (ex. shade).

Solution : Normalized Cross-correlation (Normalize the pixels of the image and template at every step before comparing them)

Result

Part 3 : Seperable Filters

Separable Convolution 에서는 커널 작업을 여러 단계로 나눌 수 있다. 컨볼루션을 y = conv(x, k)로 표현할 때, x는 입력 이미지, y 는 출력 이미지, k는 커널이다. 그리고 k=k1.dot(k2) 로 계산된다고 가정해 볼 때, 이것은 K 2D Convolution 을 수행하는 대신 k1 k2 1D Convolution 하는 것과 동일한 결과를 가져오기 때문에 Separable Convolution 이라고 할 수 있다.

 

Separable Convolution은 단순히 커널을 두 개의 작은 커널로 나눈 뒤에 원래 커널로 작업한 결과물과 동일한 효과(원래는 9번의 곱셈)를 얻기 위하여 각각 3번의 곱셈으로 두 번의 컨볼루션을 수행하게 된다. 곱셈 연산이 적을 수록 계산 복잡성이 줄어들고, 네트워크를 더 빠르게 실행 할 수 있다.

 

(출처 : https://eehoeskrap.tistory.com/431)


1) Normal Convolution (2D)

# 5x5 Gaussian blur
kernel = np.array([
    [1,4,6,4,1],
    [4,16,24,16,4],
    [6,24,36,24,6],
    [4,16,24,16,4],
    [1,4,6,4,1]
])

t0 = time()
out = conv_nested(img, kernel)
t1 = time()
t_normal = t1 - t0

Gaussian Kernal : widely used for blurring image

Result

 

2) Seperable Convolution (two 1D)

# The kernel can be written as outer product of two 1D filters
k1 = None  # shape (5, 1)
k2 = None  # shape (1, 5)

### YOUR CODE HERE
k1 = np.array([[1],[4],[6],[4],[1]])
k2 = np.array([[1,4,6,4,1]])
### END YOUR CODE

# Check if kernel is product of k1 and k2
if not  np.all(k1 * k2 == kernel):
    print('k1 * k2 is not equal to kernel')
    
assert k1.shape == (5, 1), "k1 should have shape (5, 1)"
assert k2.shape == (1, 5), "k2 should have shape (1, 5)"
# Perform two convolutions using k1 and k2
t0 = time()
out_separable = conv_nested(img, k1)
out_separable = conv_nested(out_separable, k2)
t1 = time()
t_separable = t1 - t0

Result

Seperable Convolution이 확실히 Normal convolution보다 빠른 것을 볼 수 있다.

'Computer Vision > CS131' 카테고리의 다른 글

[CS131] Lecture 5 & 6 (Canny Edge Detector, Lane Detection)  (0) 2021.07.18

댓글