본 게시글은 Standford University (Department of Computer Science)의 강의 자료를 바탕으로 공부한 내용입니다.
Images that we deal in computer vision are digital.
(discrete representations of the photographed scenes).
Image is also treated as a function f : $R^2$ → $R^N$
eg)
Part 1 : Convolutions
Convolution : System that uses information from neighboring pixels to filter the target pixel
- Commutative Property (교환 법칙)
- Shift Invariance (시불변)
- Linearity (선형성)
Kernal (image processing) : In image processing, a kernel, convolution matrix, or mask is a small matrix. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image.
eg)
[filters.py]
1) conv_nested
def conv_nested(image, kernel):
"""A naive implementation of convolution filter.
This is a naive implementation of convolution using 4 nested for-loops.
This function computes convolution of an image with a kernel and outputs
the result that has the same shape as the input image.
Args:
image: numpy array of shape (Hi, Wi).
kernel: numpy array of shape (Hk, Wk). Dimensions will be odd.
Returns:
out: numpy array of shape (Hi, Wi).
"""
Hi, Wi = image.shape
Hk, Wk = kernel.shape
out = np.zeros((Hi, Wi))
### YOUR CODE HERE
for m in range(Hi):
for n in range(Wi):
sum = 0
for i in range(Hk):
for j in range(Wk):
if m+1-i < 0 or n+1-j < 0 or m+1-i >= Hi or n+1-j >= Wi:
sum += 0
else:
sum += kernel[i][j] * image[m+1-i][n+1-j]
out[m][n] = sum
### END YOUR CODE
return out
Result
2) zero_pad
def zero_pad(image, pad_height, pad_width):
""" Zero-pad an image.
Ex: a 1x1 image [[1]] with pad_height = 1, pad_width = 2 becomes:
[[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0]] of shape (3, 5)
Args:
image: numpy array of shape (H, W).
pad_width: width of the zero padding (left and right padding).
pad_height: height of the zero padding (bottom and top padding).
Returns:
out: numpy array of shape (H+2*pad_height, W+2*pad_width).
"""
H, W = image.shape
out = None
### YOUR CODE HERE
out = np.zeros((H+2*pad_height, W+2*pad_width))
out[pad_height: H+pad_height, pad_width: W+pad_width] = image
### END YOUR CODE
return out
Result
3) conv_fast
def conv_fast(image, kernel):
""" An efficient implementation of convolution filter.
This function uses element-wise multiplication and np.sum()
to efficiently compute weighted sum of neighborhood at each
pixel.
Hints:
- Use the zero_pad function you implemented above
- There should be two nested for-loops
- You may find np.flip() and np.sum() useful
Args:
image: numpy array of shape (Hi, Wi).
kernel: numpy array of shape (Hk, Wk). Dimensions will be odd.
Returns:
out: numpy array of shape (Hi, Wi).
"""
Hi, Wi = image.shape
Hk, Wk = kernel.shape
out = np.zeros((Hi, Wi))
### YOUR CODE HERE
image = zero_pad(image, Hk//2, Wk//2)
kernel = np.flip(kernel, 0)
kernel = np.flip(kernel, 1)
for m in range(Hi):
for n in range(Wi):
out[m, n] = np.sum(image[m: m+Hk, n: n+Wk] * kernel)
### END YOUR CODE
return out
# def conv_faster(image, kernel):
# """
# Args:
# image: numpy array of shape (Hi, Wi)
# kernel: numpy array of shape (Hk, Wk)
# Returns:
# out: numpy array of shape (Hi, Wi)
# """
# Hi, Wi = image.shape
# Hk, Wk = kernel.shape
# out = np.zeros((Hi, Wi))
# ### YOUR CODE HERE
# image = zero_pad(image, Hk//2, Wk//2)
# kernel = np.flip(np.flip(kernel, 0), 1)
# # The trick is to lay out all the (Hk, Wk) patches and organize them into a (Hi*Wi, Hk*Wk) matrix.
# # Also consider the kernel as (Hk*Wk, 1) vector. Then the convolution naturally reduces to a matrix multiplication.
# mat = np.zeros((Hi*Wi, Hk*Wk))
# for i in range(Hi*Wi):
# row = i // Wi
# col = i % Wi
# mat[i, :] = image[row: row+Hk, col: col+Wk].reshape(1, Hk*Wk)
# out = mat.dot(kernel.reshape(Hk*Wk, 1)).reshape(Hi, Wi)
# ### END YOUR CODE
# return out
Zero padding을 이용하여 convolution 연산을 더 빠르게 함이 가능하다.
[Faster Version의 알고리즘]
1. Zero-pad an image.
2. Flip the kernel horizontally and vertically.
3. Compute weighted sum of the neighbor at each pixel.
np.flip(image, axis)
- axis = 0 : flipud() (vertical flipping)
- axis = 1 : fliplr() (horizontal flipping)
Result
실제 결과, conv_nested와 conv_fast의 연산 시간 차이가 많이 난다.
Part 2 : Cross-Correlation
Cross-correlation : same as convolution, except that the filter kernel is not flipped. It can be used for template matching. Template g is multiplied with regions of a larger image f to measure how similar each region is to the template.
[filters.py]
1) cross_correlation
def cross_correlation(f, g):
""" Cross-correlation of image f and template g.
Hint: use the conv_fast function defined above.
Args:
f: numpy array of shape (Hf, Wf).
g: numpy array of shape (Hg, Wg).
Returns:
out: numpy array of shape (Hf, Wf).
"""
out = None
### YOUR CODE HERE
g = np.flip(np.flip(g, 0), 1)
out = conv_fast(f, g)
### END YOUR CODE
return out
[line 151] np.flip(np.flip(g, 0), 1) : horizontal flip of vertical flip of g
▷ 결과적으로 g의 상하좌우 반전
[line 152] conv_fast of f and g
▷ convolution을 하면 상하좌우 반전되었던 g를 다시 상하좌우 반전하여 연산
▷ 결과적으로 원본의 g를 뒤집지 않고 f와 곱
먼저 shelf.jpg(f)와 templage.jpg(g)를 Grayscale로 변환 후 두 이미지를 cross-correlation 한다.
Grayscale을 하는 이유 : classification에선 grayscale 로도 충분
shelf.jpg(f) | template.jpg(g) |
![]() |
![]() |
Result
Q : How does the output of cross-correlation filter look? Explain what problems there might be with using a raw template as a filter.
A : 제대로 template을 detection 하지 못했을 뿐만 아니라 서로 correlated 된 부분이 너무 많아 부정확한 결과가 나왔다.
→ Solution : zero-mean cross-correlation
2) zero_mean_cross_correlation
def zero_mean_cross_correlation(f, g):
""" Zero-mean cross-correlation of image f and template g.
Subtract the mean of g from g so that its mean becomes zero.
Hint: you should look up useful numpy functions online for calculating the mean.
Args:
f: numpy array of shape (Hf, Wf).
g: numpy array of shape (Hg, Wg).
Returns:
out: numpy array of shape (Hf, Wf).
"""
out = None
### YOUR CODE HERE
g = g - np.mean(g)
out = cross_correlation(f, g)
### END YOUR CODE
return out
Result
Zero-mean Cross-correlation을 한 결과, template을 정확하게 detection 하였다.
(appropriate scaling and thresholding)
def check_product_on_shelf(shelf, product):
out = zero_mean_cross_correlation(shelf, product)
# Scale output by the size of the template
out = out / float(product.shape[0]*product.shape[1])
# Threshold output (this is arbitrary, you would need to tune the threshold for a real application)
out = out > 0.025
if np.sum(out) > 0:
print('The product is on the shelf')
else:
print('The product is not on the shelf')
왜 zero-mean cross-correlation을 하면 정확하게 detection을 하는 것인가?
3) normalized_cross_correlation
def normalized_cross_correlation(f, g):
""" Normalized cross-correlation of image f and template g.
Normalize the subimage of f and the template g at each step
before computing the weighted sum of the two.
Hint: you should look up useful numpy functions online for calculating
the mean and standard deviation.
Args:
f: numpy array of shape (Hf, Wf).
g: numpy array of shape (Hg, Wg).
Returns:
out: numpy array of shape (Hf, Wf).
"""
out = None
### YOUR CODE HERE
f = (f - np.mean(f))/np.var(f)
g = (g - np.mean(g))/np.var(g)
out = cross_correlation(f, g)
### END YOUR CODE
return out
Zero-mean cross-correlation is not robust to change in lighting condition (ex. shade).
→ Solution : Normalized Cross-correlation (Normalize the pixels of the image and template at every step before comparing them)
Result
Part 3 : Seperable Filters
Separable Convolution 에서는 커널 작업을 여러 단계로 나눌 수 있다. 컨볼루션을 y = conv(x, k)로 표현할 때, x는 입력 이미지, y 는 출력 이미지, k는 커널이다. 그리고 k=k1.dot(k2) 로 계산된다고 가정해 볼 때, 이것은 K와 2D Convolution 을 수행하는 대신 k1 과 k2로 1D Convolution 하는 것과 동일한 결과를 가져오기 때문에 Separable Convolution 이라고 할 수 있다.
즉 Separable Convolution은 단순히 커널을 두 개의 작은 커널로 나눈 뒤에 원래 커널로 작업한 결과물과 동일한 효과(원래는 9번의 곱셈)를 얻기 위하여 각각 3번의 곱셈으로 두 번의 컨볼루션을 수행하게 된다. 곱셈 연산이 적을 수록 계산 복잡성이 줄어들고, 네트워크를 더 빠르게 실행 할 수 있다.
(출처 : https://eehoeskrap.tistory.com/431)
1) Normal Convolution (2D)
# 5x5 Gaussian blur
kernel = np.array([
[1,4,6,4,1],
[4,16,24,16,4],
[6,24,36,24,6],
[4,16,24,16,4],
[1,4,6,4,1]
])
t0 = time()
out = conv_nested(img, kernel)
t1 = time()
t_normal = t1 - t0
Gaussian Kernal : widely used for blurring image
Result
2) Seperable Convolution (two 1D)
# The kernel can be written as outer product of two 1D filters
k1 = None # shape (5, 1)
k2 = None # shape (1, 5)
### YOUR CODE HERE
k1 = np.array([[1],[4],[6],[4],[1]])
k2 = np.array([[1,4,6,4,1]])
### END YOUR CODE
# Check if kernel is product of k1 and k2
if not np.all(k1 * k2 == kernel):
print('k1 * k2 is not equal to kernel')
assert k1.shape == (5, 1), "k1 should have shape (5, 1)"
assert k2.shape == (1, 5), "k2 should have shape (1, 5)"
# Perform two convolutions using k1 and k2
t0 = time()
out_separable = conv_nested(img, k1)
out_separable = conv_nested(out_separable, k2)
t1 = time()
t_separable = t1 - t0
Result
Seperable Convolution이 확실히 Normal convolution보다 빠른 것을 볼 수 있다.
'Computer Vision > CS131' 카테고리의 다른 글
[CS131] Lecture 5 & 6 (Canny Edge Detector, Lane Detection) (0) | 2021.07.18 |
---|
댓글