您的当前位置：首页正文

Patch Partition

来源：易榕旅网

1️⃣代码

import numpy as np

def patch_partition(image, patch_size=4):
    """
    输入一个图像，将其按 patch_size 分割，每个 patch 展平后合并。
    假设输入图像是 RGB 三通道，形状为 [H, W, 3]
    
    :param image: 输入图像，形状为 [H, W, 3]
    :param patch_size: 每个 patch 的大小，默认为 4x4
    :return: Patch Partition 后的图像，形状为 [H/patch_size, W/patch_size, 48]
    """
    H, W, C = image.shape  # H: height, W: width, C: channels (RGB)

    # 计算 patch 数量
    assert H % patch_size == 0 and W % patch_size == 0, "H and W must be divisible by patch_size"

    # 创建一个新的数组，用于存储展平后的 patch
    patches = []

    # 对图像按 patch_size 分割
    for i in range(0, H, patch_size):
        for j in range(0, W, patch_size):
            # 提取当前 patch
            patch = image[i:i+patch_size, j:j+patch_size, :]
            # 将当前 patch 展平，变为 (patch_size * patch_size * C,) 的向量
            patch_flattened = patch.reshape(-1)
            # 将展平后的 patch 添加到 patches 列表中
            patches.append(patch_flattened)
    
    # 将 patches 转换为 numpy 数组，并重新调整形状
    patches = np.array(patches)
    # 将 patches 重塑为 [H/patch_size, W/patch_size, patch_size*patch_size*C] 形状
    patches = patches.reshape(H // patch_size, W // patch_size, patch_size * patch_size * C)
    
    return patches

#-------------------------------------------------
# 创建一个 RGB 图像（假设是 8x8 的图像）
H, W = 8, 8
image = np.zeros((H, W, 3), dtype=np.uint8)
# 填充图像的三个通道
# 红色通道：0-63
image[:, :, 0] = np.arange(0, 64).reshape((H, W))
# 绿色通道：64-127
image[:, :, 1] = np.arange(64, 128).reshape((H, W))
# 蓝色通道：128-191
image[:, :, 2] = np.arange(128, 192).reshape((H, W))
# 打印原始图像
print("Original image shape:", image.shape)
print(image[:, :, 0])
print(image[:, :, 1])
print(image[:, :, 2])


#-------------------------------------------------
# 进行 Patch Partition
patches = patch_partition(image, patch_size=4)

#-------------------------------------------------
# 打印 Patch Partition 后的形状和内容
print("\nPatches shape:", patches.shape)
for i in range(0,48):
    print(patches[:, :, i])

2️⃣ 一张图解释

1.划分为 Patch:

输入的图像大小为 8×8,假设 patch 大小为 4×4。
则将图像划分为 $\frac{8}{4}\times\frac{8}{4}=2\times2$ 个 patches，即上图中所展示的
每个 patch 是 4×4 大小，且每个像素有 3 个通道 (RGB)，把每个patch 在RGB方向展平为一个大小为 3×16=48 的向量。
所以，8×8 的 RGB 图像变成4个patch，每个patch都是一个大小为 48 维的向量。
举个例子，对于上图，第一个patch就是 0 64 128 1 65 129……

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文