In a Convolution Neural Network, each layer has its own set of filters, used to extract features from images called kernels. The hyperparameters to tune in a CNN include number, size of kernels, stride during convolution, and size of pooling kernels. Kernel size is important as it determines the receptive field (the size of the input region that produces the output feature/pattern maps) of a layer. Determining the optimal kernel size is a problem-specific task, depending on various parameters including the input image characteristics, desired output, features to be extracted, availability of computing resources, dataset size, resolution, etc.
In practice, an odd-sized kernel is preferred because it makes it possible to validate the coordinates of the origin within the kernel. This reduces the distortion between layers as the previous layer pixels would be symmetrical around the output pixel.
The low-level features (minor details) in an image such as edges and blobs and high-frequency details are assumed to be local, thus captured best by using small-sized kernels. In the pool of small kernels features extracted from 1x1 are local and fine-grained, with no information and learning from neighboring pixels. High-level features are formed by combining low-level features and providing representations containing the entire scene. In scenarios where large objects and features need to be extracted, we use large-sized kernels to achieve the required receptive field.
For most applications, the initial layers are used to extract low-level features and as we go deeper into the network the gap between high-level representation and low-level features is reduced. Increasing kernel size effectively increases computational complexity, and the total number of parameters in the network, leading to a model with higher complexity to address a given problem. Input image characteristics are an important factor in choosing the kernel size. For example, sharp images require small kernels to detect edges, unlike blurry images which require a large-sized kernel.
Over the years popular CNN architectures have evolved to be deeper, potentially risking overfitting. One of the common ways to combat this problem is by reducing kernel size.