Since my last post on EfficientNet on LinkedIn I started further reading on building blocks of the EfficientNet architecture, it can be divided into Architecture (Mobile Inverted Residual Block, Squeeze and Excitation optimization) and Training (Swish activation / Quantization friendly hard Swish activation, Stochastic Depth with Survival probability etc).
EfficientNet architecture is very well explained recently by Aman Arora so if you dont want to go over the paper yourself you should at least read this and off course last year my friend Connor Shorten published a video explaining this paper.
The Fused Convolution, Superpixels and Conditional Conv (Cond conv) are not mentioned in their paper but they are only found in the code implementation.
An implementation of CondConv is also available in Tensorflow Github repository.
There is also a mobile-friendly EfficientNet architecture – EfficientNet-Lite, which removes Squeeze and Excitation network, replaces Swish activation with Relu6 for supporting post-training quantization and heterogeneous hardware.
Criticism and improvements of EfficientNet Architecture:
Here’s an article How Efficient is EfficientNet? that says EfficientNet’s performance on small dataset leaves much to be desired and ML community on Reddit and Twitter seems to agree.
Also there is an article from late July A more parameter-efficient bottleneck for EfficientNet! that introduces Efficient Channel Attention ‘ECA’, it shows that if you replace EfficientNet Linear bottleneck Squeeze-Excitation ‘SE’ layer with Linear Bottleneck ECA it is slightly better in terms of accuracy and also reduces the number of parameters by 12%