In this article, we develop method of linear and exponential quantization of neural network weights. We improve it by means of maximizing correlations between the initial and quantized weights taking into account the weight density distribution in each layer. We perform the quantization after the neural network training without a subsequent post-training and compare our algorithm with linear and exponential quantization. The quality of the neural network VGG-16 is already satisfactory (top5 accuracy 76%) in the case of 3-bit exponential quantization. The ResNet50 and Xception neural networks show top5 accuracy at 4 bits 79% and 61%, respectively.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Malsagov, M.Y., Khayrov, E.M., Pushkareva, M.M., et al., Exponential discretization of weights of neural network connections in pre-trained neural networks, Opt. Mem. Neural Networks, 2019, vol. 28, no. 4, pp. 262–270. https://doi.org/10.3103/S1060992X19040106
ImageNet—huge image dataset. http://www.image-net.org.
Models for image classification with weights trained on ImageNet. https://keras.io/applications/.
Simonyan, K. and Zisserman, A., Very deep convolutional networks for large-scale image recognition, 2014. https://arxiv.org/abs/1409.1556.
He, K., Zhang, X., Ren, S., and Sun, J., Deep residual learning for image recognition, 2015. https://arxiv.org/abs/1512.03385.
Han, S., Mao, H., and Dally, W.J., Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding, Proc. 4th Int. Conf. on Learning Representations, ICLR 2016, San Juan, 2016.
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., and Zou, Y., DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients, 2016. arxiv.org/abs/1606.06160.
Kryzhanovsky, B.V., Kryzhanovsky, M.V., and Malsagov, M.Yu., Discretization of a matrix in quadratic functional binary optimization, Dokl. Math., 2011, vol. 83, no. 3, pp. 413–417. https://doi.org/10.1134/S1064562411030197
Courbariaux, M., Bengio, Y., and David, J., Training deep neural networks with low precision multiplications, 2014. https://arxiv.org/abs/1412.7024.
Courbariaux, M., Bengio, Y., and David, J.-P., BinaryConnect: training deep neural networks with binary weights during propagations, Proc. 28th Int. Conf. on Neural Information Processing Systems (NIPS’15), 2015, vol. 2, pp. 3123–3131.
Lin, Z., Courbariaux, M., Memisevic, R., and Bengio, Y., Neural networks with few multiplications, Proc. 3rd Int. Conf. on Learning Representations, ICLR 2015, San Diego, CA, 2015.
Lee, E.H., Miyashita, D., Chai, E., Murmann, B., and Wong, S.S., LogNet: energy-efficient neural networks using logarithmic computation, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Piscataway, NJ: Inst. Electr. Electron. Eng., 2017.
Han, S., Pool, J., Tran, J., and Dally, W.J., Learning both weights and connections for efficient neural networks, 2015. https://arxiv.org/abs/1506.02626.
Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., and Chen, Y., Compressing neural networks with the hashing trick, 2015. https://arxiv.org/abs/1504.04788.
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., and Lempitsky, V., Speeding-up convolutional neural networks using fine-tuned cp-decomposition, Proc. 3rd Int. Conf. on Learning Representations, ICLR 2015, San Diego, CA, 2015.
CONFLICT OF INTEREST
The authors declare that they have no conflicts of interest.
The work was financially supported by Russian Foundation for Basic Research no. 19-29-03030.
# accessory functions
def f(x, func, kde, X, x_min, x_max):
y, px, cov, p = func(x, kde, X, x_min, x_max)
def grad(x, func, kde, X, x_min, x_max, alpha=10):
y, px, cov, p = func(x, kde, X, x_min, x_max)
step = alpha * px * (y[1:] – y[:–1]) * (y[1:] + y[:–1] - 2 * x) / 2
def cov_kde(x0, kde, X, x_min, x_max):
calculate distribution function, quantized values and covariation on the set x0
X – weights in this layer,
x_min and x_max – minimal and maximal weight value in the layer
x0 current set (variable values only)
p = np.zeros(len(x0) + 1)
C = np.zeros(len(x0) + 1)
y = np.zeros(len(x0) + 1)
x_ext = sorted(np.append(x0, [x_min, x_max]))
for i in range(len(x_ext)-1):
mask = np.logical_and(x_ext[i] < X, X <= x_ext[i + 1])
p[i] = len(X[mask])
C[i] = np.sum(X[mask])
if p[i] == 0:
C[i] = 0
p[i] = 1
y = C / p
px = kde.evaluate(x0)
cov = np.linalg.norm(C / np.sqrt(p)) #/ sigma_kde
return y, px, cov, p
def results(kde, w, x0, x_min, x_max, func, bits, kde_std, ans_case='CG'):
correlation maximization procedure for initial set x0 (only variable values),
w – layer,
kde – kernel density estimation on random sample from weights,
x_min and x_max – minimal and maximal weight values
n_d = 2 ** bits
fx = lambda x: –f(x, func, kde, w, x_min, x_max)
gradx = lambda x: –grad(x, func, kde, w, x_min, x_max, alpha)
tol_curr = 1e–4
alpha = 10
ans = minimize(fun=fx, x0=x0, jac=gradx, method='CG', tol=tol_curr
solutions = ans['x']
correlations = -ans['fun']
gradients = np.linalg.norm(gradx(ans['x'])) / alpha / n_d
return solutions, correlations, gradients
About this article
Cite this article
Pushkareva, M.M., Karandashev, I.M. Exponential Discretization of Weights of Neural Network Connections in Pre-Trained Neural Network. Part II: Correlation Maximization. Opt. Mem. Neural Networks 29, 179–186 (2020). https://doi.org/10.3103/S1060992X20030042
- weight quantization
- correlation maximization
- exponential quantization
- neural network
- neural network compression
- reduction of bit depth of weights