**Abstract**—

In this article, we develop method of linear and exponential quantization of neural network weights. We improve it by means of maximizing correlations between the initial and quantized weights taking into account the weight density distribution in each layer. We perform the quantization after the neural network training without a subsequent post-training and compare our algorithm with linear and exponential quantization. The quality of the neural network VGG-16 is already satisfactory (top5 accuracy 76%) in the case of 3-bit exponential quantization. The ResNet50 and Xception neural networks show top5 accuracy at 4 bits 79% and 61%, respectively.

This is a preview of subscription content, access via your institution.

## REFERENCES

- 1
Malsagov, M.Y., Khayrov, E.M., Pushkareva, M.M., et al., Exponential discretization of weights of neural network connections in pre-trained neural networks,

*Opt. Mem. Neural Networks*, 2019, vol. 28, no. 4, pp. 262–270. https://doi.org/10.3103/S1060992X19040106 - 2
ImageNet—huge image dataset. http://www.image-net.org.

- 3
Models for image classification with weights trained on ImageNet. https://keras.io/applications/.

- 4
Simonyan, K. and Zisserman, A., Very deep convolutional networks for large-scale image recognition, 2014. https://arxiv.org/abs/1409.1556.

- 5
He, K., Zhang, X., Ren, S., and Sun, J., Deep residual learning for image recognition, 2015. https://arxiv.org/abs/1512.03385.

- 6
Han, S., Mao, H., and Dally, W.J., Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding,

*Proc. 4th Int. Conf. on Learning Representations, ICLR 2016*, San Juan, 2016. - 7
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., and Zou, Y., DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients, 2016. arxiv.org/abs/1606.06160.

- 8
Kryzhanovsky, B.V., Kryzhanovsky, M.V., and Malsagov, M.Yu., Discretization of a matrix in quadratic functional binary optimization,

*Dokl. Math*., 2011, vol. 83, no. 3, pp. 413–417. https://doi.org/10.1134/S1064562411030197 - 9
Courbariaux, M., Bengio, Y., and David, J., Training deep neural networks with low precision multiplications, 2014. https://arxiv.org/abs/1412.7024.

- 10
Courbariaux, M., Bengio, Y., and David, J.-P., BinaryConnect: training deep neural networks with binary weights during propagations,

*Proc. 28th Int. Conf. on Neural Information Processing Systems (NIPS’15)*, 2015, vol. 2, pp. 3123–3131. - 11
Lin, Z., Courbariaux, M., Memisevic, R., and Bengio, Y., Neural networks with few multiplications,

*Proc. 3rd Int. Conf. on Learning Representations, ICLR 2015*, San Diego, CA, 2015. - 12
Lee, E.H., Miyashita, D., Chai, E., Murmann, B., and Wong, S.S., LogNet: energy-efficient neural networks using logarithmic computation,

*Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing*, Piscataway, NJ: Inst. Electr. Electron. Eng., 2017. - 13
Han, S., Pool, J., Tran, J., and Dally, W.J., Learning both weights and connections for efficient neural networks, 2015. https://arxiv.org/abs/1506.02626.

- 14
Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., and Chen, Y., Compressing neural networks with the hashing trick, 2015. https://arxiv.org/abs/1504.04788.

- 15
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., and Lempitsky, V., Speeding-up convolutional neural networks using fine-tuned cp-decomposition,

*Proc. 3rd Int. Conf. on Learning Representations, ICLR 2015*, San Diego, CA, 2015.

## CONFLICT OF INTEREST

The authors declare that they have no conflicts of interest.

## Funding

The work was financially supported by Russian Foundation for Basic Research no. 19-29-03030.

## Author information

### Affiliations

### Corresponding authors

## APPENDIX A

### APPENDIX A

*# accessory functions*

**def** f(x, func, kde, X, x_min, x_max):

y, px, cov, p = func(x, kde, X, x_min, x_max)

**return** cov

**def** grad(x, func, kde, X, x_min, x_max, alpha=10):

y, px, cov, p = func(x, kde, X, x_min, x_max)

step = alpha * px * (y[1:] – y[:–1]) * (y[1:] + y[:–1] - 2 * x) / 2

**return** step

**def** cov_kde(x0, kde, X, x_min, x_max):

'''

calculate distribution function, quantized values and covariation on the set x0

X – weights in this layer,

*x_min and x_max – minimal and maximal weight value in the layer*

x0 current set (variable values only)

'''

p = np.zeros(len(x0) + 1)

C = np.zeros(len(x0) + 1)

y = np.zeros(len(x0) + 1)

x_ext = sorted(np.append(x0, [x_min, x_max]))

**for** i **in** range(len(x_ext)-1):

mask = np.logical_and(x_ext[i] < X, X <= x_ext[i + 1])

p[i] = len(X[mask])

C[i] = np.sum(X[mask])

**if** p[i] == 0:

C[i] = 0

p[i] = 1

y = C / p

px = kde.evaluate(x0)

cov = np.linalg.norm(C / np.sqrt(p)) #/ sigma_kde

**return** y, px, cov, p

**def** results(kde, w, x0, x_min, x_max, func, bits, kde_std, ans_case='CG'):

'''

correlation maximization procedure for initial set x0 (only variable values),

w – layer,

*kde – kernel density estimation on random sample from weights,*

x_min and x_max – minimal and maximal weight values

'''

n_d = 2 ** bits

fx = lambda x: –f(x, func, kde, w, x_min, x_max)

gradx = lambda x: –grad(x, func, kde, w, x_min, x_max, alpha)

tol_curr = 1e–4

alpha = 10

ans = minimize(fun=fx, x0=x0, jac=gradx, method='CG', tol=tol_curr

solutions = ans['x']

correlations = -ans['fun']

gradients = np.linalg.norm(gradx(ans['x'])) / alpha / n_d

**return** solutions, correlations, gradients

## About this article

### Cite this article

Pushkareva, M.M., Karandashev, I.M. Exponential Discretization of Weights of Neural Network Connections in Pre-Trained Neural Network. Part II: Correlation Maximization.
*Opt. Mem. Neural Networks* **29, **179–186 (2020). https://doi.org/10.3103/S1060992X20030042

Received:

Revised:

Accepted:

Published:

Issue Date:

### Keywords:

- weight quantization
- correlation maximization
- exponential quantization
- neural network
- neural network compression
- reduction of bit depth of weights