1 Introduction

In the digital age, virtual reality has defined a new form of human–computer interaction, providing an immersive 3D environment. However, with the popularization of VR, users’ expectations for visual quality are also increasing [1, 2]. The jagged images and blurred textures in complex scenes can affect the visual experience, while hardware and network limitations can hinder smoothness and clarity [3]. These issues affect user immersion and the potential of VR. To address this issue, it is crucial to study VR visual enhancement [4,5,6,7,8]. By optimizing image rendering and texture mapping, a more realistic virtual world can be provided to meet user needs, promote the development of VR, and expand its applications.

For this reason, it is necessary to study more significant enhancement methods for VR scene detail enhancement [9]. Sandoub et al. introduced halo artifacts and color distortion to existing bright channel prior and maximum color channel enhancement algorithms during VR scene detail enhancement, consequently impacting the overall efficacy of the enhancement. To address these challenges, they proposed an effective approach for enhancing VR scene details through fusion. This method involves estimating the illuminance of VR scene details from the brightest and largest color channels to mitigate halo artifacts and color distortion. Furthermore, an effective thinning method is applied to enhance the scene reflectivity of the initial VR scene detail image. Experimental results demonstrate the capability of this method to reduce halo artifacts and color distortion while preserving the natural appearance of VR scene details [10]. However, this method will ignore the local area information and easily lose the VR scene detail information, affecting the VR scene detail enhancement effect. Aguirre-Castro et al. first decomposed the input color image into three VR scenes according to the R, G, and B components of the pixel, representing the intensity of reflected light with different wavelengths (long wave, medium wave, and short wave) in the VR scene; Then calculate the relative light dark relationship between pixels in the long-wave, medium-wave, and short-wave bands, respectively, and then determine the color of each pixel; Finally, the colors in Retinex chromaticity space are linearly mapped to RGB space to obtain the enhanced VR scene details. The VR scene obtained by this method has the characteristics of color fidelity and large dynamic range, and can obtain very good VR scene detail enhancement effect [11]. However, although this method can better retain the VR scene details, it sacrifices the computational complexity and has obvious chessboard effect. Rao et al. adopted the Bicubic Histogram-Based Image Enhancement (BHBI) technology for enhancing the contrast and resolution of VR scene details. This method incorporates a two-pronged strategy: utilizing double histogram equalization to amplify contrast and bicubic interpolation to upscale resolution. The double histogram equalization process involves segmenting the input VR scene's histogram at its mean, followed by independent equalization of the resulting parts. Simultaneously, bicubic interpolation is used to generate enlarged or sharper VR scenes from their smaller or lower resolution counterparts. Experimental findings demonstrate that this approach excels in key metrics such as discrete entropy (DE), image quality index (IQI), normalized correlation coefficient (NCC), contrast improvement index (CII), and absolute mean brightness error (AMBE), as documented in [12]. However, this method ignores the overall characteristics of VR scenes, resulting in a lack of hierarchy in VR scenes. Jeong et al. proposed a VR scene detail enhancement method based on maximum gray frequency suppression and dynamic histogram equalization. Initially, they segmented the VR scene detail image by analyzing the gray distribution within the histogram. Based on the user’s interested gray range, they determined the gray mapping scope and frequency control threshold for each sub-level VR scene detail image. Subsequently, they redefined the sub-level histograms using their respective thresholds and employed the histogram equalization algorithm to devise a gray-scale conversion function. This function mapped the sub-layer grayscale to a designated range. The final step involved synthesizing the enhancement outcomes from each sub-layer to produce the ultimate enhanced VR scene detail image. Experimental data indicate that this approach effectively curbs over-enhancement, allowing users to control the grayscale within their areas of interest. This meets users’ demands for controllable contrast enhancement in VR scene detail images [13]. Nonetheless, this technique is computationally demanding and may deviate from natural object imaging principles, potentially leading to issues like VR scene detail loss and color distortion. Prasath and Kumanan put forth an innovative hybrid technique termed the population update dragonfly algorithm, blending particle swarm optimization (PSO) with the dragonfly algorithm (DA). This method aims to pinpoint the best histogram for VR scene detail amplification, boosting contrast, and color richness in VR images. Testing outcomes reveal that this approach has unique benefits in refining VR visuals [14]. However, this method cannot guarantee good results in dynamic range compression and contrast enhancement at the same time. Peng et al. proposed an image enhancement method with adaptive color compensation and detail optimization, considering the attenuation level of each optical channel to guide color correction based on attenuated images, and introduced a brightness adjustment method to give the output image a good natural appearance. Color corrected images are processed using gradient oriented local contrast enhancement and multi-scale edge optimization methods, respectively [15]. However, this method suffers from high computational complexity, as it involves multiple complex image processing steps, including color compensation, local contrast enhancement, and edge optimization, resulting in longer image processing time, especially when processing high-resolution images. Vijayalakshmi et al. proposed a two-dimensional histogram equalization enhancement method based on total variation decomposition. The total variation (TV)/L1 decomposition method is used to retrieve information from low contrast images, and detailed images created by iterative methods are used to construct a two-dimensional histogram to determine the cumulative distribution function (CDF). Then, CDF is passed to distribute intensity throughout the dynamic range, resulting in enhanced images [16]. However, this method is prone to introducing unnatural appearance or color shift during the processing, resulting in a decrease in the naturalness of the image.

To solve the problems of color shift and high noise in existing visual image enhancement methods, a VR visual detail enhancement method based on deep reinforcement learning algorithm is proposed. Deep reinforcement learning can handle complex, high-dimensional state and action spaces [17]. In VR scene detail enhancement, it is necessary to process a large amount of image data and user interaction information, which are often high-dimensional. Deep reinforcement learning algorithms can effectively process these data and improve the detailed representation of the scene.

2 VR Scene Detail Enhancement Method

2.1 VR Scene Denoising

To refine the intricacies of VR scenes, the Total Variation (TV) denoising algorithm is engaged to minimize noise interference. The essence of the TV denoising approach involves disentangling the noisy VR scene, designated as \(f\), into two clear-cut elements: a pristine scene \(c\) and noise \(n\), collectively known as \(f = c + n\). The TV denoising model is mathematically represented as

$$c = \arg \mathop {\min }\limits_{c} \left\{ {\int\limits_{\Omega } {\left| {\nabla c} \right|dc + \frac{1}{2}\lambda \left\| {f - c} \right\|_{2}^{2} } } \right\},$$
(1)

where \(TV(c) = \int\nolimits_{\Omega } {\left| {\nabla c} \right|dc}\) is the total variation of \(c\), \(\Omega\) represents the entire VR scene, and \(\lambda\) represents the scale parameter [18].

Because it is difficult to directly solve the denoising model, it is usually converted to Euler–Lagrange equation for solution, and the results are as follows:

$$div\frac{\nabla c}{{\left| {\nabla c} \right|}} + \lambda (f - c) = 0.$$
(2)

Decompose \(div\frac{\nabla c}{{\left| {\nabla c} \right|}}\) to get

$$\begin{aligned} div\frac{{\nabla c}}{{\left| {\nabla c} \right|}} & = \frac{1}{{\left| {\nabla c} \right|}}\nabla (\nabla c) + \nabla c\nabla \left( {\frac{1}{{\left| {\nabla c} \right|}}} \right) \\& = \frac{{c_{{xx}} + c_{{yy}} }}{{\left| {\nabla c} \right|}} - \frac{{c_{x}^{2} c_{{xx}} + 2c_{x} c_{y} c_{{xy}} + c_{y}^{2} c_{{yy}} }}{{c_{x}^{2} + c_{y}^{2} }}\frac{1}{{\left| {\nabla c} \right|}} \\ & = {\text{ }}\frac{{c_{{\xi \xi }} + c_{{\eta \eta }} }}{{\left| {\nabla c} \right|}} - \frac{{c_{{\eta \eta }} }}{{\left| {\nabla c} \right|}}.\end{aligned}$$
(3)

Among them, \(c_{\eta \eta } = \frac{{c_{x}^{2} c_{xx} + 2c_{x} c_{y} c_{xy} + c_{y}^{2} c_{yy} }}{{c_{x}^{2} + c_{y}^{2} }}\) represents the direction along the pixel gradient, \(c_{\xi \xi } = \frac{{c_{x}^{2} c_{xx} - 2c_{x} c_{y} c_{xy} + c_{y}^{2} c_{yy} }}{{c_{x}^{2} + c_{y}^{2} }}\) represents the orthogonal direction along the pixel gradient. When the VR scene is flat, the diffusion coefficient \(\frac{1}{{\left| {\nabla c} \right|}}\) is large, strong diffusivity, can effectively remove noise; when in the edge area of the VR vista, \(\frac{1}{{\left| {\nabla c} \right|}}\) is small, weak diffusion, thus protecting the edge. It can be seen from the above analysis that the algorithm may misjudge the noise as VR scene edge, leading to false edges in the flat area, that is, “ladder effect”. Therefore, the TV denoising model needs to be improved.

If the L \(L_{1}\)-norm \(\left| {\nabla c} \right|\) of the gradient of the TV denoising model is changed to the \(L_{2}\)-norm \(\left| {\nabla c} \right|^{2}\) of the gradient, then the improved TV model can be expressed as

$$c = \arg \mathop {\min }\limits_{c} \left\{ {\int\limits_{\Omega } {\left| {\nabla c} \right|^{2} dc + \frac{1}{2}\lambda \left\| {f - c} \right\|_{2}^{2} } } \right\}.$$
(4)

The corresponding Euler–Lagrange equation is

$$2div(\nabla c) + \lambda (f - c) = 0.$$
(5)

The decomposition results of \(div(\nabla c)\) is

$$div(\nabla c) = \nabla (\nabla c) = \frac{{c_{x}^{2} c_{{xx}} - 2c_{x} c_{y} c_{{xy}} + c_{y}^{2} c_{{yy}} }}{{c_{x}^{2} + c_{y}^{2} }} + {\text{ }}\frac{{c_{x}^{2} c_{{xx}} + 2c_{x} c_{y} c_{{xy}} + c_{y}^{2} c_{{yy}} }}{{c_{x}^{2} + c_{y}^{2} }} = c_{{\xi \xi }} + c_{{\eta \eta }}$$
(6)

As can be seen from Formula (6), the coefficients of \(c_{\xi \xi }\) and \(c_{\eta \eta }\) are all 1, indicating that the improved TV algorithm is an isotropic diffusion model, and the isotropic peer diffusion model can effectively avoid the staircase effect.

Due to the slow speed of traditional iterative methods, this paper employs the Split Bregman iterative framework to solve the proposed denoising model, which can significantly reduce computational complexity. As a result, Formula (5) can be converted to

$$2div(\nabla c) + \lambda (f + n_{k - 1} - c) = 0$$
(7)
$$c_{k} = 2div(\nabla c_{k - 1} ) + \lambda_{k} (f + n_{k - 1} - c_{k - 1} )$$
(8)
$$\lambda_{k} = \frac{1}{{\sigma^{2} \left| \Omega \right|}}\int\limits_{\Omega } {2div(\nabla c_{k - 1} )(c_{k} - f)} dc.$$
(9)

In the above formula, \(c_{0} = f\); \(n_{0} = 0\); \(k = 1,2,3,...\); \(\sigma^{2}\) represents the variance of VR scene noise, and \(\left| \Omega \right|\) represents the area of the VR vista.

Following noise decomposition, denoising is carried out using the Non-Local Means (NLM) algorithm. The NLM algorithm operates on the premise that VR scene pixels can be described through their adjacent pixel blocks. Correlation calculations between pixel blocks determine weight allocation, and these weights are then used to adjust pixel values in the field, thereby attaining denoising.

Set the VR scene with noise as \(f = \left\{ {f(i,j)\left| {(i,j) \in U} \right.} \right\}\), get filtered VR scene through NLM algorithm \(K\), the VR scene is obtained from the weighted average of pixels in its neighborhood, and the calculation formula is

$$K(i,j) = \sum\limits_{(u,v) \in I} {w(i,j,u,v)f(u,v)} .$$
(10)

Among them, \(w(i,j,u,v)\) represents pixels \(f(i,j)\) in its window neighborhood \(\{ (u,v) \in I\}\) to other pixel pairs inside \(f(i,j)\)’s weight of similarity. The weight should meet the following conditions:

$$\left\{ \begin{gathered} 0 \le w(i,j,u,v) \le 1 \hfill \\ \sum\limits_{(u,v) \in I} {w(i,j,u,v) = 1} \hfill \\ \end{gathered} \right..$$
(11)

The calculation formula of weight \(w(i,j,u,v)\) is

$$w(i,j,u,v) = \frac{1}{C(u,v)}\exp \left( {\frac{{G_{\sigma } \otimes S^{2} (i,j,u,v)}}{{h^{2} }}} \right),$$
(12)

where

$$C(u,v) = \sum\limits_{(u,v) \in } {\exp \left( {\frac{{G_{\sigma } \otimes S^{2} (i,j,u,v)}}{{h^{2} }}} \right)} .$$
(13)

In the above formula, \(G_{\sigma }\) represents standard deviation, \(\sigma\) represents Gaussian kernel, \(S^{2} (i,j,u,v)\) represents the Euclidean distance between pixel blocks in the search window, \(h\) represents the smoothing coefficient of VR scene, and \(C(u,v)\) is the normalization coefficient of the weight value.

2.2 VR Scene Feature Point Extraction

Upon completing the denoising of the VR scene, the Speeded Up Robust Features (SURF) algorithm is utilized for extracting salient features of the VR environment. Renowned for its resilience against translations, rotations, scalings, and other VR scene variations, SURF is a robust approach for detecting and delineating local features. The procedure to pinpoint these distinctive features encompasses the following steps:

  1. (1)

    Hessian matrix construction

Set up \(X = (x,y)\) as a pixel in the VR scene, and the scale is \(\tau\)’s VR vista, its Hessian matrix \(H(X,\tau )\) is defined as follows:

$$H(X,\tau ) = K\left[ {\begin{array}{*{20}c} {L_{xx} (X,\tau )}\quad {L_{xy} (X,\tau )} \\ {L_{xy} (X,\tau )}\quad{L_{yy} (X,\tau )} \\ \end{array} } \right].$$
(14)

The convolutional template is used to calculate \(L(X,\tau )\). To improve the calculation speed, the box filter can be used for approximation. Therefore, using the box filter, an approximate Hessian matrix can be quickly constructed, and its column and row formula is as follows:

$$Det(H_{approx} ) = D_{xx} D_{yy} - (\omega D_{xy} )^{2} .$$
(15)

Among them, \(D_{xx}\), \(D_{xy}\), and \(D_{yy}\), respectively, represent pixels in the VR scene \((x,y)\) convolution with box filters of different sizes, and \(\omega\) is a parameter to adjust the determinant of Hessian matrix.

  • 2) Scale space construction

Unlike the classical SIFT (Scale Invariant Feature Transform) algorithm, which constructs the scale space through image down sampling, the SURF algorithm constructs the scale space by gradually changing the size of the template. The SURF algorithm has a faster processing speed [19] because it does not need to down sample the image and can simultaneously process multi-layer images in the scale space.

To construct the scale space, it is necessary to first determine the pyramid order octave of VR view and the layer interval of each order, and then determine the size \(N\) of the box filter of any order and any layer according to Formula (16):

$$N = 3 \times (2^{octave} \times interval + 1)$$
(16)
  • (3) Feature point positioning

Initially, a non-maximum suppression process was conducted in a 3 × 3 × 3 domain. Considering the vast scale differences present in the first layer of each group, it becomes imperative to interpolate the maximum determinant of the Hessian matrix detected in both the scale and VR scene spaces. This interpolation is necessary to elevate the accuracy of feature point positioning. Ultimately, by pinpointing extreme value samples within the vicinity and applying Formula (17) for three-dimensional quadratic fitting, we acquire local maxima with sub-pixel and sub-scale accuracy, which are designated as feature points

$$L(X) = L + \frac{{\partial L^{T} }}{\partial X}X + \frac{1}{2}X^{T} \frac{{\partial^{2} L}}{{\partial X^{2} }}X.$$
(17)

Among them, \(X = (x,y,s)^{T}\) is the scale space coordinate, and \(L(X)\) is the Laplace approximation.

2.3 Details Enhancement Method Based on Deep Reinforcement Learning

After the feature extraction of VR scene is completed, the depth reinforcement learning algorithm is used to enhance the details of VR scene. The deep reinforcement learning algorithm constrains the learning behavior of the policy network by taking the objective function of the value network in the Actor Critical (AC) algorithm as the penalty item of the policy gradient algorithm to improve the stability and performance of the algorithm. This algorithm effectively limits the error accumulation caused by sub optimal strategy execution when the value network pair estimation is highly inaccurate, and improves the VR scene detail enhancement effect [20,21,22].

The process of VR scene detail enhancement based on deep reinforcement learning is modeled as a sequential decision problem [23,24,25]. This framework defines an operator set, which includes 8 kinds of decoration operations for VR scene detail images: exposure, contrast, saturation, color curve, hue curve, brightness and contrast adjustment (or black and white adjustment, depending on your intended meaning), white balance, and gamma correction. The agent selects an operation from the set and determines its parameter value, which acts on the input VR scene detail image. The system evaluates the merits of this action and enters the next decoration state. Repeat the above behavior until you can get the VR scene detail image with good visual effect.

Denote the VR scene detail enhancement model as \(P = \left( {Q,B} \right)\). Here, \(Q\) designates the state space, encompassing the initial VR scene detail image and all transient states throughout the enhancement progression, whereas \(q_{t} \in Q\) signifies the agent's current position at step \(t\). Meanwhile, \(B\) signifies the action space, containing operators feasible for decision-making during the VR scene detail enhancement process, where \(b_{t} \in B\) designates the action undertaken in the present state. The probability of the agent moving to state \(q_{t + 1}\) upon enacting action \(b_{t}\) within state \(q_{t}\) is represented as

$$q_{t + 1} = p\left( {q_{t + 1} \left| {q_{t} ,b_{t} } \right.} \right),$$
(18)

where \(p\left( \cdot \right)\) is the state transition function of VR visual detail image. Each time an action is performed, the environment gives an immediate reward, known as the reward function \(r\) in reinforcement learning. By applying a series of actions to the original VR visual detail image, a trajectory \(\phi\) composed of states, actions and rewards is formed, which is expressed as

$$\phi = \left\{ {q_{0} ,b_{0} ,r_{0} ,q_{1} ,b_{1} ,r_{1} , \cdots ,q_{t - 1} ,b_{t - 1} ,r_{t - 1} ,q_{T} } \right\},$$
(19)

where \(T\) is the total number of steps. Define the sum of rewards earned after the \(q_{t}\) state as the cumulative discount return \(r_{t}^{\gamma }\), that is

$$r_{t}^{\gamma } = \sum\limits_{\tau = 0}^{T - t} {\lambda \gamma_{\tau } r\left( {x_{t + \tau } ,b_{t + \tau } } \right)} .$$
(20)

The discount factor can indicate how much the Agent considers the future reward. The VR vision detail enhancement strategy \(\pi\) is defined as the probability density function of the action space under the current state \(q_{t}\). The purpose of this enhancement algorithm is to find out the optimal VR vision detail enhancement strategy in the sequential decision process to maximize the expected return \(J\left( z \right)\) of all possible tracks under this strategy. This optimization process can be expressed as

$$\mathop {\arg \max }\limits_{z} J\left( z \right) = \mathop {\arg \max }\limits_{z} \mathop E\limits_{{q \sim \rho^{z} ,\phi \sim z}} \left[ {V_{0}^{\gamma } \left| z \right.} \right]L(X),$$
(21)

where \(V_{0}^{\gamma }\) is the cumulative discount return obtained by Agent from the initial state \(q_{0}\); \(\rho^{z}\) is the discount state access distribution, defined as

$$\rho^{z} = \sum\limits_{t = 0,\phi \sim z}^{\infty } {H\left( {x_{t} = x} \right)\lambda \gamma_{t} } ,$$
(22)

where \(H\) is the probability density function. Similarly, the state value function \(V^{z} \left( s \right)\) represents the expected value of the cumulative discount reward obtained by the interaction of state \(s_{t}\) with the environment following the VR visual detail enhancement strategy \(z\), namely

$$V^{z} \left( q \right) = \mathop E\limits_{{q_{t} = q,\phi \sim z}} \left[ {r_{t}^{\lambda \gamma } } \right].$$
(23)

The VR visual detail state-action value function \(G^{z} \left( {q_{t} ,b_{t} } \right)\) can be represented by the state value function \(V^{z} \left( q \right)\) as

$$G^{z} \left( {q_{t} ,b_{t} } \right) = \mathop E\limits_{{q \sim \rho^{z} ,b \sim b_{t} \phi \sim z}} \left[ {r\left( {q_{t} ,b_{t} } \right) + \lambda \gamma V^{z} \left( {H\left( {q_{t} ,b_{t} } \right)} \right)} \right].$$
(24)

The advantage function \(B^{z} \left( {q_{t} ,b_{t} } \right) = G^{z} \left( {q_{t} ,b_{t} } \right) - V^{z} \left( {q_{t} } \right)\) was used to evaluate the appropriateness of performing the VR scene detail enhancement action \(b_{t}\) under state \(q_{t}\).

To simulate the post-retouching process of VR visual detail images, the action space is divided into discrete action space \(B_{1}\) (selection of retouching operators) and continuous action space \(B_{2}\) (range of random variables of operators). Therefore, the above VR vision detail enhancement strategy \(z\) consists of two parts: stochastic strategy \(z_{1}\) and deterministic strategy \(z_{2}\). \(z_{1}\) is the probability distribution of the selection of VR scene detail enhancement action \(b_{1}\) in the current state, and \(z_{2}\) is the selection of the optimal parameter \(b_{2}\) within the value range of the action after selecting an action. The advantage actor-critic (A2C) algorithm was used to optimize the above VR visual detail enhancement strategy. The algorithm framework mainly consists of two policy networks (stochastic policy network, deterministic policy network) and value network. The value network critic approximates the state value function \(V^{z}\), and the dual-strategy network actor updates the VR vision detail enhancement strategy according to the state-action value function \(G^{z}\) and dominance function \(B^{z}\) respectively, and obtains the reasonable probability and optimal parameter value of selecting each action in each state.

The state function \(V\) and the policy function \(z = \left( {z_{1} ,z_{2} } \right)\) are approximated by the convolutional neural network \(V^{\omega }\) and \(z^{{\left( {\theta_{1} ,\theta_{2} } \right)}}\), respectively, where \(\omega\) and \(\theta = \left( {\theta_{1} ,\theta_{2} } \right)\) are the learning parameters of the value network and the two-strategy network, respectively. The time difference (TD) error is used as an unbiased estimate of the dominance function to approximate the gain of a certain state. In practical applications, the TD error \(\delta\) is approximated using a value network to reduce parameters and improve the stability of training, that is

$$\delta \left( {q_{t} ,q_{t + 1} ;\omega } \right) = r\left( {q_{t} ,b_{t} } \right) + \lambda \gamma V\left( {H\left( {q_{t} ,b_{t} } \right);\omega } \right) - V\left( {q_{t} ;\omega } \right).$$
(25)

Optimize the value network by minimizing \(l_{\omega }\)

$$l_{\omega } = 0.5\mathop E\limits_{{q \sim \rho^{z} ,b \sim z\left( q \right)}} \left[ {\delta \left( {q_{t} ,q_{t + 1} ;\omega } \right)} \right]^{2} .$$
(26)

Since the actions are divided into discrete and continuous actions, the parameters are updated by stochastic and deterministic strategy gradient algorithms, respectively. The strategy gradient is

$$\begin{gathered} \nabla_{{\theta_{1} }} J\left( {\pi_{\theta } } \right) = \mathop E\limits_{{q \sim \rho^{z} ,b_{1} \sim z_{1} \left( q \right),b_{1} = z_{2} \left( {q,b_{1} } \right)}} \left[ {\nabla_{{\theta_{1} }} \log z_{1} \left( {b_{1} \left| {q;\theta_{1} } \right.} \right)B\left( {q,\left( {b_{1} ,b_{2} } \right);\omega } \right)} \right] \hfill \\ \nabla_{{\theta_{2} }} J\left( {\pi_{\theta } } \right) = \mathop E\limits_{{q \sim \rho^{z} ,b_{2} = z_{2} \left( {q,b_{1} } \right)}} \left[ {\nabla_{{\theta_{2} }} \log z_{2} \left( {b_{1} \left| {q;\theta_{2} } \right.} \right)\nabla_{{a_{2} }} G\left( {q,\left( {b_{1} ,b_{2} } \right);\omega } \right)} \right]. \hfill \\ \end{gathered}$$
(27)

Among them, the dominance function \(B\) can be calculated by TD error, the action value function \(G\) is substituted by Eq. (24), and its gradient is calculated by chain rule. The update formula of each parameter is as follows:

$$\begin{gathered} \theta_{1,t + 1} = \theta_{1,t} + \nabla_{{\theta_{1} }} J\left( {z_{\theta } } \right) \hfill \\ \theta_{2,t + 1} = \theta_{2,t} + \nabla_{{\theta_{2} }} J\left( {z_{\theta } } \right) \hfill \\ \omega_{t + 1} = \omega_{t} - \nabla_{\omega } L_{\omega } . \hfill \\ \end{gathered}$$
(28)

However, the updated parameters in Eq. (28) are easy to lead to the over fitting problem of the deep reinforcement learning algorithm. Therefore, the meta-learning idea is introduced into the gradient descent method to update the parameters of the deep reinforcement learning algorithm and improve the VR scene detail enhancement effect. The specific steps are as follows:

Step 1: The parameters to be updated are \(\theta_{1}\), \(\theta_{2}\), and \(\omega\). Assume that there are \(n\) VR scene detail enhancement training tasks, and sample training sample \(U_{j}\) and test sample \(D_{j}\) from each task, with \(1 \le j \le n\). Using gradient descent method to update parameters, the following results are obtained:

$$\begin{gathered} \theta_{1,t + 1} = \theta_{1,t} - a\nabla_{{\theta_{1} }} l_{{U_{j} }} \left( {F_{{\theta_{1} }} } \right) \hfill \\ \theta_{2,t + 1} = \theta_{2,t} - a\nabla_{{\theta_{2} }} l_{{U_{j} }} \left( {F_{{\theta_{2} }} } \right) \hfill \\ \omega_{t + 1} = \omega_{t} - a\nabla_{\omega } l_{{U_{j} }} \left( {F_{\omega } } \right), \hfill \\ \end{gathered}$$
(29)

where \(a\) is the learning rate of learning a specific task; \(F\) is the VR visual detail enhancement framework of deep reinforcement learning Exposre; \(l_{{U_{j} }} \left( {F_{{\theta_{1} }} } \right)\), \(l_{{U_{j} }} \left( {F_{{\theta_{2} }} } \right)\), \(l_{{U_{j} }} \left( {F_{\omega } } \right)\) are the losses of the training sample based on \(\theta_{1,t + 1}\), \(\theta_{2,t + 1}\), and \(\omega_{t + 1}\).

Step 2: Calculate the loss \(l_{{D_{j} }} \left( {F_{{\theta_{1,t + 1} }} } \right)\), \(l_{{D_{j} }} \left( {F_{{\theta_{2,t + 1} }} } \right)\), \(l_{{D_{j} }} \left( {F_{{\omega_{t + 1} }} } \right)\) of the test sample based on \(\theta_{1,t + 1}\), \(\theta_{2,t + 1}\), \(\omega_{t + 1}\), and record the cumulative loss \(\sum\nolimits_{j = 1}^{N} {l_{{D_{j} }} \left( {F_{{\theta_{1,t + 1} }} } \right)}\), \(\sum\nolimits_{j = 1}^{N} {l_{{D_{j} }} \left( {F_{{\theta_{2,t + 1} }} } \right)}\), and \(\sum\nolimits_{j = 1}^{N} {l_{{D_{j} }} \left( {F_{{\omega_{t + 1} }} } \right)}\) of all tasks.

Step 3: Calculate gradients \(\nabla_{{\theta_{1} }} l_{{D_{j} }} \left( {F_{{\theta_{1,t + 1} }} } \right)\), \(\nabla_{{\theta_{2} }} l_{{D_{j} }} \left( {F_{{\theta_{2,t + 1} }} } \right)\), and \(\nabla_{\omega } l_{{D_{j} }} \left( {F_{{\omega_{t + 1} }} } \right)\) based on parameters \(\theta_{1}\), \(\theta_{2}\), and \(\omega\).

Step 4: The gradient descent method is used to update the parameters \(\theta_{1}\), \(\theta_{2}\), \(\omega\) again, to obtain

$$\begin{gathered} \theta_{1,t + 1} = \theta_{1,t} - \beta \nabla_{{\theta_{1} }} \sum\limits_{j = 1}^{N} {l_{{U_{j} }} \left( {F_{{\theta_{1,t + 1} }} } \right)} \hfill \\ \theta_{2,t + 1} = \theta_{2,t} - \beta \nabla_{{\theta_{2} }} \sum\limits_{j = 1}^{N} {l_{{U_{j} }} \left( {F_{{\theta_{2,t + 1} }} } \right)} \hfill \\ \omega_{t + 1} = \omega_{t} - \beta \nabla_{\omega } l_{{U_{j} }} \left( {F_{{\omega_{t + 1} }} } \right). \hfill \\ \end{gathered}$$
(30)

In this context, \(\beta\) denotes the learning rate for the overlapping phases of distinct learning tasks.

In the Expose framework of depth enhancement learning after parameter update, input the VR scene detail image to be enhanced, output the enhanced VR scene detail image, and complete the VR scene detail enhancement.

3 Experimental Analysis

Taking a public VR scene detail image dataset as the experimental object, including 5000 original VR scene detail images and 25,000 VR scene detail images composed of 5 experts who decorated each VR scene detail image. The experiment uses 2000 original VR scene detail images as the training set, 2000 expert modified VR scene detail images as the tags, and 250 original VR scene detail images as the test set. During the training process, the tag order is randomly disrupted, forming a no paired mapping between the training set and the tags, and realizing weak supervised learning.

Randomly select a relatively simple VR scene detail image in the test set, and use the image enhancement method based on the bright channel prior and the maximum color channel in literature [6], the image enhancement method based on Retinex in literature [7], the bicubic interpolation image enhancement method based on BI histogram equalization in literature [8], and the gray-scale image enhancement method in literature [9], literature [10] is based on the optimal histogram image enhancement method of hybrid particle swarm optimization and dragonfly, which is the comparison method of the methods in this paper. The above six methods are used to enhance this VR scene detail image. The VR scene detail enhancement results are shown in Fig. 1.

Fig. 1
figure 1

VR visual detail enhancement results of different methods

According to Fig. 1a, this VR visual image is relatively fuzzy, and the internal details cannot be clearly presented. According to Fig. 1b, for simple VR scene details, the brightness of VR scene detail images has been significantly improved after the application of the method in literature [6], but the problem of over-enhancement also arises. According to Fig. 1c, after the application of the method in literature [7], there is no overexposure in the VR scene detail enhancement results, but the overall brightness, contrast and visual effect are poor. According to Fig. 1d, after the application of the method in literature [8], the brightness and color saturation in the VR scene detail enhancement results are appropriate, and the VR scene detail image quality is good, but the contrast is too high, and the image is distorted. According to Fig. 1e, after the application of the method in literature [9], the color of the enhanced VR scene detail image is more natural, without distortion, but the image with lower brightness is not significantly enhanced. According to Fig. 1f, after the application of literature [10] method, the enhanced VR scene detail image is more realistic, but the contrast is too high and the color saturation is low. According to Fig. 1g, after the application of this method, the exposure and contrast of VR scene detail image can be significantly improved, and its color distribution is also the closest to the original image. Therefore, the method in this paper is the best to enhance the subjective visual effect and color naturalness of VR scene detail images. This is because the method proposed in this article first uses TV denoising algorithm to effectively remove noise in VR scenes and improve image quality. And use SURF algorithm to accurately extract denoised VR visual features. Then, these features are trained using deep reinforcement learning algorithms, combined with meta-learning to accelerate the training process and enable the model to learn precise image adjustment strategies. Finally, by designing multiple loss functions, including L1 loss, structural similarity loss, content perception loss, and sharpness loss, the detailed representation of the image is comprehensively optimized, ensuring that the color distribution remains highly consistent with the original image while improving exposure and contrast.

Randomly select a more cumbersome VR scene detail image in the test set, and add Gaussian noise inside it. Use this method to enhance this VR scene detail image, and analyze the enhancement effect of the VR scene detail image in this method. The enhancement result of the cumbersome VR scene detail image is shown in Fig. 2.

Fig. 2
figure 2

Enhancement results of detailed images of tedious VR scenes

It can be seen from Fig. 2a that after adding Gaussian noise, the details of VR scene detail images cannot be clearly presented, which cannot provide users with a smooth realistic experience. According to Fig. 2b, after the enhancement of the method in this paper, the definition of the VR scene detail image is significantly improved, without overexposure. The contrast and color distribution are uniform, which meets the aesthetic needs of the human eye and is conducive to providing users with a better sense of reality. The experiment proves that the method in this paper can still effectively enhance the VR scene details for the more tedious VR scene details images. This is because the method proposed in this article can accurately extract key features of VR scenes by constructing Hessian matrices and using SURF algorithm, which is crucial for processing complex and tedious image details, thereby improving the quality of complex VR scene detail images.

The peak signal-to-noise ratio (PSNR) is used to measure the VR scene detail enhancement effect of this method, and its value has a positive correlation with the VR scene detail enhancement effect. Randomly select a VR scene detail image in the test set, add Gaussian blur and additive Gaussian white noise, respectively, use the method in this paper to enhance and process the VR scene detail after the noise is added, and analyze the VR scene detail enhancement effect of the method in this paper when analyzing different VR scene detail magnification. The analysis results are shown in Fig. 3.

Fig. 3
figure 3

VR visual detail enhancement effects at different magnifications

As shown in Fig. 3, with the increase of magnification, the PSNR value of the enhanced VR scene detail image after adding Gaussian blur and additive white Gaussian noise in this method keeps rising; the minimum PSNR value of the Gaussian blurred VR scene detail image is about 35, and the minimum PSNR value of the additive Gaussian white noise VR scene detail image is about 34, both of which are not lower than the PSNR threshold, indicating that the VR scene detail enhancement effect of this method is better. The experiment shows that the method can effectively enhance the VR scene detail in different noise types and different magnification, and the PSNR values are high, that is, the VR scene detail enhancement effect is better. This is because the method in this article first uses the TV denoising algorithm to decompose the VR scene and remove noise. This step provides a clear basic image for subsequent detail enhancement, effectively reducing the impact of noise on image quality. The most crucial aspect of this article is that the method uses deep reinforcement learning algorithms to train and process the extracted VR visual features. The deep reinforcement learning algorithm has strong learning and optimization capabilities, and can adaptively learn complex image adjustment strategies, and make intelligent decisions based on different scenes and conditions to achieve the best VR visual detail enhancement effect.

Using a VR scene detail image as our experimental subject, we applied the method outlined in this paper to enhance it. The gray values before and after the VR scene detail enhancement are presented in a histogram format. A higher gray value of pixel points in the VR scene detail image indicates a more uniform gray value distribution, reflecting a better VR scene detail enhancement effect. The outcomes of our analysis are visible in Fig. 4.

Fig. 4
figure 4

Distribution of gray values before and after VR visual detail enhancement

After carefully comparing Fig. 4a and b, it is apparent that the utilization of the technique proposed in this study for VR scene detail enhancement has led to increased pixel gray values within a similar pixel environment. Furthermore, the spread of pixel gray values is now more consistent. This observation substantiates the effectiveness of the approach outlined in this study for improving VR scene details. This is because the method designed in this article includes L1 loss, structural similarity loss, content perception loss, and sharpness loss functions. The comprehensive application of these loss functions ensures that the enhanced image is optimized in terms of structure, content, and clarity, thereby achieving an improvement in pixel gray-scale values and a more uniform distribution.

4 Conclusion

The virtual environment is the representation space of VR technology, and building a realistic virtual environment is an indispensable key step in applying virtual reality technology. The main factor affecting the realism of virtual environments is the VR visual detail information. Therefore, research on VR scene details enhancement methods based on deep reinforcement learning algorithms. Through techniques such as TV denoising, feature extraction, and deep reinforcement learning, the detail quality of VR scenes has been effectively improved. The experimental results show that the enhanced VR scenes have higher gray-scale values, more uniform distribution, and significantly improved signal-to-noise ratio. This fully demonstrates the effectiveness and practicality of this method in enhancing VR scene details, providing technical support for the further development of VR technology. However, when dealing with high-resolution VR scenes, the research method faces challenges in computational efficiency and performance. High-resolution images contain more pixels and details, therefore requiring stronger computing power for real-time image processing and enhancement. Therefore, in future research, super-resolution methods will be combined to design enhancement methods that can enhance high-resolution VR visual details in real time.