1 Introduction

Recently, the rapid advancements in technology have transformed various aspects of education, including music learning. One notable innovation in music education field is the development of remote piano-teaching platforms on the basis of deep learning and cloud computing technology [1, 2]. These platforms combine the power of artificial intelligence, cloud infrastructure, and online connectivity to revolutionize the way piano lessons are delivered and experienced [3]. Traditional piano learning has often been confined to in-person lessons, restricting access for individuals with geographical constraints or time limitations [4]. However, with the emergence of deep learning algorithms and cloud computing capabilities, a new era of piano education has dawned [5]. This novel approach enables students from all corners of the globe to connect with highly skilled piano instructors, breaking barriers and democratizing music education [6]. At the core of these platforms lies the implementation of deep learning algorithms. Deep learning, a subset of artificial intelligence, enables the software to comprehend and interpret piano music in a manner akin to human cognition [7, 8]. Neural networks are trained on vast datasets of piano performances to learn patterns, dynamics, and stylistic nuances [9]. As a result, these algorithms can accurately analyze a student's performance, providing personalized feedback and recommendations for improvement [10]. The incorporation of cloud computing technology is a crucial component of these remote piano-teaching platforms [11]. Leveraging the vast computational power along storage capabilities of the cloud, the platforms can efficiently process and store the immense amount of data generated during piano lessons [12]. Cloud-based infrastructure also enables seamless scalability, ensuring that the platform can cater to a large number of students concurrently without compromising on performance [13]. Remote piano-teaching platforms offer an interactive virtual learning environment, wherein students and instructors can engage in real-time piano lessons from the comfort of their homes [14]. The platforms typically include features like video conferencing, live chat, and digital sheet music sharing, fostering a collaborative and immersive learning experience [15]. The amalgamation of deep learning and cloud computing technology has ushered in a new era of piano education, making it more accessible, interactive, and personalized than ever before [16, 17]. Remote piano-teaching platforms offer a dynamic and enriching learning environment, fostering a love for music and piano proficiency among students from diverse backgrounds [18, 19]. As these platforms continue to evolve, they hold the potential to revolutionize music education as a whole and inspire the next generation of virtuoso pianists worldwide.

The ability of students to enjoy and learn music can be improved by incorporating multimedia technologies into college and university piano instruction. Multimedia-base Piano Teaching Model (MPTM) is particularly effective in larger classes, as it emphasizes practical and deliberate challenges to improve learning outcomes. In this context, various supervised, unsupervised, and semi-supervised deep learning algorithms have generated considerable interest. This work presents a deep learning method to explore the potential connection between piano instruction and students' progress. Machine learning techniques present intriguing chances to enhance piano music lessons for diverse learning types by looking at two essential elements of piano music learning. One intriguing program automates the development of lesson plans for music fans, enabling them to learn how to play their preferred instruments in accordance with their unique learning preferences, musical backgrounds, and talents. Moreover, deep learning plays a significant role in the technology of music automated recording, enabling the development of piano automatic recording system based on sound implementation principles and legal considerations. The consolidation of piano music technology holds great promise in benefiting music, rhythm, and instruction in the context of piano learning. Overall, the focus of this study is on leveraging multimedia technologies, and deep learning to improve piano-teaching quality, create tailored lesson plans, and enhance students' musical experiences.

The major contribution of this manuscript are summarized below,

  • In this work, Piano Triads Wavset dataset [16] is utilized, which contains the information regarding learning activity, learning behavior, learning skills, student performance, teaching analysis.

  • Then, the input data are given into adaptive distorted Gaussian matched filter [17] to cleaning the data.

  • At last, the preprocessed data are fed to Attention-Induced Multi-Head Convolutional Neural Network [18] optimized with HPO [19] for piano-teaching quality analysis.

  • Evaluate the proposed approach and analyze with the state-of the-art approaches on the existing approaches.

Remaining manuscript is structured as: Sect. 2 portrays the literature survey, Sect. 3 demonstrates proposed methodology, Sect. 4 proves results, Sect. 5 gives the conclusion.

2 Literature Survey

Among the numerous research works on remote piano teaching on Deep learning in cloud computing environment, certain recent research works are discussed in this section.

Zheng et al. [20] presented MPTM to elevate the quality of piano teaching significantly. MPTM creates a thorough music network organization, supporting resource sharing and advancing amateur music literacy in society by integrating Internet education mode for teacher evaluation and adopting a systematic strategy that integrates various music educational materials. Leveraging machine learning, the model enhances concrete piano instruction for students, revolutionizing contemporary piano teaching and elevating overall instructional standards. The neural network's role in this context involves efficiently detecting piano notes in a given set. It provides lower learning skill analysis and higher learning activity analysis.

Pang [21] presented real-time information storage with remote piano teaching using Bayesian approach. It integrates big data into industrial equipment examination and introduces a specialized data processing functioning dependent on XML. By investigating the correlation among XML and database, the study enables seamless switching between different data types. To address data inaccuracies resulting from sensor errors, the Bayesian prediction calculation method was enhanced to achieve higher data accuracy and minimize errors. Key sensor data were extracted and transmitted to a cloud platform for further analysis and processing. Processed data were stored on monitors or computers to facilitate subsequent steps. To enhance system efficiency and extend its lifespan, the integration of online piano instruction as a component of online education was explored. Additional feature allows for a comprehensive and dynamic learning experience within the system. It provides lower performance ratio and higher behavior analysis.

Zhang [22] presented piano-teaching model that leverages the power of machine learning with artificial intelligence to enhance the quality of piano instruction. It begins by examining machine learning theories, emphasizing the data processing capabilities of neural networks, and illustrating how the integration of machine learning and AI can be advantageous for music education. Furthermore, the study identifies the interactive requirements between intelligent pianos and learners, proposing intelligent piano-teaching assistance methods. It delves into a comprehensive analysis of the impact of artificial intelligence on piano teaching from various perspectives, highlighting the potential benefits and implications for the field. It provides lower learning activity analysis and higher learning skill analysis and higher learning activity analysis.

Sun [23] presented the specific machine learning mode to assess the relationship of piano teaching. The data investigation was conducted in network edges to increased efficiency. The model utilizes association rule mining technique in combination with an enhanced T-test method. This improved T-test measures association rules and introduces a novel measure for evaluating their influence degree. The results clearly demonstrate the feasibility of using this influence degree as a measure to determine the significance of multimedia-assist piano training evaluation data. It minimizes generation of redundant rules. It provides higher learning activity analysis and lower learning skill analysis and higher learning activity analysis.

Wang [24] presented piano information classroom teaching quality was developed, integrating deep learning algorithms and hierarchical analysis techniques. To calculate the weights for quality assessment of piano information classroom teaching, a genetic algorithm (GA) was employed to optimize the weights and thresholds of the BP neural network in the deep learning algorithm. The innovative mode improves the accuracy and effectiveness of the assessment by considering the hierarchical relationships between the evaluation indices and optimizing the neural network's parameters through the genetic algorithm. It provides higher learning activity analysis and higher learning skill analysis and lower learning activity analysis.

Zi [25] presented Intelligent Piano Teaching System and Method depending on cloud platform, which offers an innovative approach to mastering the piano with ease, similar to learning guitar, drums, and other instruments effortlessly. The beginning of "Internet plus" era has paved the way for intelligent piano series products, combining piano education with the power of the Internet, making intelligent piano teaching a popular trend in the realm of piano education. As digital music teaching gains momentum, numerous music teaching auxiliary software have emerged; however, only a few were genuinely tailored to the needs of music classroom teaching. Hence, it holds immense significance to delve into the research and implementation of teaching auxiliary software specifically designed for music classroom instruction. It provides lower teaching evaluation analysis and higher performance ratio.

Lei and Liu [26] presented the online piano education approach and incorporating dual neural networks to construct effective learning models. The suggested method focuses on the teaching environment including intelligent piano's capabilities. Specifically, the method involves using a Dual Neural Network (DNN) to detect the onset of piano notes. DNN transforms original time-domain waveform as time–frequency representation, enabling in-depth analysis of the input signal. The incorporation of deep learning with artificial intelligence to optimize student learning outcomes and provide an exceptional learning experience for aspiring piano players. It provides higher behavior learning analysis and lower performance ratio. Table 1 shows that the Literature comparison table.

Table 1 Literature comparison table

3 Proposed Methodology

In this work, Design and implementation of remote piano teaching under AIMCNN in cloud computing technology (RPT-AIMCNN-HPO) is proposed. Figure 1 portrays the block diagram of RPT-AIMCNN-HPO method. The comprehensive illustration regarding remote piano teaching depending on AIMCNN in cloud computing is given below.

Fig. 1
figure 1

Block diagram of RPT-AIMCNN-HPO method

3.1 Remote Piano Teaching

The field of education has witnessed a significant shift towards providing students with a more engaging and interactive learning experience, breaking free from traditional classroom constraints. This transformation has been largely facilitated by advancements in technology and the widespread availability of the internet. As a result, learners now have the opportunity to access education and receive personalized instruction and feedback, regardless of their physical location. One of the key components of this engaging and interactive learning experience is the integration of digital tools and platforms into the educational process. These tools, such as online learning management systems, virtual classrooms, and interactive educational apps, allow students to actively participate in their learning journey. They can access a wide range of educational resources, multimedia content and interactive exercises that cater to their unique learning styles and preferences. Moreover, with the advent of video conferencing and real-time communication tools, students can engage with teachers and peers in virtual classrooms, simulating the experience of a traditional classroom setting. This not only fosters collaboration and teamwork but also facilitates immediate feedback and clarification of doubts, promoting a deeper understanding of the subject matter. The rise of adaptive learning technologies has revolutionized the way students receive personalized instruction. These tools utilize data analytics and artificial intelligence algorithms to assess students' strengths and weaknesses, tailoring the learning experience to their individual needs. By identifying areas of improvement, adaptive learning systems can recommend specific learning resources, offer targeted practice exercises, and adjust the pace of instruction accordingly, enabling students to progress at their own speed. Another crucial aspect of an engaging and interactive learning experience is the emphasis on self-directed learning and active participation. Students are encouraged to take charge of their education, exploring topics of interest, and pursuing inquiry-based learning. Through gasification and interactive learning activities, they can approach complex concepts in a more enjoyable and memorable way, fostering intrinsic motivation and a passion for learning. Furthermore, the flexibility and accessibility of remote learning allow students to balance their academic pursuits with other commitments and responsibilities, making education more inclusive and accommodating for diverse learners.

3.2 Dataset Description

In the process of developing a Virtual Studio Technology (VST) plug-in for Digital Signal Processing (DSP) in music teaching, recognized the potential benefits of leveraging deep learning techniques. To create our product, we gathered a comprehensive collection of 432 Wav files. These Wav files consist of piano triads spanning six octaves, with each file representing twelve major, minor, reduced triad chords on their root including first inversion. The Wav files adhere to the following specifications: they are encoded with 32 bits, 44 kHz sampling rate, mono channel, approximately 520 K have three seconds period. The overall size of this Wav file dataset is around 200 MB. Additionally, a CSV file called "Trios.csv" to complement the Wav files. This CSV file contains essential information about the chords, including their names, octaves, and inversions. Each chord's constituent notes are also included in the file, separated by two underscores. Notably, lowercase "s" denotes sharp notes along chords in musical system, while lowercase "b" represents flat notes or chords. For further organization and reference compiled a document listing the chord locations with chords themselves. This document is structured as Chord, Note 1, Note 2, Note 3. The "Chord" column comprises a string that involves chord's name, sharp or flat notation, and its keyboard position. Additionally, the "Note" column provides a list of the separate notes that constitute each triad, organized in ascending order.

3.3 Preprocessing Using Adaptive Distorted Gaussian Matched Filter

Data cleansing and data transformation are essential preprocessing steps in data mining and deep learning. They involve cleaning and preparing the data to improve the quality and suitability of the data for further analysis. The process of data cleansing involves the identification and correction of errors, discrepancies, and incorrectness in the dataset. It is also represented as data cleaning or data scrubbing. The goal is to eradicate noise and unrelated information, ensuring that the data is accurate and reliable. Noise in data can come from various sources, such as human errors in data entry, sensor inaccuracies, or data transmission issues. Data cleansing techniques involve handling missing values, smoothing noisy data, correcting inconsistencies, and dealing with outliers. Data transformation involves converting the data as proper format for data mining or machine learning algorithms. The intention of data transformation is to enhance the performance of these algorithms and make the data more amenable to analysis. One common data transformation technique is normalization. Normalization measures the numerical features in the dataset to a standard range (among 0 and 1). It assures that the features with diverse scales not dominate the analysis and that each feature contributes equally. The adaptive distorted Gaussian matched filter technique. This technique is a specific approach for data transformation, and it is used for enhancing the signal-to-noise ratio in certain applications. The adaptive distorted Gaussian matched filter technique is employed when dealing with signals that are embedded in noise and have undergone certain types of distortion. It uses a Gaussian filter that is adaptively adjusted depends on the characteristics of the input data and the noise. The equation for the Gaussian filter is described in Eq. (1)

$$G(x) = \frac{1}{{\sigma \sqrt {2\pi } }}\exp \left( { - \frac{{(x - \mu )^{2} }}{{2\sigma^{2} }}} \right),$$
(1)

where \(x\) denotes input data, \(\mu\) denotes mean of Gaussian distribution, \(\sigma\) denotes standard deviation of Gaussian distribution, \(\pi\) is the mathematical constant, \(\exp\) is the exponential function. The matched filter is a common technique used in signal processing to maximize the signal-to-noise ratio (SNR) for detecting a known signal in the presence of noise. In a continuous domain, the matched filter response \(R(t)\) for a signal \(\exp\) and an input signal \(s(t)\) and the input signal \(x(t)\) is given by following Eq. (2).

$$R(t) = \int\limits_{ - \infty }^{\infty } {s(\tau )x(t - \tau ){\text{d}}\tau } ,$$
(2)

where \(R(t)\) is the matched filter response at time \(t\), \(s(\tau )\) is the known signal, \(x(t - \tau )\) is the received signal (input signal) delayed by \(\tau\). Normalization is a common data transformation to scale numerical features as standard range 0 and 1. The normalized value \(x_{norm}\) for an input value \(x\) in the range \([x_{\min } x_{\max } ]\) is calculated as Eq. (3).

$$x_{norm} = \frac{{x - x_{\min } }}{{x_{\max } - x_{\min } }},$$
(3)

where ​\(x_{{\max^{{}} }}\) specifies minimum value of feature in the dataset, \(x_{\min }\) specifies maximum value of the feature in the dataset. The adaptive distorted Gaussian matched filter technique adjusts the parameters \(\mu\) and \(\sigma\) depends on the characteristics of the data with noise. This adaptive adjustment ensures that the filter can effectively target and reduce the noise while preserving the essential signal components.

3.4 Attention-Induced Multi-Head Convolutional Neural Network (AIMCNN) to Improve the Piano Teaching Quality

AIMCNN is designed explicitly for improving piano-teaching quality. Using AIMCNN for improving piano-teaching quality is an innovative deep learning application in music academy. AIMCNN are well suited for tasks involving image and pattern recognition, and in the context of piano teaching, they can be applied to the analysis of sheet music and piano playing techniques. The first step involves converting sheet music (notated as musical symbols) into a digital format that a computer can understand. This can be achieved by using Optical Music Recognition (OMR) techniques. OMR algorithms are designed to detect and recognize musical symbols such as notes, rests, clefs, key signatures, etc., from sheet music images. Once the sheet music is converted into a digital format, the next step is to recognize individual musical symbols using AIMCNN. Each musical symbol corresponds to a specific class (e.g., whole note, half note, quarter note, etc.). The AIMCNN is trained on a dataset of labeled musical symbols, and its task is to classify each symbol correctly. A significant aspect of playing the piano is not only recognizing individual notes but also understanding the sequence of notes and their timing. After a student plays a piece on the piano, their performance can be recorded and converted into a sequence of musical symbols. The AIMCNN can then compare the student's rendition with the original sheet music to identify errors in timing, note accuracy, and expression. Feedback can be provided to the student based on these error detections, helping them improve their performance. AIMCNN-based systems can provide quantitative assessments of a student's progress over time. This information create gamified learning experiences, where students can track their progress, achieve milestones, and compete with themselves or others, encouraging them to practice more and improve their piano skills. AIMCNN are deep learning models primarily used for image-related tasks. They are well known for their ability to automatically learn hierarchical features from input data. AIMCNN contains multi-convolutional layers, emulated by pooling and fully connected layers, which enable it to recognize patterns in images. The attention mechanism is widely used in neural networks to enhance their ability to consider relevant parts of the input data. In the context of sequences (such as music notes or piano scores), attention mechanisms can help the model pay more attention to important sections while processing the data. Multi-Head Attention is an extension of the basic attention mechanism. It uses multiple sets of attention weights to capture different patterns or features simultaneously. This permits the method to present in multiple parts of the input sequence parallelly, then potentially improving its learning capacity and performance. If combine the concepts of CNN and attention mechanism to create a model for improving piano-teaching quality, we can envision the following architecture: at first, the input layer the piano-teaching data, which could be represented as a sequence of music notes or chords, is fed into the model. Convolutional Layers: These layers can capture local patterns in the musical sequence. They may not be strictly necessary for music-related tasks, but they could potentially capture certain short-term dependencies. The convolution operation in 1D takes an input sequence \(X\) and a filter \(w\), and produces an output sequence \(Y\): It is described in Eq. (4)

$$Y[i] = \sum\limits_{K = 0}^{K - 1} {X[i + k].W[k]} ,$$
(4)

where \(Y[i]\) denotes the output at position \(i\), \(X[i]\) output at position \(i\), \(W[k]\) Filter weight at position \(k\). \(k\) is the size of filter. Multi-Head Attention Layers: These layers help the model focus on important sections of the musical sequence, considering both short-term and long-term dependencies. The attention mechanism enables the model to learn which notes or parts of the music are crucial for effective teaching. Multi-Head Attention combines multiple attention heads, each capturing different aspects of the data. The attention layer output is a weighted sum of values \((V)\) with the help of attention weights \((\alpha )\) it is described in Eq. (5).

$$Head_{j} = Soft\max \left( {\frac{{QW_{Q}^{T} .KW_{K}^{T} }}{{\sqrt {d_{k} } }}} \right).VW_{V}^{T} ,$$
(5)

where \(Head_{j}\) is the \(j{\text{th}}\) attention head, \(Q,K,V\) implicates T\the queries, keys, values, which are learned linear transformations of the input data. \(QW,KW,VW\) are the learned weight matrices for queries, keys, and values, \(d_{k}\) specifies dimension of key vectors. Fully Connected Layers: The output from the attention layers is flattened and passed through fully connected layers to make predictions or further process the data. Output Layer: The final layer produces the desired output, such as the next recommended notes, feedback for the learner, or assessment of the learner's performance. The \(j{\text{th}}\) feature map on \(l{\text{th}}\) layer of \(h{\text{th}}\) head for multi-head CNN is described in Eq. (6).

$$\alpha_{i,j}^{i} = \left( {y_{i,j} + \sum\limits_{q = 1}^{E} {\sum\limits_{p = 1}^{{K^{h} }} {W_{i,j}^{qp} } \alpha_{(l - 1)q}^{i + p,h} } } \right)\forall_{h} = 1,2,3,$$
(6)

where \(y_{i,j}\) denotes bias of feature map, \(k^{h}\) denotes size of kernel for \(h^{\rm th}\) head, \(W_{i,j}^{qp}\) denotes weight matrix in layer \(l\) and \(p\) implies index of the feature map at \((l - 1)^{\rm th}\) layer.

In Convolutional Neural Network, convolutional layer is a noteworthy component, which employs a collection of learnable kernels to execute the convolution process. The primary purpose of this layer is to extract meaningful features from the input time series data. The input, represented as timestamps × features, is passing through the convolutional layer denoted as \(l\), resulting in the generation of a feature map denoted as \(\alpha_{i,j}^{i}\). To address the vanishing problem in the teaching prediction, a non-linear function σ is applied to each element in the feature map \(\alpha_{i,j}^{i}\). The activation layer then comes into play, which is responsible for transforming the output of the convolutional layer. Specifically, any negative input values are set to zero, whereas positive input values remain unchanged in the output. This activation function ensures that the teaching quality can handle non-linear relationships effectively and enhances the teaching process by introducing non-linearity into the model.

$$R_{j}^{h} = \sigma \left(\alpha_{i,j}^{i} \right).$$
(7)

ReLU activation function, denoted by \(\sigma\), is determined by \(f(x) = \max (0,x)\) for an input \(x\). To enhance the piano-teaching quality, the feature map \(R_{j}^{h}\) is passed through BN layer \(B\) for the \(h^{\rm th}\) head, resulting in the normalized feature map \(R_{norm}\). Unlike the convolutional block, positioned the BN layer after the activation layer, which has led to achieving higher teaching prediction accuracy in the proposed model, is described in Eq. (8).

$$R_{norm}^{h} = B\left(R_{j}^{h} \right).$$
(8)

The feature set \(X\) is obtained by passing the output of BN layer to the convolutional layer in 2nd convolutional block. This process involves applying batch normalization to the previous layer's output to standardize and stabilize the activations, which helps in improving the convergence and performance of the teaching quality. The subsequent convolutional layer then utilizes these normalized features to extract relevant patterns with features from the input data, resulting in the generation of the feature set \(X\). This sequential transformation ensures that the teaching quality can effectively learn and represent complex patterns in the data during the training process is described in Eq. (9)

$$X_{i,j}^{h} = \sigma \left(y_{l,j} + W_{{k^{s} }}^{h} *R_{norm}^{h} \right).$$
(9)

Here, \(W\) implies filter of \(h^{\rm th}\) with kernel size \(1 \times k^{s}\). The batch normalization layer with max-pooling layer is replaced using kernel of \(1 \times k^{p}\) in the second convolutional block, to decrease the output map size \(X_{i,j}^{h}\) then the computational process is done by the teaching quality. Generally, a AIMCNN does not adopt any optimization techniques to determine the ideal parameters and ensure proper prediction of teaching quality. Therefore, this work proposes HPO to optimize the weight parameters of AIMCNN.

In this work, HPO algorithm is exploited for optimizing the optimum parameters of AIMCNN classifier. The HPO algorithm is a meta-heuristic optimization method inspired by the predator–prey interaction in nature, where hunters (predators) chase preys to survive. When applied to optimize the weight parameters of an Attention-induced Multi-Head Convolutional Neural Network (CNN), HPO offers several advantages: HPO is a population-based optimization algorithm that explores multiple potential solutions simultaneously. This allows it to search the solution space extensively, increasing the likelihood of finding a globally optimal set of weight parameters for the complex Attention-induced Multi-Head CNN. The HPO algorithm maintains diversity within its population of solutions. This diversity helps in preventing premature convergence to sub-optimal solutions and promotes a more thorough exploration of the solution space. HPO is designed to strike a balance amid the exploration and exploitation, making efficient use of both the computation time and count of evaluations required to find a good solution. This is particularly advantageous when dealing with large and complex neural networks like the Attention-induced Multi-Head CNN. The HPO algorithm can adapt to various problem domains and does not rely on any specific mathematical model or gradients. This adaptability makes it suitable for optimizing the non-linear and high-dimensional parameter space of neural networks. HPO is used find out the optimal weight parameter \(R_{j}^{h}\) and \(X_{i,j}^{h}\) of AIMCNN. The stepwise procedure of HPO are given below,

3.5 Stepwise Procedure of Hunter–Prey Optimization Algorithm (HPO)

Here, the step-by-step process is delimited to derive optimum values of AIMCNN under HPO. HPO creates distributed population uniformly to enhance ideal parameters of AIMCNN. The ideal solution is stimulated by HPO approach and the relative flowchart is depicted in Fig. 2.

Fig. 2
figure 2

Flowchart of HPO for optimizing the weight parameter of AIMCNN

Step 1: Initialization.

Initially, the population is set to \((\vec{y}) = \{ \vec{y}_{1} ,\vec{y}_{2} ,...\vec{y}_{n} )\) randomly then the objective function is determined as \((\mathop{z}\limits^{\rightharpoonup} ) = \{ z_{1} ,z_{2} ,...z_{n} )\) for every members of population. The population is managed and guided within the search space and it is described in Eq. (10).

$${\text{y}}_{{\text{i}}} = {\text{rand (1,d)}}{\text{.*(ub - lb)}} + {\text{lb}}$$
(10)

In the given context, \({\text{y}}_{{\text{i}}}\) represents either the position of the hunter or the prey in the problem. The variables \({\text{lb}}\), \({\text{ub}}\) denote lower and upper boundaries for the problem's variables. The letter \({\text{d}}\) corresponds to the number of dimensions or variables in the problem.

Step 2: Random generation.

Afterward, the initialization randomly created the input parameters. Here, the best fitness is chosen with regard to its obvious hyper-parameter situation.

Step 3: Evaluation of Fitness Function.

From the initialized assessments, the random solution is produced. This is evaluated with parameter optimization values for optimizing weight parameter \(R_{j}^{h}\) and \(X_{i,j}^{h}\) of AIMCNN. Thus, it is expressed in Eq. (11),

$$Fitness\;Function = optimizing\,\,\left[R_{j}^{h} \,\,and\,\,X_{i,j}^{h} \right].$$
(11)

Step 4: Update the hunter position for optimizing \(R_{j}^{h}\).

The introduction of variations in solutions prompts a thorough exploration of search space, leading to the identification of promising areas. Once these promising regions have been identified, it becomes essential to minimize random behaviors within the algorithm. This reduction in randomness allows the algorithm to focus on searching around the identified promising regions, a process commonly referred to as exploitation. The adaptive parameter is calculated by Eq. (12).

$$\begin{gathered} P = \mathop{S _{1}}^{\rightharpoonup} < D;\,\,IDY = (P = = 0) \hfill \\ O = S_{2} \otimes IDY + {\mathop{S}\limits^{\rightharpoonup}} _{3} \otimes (IDY). \hfill \\ \end{gathered}$$
(12)

where \({\mathop{S _{1}}^{\rightharpoonup}}\) and \({\mathop{S}\limits^{\rightharpoonup}} _{3}\) implies random vectors at range [0,1], \(P\) implies random vector with 0 and 1 values, \(S_{2}\) specifies random count at [0,1] range, \(IDY\) specifies index number of vector \({\mathop{S _{1}}^{\rightharpoonup}}\) which fulfills \(P = = 0\).

Step 5: Exploration and exploitation for optimizing \(X_{i,j}^{h}\).

To balance the parameter among the exploration and exploitation, those value lessens 1 to 0.02 when iterations, \(D\) is scaled by Eq. (13).

$$D = 1 - it\left( {\frac{0.98}{{MaxIt}}} \right).$$
(13)

The present iteration values along \(MaxIt\) represents maximal count of iterations. The position of the prey \(P_{pos}\) is determined as follows: firstly, all positions (l) average is computed using Eq. (14). Subsequently, the distance of all search agents from the mean position is calculated.

$$\mu = \frac{1}{n}\sum\limits_{i = 1}^{n} {\vec{y}_{i} } .$$
(14)

Here, search agent with maximal distance from the mean of positions implies prey \(P_{pos}\).

Step 6: Termination.

The weight parameter values \(R_{j}^{h}\) and \(X_{i,j}^{h}\) of AIMCNN are optimized with the help of HPO Algorithm, will iteratively repeat step 3 until fulfill the halting criteria \(y = y + 1\). Finally, AIMCNN with HPO algorithm provides teaching quality with higher accuracy.

4 Results and Discussion

The simulation of Design with Implementation of Remote Piano Teaching under Attention-Induced Multi-Head Convolutional Neural Network Optimized with Hunter–Prey Optimization in cloud platform is discussed in this section. The proposed technique is activated in Cloudsim platform, 6 GB RAM, 8 GB graphics card, 500 GB SSD. Moreover, the performance metrics includes accuracy, computational time, learning skill analysis, learning activity analysis, learning behaviour analysis, student presentation ratio and teaching assessment analysis is estimated. Here, the proposed RPT-AIMCNN-HPO method is analyzed with existing RPT-MPTM [20], RPT-BA [21] and RPT-ANN [22] models.

4.1 Performance Metrics

To validate the robustness of the proposed method, the following performance metrics is estimated.

4.1.1 Accuracy

It is the rate of exact prediction to total proceedings in the dataset and is computed through Eq. (15),

$$Accuracy = \frac{{T_{P} + T_{N} }}{{T_{P} + T_{N} + F_{P} + F_{N} }}$$
(15)

4.1.2 Learning Skill Analysis

Analyzing learning skills can be beneficial for identifying areas of improvement and optimizing your learning process. Formula for conducting a basic learning skill analysis is expressed in Eq. (16).

$$LSA = (P_{i} - E_{xe} )*imp,$$
(16)

where \(LSA\) denotes the learning skill analysis, \(P_{i}\) denotes the Performance, \(E_{xe}\) denotes the expectation and \(imp\) is denoted as importance.

4.1.3 Learning Activity Analysis (LAA)

Learning Activity Analysis (LAA) is a systematic approach used in education to assess the effectiveness of instructional activities in achieving specific learning outcomes. It involves analyzing the components of a learning activity and evaluating its alignment with desired learning objectives and goals.

4.1.4 Learning Behavior Analysis

Behavior analysis is the scientific study of behavior, focusing on understanding how the environment influences and shapes behavior. It involves analyzing and modifying behavior through systematic observation and experimentation. This model helps to identify the factors that influence and maintain specific teaching behaviors.

4.1.5 Student Performance Ratio

The student performance ratio is a metric used to evaluate the academic performance of students or educational institutions. It is often expressed as a percentage and represents the proportion of students who achieve a certain level of academic success compared to the total number of students in a given group or population. The formula for calculating the student performance ratio is described in Eq. (17).

$$SPR = \frac{{N_{i}^{p,q} }}{{Total\,\,N_{S} }}*100,$$
(17)

where \(SPR\) denotes the Student Performance Ratio, \(N_{i}^{p,q}\) denotes the Number of Students Achieving Desired Level of Performance and \(Total\,\,N_{S}\) denotes the Total Number of Students.

4.1.6 Teaching Evaluation Analysis

Teaching evaluation is an important process that assesses the effectiveness of a teacher's performance in the classroom. It involves gathering feedback from students, colleagues, or administrators to identify areas of strength and areas that need improvement.

Figures 39 show the simulation results of Remote Piano Teaching Based on Attention-Induced Multi-Head Convolutional Neural Network Optimized with Hunter–Prey Optimization. The performance is compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Fig. 3
figure 3

Performance analysis of learning skill analysis

Figure 3 shows learning skill analysis. Here, the proposed approach attains better learning skill analysis 26.98%, 38.08% and 36.77% for student 10; 34.73%, 29.94%, and 22.33% for student 30; 24.86%, 16.97%, and 33.86% compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Students who receive piano instruction through multimedia have the advantage of practicing individually while simultaneously emulating the teacher's explanations, resulting in a significant enhancement in their learning efficacy. It is believed that learning to play the piano helps the hands and brain form neural connections that promote better coordination and control. The key to knowing music theory is to first master the piano’s first scale or chord. In the case of one-on-one teaching, teachers can dedicate more time to tailor lesson plans according to every pupil's unique requirements and skills in the piano. This flexibility allows them to adjust their teaching approach based on real-time progress and results. The learning skills ratio (%) depicted in Fig. 3 reflects the positive impact of such teaching methods. Furthermore, as educators' music playing skills advance, they become better at recognizing and appreciating beauty in music, fostering a virtuous cycle that motivates students to excel in their piano learning journey.

Figure 4 shows learning activity analysis. Here, the proposed approach attains greater learning activity analysis 26.98%, 38.08%, and 36.77% for student 10; 25.75%, 35.76%, and 24.65% for student 30; 24.75%, 25.64%, and 31.54% compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Fig. 4
figure 4

Learning activity analysis

The prevalent mode of remote piano teaching often involves individual instruction, where students and instructors interact in a relatively closed-off environment. However, this traditional teaching model has its spatial limitations. To overcome these constraints and enhance piano theory comprehension and practical skills, multimedia technology can play a vital role. By employing multimedia tools, students can practice with relevance and benefit from a comprehensive audio–video impact that enriches their emotional and knowledge experiences within the music classroom. Embracing this interactive approach requires instructors to build upon established knowledge and explore innovative ideas. While incorporating multimedia technology may not necessarily simplify the process of accessing a broad array of higher quality piano resources with educational materials, it does promote independent study among students. Figure 4 illustrates dataset's learning activity rate (%), as discussed regarding learning activity.

Figure 5 shows learning behavior analysis. Here, the proposed method attains better learning behavior analysis 33.76%, 27.98%, and 32.06% for student 10; 37.86%, 25.06%, and 34.96% for student 30; 26.95%, 25.64%, and 20.65% compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Fig. 5
figure 5

Learning behavior analysis

The focus of this study is on utilizing neural networks to enhance piano playing with education through a recommendation scheme. This scheme is designed to recommend piano presentation as well as training schemes depend on both music content and user history. The historical data on students' music listening habits is gathered using the student modelling module to create a user preference feature model. A non-utilitarian music values play a significant role in piano music education. Piano instructors often rely on students' nonverbal cues to gauge their level of commitment and motivation. Behavioral indications such as the amount of time pupils spent on a task, also how they express their feelings and ideas are all crucial factors. Figure 5 illustrates the learning behavior analysis ratio (%). Overall, the study aims to leverage neural networks and user data to optimize piano education and enhance the learning experience for students by providing personalized music recommendations based on their preferences and historical behavior.

Figure 6 shows student performance ratio analysis. Here, the proposed technique reaches better performance ratio 24.56%, 34.56%, and 54.86% for student 10; 36.06%, 29.65%, and 26.75% for student 30; 26.44%, 38.95% and 25.86% compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Fig. 6
figure 6

Student performance ratio analysis

Learning the piano is a continuous and demanding process, requiring students to acquire comprehensive knowledge and consistent recognition of specific skills to reach better level. Teachers that incorporate multimedia technology into the classroom give their pupils a flexible learning environment that helps them fully understand piano theory. This strategy's favorable effect on emotional resonance and promotion of an energizing interchange, learners can flourish and enhance their capacity for learning. Figure 6 illustrates the ratio of student performance (%). In the context of Polyhymnia, piano performances can be automated, and expressive playing can be guided by various musical signals, allowing for adaptable parametric modes that automatically interpreted. Due to the prevalence of polyphony in piano compositions, the expressive quality of the piano performance significantly depends on its polyphonic qualities.

Figure 7 depicts teaching evaluation analysis. Here the proposed method attains greater teaching evaluation analysis 36.94%, 29.05%, and 44.86% for student 10; 25.86%, 22.85%, and 34.64% for student 30; 21.34%, 31.85% and 32.09%compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Fig. 7
figure 7

Teaching evaluation analysis

The typical models of piano instruction lead to lacking of student initiative and interest in learning. Consequently, students make less effort to learn the piano, which would slow down their development toward professional accomplishment. To address this issue and enhance students' learning excitement, incorporating multimedia and leveraging students' cognitive capacities during piano learning is highly beneficial. Figure 7 presents the teaching evaluation analysis (%). The outcome of the output layer node, reflects the teaching evaluation ratio and student performance. By integrating remote technology, the teaching process can be simplified. The visual display presented through multimedia aids in preventing false cognition among students and dispelling preconceptions that might hinder the effectiveness of piano learning.

Figure 8 shows accuracy analysis. The proposed RPT-AIMCNN-HPO method provides higher accuracy 34.73%, 29.94%, and 22.33% at student 10; 23.14%, 21.14% and 11.47% at student 30; 45.47%, 55.69% and 25.47% for student 50 compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Fig. 8
figure 8

Accuracy analysis

Figure 9 displays computational time analysis. The proposed RPT-AIMCNN-HPO method provides 24.73%, 39.94%, and 32.33% lower computational time compared with existing methods, like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

Fig. 9
figure 9

Computational time analysis

4.2 Discussion

This study delves into the exploration of multimedia network teaching technology, analyzing its concepts, characteristics, modules, categorizations, present stage of development, probable future applications. The research highlighted the critical technologies incorporated into developing a networked music education system, focusing on two essential aspects of teaching content format including content delivery applications. The streaming teaching system comprises various logically organized subsystems, which address numerous technological challenges inherent in multimedia utilization. This study proposes a networked multimedia teaching system with the purpose of providing teachers and students with educational services by utilizing the already installed campus network infrastructure software. Multimedia network technology in the classroom has the potential to improve the effectiveness of education while also fostering independent learning and student participation in a pleasant learning environment. However, the implementation may present difficulties at three levels: instructors, students, and school administration. To overcome these challenges, the article suggests that educators familiarize themselves with multimedia networks' educational potential and assemble a capable team to oversee their classroom implementation. In the context of piano instruction, the use of multimedia technology offers students the opportunity to transcend limitations of time and distance, enabling a deep consideration of piano playing along related data. According to experimental results, the proposed method attains impressive learning skills ratio (98.45%), learning activity ratio (98.89%), student performance rate (98.47%), teaching evaluation rate (90.3%), and learning behavior rate (99.35%) when compared to other methods.

5 Conclusion

Here, Remote Piano Teaching Based on Attention-Induced Multi-Head Convolutional Neural Network Optimized with Hunter–Prey Optimization is successfully implemented. The RPT-AIMCNN-HPO method is implemented using Clouds platform under mentioned performance metrics are evaluated. The RPT-AIMCNN-HPO technique attains greater prediction accuracy 12.566%, 12.075% and 15.993%, higher learning skill 15.86%, 15.26% and 16.25% compared with existing methods like RPT-MPTM, RPT-BA and RPT-ANN, respectively.

One limitation of remote piano teaching based on deep learning methods is the challenge of real-time feedback and interaction. Current technology may introduce latency issues, hindering the immediate correction of students' mistakes and impeding the natural flow of the learning process. Additionally, the reliance on deep learning models may face constraints in accurately capturing the nuances of individual playing styles or interpreting subtle musical expressions. Future work could focus on enhancing the development of interactive platforms that minimize latency, incorporating more hybrid optimization algorithms to better understand and adapt to diverse playing styles, and integrating multimodal approaches to capture not only audio but also visual cues for a more comprehensive learning experience. Moreover, exploring ways to address the potential limitations of deep learning models, such as refining their interpretability and adaptability, further improve the efficacy of remote piano teaching.