On the Numerical Modelization of Moving Load Beam Problems by a Dedicated Parallel Computing FEM Implementation

The present work outlines an original numerical modelization approach for Moving Load (ML) beam problems, by a dedicated object-oriented C++ parallel computing FEM implementation, with the purposes of performing efficient numerical analyses resolving the complete dynamic response of beams under the effect of a high-velocity ML. Alongside, main framing state-of-the-art reviews are attempted, on the principal involved issues of: ML context and physical description, numerical FEM modelization, parallel computing implementation. Running ML example cases are explored, for a (long) finite beam on a (visco)elastic foundation and for a continuous beam of a historic railway iron bridge, with per se interesting engineering outcomes. The contribution may serve as a guideline paradigm to readers that may be novel to the treated topics, though motivated in adventuring in the computational challenges involved in the present mechanical research context.


Generalities and Contextualization
Many recurring practical dynamical engineering applications may be modeled as Moving Load (ML) problems, namely as those peculiar mechanical problems occurring when the point of application of concentrated or distributed loads acting on a certain structural element changes along time, at a constant or variable velocity. For example, in transportation engineering, the need of correctly predicting the dynamic response of structural systems under (growing) high-speed moving vehicles is becoming crucial, for guaranteeing vehicle stability, monitoring maintenance costs and avoiding possible passenger discomfort. Thereby, railways, bridges, cable ways, overhead cranes constitute selected characteristic contexts of application of ML problems (and other moving object problems, see e.g. Metrikine and Dieterman [110], Metrikine and Verichev [111], Mazilu [108,109], Dimitrovová [35][36][37], and Rodrigues et al. [131], with therein quoted references, even, very recently, in the nonlinear range of mechanical response, Fărăgău et al. [60], Sanches et al. [139]).
For instance, among the most modern and pioneering projects, the TransPod company has recently proposed a nextgeneration mass tube-based vacuum transportation system consisting of a ultra-high-speed vehicle designed to carry passengers at a speed exceeding 1000 km/h, as declared by Janzen [88]. The tube infrastructure that would interconnect cities has been designed as a canted spiral-welded steel tube segment, supported by a substructure anchored to the foundation. In such a context, it becomes fundamental to appreciate the response of the whole supporting structure, when an ultra-high-speed vehicle is moving along. Although the TransPod project shall still look a bit futuristic, high-speed trains traveling at 300 ÷ 500 km/h are almost becoming commonplace in everyday life, bringing to light the importance of the present ML dynamical context at a fast travelling velocity, and of achieving relevant consistent modelizations and attached efficient computational simulations.
It appears fundamental to investigate how the interaction between the structure and the ML influences the amplitude of the resulting structural vibration, in terms of different kinematic components, and in which manner the latter may depend on the source ML characteristics, and specifically on its propagating velocity. As it had been theoretically demonstrated by Timoshenko [148] and then by various other researchers, as discussed in detail in the following, an excessive dynamic amplification of structural vibrations induced 1 3 by MLs may become apparent, when the ML velocity attains a certain characteristic value, named "critical velocity" [38,39].
Hence, demands for safe infrastructures and cost-effective design solutions are becoming necessary, in such an increasingly challenging ML context. A successful design shall strongly depend on the availability of robust and efficient computational tools, as conceived to achieve accurate, reliable and accessible numerical simulations of the dynamic response induced by MLs. By the analysis of the outcomes of the numerical simulations, potential practical implications in contemporary transportation engineering, especially in terms of lowering down the admissible ranges of high-speed vehicle velocities, shall be revealed, and possible remediation measures for the mitigation of the induced system vibrations may be explored.

General Framing
The determination of such a numerical dynamic response under ML constitutes a rather challenging task, since it shall involve a quite sophisticated modelization and a rather intensive computation. Toward that, the Finite Element Method (FEM) shall represent a wide-spread and powerful numerical method by which an accurate description of the physical structural response may numerically be obtained.
The multi-fold history of the FEM began long time ago, likely back to the '50s, and shall be hard to be reported in complete form, especially in general terms (thus, exceeding the scope of describing modelization approaches specifically devoted to ML problems). However, four main researchers may here be mentioned, for framing, referring to the first origins of the FEM in modern terms. Firstly, Argyris [8] demonstrated the efficacy of his numerical techniques, by providing a constant matrix formulation for the principles of virtual displacements and forces. He managed to apply that method to a rectangular planar elasticity "element", even though, this term had not actually been coined yet. As reported in Gupta and Meek [73] and debated during the 5th World Congress on Computational Mechanics (WCCM V, 2002), professor Clough first baptized the method as follows (see Clough [27]): «On the basis that a deflection analysis done with these new 'pieces' (or elements), of the structure is equivalent to the formal integration procedure of integral calculus, I decided to call the procedure the FEM because it deals with finite components rather than differential slices.» However, Turner et al. [153] was likely the first scientist who exported the FEM to everyday use, while he was working at the Boeing company, over period [1952][1953][1954][1955][1956][1957][1958][1959][1960][1961][1962][1963][1964]. Initially, Turner et al. [153] demonstrated the displacement convergence characteristics of planar elements, and then, they generalized and improved the so-called Direct Stiffness Method, in order to study aircraft structures. During WCCM V, Clough also reported an important circumstance that had given a positive turn to the history of the FEM (see Clough [27]): «A 'red letter' event that occurred during this very early history of FEM was my visiting Northwestern University [in 1958] to give a seminar lecture on finite elements. When I received this invitation from Olek Zienkiewicz, who was teaching at Northwestern at that time, I expected we would have some arguments about the relative merits of finite elements versus finite differences because Olek had been brought up in the tradition of Professor Southwell. It is true that we did have some such discussions, but Olek recognized very quickly the advantages of the finite element approach. In fact I would say that my visit to Northwestern yielded a tremendous dividend in the conversion of Olek from finite differences to finite elements.» It may also be noted that Zienkiewicz likely was the first author who published a book popularizing the FEM, see Zienkiewicz and Taylor [165]. Thus, he should be considered as one of the main proponents of the consequent "technology transfer", namely the process of exporting FEM concepts from aerospace industry to a wider range of engineering applications. The FEM had then turned out to constitute a fundamental tool to acquire an in-depth knowledge on the behaviour of different physical and engineering systems. Specifically, the effectiveness of the method in describing the dynamic behaviour of mechanical systems and structural elements has been realized and considerably developed.
Up to then, the FEM had been treated without considering an essential ingredient that truly supports the efficiency of the method: digital computation. Computers have been considered as a scarce and precious resource, for several years. In fact, during the '50s, only large industrial companies managed to afford mainframe computers, because of the required dedicated and expensive infrastructures. It shall not be a coincidence that all of the mentioned FEM pioneers were likely working in the aerospace industry, at least during a part of their careers. As chronicled by Felippa [43], military companies first moved from desk calculators and punched-tape accounting machines to digital computers during the so-called "cold war". Another barrier to the diffusion of the FEM was probably represented by the secretiveness on the proprietary FEM codes developed by companies. In spite of that, the distribution of early open-source FEM software began, thanks to researchers and universities, but only around the '70s. Since then, the FEM approach has become more and more popular, within a wider amount of engineering applications.
The increasing complexity of FEM models has then required faster and faster electronic components, to deliver the demanded processing speed. As predicted by the Cofounder of the Intel Corporation, Moore [114], the number of components per integrated circuit has rapidly grown for years, inducing an increase in the available computational resources. Otherwise, this same positive trend may not last forever. Nowadays, it may be getting harder and harder to achieve further performance improvements at the same pace, because of present physical and technological limits induced by the design of micro-or nano-components. Despite that, the computational time may significantly be reduced by introducing different efficient coding strategies such as parallelism, also allowing to reach and take benefit of modern distributed resources of High Performance Computing (HPC).

Paper Scope
For the above briefly mentioned contextualization reasons, the main subject of the present work is constituted by setting a FEM structural dynamic analysis of a single slender finite Euler-Bernoulli elastic beam excited by a transverse concentrated ML, traveling at a high constant velocity along the beam. The beam is assumed to lie on either a distributed continuous (visco)elastic foundation (linear or nonlinear) or on firm discrete supports.
In particular, within the context of a small displacement (geometrically linear) theory, various numerical analyses are performed, focusing on the numerical determination of the physical response of the mechanical system, and specifically of the critical ML velocities, leading to large beam deflections, by revealing all the associated kinematic characteristics.
The scope of the paper is to investigate the physical response of such a beam-support structural system and to identify all its peculiar features, by first developing a literature framing on the several involved aspects of the relevant modelization approach and then producing a campaign of numerical simulations relying on a parallel computing paradigm. The latter philosophy is briefly introduced, as a first guideline to readers that may be involved in similar parallelization projects. Then, numerical results are obtained through an autonomous object-oriented C++ FEM implementation, specifically coded within a parallel programming environment, aiming at drastically reducing the computational cost of possible parametric analyses of the ML mechanical dynamical problem under consideration.

Work Summary
The basics of FEM and its applications to the field of ML dynamical problems are first presented, after a general discussion about competent analytical and numerical approaches developed in the last decades for studying the response of railway tracks or bridges under ML. Then, the FEM is employed to derive the characteristic matrices and vectors of a beam finite element, with or without modelling an underlying distributed foundation. Moreover, the HTTmethod is considered, as an efficient numerical integrator for determining the transient structural response, within both linear and nonlinear contexts.
FEM applications frequently require a fine, often multidimensional, discretization, resulting into several degrees of freedom and attached mechanical system unknowns. Thereby, the opportunity of exploring efficient parallel computing environments in FEM calculations has been investigated. First of all, main generalities and strategies of parallelization are analyzed, for a useful guideline. Then, an algorithm that could be employed in a generic FEM analysis is here developed and described, in order to examine the potential advantages of employing distributed computing systems. Subsequently, the algorithm is implemented within an object-oriented C++ FEM formulation, and its fundamental features are sketched, to start and to compare with a previous sequential MatLab FEM coding of the same ML modelization. Furthermore, some benchmark performance results are reported, to illustrate achieved speed-up, efficiency and scalability of the devised C++ parallel computing implementation.
At a first validation stage, the implemented parallel code is employed to carry out the analysis of the transient response of a simply supported finite beam lying on a (visco) elastic foundation subjected to a transverse concentrated ML, with either a constant or a harmonic-varying amplitude in time. A successful validation with previous recent outcomes by a standard FEM approach is demonstrated, with a much superior computational performance. Then, the dynamic response of a 1D multi-span FEM model of the upper continuous beam of a historic railway riveted iron arch bridge, the Paderno d'Adda Bridge (1889), is evaluated, by considering the structure subjected to a constant-magnitude and constant-velocity ML, referring to the passage of a locomotive along the upper continuous beam. Numerical tests allow to reveal the effects induced by the bending stiffness and the damping of the beam on the critical velocity of the ML, and to observe the whole dynamic response of the structure in space and time.
The present investigation shows, after an appropriate literature framing, that an original, efficient C++ parallel computing FEM implementation devoted to analyze ML problems allows to derive representative numerical results, which truly reveal the physical characteristics of the dynamic response of the structural system and become apt to consistently quantify the amount of vibration, in view of assessing the structural performance of the mechanical system in such a challenging dynamical context.
The literature framing and the combination of a new object-oriented C++ parallel computing FEM implementation with the mechanical and engineering implications derived from the numerical tests on beam ML problems shall represent the main contributions of this work. Furthermore, the present approach shall reveal to be feasible not only for inspecting the dynamic response of those physical systems subjected to MLs but also for representing an efficient tool to carry out complex FEM analyses, by distributed computing, independently from the specific field of application, as here considered.

Manuscript Organization
The presentation in the manuscript is organized into seven main sections, as it is outlined below, as a structural guideline toward a convenient reading. State-of-the-art reviews will be presented along the next three sections, concerning the relative topics.
Section 2 presents a discussion about several approaches developed in the last decades for studying the response of railway tracks or bridges under the passage of a fast travelling vehicle. These kinds of analyses usually concern the response of finite beams under ML and the investigation of their resonance conditions. Several criteria for classifying research works in the field of ML problems may be selected, but, in the following exposition, a methodological classification is basically adopted, based on the solution strategy employed for solving various ML problems. Therefore, literature contributions are analyzed and presented by organizing them into two main categories: analytical and numerical approaches. Accordingly, the section is subdivided into two corresponding main parts. Both subsections focus onto two different kinds of finite beam models, namely beams without underlying foundation, constrained only at the extremes, and beams also supported by a (visco)elastic foundation, with either a linear or a nonlinear behavior. Section 3 deals with the basics of the FEM and its specific applications to the field of ML dynamical problems of planar one-dimensional beams. After a brief introduction to the Weighted Residuals Method (WRM) and, in particular, to the Galerkin approach, the specific application of the FEM to ML structural problems is described, together with the derivation of the matrices and vectors of a beam finite element, with or without underlying foundation. Furthermore, the HTT-method is presented, as an efficient numerical integrator scheme for determining the transient structural dynamic response, within both the linear and nonlinear contexts.
Section 4 examines the opportunities for exploiting parallelism in FEM calculations. Three different approaches to achieve a parallel implementation of the FEM are herein presented, together with the models of computation that shall allow to select the most suitable parallel computing architecture. In the end, an algorithm to be employed in a generic FEM analysis is described, in order to take advantage of the potentialities of distributed computing systems.
Section 5 reports how the previous algorithm could be devoted to the analysis of the considered ML beam problem (and related ones). First of all, the fundamental features of an implemented object-oriented C++ parallel FEM code are briefly sketched, in order to introduce the comparison of performances between a previous sequential MatLab FEM coding, earlier developed by Froio et al. [50,51,54,56], and the current C++ parallel version of the program. Moreover, the numerical transient response of a simply supported beam lying on a (visco)elastic foundation and subjected to a ML, with either a constant or a harmonic-varying amplitude in time, is considered. In order to validate the devised C++ code, the obtained numerical outcomes in terms of critical velocities are depicted, and compared with recent reference results from the literature (Castro Jorge et al. [19], Froio et al. [56]). Finally, some benchmark performance results of the achieved parallel computing strategy are reported, to illustrate the achieved speedup, efficiency and scalability of the present object-oriented C++ parallel implementation. Section 6 outlines the final adoption of the implemented algorithm to evaluate the dynamic response of a 1D multispan FEM model of the upper continuous beam of a historical railway riveted iron arch bridge, namely the Paderno d'Adda Bridge (1889), which has recently been made the target of a wide research project concerning the determination of its present structural capacity (see Ferrari et al. [44][45][46]), in both static and dynamic mechanical contexts. The technique originally considered by designer engineer Jules Röthlisberger to conceive and calculate the bridge is briefly introduced and some technical aspects are summarized, to provide an introductory view about the architecture and structural functioning of such a monumental railway bridge. The upper box-beam of the Paderno d'Adda Bridge has been modeled as a 1D multi-span continuous beam on nine firm supports. Finally, the results of the numerical tests carried out by varying the ML velocity display the effects induced by the bending stiffness and the damping of the beam on the resulting critical velocities of the physical system. Moreover, the time history of the system subjected to a moving locomotive is depicted, through a sequence of frames and figures reporting the full dynamic response in terms of displacements, velocities and accelerations. Section 7, eventually, delivers the most prominent conclusions of the present work, by outlining its main thesis. Also, potential future developments are briefly mentioned.

Further Information
Concerning notation, terminology and adopted computational tools and resources, the following specific information may be reported, on behalf of the interested reader, possibly helping for independent reproductions of the present developments.
Statements similar to those of a typical structured programming language, such as C++, are employed in outlining the developed algorithms. Examples of such statements include: if…then…else…end if, loop…end loop, for each…do…end for. Algorithms will also here be reported as pseudocodes, which constitute an informal high-level description of the underlying operating principle of a computer program. In fact, those algorithms have been sketched during the coding phase, to assemble the present C++ parallel implementation, but they could also be followed in creating a new computational program, independently from the selected programming language and implementation platform.
All plots in this work have been generated by matplotlib [84], which is an open-source Python 2D plotting library, producing publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Instead, illustration sketches have been created by LatexDraw [9], a graphical drawing editor interface for LaTeX, developed in Java.
The final parallel implementation has been compiled and run on GALILEO and MARCONI HPC platforms of the Cineca consortium, Italy.

State of the Art on Moving Load Analysis
This section presents a brief review on the subject of the dynamic response of beams under "Moving Load" (ML), and attempts to provide a useful introductory information to this topic, by focussing on some specific aspects involved in the modelization and the relevant treatment, as detailed below.
It is common to classify a "ML problem" as that particular mechanical problem occurring when the point of application of concentrated or distributed loads acting on a certain structural element changes along time, at a constant or variable velocity. This characteristic makes the ML problem a fundamental topic of structural dynamics. Timoshenko [149] reported that in the middle of the nineteenth century there seemed to be no agreement among engineers about the effect of a ML on a beam. It likely was the collapse of Stephenson's Bridge in 1847 that accelerated the experimentation on dynamical problems involving MLs. The first approach was to build a railroad track and a carriage with two axles, in order to simulate the force acting on the rails, as reported and sketched in Timoshenko [149]. Such a test was conducted for the first time by Willis et al. [156], with the purpose of discovering the dynamic deflections of a beam under a given ML. They discovered that displacements were larger than those produced by a static load with an equal magnitude. It was pointed out that the bars could be critically damaged by a ML, while structural integrity was not compromised by the same static load. It was probably Stokes [144] the first researcher who proposed an exact solution and then provided a possible physical interpretation of the ML problem.
Since that time, many different approaches to the ML problem and more accurate modelizations have been developed. ML problems, which specifically arose in the railway framework, have found their application within several various engineering contexts, such as in the analysis of the interaction between a vehicle and a road ground, the simulation of a tool acting on a rotating craft or the examination of the interaction between brake calipers and discs. In the present computational analysis, only those cases which can be modeled as a beam subjected to a ML will be considered, and the possible presence of different kinds of underlying foundation will be taken into account.
Concerning the analysis of infinite beams, namely beams of an unbounded spatial extension, one of the most common approaches has been that of considering the steadystate vibration response of the mechanical system, by virtue of its great theoretical and practical appeal. In fact, during the analysis and design stages, a railway track may be well modeled as an infinite beam under support, in order to correctly represent the large extension of the rail and the corresponding arising wave propagation phenomena. When a steady-state approach is considered, the mathematical model may lead to a deeper understanding of the wave propagation occurrence within the analyzed structure. By supposing that the effects of transient motion due to the initial conditions have vanished, the result of such a simplification is a beam response that only depends on the action of the ML. In fact, as stated by Frýba [59], the dynamic response, in terms of transverse displacement, is depicted as a traveling wave, moving at the load velocity. Thus, by further considering a moving coordinate system, called "convected coordinate", attached to the ML position, the dynamical problem becomes a time-invariant one, and may be analyzed as in statics (see e.g. Froio et al. [55]).
Between the latter, Bogacz et al. [15] proved the existence of four sinusoidal traveling waves gathered into two groups, one ahead and one behind the load position. When the load displays a certain frequency and a velocity equal to one of the two groups of waves, the energy is accumulated in the load vicinity. So, in the ideal undamped case, the response of the beam approaches infinity, for those velocities defined to be as critical, the so-called "critical velocities".
Recently, a new Discontinuous Least-Squares FEM formulation was developed in the modelling of the steady-state response of an infinite supported taut string under ML (2ndorder differential problem), see Froio et al. [57], and then of an infinite beam under ML (4th-order differential problem), see Froio et al. [58]. These works also properly consider the effects of absorbing boundaries in terms of an efficient Perfectly Matching Layer (PML) implementation, displaying results that are fully consistent with available analytical solutions, and resolving several issues of inconsistency in existing traditional FEM formulations for steady-state problems, as in overcoming numerical instabilities that may be linked to high-velocity propagating loads and in correctly handling the far-field conditions.
Of course, the steady-state approach turns out to be unsuitable, in principle, for analyzing the response of a finite beam. In fact, an infinite beam may display a time-unlimited response while, in the case of a finite beam, the solution shall always be bounded, since, at a certain time instant, the travelling load has to leave the beam, then setting to zero the external excitation. Despite the absence of an unbounded response, in the analysis of the resonance condition, the critical velocity of a finite beam may still be defined. Two different definitions may be found in the literature. According to Chen and Huang [23], the critical velocity is then defined as the lowest between the modal resonant velocities of the system. On the other hand, it may also be defined as the load velocity inducing the maximum transverse displacement, as stated by Dimitrovová and Varandas [39].
Another characteristic feature of the dynamic response of finite beams under ML is the phenomenon of wave reflection, which is absent for the wave propagation within unbounded media. The reflection of waves induces important increments of beam displacements. As explained by Steele [143], the head wave front reflects from the far end before the ML reaches the beam extreme. So, at a certain instant of time, near the beam ends, interference phenomena occur between the incident and the reflected waves. The possibility of a constructive (build-up) interference among such waves implies that the maximum displacement for a finite beam shall be higher than that for the steady-state maximum response of an infinite beam.
During the sequel of the present section, the variety of solution approaches adopted by different authors in dealing with the analysis of the response of finite beams under ML and the investigation of their resonance conditions will be illustrated. Several criteria for classifying the research works in the field of ML problems may be adopted, see for instance the review information already included in Froio et al. [55,56]. In the following exposition, a methodological classification is chosen, namely that based on the solution strategy employed toward solving the ML problem. Therefore, literature contributions will be analyzed and presented by organizing them into two main categories: • Analytical approaches; • Numerical approaches.
Due to the large amount of publications on the topic, the interested reader may as well consider dedicated comprehensive literature reviews, as proposed e.g. by Wang et al. [155], Ouyang [123] and Beskou et al. [13].
Accordingly, the present section is divided into two main subsections: Sect. 2.1 presents the analytical formulations developed to solve ML problems involving finite beams, while Sect. 2.2 discusses the numerical approaches adopted for the solution of the same ML dynamical problem. Further, both sections will focus onto two different kinds of finite beam ML problem modelizations: • Beam without foundation, constrained at its ends; • Beam resting on a (visco)elastic foundation, constrained at its ends.

Analytical Approaches
As above mentioned, dynamic analyses on ML problems started in the nineteenth century, while during the 1980s, the first numerical experimentations began. Analytical investigations continued because of the crucial need of gaining a better understanding of the problem and making a benchmark reference for numerical tests. A collection of analytical solutions about several different ML problems may be found in the classical monograph by Fryba [59], while some others are presented in the following discussion.

Analytical Finite Beam Without Foundation
Out of many alternatives adopted in the modelization of road and railway bridges, a widespread one is the simply supported beam model, under a ML travelling with a constant velocity, as in Fig. 1; though, other well-posed types of boundary conditions, even elastically compliant ones, may similarly be considered. This model was probably first solved by Krylov [99] and, shortly after, by Timoshenko [147], who applied the method of harmonic analysis. Other solutions worthy to be mentioned are those by Jeffcot [89], Inglis [85], Lowan [102], Kolousek [98]. Some years later, Fryba [59] obtained the dynamic response of the beam by using the Finite Fourier sine transformation and dwelling on the concept of critical velocities, as those values leading to a resonance effect recorded within the structure.
Such pioneering works encouraged further investigations on the subject, such as the simulation of vehicular traffic loads on a bridge. Iwankiewicz and Sniady [87] developed a technique that was able to predict the variability of the beam deflection from a new stochastic viewpoint. Force arrivals were assumed to be described as a Poisson process of events. They argued about the validity of the Poisson process to idealize the traffic load and sought for an explanation of the apparent induced effects. In fact, they found that the value of beam deflections decreases with an increasing force speed. It was clear that the average of the total loads on the beam decreased with an increasing speed. Actually, it was well-known yet that the dynamic response to a single ML is amplified by an increasing velocity.
Olsson [121] discussed the dynamical problem of a simply supported beam subjected to a constant-amplitude force moving at a constant speed. Throughout that paper, an analytical solution based on the separation of variables was presented and compared with those numerically obtained by the FEM. He noticed that the ratio between the transverse deflection, w(x, t), and the same displacement evaluated in the middle of the beam, w(L/2, t), depends on three non-dimensional parameters: x/L, t∕ and , where is the traversing time of the moving force and = T 1 ∕(2 ) in which T 1 is the period of the lowest vibration mode of the beam. Typically = 1 corresponds to a vehicle velocity of v = 400 ÷ 1500 km∕h , depending on the structural flexibility/rigidity, while = 0 corresponds to the static case. Time histories for mid-span displacement and moment were displayed, highlighting a similar behavior. However, the moment histories seemed to be more irregular, and a noticeable negative mid-span moment was obtained, when the force entered the beam. In the end, the size of discretization errors introduced by a FEM approach were pointed out, in contrast with the exactness of the analytical solution.
The vibration of simply supported beams subjected to the passage of high speed trains was investigated by Yang et al. [161]. The analytical solution was obtained by modeling the train as the composition of two subsystems, traveling with a constant speed, one for the front and the other for the rear assembly of the wheels. A closed-form solution for the dynamic response of the beam was derived, the resonance condition was found and some criteria were given to select the span length and the cross-section of the beam.
Abu-Hilal and Zibdeh [2] analyzed an elastic homogeneous isotropic beam with general boundary conditions traversed by a ML, with accelerating, decelerating and constant velocity types of load motion. The dynamic response in a closed-form solution employing the modal form of the transverse displacement was obtained. Different cases of boundary conditions and damping were presented. It was shown how transverse vibrations were amplified by a uniformly accelerated or decelerated ML. Moreover, the influence of boundary conditions on the value and position of the maximum displacement along the beam was highlighted.
In a next article, Abu-Hilal and Mohsen [1] introduced a harmonic ML, to study how its frequency affected the maximum dynamic deflection. When the frequency was equal to the first natural frequency of the beam, displacements reached their maximum values. Also, the velocity of the load turned out to be significant: the maximum response decreased by an increasing load velocity, because the acting time of the harmonic load became shorter.
For the purposes of track maintenance, the phenomenon of resonance was also investigated by Yau and Yang [162]. The vertical acceleration response was taken into account, in case of a simply supported beam, under a series of MLs at a high constant speed. Considering a train as a series of MLs with regular intervals, a closed-form solution was developed, by a superposition method. The results indicated that the contributions of the higher modes to the beam acceleration could not be neglected, for beams with a light structural damping. In particular, the maximum acceleration of the beam did not occur at the mid point, when resonance was excited by the higher mode frequencies, which corresponded to the characteristic frequency of occurrence of the MLs.
Museros et al. [116] studied not only the beam free vibrations but also the effects of resonance and cancellation phenomena. If the resonance phenomenon occurs because of the continuous build-up of the modal responses of the bridge, the cancellation phenomenon implies that waves may annihilate each other. A new closed-form expression was provided, for the cancellation speed of a generic mode of a simply supported beam. It was of a great interest to a priori know the speeds of the maximum free vibrations or cancellations, in order to conduct experimental tests on real structures. In fact, with this information, it would be made possible to obtain refined measurements of the dynamic properties of bridges, such as for estimating the amount of inherent structural damping.
A closed-form solution for evaluating the dynamic behavior of a general multi-span Euler-Bernoulli beam was derived by Johansson et al. [90]. A bridge was modeled as a certain number of simply supported beams whose behavior was controlled by a Partial Differential Equation (PDE), connected to adjoining beams by means of boundary conditions. The PDE was first transformed into a modal equation, assuming it to be a linear combination of normal modes, thanks to the expansion theorem. Then, the natural frequencies and the normal modes were determined. Furthermore, the modal equation was solved within the frequency domain, using a Laplace transform, to arrive at a closed-form solution. Finally, the displacement, velocity and acceleration responses were obtained within the time domain by an inverse Laplace transform, and subsequent derivation with respect to time.
Xia et al. [159] investigated the mechanism of vibration resonance and cancellation for a simply supported bridge subjected to a ML series. The analysis was theoretically led, by using free vibrations, and then verified through a FEM simulation. It was explained that the superposition of vibrations induced by a series of MLs could cause a resonant vibration, when the natural period of the bridge turns out equal to the time interval between the adjacent loads. Two types of cancellation effects were detected in the observed case. The first one dealt with the free vibration of the bridge induced by a single moving force. This free vibration may cancel out by itself, whether the load speed satisfies a relationship between the span length and the natural frequencies of the bridge. The second one implied that, under a certain relationship between the load speed, the load interval and the natural frequencies of the bridge, the free vibrations of the beam induced by a series of MLs may cancel out. Furthermore, an analysis of the case in which both resonance and cancellation occur was provided. Proofs were produced about the fact that the cancellation plays a dominant role in getting rid of the resonance condition and then giving some important parameters toward bridge design.
Kumar et al. [100] studied the cancellation phenomenon characterizing the free vibration response of a simply supported beam, after the transition phase of a single concentrated ML, providing a simple closed-form expression for both lightly and heavily damped beams. Expressions for the cancellation speeds were derived, finding a relationship between the free-vibration amplitude, the phase angle and the initial conditions. Di Lorenzo et al. [32] dealt with the vibration response under ML of uniform Euler-Bernoulli beams, with translational supports and rotational joints obeying to a Kelvin-Voigt viscoelastic behavior. The aim of that paper was that of developing an analytical method that could handle loads traveling on multi-span beams. A novel modal superposition approach was introduced, to provide an analytical expression of the beam response. The discontinuities due to the supports and joints were handled using the theory of generalized functions. Two examples of application were presented by the authors: a three-span beam, held by two viscoelastic translational supports, assuming clamped-clamped boundary conditions, with rotational dampers at the ends of the beam; a simply supported four-span beam, with Kelvin-Voigt viscoelastic translational supports and rotational joints. For both tests, the deflection profile of the beam at different time instants was determined.
An original approach to the ML problem on an Euler-Bernoulli beam with Kelvin-Voigt viscoelastic translational supports and rotational joints was proposed by Adam et al. [3]. Additionally, the structure was equipped with viscoelastic Tuned Mass Damper (TMD) control devices (see, e.g., Salvi and Rizzi [134][135][136], in the context of non-stationary and seismic excitation; Pioldi et al. [126] and Salvi et al. [133], in the field of Soil-Structure Interaction and modal dynamics system identification within that; Salvi et al. [137,138], in the realm of controlling the mechanical response to pulse-like excitations), as a mean of reducing the vibration amplitude induced by MLs. In that paper, a closed-form analytical response was achieved, within the time domain, for those mechanical systems subjected to MLs travelling with a constant velocity, in the presence of any number of TMDs, supports and joints. In order to treat the discontinuities given by the rotational joints, the theory of generalized functions was used; so, the exact complex eigenvalues and eigenfunctions of the beam were obtained, by imposing an orthogonality condition. In that way, all response variables were expressed by time-domain convolution integrals. Furthermore, no numerical integration was required, as instead performed in other FEM approaches. The proposed exact solution may be employed as a benchmark one for applications with standard FEM codes.

Analytical Finite Beam with Foundation
The models without foundation could be enriched by the insertion of an underlying (visco)elastic supporting medium, as portrayed in Fig. 2.
Among the existing mechanical models, such as those of Pasternak or of Kerr, or that of the underlying elastic continuum, the Winkler foundation one owns a leading position among engineers and researchers, because of its mathematical simplicity and applicability to several practical situations. As described in Winkler [157], Winkler's assumption was to consider the foundation as an array of infinitely closed linear elastic springs with a symmetric behavior under tension and compression (see also competent review information provided in Froio and Rizzi [52]). Winkler's elastic stiffness coefficient k(x) (N∕m 2 ) , also known as foundation modulus, was introduced as where r(x) (N∕m) is the transverse reaction force acting on the beam, w(x) (m) the deflection of the beam and x (m) a longitudinal spatial coordinate. Thanks to this model, it was possible to investigate the interaction between a beam and the underlying foundation, as elastically deformed by different kinds of static and dynamic loadings.
Regarding the Winkler model, the stiffness of the elastic springs is usually assumed to be constant but, in the literature, some examples of a variable stiffness coefficient could be found, either in the static or in the dynamic analysis. A static analytical solution was presented by Froio and Rizzi [52], for a simply supported Euler-Bernoulli elastic beam lying on a variable Winkler elastic foundation. The analytical solution, in a nonlinear case describing the Winkler foundation modulus as was presented in closed form, and a complete parametric study was carried out to inspect the influence of the mechanical properties on the behavior of the beam-foundation system. The analytical solution was in perfect agreement with numerical predictions but the advantage of owning an exact solution shall become evident, when a numerical method shall need to be validated or also in view of practical applications.
Similarly to such a previous work, Froio and Rizzi [53] later considered the analytical static bending response of a finite uniform free-free Euler-Bernoulli beam resting on a Winkler elastic foundation with a linear variation of the support coefficient, as e.g. resembling the configuration of a vertical pile embedded into the ground. The stiffness coefficient was then described through the following relation in which k 0 and k L (N∕m 2 ) are the point-wise values of the spring stiffness, respectively, at the top and bottom ends of the beam. That paper dealt with the explicit closed-form analytical solution of a beam-foundation system loaded by a force and a moment applied at one beam's end.
Pavlovic and Wylie [125] investigated the natural vibration response of a beam restrained by an elastic foundation of a linearly varying modulus along the beam span, by using the infinite power series method. A completion of the previous method for an arbitrary variation of the foundation modulus, even non-analytical, if the singularity is of a regular type, was presented by Foyouzat et al. [49], by using the Frobenius theorem.
Dealing with the analytical investigation of the ML problem with underlying foundation, Dimitrovová and Varandas [39] implemented two analytical approaches, in order to study a beam, supported by a foundation presenting a stiffness change, subjected to a force moving with a constant velocity. The analysis of that case shall become fundamental, because of the constant diffusion growth of high-speed lines. A strong variation in the stiffness of the track-foundation system causes excessive ground vibration, which may compromise vehicle stability and cause passenger discomfort. When the supporting structure changes, additional waves are generated and the dynamic response may significantly become much severe. Two different ways of analytically solving the problem are collected in that paper: a generalized method of integral transformations and a combination of the differential equations used to describe the two half beams. The analysis was performed with a parametric approach and it displayed that the amplitudes of the maximum displacements are highly amplified passing from the softer to the harder region.
The response of a simply supported beam on an elastic foundation subjected to repeated moving concentrated loads was obtained by Amiri and Onyango [6], by means of a Fourier sine transformation. The procedure and the effects of some parameters were illustrated with numerical examples, adopting two concentrated loads moving along the beam. It could be seen that the results were affected by the foundation stiffness, the traveling speed and the magnitude of the MLs. In particular, an increase in speed could lead to amplify the dynamic deflections, for a characteristic range of velocities, and an increment of foundation stiffness could bring to higher values, either the natural frequencies or the critical speed of the mechanical system. Dimitrovová [33] meant to deepen the effect of changes in stiffness of the foundation. For instance, the effects on the dynamic response of the beam were analyzed, to achieve a better understanding on tunnel transitions or on phenomena related to entering or leaving bridges. In that paper, the transverse vibration was examined, as induced by a load moving at a constant speed along a finite or almost infinite (very long) beam resting on a piece-wise homogeneous visco-elastic foundation. The changes in stiffness foundation were considered, together with the discontinuities in the supporting structure created by track degradation or alteration of structural design. The governing equations were solved by normal-mode analysis and Laplace-Carson integral transform. Using the dynamic stiffness matrix method introduced by Chen et al. [24], the natural frequencies were determined and the solution was then established, adding an intermediate region of adaptable foundation stiffness.
Dimitrovová and Rodrigues [38] carried out an investigation on a finite or almost infinite (very long) simply supported beam possibly composed of two subdomains. The governing equations were solved by a modal expansion technique, both considering Euler-Bernoulli and Timoshenko beam theories. Moreover, the critical velocity and the influence of damping on it were taken into account. It was demonstrated that the critical velocities of a finite beam constitute an overestimation of those for an infinite one, because of the effects of wave reflections. For the finite beam composed of two sub-domains, the harder sub-domain acquires another displacement peak, which corresponds to the critical velocity of the softer sub-domain. Finally, it was stated that also a low level of damping is enough to smoothen the resulting peaks, for both the upward and downward displacements.
Investigating the behavior of another type of foundation, different from the Winkler one, Dimitrovová [34] derived a new formulation for the critical velocity related to a beam supported by a foundation of a finite depth, instead of a continuously supported beam resting on a uniform layer of closely spaced, mutually independent linear elastic springs. A simplified plane model of the foundation was built and some results were presented, either for the finite beam or for the infinite one, using also some different values of damping. Thanks to a parametric analysis and to the introduction of three dimensionless parameters (velocity ratio, shear coefficient and mass ratio), the new formulation was validated and it was found that the mass ratio, defined as the square root of the fraction of foundation mass to beam mass, strongly influences the resulting critical velocity.

Numerical Finite Beam Without Foundation
To the Authors' knowledge, the FEM was first applied to a ML problem by Yoshida and Weaver [163]. An analysis technique was developed to predict the dynamic response of lightweight highway bridges due to the passage of semitrucks and other large vehicles. In that paper, the FEM was used to model continuous beams, with different support conditions, traversed by constant-velocity and constant-acceleration MLs, with or without the influence of a mass associated with the load. The moving-force problem was converted into a finite number of differential equations of motion, for the discretized structure, and then the normal-mode method was employed to obtain the displacements of the nodes in terms of a Duhamel integral. The numerical integration of the second-order equations was carried out with the linear acceleration method based on the assumption that the nodal accelerations linearly vary during a small time interval.
A general bridge-vehicle finite element was defined by Olsson [120], in order to investigate the arising interaction phenomenon. Assuming a linear structural behavior, that specific finite element displayed time-dependent and unsymmetric matrices and it was used to obtain a modal formulation of the bridge motion. The advantage of such a technique was that of reducing the number of the motion equations, truncating the higher vibration modes. So, although the modal equations were coupled, the set of linear equations with a time-dependent coefficient was made solvable through the Newmark method.
Rieker et al. [130] studied the discretization for FEM models related to a ML acting on an elastic beam. That work investigated the relationship between model accuracy and number of elements used to represent the beam. The discretization turned out to be important in establishing the size of the computational error and the elapsed time in the developed simulation. It was stated that the interpolation error potentially affects the calculation of the equivalent nodal reactions, of the transmitted force, from a moving sprung mass, and of the overall system response. A comparison was made, in order to establish the number of finite elements required to achieve an acceptable static or dynamic deformation analysis. The results showed that for a MLtype problem, the mesh density must be increased, when the traveling speed increases. Moreover, adaptive meshing may be required in the area around the moving force, to ensure high accuracy, within a reasonable computational cost.
An exact dynamic stiffness element under the framework of a FEM approximation was presented by Henchi et al. [75]. That paper was about the dynamic response of multi-span structures under a convoy of MLs. Using the virtual work expression for the transverse vibrations of Euler-Bernoulli beams, the weak formulation for a discretized structure was obtained. Instead of employing the traditional Hermitetype polynomial functions, they introduced a characteristic nodal approximation, which led to an exact evaluation of the frequencies and the modes. The dynamic response of multi-span beam-structures could be reduced to a numerical solution, using the discrete Fourier transform implemented as a FFT algorithm.
Wu et al. [158] studied how to improve the control of mobile gantry cranes used in ports and rail-head freight yards. A one-dimensional model of a gantry crane was developed, employing a simply supported beam subjected to a single ML or a pair of forces. A computer program was written, to calculate the time-variant external forces on the whole structure, providing the equivalent loads that move around it. In that way, a model was assembled and it was able to reveal how the forces acted on the structure, in order to correctly size the crane.
A dynamical simulation of multi-span bridges modeled as simply supported beams was developed by Ju and Lin [91]. The three-dimensional model represented a vehicle-bridge system and it analyzed the passage of a high-speed train. Three-dimensional beam elements were adopted and the Least-Squares method was employed to calculate the stiffness, the damping and the mass matrices. A SKS-700 highspeed train transit was simulated, using the non-linear Newmark direct integration method, which converged, averagely, within two Newton-Raphson iterations. Then, the resonance effect was investigated; so, an analysis was performed to find the bridge natural frequencies. It was proven that, to avoid resonance effects, the dominating train frequencies should be as much different as possible from the bridge natural frequencies. In particular, if the two first frequencies are similar, bridge resonance may cause serious damages.
Martínez-Castro et al. [105] presented a semi-analytical solution of the ML problem applied to multi-span uniform and non-uniform beams. The beam was spatially discretized using conventional Euler-Bernoulli finite elements with two nodes, individually owning two degrees of freedom. Linear variations of the area and depth of the cross-section were assumed, leading to a linear variation of the mass per unit length and a cubic variation of the second moment of inertia of the cross-section. When the element equations of motion were built, the assembly of each contribution led to a system of linear differential equations with constant coefficients and an analytical right-hand term. In fact, in that method, the load term is a vector, with an analytical polynomial expression, defined on a temporal interval equivalent to the time spent by the ML on a single element. Then, the differential equation was defined for the nth mode and the analytical expression was obtained in closed form, piecewise defined for every different time interval. So, the only approximation introduced in the procedure came from the spatial FEM discretization of the beam, while the solution was expressed in terms of ten coefficients per element and per mode, the values of whose were independent from the travelling speed of the ML. This semi-analytic procedure was applied to the analysis of a set of simply supported as well as continuous multi-span beams, by implementing a FORTRAN computer code.
Salcher and Adam [132] proposed a three-dimensional mechanical model for analyzing the dynamic interaction of a train passing on a railway bridge at high speed. For the complexity of the system, the equations of motion were separately derived, for the bridge and the vehicle subsystems. Subsequently, a coupling of the bridge with the train was achieved through a Component Mode Synthesis (CMS) methodology, which is an efficient sub-structuring technique based on modal analysis. The full FEM bridge model was built using 3D elements with six degrees of freedom per node, while Euler-Bernoulli finite beams were chosen to represent the rails. They considered the train made up of a certain number of wagons, modeled in five different parts: the primary and the secondary suspension systems and the passenger, the bogie and the wheel stages. Once the equations of motion for each subsystem were obtained by employing the Lagrange equations, the CMS method was applied. It consisted of three different steps. In the first step, the equations of motion of the subsystems were separately derived, in nodal or generalized spaces. Then, the component mode coordinate transformation was formulated, according to the compatibility conditions between the train and the bridge. Finally, the coupling was fulfilled through a coordinate transformation, imposing the equilibrium conditions defined at the interface. Moreover, using an ABAQUS FEM model, they displayed the dynamic response of a simply supported single-span steel railway bridge, taking into account also the irregularities of the rail profile. Additionally, the maximum absolute deflection was presented at different locations of the bridge, as a function of the travelling speed; also, they displayed some mode shapes of the bridge model.

Numerical Finite Beam with Foundation
A FORTRAN program was implemented by Thambiratnam and Zhuge [145], for the dynamic analysis of a beam on an elastic foundation subjected to MLs with a constant propagating velocity. Using the Lagrange's equation, the element matrices and vectors were obtained, and the equations of motions were integrated within the time domain, by the Newmark method. Some parameters such as the length of the beam, the speed of the ML, and the magnitude of the foundation stiffness were studied, to investigate their influence on the recorded dynamic response. The procedure was applied to the analysis of railway tracks, approximating the response of an ideal infinite beam with a (long) finite beam FEM structure. Furthermore, the effects of two consecutive MLs were taken into account. The reported beam deflections were similar to those detected for a single ML, even though the time duration of the beam response was significantly increased.
Chang and Liu [22] performed a deterministic and random vibration analysis on a nonlinear beam on an elastic foundation, excited by a ML travelling with a constant velocity. In order to obtain the dynamic response of the beam, the Galerkin method was employed, together with the FEM. The nonlinear system of differential equations was linearized, by using the incremental method and it was solved by the implicit form of the Newmark time-integration algorithm. Using a Monte Carlo approach, the randomness of the beam profile, in terms of beam axis variability was studied. Then, the responses coming from the linear and nonlinear beam models were compared: it was displayed that the standard deviation of the midpoint deflection of the beam for the linear model was always the largest among those of the nonlinear models.
The dynamic analysis of a pre-stressed Euler-Bernoulli beam resting on a two-parameter elastic foundation was described by Kien and Hai [95]. The FEM was used to investigate the response of the beam under a moving harmonic load. The finite beam stiffness matrix was found, by employing cubic Hermitian polynomials, as interpolation functions for the deflection. Then, the expression of the potential energy due to beam bending, the foundation stiffness and the potential due to the axial load were considered. The foundation stiffness was characterized by either the stiffness of Winkler springs or the contribution of a shear layer. So, that model showed a greater accuracy in modelling the effect of the foundation support on the structure. The dynamic response of a simply supported beam was obtained, with the aid of Newmark direct integration method. Using the above-mentioned formulated element, the dynamic analysis of the beam with different values of axial force, ML frequencies, foundation parameters and velocity was performed, in order to investigate the changes in terms of natural frequencies. Two years later, Kien [96] carried out the same kind of analysis but employing a Timoshenko beam model.
Ansari et al. [7] treated the vibration of a finite Euler-Bernoulli beam supported by a nonlinear viscoelastic foundation traversed by a ML. The Galerkin method was used to discretize the system, by obtaining a local expression of the transverse displacements. The nonlinear foundation presented in that article followed a constitutive law different from classical linear Winkler's one. It was expressed as where r nl [w(x, t)] is the force generated by displacements w(x, t) from the equilibrium configuration, k l is the linear part of the foundation stiffness, while k nl is the nonlinear one. The dynamic response was obtained for different harmonics using the Multiple Scales Method (MSM), as one of the perturbation techniques. Then, the effects of the damping, of the ML magnitude and of the nonlinear stiffness on the achieved solution were examined through a parametric analysis. Like that, the resonance condition and the frequency responses of different harmonics were displayed.
Ding et al. [40] dealt with the convergence of the Galerkin method as applied to the dynamic response of an elastic beam resting on a nonlinear dissipating foundation. The latter was represented by a constitutive equation that was similar to that in Eq. (4) but with an additional parameter, in order to introduce a coupled presence of damping: in which c is the damping coefficient of the foundation. So, using Hamilton principle and Euler-Bernoulli theory, the governing differential equation of motion was written for the beam. Then, adopting the Galerkin truncation method, the system was discretized. By a fourth-order Runge-Kutta algorithm, the spatially discretized equation of motion was solved, for different types of beam constraints: clampedclamped, free-free and simply supported. Considering the UIC60 European high-speed rail for their tests, the effects of foundation stiffness, beam length and damping coefficient on the dynamic response were investigated. It was found that the effects of the boundary conditions became smaller, when longer span lengths of the beam were chosen. Moreover, the convergence of the method increased with the growing modulus of elasticity of the beam, nonlinear foundation parameter k nl and span length, while decreased with greater k l and c parameters.
Castro Jorge et al. [19] focused on the dynamics of simply supported beams on a non-uniform nonlinear foundation acted upon by a moving concentrated force. The effects of the load's intensity and velocity, of the damping and of the foundation stiffness on the critical velocities were investigated. The analysis was based on a FEM formulation of the problem, built on the Lagrange equations. Considering the elastic strain energy of the beam and the foundation, and then the kinetic energy of the finite beam, the element matrices were built. Dealing with a nonlinear foundation governed by Eq. (4), the potential due to the foundation forces acting on a finite element was found. It was possible to describe the nonlinear behavior of the system, from the equations of motion, which contained the nonlinear force vector expressed as a function of the generalized displacements of the finite element. Viscous damping was taken into account by considering classical Rayleigh damping. The governing differential equations of motion were solved by means of the HHT -method, implemented within a MatLab environment [146].
Some meaningful conclusions were derived, for both uniform and non-uniform foundations, which consisted of two sub-domains, with different stiffness constants. A reduction of the maximum displacements and a higher critical velocity could be observed if the foundation stiffness increased. Instead, the damping had no apparent effect on the critical velocity, even though it reduced the maximum values of the resulting displacement. Furthermore, in case of a two subdomain foundation, independently from the position of the stiffer region, the maximum displacement took place when the load was on the second sub-domain. Then, it was noticed that the second sub-domain exhibited two critical velocities: its critical speed and that of the first sub-domain, as shown by Dimitrovová and Rodrigues [38]. The important aspect of the presence of friction and its effects were further considered by Toscano Corrêa et al. [151,152]. Froio et al. [51] investigated the transient response of a simply supported Euler-Bernoulli beam resting on a homogeneous in space Winkler elastic foundation. Two types of foundation constitutive law, linear and cubic, were taken into account. Moreover, a beam subjected to a transverse concentrated load moving at a constant velocity with a harmonicvarying magnitude was considered where F is the reference amplitude, Ω the angular frequency of the ML variation and t the generic time instant. For the purpose of solving the governing linear/nonlinear partial differential equation of motion, a FEM approach was employed, and the system was discretized in space using cubic Hermitian polynomials, as interpolation functions for the unknown transverse displacements. Finally, the dynamic response was obtained by a HTT-algorithm. A code was implemented within MatLab and the output was validated using the results earlier reported by Castro Jorge et al. [19]. Furthermore, the effects given by the velocity and the frequency of the load were reported, portraying the results for the damped and the undamped cases. It was noticeable that the critical velocities might be one or two, in case of a harmonic ML. The lower critical velocity approaches zero at a very high load frequency, while the higher critical velocity moves toward higher values. Then, using appropriate analytical bifurcation curves, the relationship between the ML frequency and the critical velocity of the beam was derived. The resulting curves showed a very good agreement to those provided by Chen et al. [25], who employed an analytical approach.
Froio et al. [54] further extended the results earlier obtained by Froio et al. [51], through a parametric analysis on the dynamic response of a simply supported elastic beam resting on a spatially homogeneous nonlinear cubic (visco)elastic foundation. Thanks to extensive numerical simulations implemented into an autonomous FEM code, the dependency of the critical velocities on the mean value of a harmonic ML was explored. This time, the harmonic force was described as where the mean value of the force was defined as F 0 = F , F being the reference amplitude of the force and a parameter employed within the parametric analysis. First of all, the linear and nonlinear foundation results were compared. It was revealed that, for a constant-magnitude ML ( Ω = 0 ), for the linear model, the peak corresponding to the critical velocity was proven to be independent from ratio ; however, for the nonlinear foundation, it was recorded that the peak moves to higher values of critical velocities, at increasing . On the other hand, for a zero-mean harmonic ML ( = 0 ), the peak bifurcates into two distinct peaks, considering either the linear or the nonlinear model. Then, the much salient result was the manifestation of three different critical velocities, when mean value F 0 of the acting ML was different from zero ( ≠ 0 ). It was also noticed that the lowest and highest critical velocities tend to separate, as the load frequency increases, while the central one remains stationary for every Ω , at the value corresponding to Ω = 0.
Up to now, the cited papers were concerning beams lying on a foundation with a symmetric constitutive law, so with the same behavior under tension and compression. In the following, a bilinear foundation is further presented, in order to obtain more realistic models, which not only rely on the amount of displacements induced by the forces on the beam but also on their versus. A bilinear foundation exhibits two distinguished linear force-displacement branches, for the upward and downward displacements, obeying to the law where w(x, t) is the transverse displacement of the beam, r[w(x, t)] the reaction of the foundation, H(⋅) the Heaviside function, k + the tension stiffness and k − the compression stiffness, as shown in Fig. 3.
Very few researchers seem to have addressed the problem of a bilinear foundation. Between them, Castro Jorge et al. [18] extended the analysis presented in Castro Jorge et al. [19], to the beam lying on a bilinear elastic foundation. Using a previous FEM approach, it was explained how to take into account the contribution of the foundation to the internal forces, so as to generate the correct tangent stiffness matrix, employed within the direct integration of the governing equations. Several analyses were performed, testing different values of load velocity and foundation stiffness. It was observed that when the stiffness to the upward motion decreases, the upward displacements of the beam increase, while the critical velocity decreases.
Froio et al. [56] concerned with the numerical modelization of the transient dynamic response of a simply supported Euler-Bernoulli beam resting on a bilinear Winkler-type elastic foundation. In that paper, the effect of a transverse concentrated load with a zero-mean harmonicvarying magnitude in time was displayed. Employing a FEM approach coupled with a direct integration algorithm, extensive numerical analyses were performed, to investigate the influence of the ML frequency and of the foundation moduli on the transverse deflections of the beam. Several numerical simulations were carried out, after convergence studies designed to establish the proper space and time discretizations. Analyzing the system response to a frequency variation, it was found that if the ML displays a constant magnitude, only one critical velocity appeared. Increasing the value of the load frequency, two critical velocities were instead observed. The two critical velocities tended to separate: the higher one increased, starting from the value obtained for a constant-magnitude ML, while the second one decreased, until it reached zero for a ML frequency equal to the first natural frequency of the beam.

FEM Formulation in the Context of Moving Load Problems
The Finite Element Method (FEM) is one of the most common and widespread numerical methods for determining useful approximate solutions to Partial Differential Equations (PDEs), in various fields of engineering and mathematical physics. Indeed, when intricate geometries, complicated loading conditions and complex material constitutive laws, often nonlinear, may be involved, the derivation of analytical solutions may turn out prohibitive or impossible. Such difficulties become even more accentuated if the sought quantities, which could be a temperature, a displacement or a stress field, may display a multi-dimensional space distribution and a temporal variation. The FEM is based on the concept of approximating the unknown solution as a linear combination of given "shape functions", thus traducing the weak form of an initialboundary value problem for a given PDE, or a system of PDEs, into an ordinary differential/algebraic system of equations (linear or nonlinear), becoming amenable to effective numerical solutions. Based on a convenient approximation, the real power of this method is the ability to transform a continuous differential problem into a discrete algebraic one, endowed with a finite set of unknowns.
In this section, the basics of the FEM, as related to its application to the field of ML dynamical problems, with specific reference to the context of planar one-dimensional beams are outlined. A brief introduction to the Weighted Residuals Method (WRM) is first provided in Sect. 3.1. Then, the general FEM formulation is overviewed in Sect. 3.2, through a classical Galerkin's approach. The specific application of the FEM to ML structural problems is described in Sect. 3.3, where matrices and vectors related to a beam finite element, with or without underlying foundation, are derived. Finally, in Sect. 3.4, the HTTmethod is introduced, as an efficient numerical integrator, useful for determining the transient beam dynamic response, within both the linear and nonlinear contexts, with pertinent algorithm representations.

A Brief Introduction to the Weighted Residuals Method
With the purpose of deriving a FEM formulation for a given system of differential equations, a variational or weak statement of the problem is needed. In order to obtain a weak formulation, different approaches could be followed. In that sense, the Virtual Work Principle shall constitute a powerful tool, endowed of a physical appeal, providing a convenient framework for producing a general FEM approximation.
Other approaches for deriving a weak form are based on variational principles, according to which specific functionals are minimized (for instance, stationary of the total potential energy, Euler-Lagrange equations, stationary of Least-Squares functionals, see e.g. Froio et al. [57,58]). Furthermore, the Weighted Residual Method (WRM) may be applied. This latter mathematical framework is adopted and discussed, through the classical concept of weight functions, in order to derive the weak governing equations from which FEM matrices and vectors may be derived, as illustrated in the following subsections. As e.g. introduced by Cook [28], a mathematical statement of a physical problem in its strong form may be formulated as where D and B are two differential operators, x is the independent variable or a set of them, u is the unknown solution of the problem (for instance, a displacement field), and f and g are two given functions of x. The domain of definition of the problem has been labeled as Ω , while Ω marks the frontier of Ω , on which boundary conditions are set. In the context of transverse bending of a beam, Eq. (9) becomes in which q(x) is a distributed transverse load applied onto the beam, w(x) is the beam's bending deflection and M(x) and V(x) are the prescribed values of bending moment and transverse shear force at the ends of the beam.
Using the appropriate boundary conditions and finding the correct expression of u(x), the differential relation in Eq. (9) shall exactly be satisfied at each point of the physical domain. Let instead ũ(x) be an approximation of u(x). Approximate solution ũ(x) could be conceived as a linear combination of appropriate "shape functions" Ψ i (x) , with coefficients a 1 , a 2 , … , a n to be determined. So, ũ(x) turns out to be a linear combination of n terms, where the ith term is multiplied by coefficient a i : The effect of such an approximation implies that the left hand side of Eq. (9), evaluated for ũ(x) , may result in nonzero residuals When such residuals approach zero, the approximation improves its quality, as ũ ≃ u.
According to the WRM, the best solution satisfies the governing equations in their following weak form: where each v i (x) is a weight or test function, which may be defined as Indeed, in the Bubnov-Galerkin method, commonly known as Galerkin method (see e.g. Bathe [12]), weight functions (11), so that Eq. (13) turns out to be Otherwise, the Petrov-Galerkin method may be considered, where test functions v i (x) differ from shape functions Ψ i (x) . Equation (13) means that, over a region of interest, the weighted residual takes a vanishing average value, because the best approximate solution is chosen from trial family ũ =ũ(a i , x).

FEM Problem Formulation
It is here shown how Eq. (15) could be solved through a standard Galerkin FEM formulation. As e.g. reported by Zienkiewicz and Taylor [165], the FEM is based on a discretization of the physical domain ( Ω ) into appropriate (finite) subdomains ( Ω e ) called finite elements. As Fig. 4 ideally illustrates, the original physical domain is represented by discretized domain Ω , defined as the assembly of elements Ω e where Ne is the overall number of finite elements in the discretization. Similarly, domain boundary Ω is approximated as where Nb is the total number of elemental edges belonging to the boundary.
The quality of the approximation depends on the type of employed finite elements and their overall number. In the limit, it shall be possible to provide an exact representation of a continuum by using an infinite number of finite elements. Now, suppose to handle the following differential problem, stated in general matrix form where is the sought function, which may be a scalar quantity or a vector of variables, and A i is a linear or linearized differential operator. The boundary conditions of Eq. (18) may also be stated as where B i is a generic differential operator for the boundary conditions. The FEM representation seeks the approximate solution as where is the vector of the shape functions, usually locally defined on the elements or subdomains, and is the vector of the unknown coefficients. Using the WRM presented in Sect. 3.1 and considering a single element (e), the scalar product between e and Eq. (18) has to vanish at each point of Ω e . So The boundary conditions in Eq. (19) simultaneously need to be satisfied; then, it shall be required that Actually, the previous system of equations is satisfied if the following relation turns out to be fulfilled: Remembering the previous discretization of the domain stated in Eqs. (16)-(17), Eq. (23) has to be solved on each subdomain; so, it could globally be rewritten as

Problem Formulation
The ML problem could be formulated in the simplest case as follows: a transverse concentrated force, F(t), moves with a constant speed v along a 1D beam without foundation, for instance a simply supported beam, of a finite length L, as earlier shown in Fig. 1.
The following hypotheses are assumed throughout the present formulation: 1. The beam behavior is described by Euler-Bernoulli's theory. It has been deduced within small deformation theory, Hooke's law, Navier's hypothesis and de Saint-Venant's principle; 2. The beam displays a constant cross section and uniform mass per unit length; 3. Only gravitational effects associated with the moving object are considered; 4. The load moves, pointing downward, at a constant speed, e.g. from left to right; 5. The deflection and the bending moment are zero at both ends of the beam (simply supported case). 6. The beam is considered at rest until first force arrival.
The first solution to this problem was likely found by Krylov [99]. Under the above assumptions, such beam ML problem is described by the following 4th-order differential equation of motion: where the symbols used in Eq. (28) and throughout the present section display the following meaning: Moreover, a beam lying on a (visco)elastic smeared foundation could be considered (Fig. 2). If the balance of the forces acting on an infinitesimal chunk of the beam is carried out, as in Fig. 5, a differential equation with an additional term is found, and it could be used to further explore the ML problem: In Eq. (29), k l is introduced as the constant coefficient of a classical linear elastic Winkler foundation (see for instance the discussion, with historical perspective, in Froio and Rizzi [52]). The third term on the left hand sides of Eqs. (28) and (29) represents the contribution of structural damping, coming from either the beam itself and/or the external damping due to an underlying foundation. Now that the equation of motion of the mechanical system has been defined, it is possible to proceed through the WRM, as earlier introduced in Sect. 3.2.

Application of the WRM
Consider the system earlier shown in Fig. 2, consisting of a finite 1D beam lying on a Winkler (visco)elastic foundation under the action of a transverse concentrated force moving along with a constant velocity.
Suppose to abandon the idea of achieving an exact analytical solution and think about a discretized beam made of finite elements of length h, interconnected at the nodal points. First of all, it is essential to define generic beam element e. It could be seen as a finite piece of a beam, obeying to Euler-Bernoulli beam theory, lying between two nodes (i and j), as depicted in Fig. 6.
This finite element displays constant cross section area A, constant mass density per unit length and constant bending stiffness EI along its length. Each node is endowed of two degrees of freedom: the transverse translations ( q 1 and q 3 ) and the rotations about an axis normal to the plane of the sheet ( q 2 and q 4 ). The resulting vector of such four generalized coordinates is defined as follows: Providing four generalized coordinates, the simplest approximation of the field of transverse displacements w e (x, t) is given in cubic form: where e (x) is the vector of the shape functions. Hence, the displacement field along the beam element can be approximated by the linear combination of these shape functions with the generalized coordinates: where the interpolating shape functions finally display the following classical cubic expressions: Going back to the equation of motion (Eq. (29)), it is now possible to apply the WRM. First of all, that equation has to be transformed into its weak form, as in Eq. (13). Following the Galerkin method, the weight functions to be employed are given by Eq. (36). Then, the scalar product between the test function vector e (x) and Eq. (29) has to be computed on each single element. Additionally, the whole expression has to be integrated over domain Ω e . Using subscripts x and t for the spatial and temporal derivatives, respectively, and remembering that Equation (29) becomes Analyzing each term in Eq. (38), the equation of motion can be obtained in the following way. The first terms, integrated twice by parts, become where e is the contribution given by the external forces or by the connection of the element with other parts of the structure. If a single beam element (e) is considered, domain So, the stiffness matrix is given by From the second term, the mass matrix of the element is found: The third term in Eq. (38) introduces the damping of the system and its contribution will be included into global damping matrix . For each element: From the fourth term, the foundation stiffness matrix associated to the beam element (e) is found: in which The last term deals with the nodal forces generated by a moving concentrated force placed at x c = vt . Remembering the translation property of the Dirac delta function (sometimes referred to as either the shifting or the sampling property), the contribution of the moving force on a finite beam can be evaluated as: The analysis is complete and now the equations of motion can be displayed. Remembering the previous steps, and expressing for convenience structural damping in classical Rayleigh form, instead of in the general form given by Eq. (43), by using a linear combination of the mass and stiffness matrices (e) where a 0 and a 1 are constants, the next equation is valid for each element: Then, by assembling the contributions from all the finite elements by means of the direct stiffness method and imposing the boundary conditions at the two extreme nodes of each finite element, the global equations of motion are obtained in the following form: where , and are, respectively, the global mass, damping and stiffness matrices of the structure; tt , t and are the global unknown vectors of generalized accelerations, velocities and displacements; is the vector defined as where ê is the element containing the ML at instant t.

Beams Lying on a Nonlinear Elastic Foundation
Up to now, a linear foundation has been considered, where the elastic restoring force and the displacement are related by a direct proportionality law, as shown in Fig. 7a.
To account for a nonlinear elastic foundation, a cubic term may be added. The law of the foundation describing reaction f nl per unit length (see Fig. 7b), may be taken as in Castro Jorge et al. [19]: Recalling the WRM, the nonlinear reaction force exerted by the foundation can be obtained from The numerical integration of Eq. (54) adopted in the present analysis requires the definition of a tangent stiffness matrix at each time instant, which will be shown in the next section.

Numerical Integration
The analytical procedures for solving systems of Ordinary Differential Equations (ODEs), such as Eqs. (49) and (54), are very demanding for transient problems. It is seldomly possible to analytically solve a nonlinear differential equation. That is why, in a FEM implementation, a direct integration approach may be adopted to resolve a problem in the time domain. It consists of finding a sequence of solutions at a discrete number of time steps (time discretization); so, the equations of motion are not exactly satisfied at each time instant, but the time evolution of the field quantities is traced, in an approximate form. Several direct integration methods have been developed. Here, the HHT-method (Hilber-Hughes-Taylor method) is described, since it is going to be employed in the subsequent ML analysis.

Linear Systems
Consider the equations of motion in Eq. (49). Mathematically, they represent a system of linear second-order differential equations with constant coefficients. An initialvalue problem has to be solved, and its solution is (t) , where t ∈ [0, ] , > 0 . The initial conditions are given as Hilber et al. [80] and Hughes [83] provided the following equation to solve the dynamical problem in which the various symbols stand for n number of the time step Δt size of the time step, which is uniformly taken as ∕N t n , t n+ t n = nΔt and constants of the HTT-method Coefficients , , are numerical parameters that govern the stability, accuracy and numerical damping of the algorithm. In order to achieve an unconditionally stable and second-order accurate intregration scheme, these parameters have to be chosen through the following rules: Decreasing increases the amount of numerical dissipation. For = 0 the method reduces to the classical Newmark method, with = 1∕2 , stating for no numerical dissipation.
Thus, when numerical dissipation is required, the HTTmethod provides a numerical damping, without degrading the accuracy order. Assume that in Eqs. (56)-(58) vectors marked with subscript n are known from the previous step or given by the initial conditions (Eq. (55)). Otherwise, n+1 , n+1 and n+1 are the unknowns that have to be determined at the nth step. An implicit formulation of the HHT-method is reported in the following. Rearranging Newmark Eqs. (57), (58) for n+1 and n+1 gives: This can be rewritten in a more compact form as: where ̃ is the effective stiffness matrix and ̃ is the effective load vector within the time integration method. Algorithm 1 below shows how the HTT-direct time integration method may be implemented within a FEM code.

Nonlinear Systems
The HTT-method was adapted by Geradin [66], in order to study the transient response of the solution for a nonlinear system. It is possible to rewrite Eq. (54) in a more general form: Looking at Eq. (65), at each time step, either n or n+1 has to be known. A possible technique is to employ the Newton-Raphson method that, thanks to an iterative process, allows to find n+1 , despite for the presence of unknown nonlinear term nl ( n+1 ).
This iterative approach is based on the definition of a residual vector, ( n+1 ) , that is: The aim of the iterations is to find out the value of n+1 that makes vanishing the residual in Eq. (66). Now, replacing Eqs. (60), (61) into Eq. (66), one obtains that In order to identify the number of the present iteration, apex (i) is employed. So, at each time step, (i) n+1 should be known, possibly using a minimum number of iterations.
Before starting to iterate, the Jacobian matrix has to be defined as If Eq. (67)  Finally, an appropriate rule has to be chosen in order to stop the iterative process. For instance, the maximum number of iterations or an appropriate norm of the residual vector could be established. Moreover, an appropriate tolerance ( ) could be set as in the following relation The solution will converge to the exact one with a quadratic rate, as stated in Bathe [12]. Chang [21] proves that the solver is unconditionally stable if , and are chosen as suggested in Eq. (59).
At the end of the section, Algorithm 2 for nonlinear HTTtime integration, implemented inside the developed FEM code, is listed.
if then initialize q(0) and q ,t (0) initialize d 0 and v 0 a 0 using d 0 and v 0 : The first approach likely adopted on that by the computer industry was to develop faster and faster electronic components and devices, to attempt delivering the demanded processing speed. Nowadays, it may be getting harder and harder to achieve further performance improvements, because of present physical and technological limits. In any case, at one present stage of development of the hardware components, the computational time may significantly be reduced by introducing different implementation strategies such as the one put forward by parallelism. Parallelism consists of simultaneously performing several operations, instead of only sequentially processing the data. Thus, parallelism may be similar to a vision of an organized society in which several people of comparable skills are cooperating, in completing a complex job, in a fraction of the total time that would be taken by just a single individual. The concurrency of the tasks is not meant to speed up the execution of each individual task, but to increase the global byproduct of the whole system.
FEM applications frequently require a fine, often multidimensional, discretization, resulting into several dofs and associated unknowns. The efficient solution of such largescale FEM problems may be accomplished on parallel machines (see e.g. Yagawa et al. [160], Dupros et al. [41], Ullah et al. [154]).
This section aims at framing the exploitation of the parallelism concept into FEM calculations, within the present Moving Load context. Section 4.1 presents three different approaches to achieve efficient parallel FEM implementations. Different models of computation are explained in Sect. 4.2, as a guideline to understand which kind of parallel algorithm may result to be more suitable for an available computing architecture. Finally, Sect. 4.3 provides an algorithm developed within the present attempt, for the analysis of ML structural dynamics problems, but it could also be employed in a generic FEM analysis carried out on any parallel distributed computing system.

Different Strategies for FEM Implementation
From a mere computational viewpoint, a general FEM analysis shall be composed of the following main phases: 1. Input phase: problem characteristics, element and nodal data, boundary conditions and geometry information are provided to set up an appropriate model description of a physical problem; 2. Evaluation phase: element characteristics are evaluated from the information collected in phase 1; 3. Assembly phase: element contributions are assembled, to create the global matrices and vectors that constitute the solving discretized system of governing equations; 4. Solving phase: nodal unknowns of the problem at hand are obtained by solving the system built in phase 3; 5. Post-processing phase: calculated data are post-processed, in order to display the various outcomes sought for the analysis, toward interpretation and critical evaluation, and possible reiteration, by refinement or adaptation of the model (model updating) and identification of its underlying parameters. It could be seen that the input phase can be parallelized, as well as the evaluation of the element characteristics (e.g. stiffness matrix and load vector), since each element brings in its own information, independently from the other parts of the discretization. Moreover, the most computationally expensive phases are constituted by the assembly and solution of the resulting system of discretized equations; so, parallel solvers may drastically reduce the time required to evaluate the unknowns of the physical problem.
As described by Noor [119], a number of strategies, based on a "divide and conquer" approach, could be used to increase the degree of parallelism within a given FEM implementation. They consist of splitting a large and complex problem into a number of smaller and simpler subproblems, which may independently be solved. Towards that, three main parallelization strategies are presented in the following:

Models of Computation
Before implementing a parallel code, it is worthwhile to design algorithms toward achieving a good understanding of performance and efficiency. Such design phase should be focused on the available resources and facilities, which could be described through different models of computation, as suggested by Selim [140]. Any computer, either sequential or parallel, works by executing instructions on data. An algorithm, thanks to a stream of instructions, tells the machine what to do at each step. The instructions dictate to the Central Processing Unit (CPU) how to process the input of the algorithm, which could be seen as a stream of data. Depending on the number and arrangement of these streams, computers may be distinguished among four main classes: 1. Single Instruction stream, Single Data stream (SISD): a SISD computer consists of a single CPU receiving a single stream of instructions that operates on a single stream of data, as shown in Fig. 8. The control unit sends an instruction to the CPU, which executes it on a data obtained from the memory. An algorithm designed for a SISD computer is defined as sequential or serial.

Multiple Instruction stream, Single Data stream
(MISD): a MISD computer consists of a certain num-ber of CPUs, each with its own control unit. A common shared memory provides a unique datum on which the CPUs have to simultaneously work, each performing different operations, as displayed in Fig. 9.

Single Instruction stream, Multiple Data stream
(SIMD): in this class of calculators, as depicted in Fig. 10, a parallel computer consists of N identical CPUs, all controlled by a central control unit. At each step of the algorithm, all CPUs synchronously execute the same instruction on different data. The memory providing data to the processors could be organized in two different ways. The first option is that the CPUs shall share a common memory, which they could simultaneously access. Multiple-read access to the same memory address should in principle endow no problem, because each processor makes a local copy of the content. Instead, multiple writing requires a deterministic rule to achieve memory access, in order to avoid conflicts. The second solution is to divide the memory into blocks in which only a processor could be accessed. A data could be employed by an external process only using an explicit request of communication.

Multiple Instruction stream, Multiple Data stream
(MIMD): this class of computers turns out to be the most general and powerful, among the parallel architectures. As sketched in Fig. 11, this parallel system is made up of N CPUs ruled by N streams of instructions, sent by their dedicated control units, in order to elaborate N streams of data. As for SIMD, the communication between processors is performed through a shared memory or an interconnecting network. In the first case, processors are linked to a globally available memory, while the second alternative requires that each processor displays its own individual memory location. It is possible to refer to the latter as multicomputers or distributed systems because of their physical distance separating the processors. The main difference between SIMD and MIMD computers is the presence of N control units, instead of a shared one. This allows for MIMD computers to asynchronously execute different programs on different data. The MIMD model is employed to build clusters. Clusters are sets of machines (also called nodes) working together as a unified resource and connected over a high-bandwidth network. These computational architectures are based on the absolute and incremental scalability of the system. This means that clusters could be made up with a very large number of nodes and they could be expanded, if necessary. Furthermore, the ratio between price and performance becomes favorable, because great results may be achieved even if cheap commodity nodes (i.e. workstations) would be employed. Moreover, if each node were equipped with a multiprocessor, a great amount of computational power could be made available.
It is crucial to distinguish between the notion of processor and that of process, to clarify terms that will be used in the following. A process is a computational task and a collection of them builds up an algorithm. Many processes may simultaneously be executed on a number of available processors. A parallel algorithm starts its execution on an arbitrarily chosen processor, which is responsible to create other computational processes and to distribute them between processors. A processor is said to be free when it is waiting for a process. So, if all processors are busy, a process is queued and waits for a free processor.
In order to exploit the performance of a cluster, an efficient mean of communication between nodes has to be developed. Processes owned by different nodes could exchange data and perform synchronizing actions using messages; hence, the denomination message passing. In the most general form, message-passing paradigms are able to coordinate the execution of several programs distributed between nodes. Different Application Programming Interfaces (APIs) (e.g. Message Passing Interface (MPI), Parallel Virtual Machine (PVM)) support four basic operations. The interactions between nodes are accomplished by sending and receiving messages calling send and receive functions. Additionally, the send and receive operations must specify a target address, so, a unique identification has to be assigned to every process, through a function such as whoami. Typically, the number of processes participating to the ensemble is provided by numprocs, in order to carry out collective computations, which are operations involving all the available processors.
Among the protocols allowing for communication between nodes, MPI has become widely used for its: • Universality: the message-passing model fits well on separate processors connected by a fast or slow communication network. Thus, it matches the hardware of today parallel supercomputers, as well as the workstation networks and the dedicated PC clusters that are beginning to compete with them. Furthermore, MPI runs either on shared or distributed memory architectures; • Portability: no need to modify the source code, when an application has to be ported to a different platform, and a variety of implementations may be available (i.e. Open MPI [115], MPICH [70]); • Functionality: many built-in routines are available and collective calls are implemented, yet to simplify communication and synchronization between machines; • Scalability: the protocol allows to obtain a speed up, by simply increasing the number of processes that execute a program.
As stated by Grama et al. [69], the MPI library contains over 125 routines, but it is possible to write fully functional message-passing programs by using only six routines, as gathered in Table 1. These routines are able to initialize and terminate the use of a MPI library, to get information about the parallel computing environment and to send and receive messages. So, using a few routines, it is possible to achieve a great performance boost. The interested reader may deepen some other aspects on MPI architecture, for instance, in Snir and Gropp [141] and Gropp et al. [71].

A Parallel FEM Algorithm for Distributed Systems
In this section, a general parallel algorithm employed to implement the FEM, for the targeted ML dynamical problem context described in Sect. 3.3.2, by adopting an operator splitting approach on distributed systems (MIMD), is presented. The general rule that should be followed to build a wellperforming parallel code is to split a job into smaller tasks, and to distribute them between the available processes. This does not mean that a process shall run only a little part of the code, but that simultaneous processes concurrently work together to achieve the same aim.
The most important task of a FEM code is to efficiently assemble the global matrices and vectors, and then to solve the obtained system of equations using optimized parallel solvers. It is for this reason that dedicated libraries, such as, for instance, PaStiX [78] and PETSc [10], have been built.
During the assembly phase, some local matrices have to be computed from each finite element and their components have to be inserted at specific positions of the global matrices, according to the numbering of the degrees of freedom. By using a sequential code, element matrices are evaluated and collocated, one by one.
On the contrary, a parallel program should be able to split the mesh information between the various processes; so, the information related to the geometry of the problem could simultaneously be read and processed. For instance, if a problem is discretized by ten finite elements and one second would be spent to read each of them, the elapsed time would be ten seconds, for a serial code, while two seconds should instead be needed by a parallel program executed through five concurrent processes.
An essential preprocessing operation is to consider a FEM mesh and to create a partition of it, with the purpose of teaching the processes which elements they have to take care of. Since the finite elements are connected to each other at the nodes of the mesh, it results unavoidable to assign adjacent finite elements to different processes. Consequently, for handling such "process interfaces" during a parallel FEM solution procedure, communications between the processes are needed.
The problem of partitioning elements into roughly equal sets, under the condition of minimizing the communication, may be identified as the well-known graph-partitioning problem. For instance, libraries such as METIS [92] are able to elaborate a mesh file and decide how to split it, based on graph theory.
In the following, some basic concepts of graph theory are presented, in order to obtain a partitioned mesh.

Creation of Partition
As explained by Koijo [93], in mathematics a graph shall formally be defined as a pair of sets (V, E), where V is the set of vertices that represent the extremities of the edges, which are the elements belonging to E. An edge could also be seen as the line linking two vertices. In order to briefly illustrate what a graph is, Fig. 12 may be considered.
In such drawn graphs, V = {v 1 , v 2 , v 3 , v 4 , v 5 } and E = {e 1 , e 2 , e 3 , e 4 , e 5 , e 6 } are, respectively, the sets of vertices and edges. An edge can also be written as e 1 = (v 1 , v 3 ) . In general, two types of graph exist. The first one is the category of directed graphs. Their edges display an orientation so that (v 1 , v 3 ) is different from (v 3 , v 1 ) . In this case, edges display an arrow specifying their direction, as show in Fig. 12a. On the other hand, as depicted in Fig. 12b, the second class is represented by an undirected graph, for which the order of the vertex of the pair is not influent.
Weights could be associated to both kinds of graph. Weights are numerical values associated to each edge or node, with a different meaning depending on the studied case. For instance, if vertices represent cities, edges may stand for the existence of connecting roads. So, by using weights on edges, it is possible to specify the length of each road.
An undirected graph of a FEM mesh may be built, starting from its connectivity information and may be used for the creation of mesh partitions. This means that it has to be known which nodes belong to each element. For instance, two node numbers have to be provided to define a beam element, three for a triangular element and so forth. The graph may be represented as a cell structure in which the index of each cell states for the element number and each cell is filled with the numbers of the nodes owned by the element. An example may be visualized in Fig. 13. The next step is to set weights, in order to ensure that the computed graph partition satisfies some specified constraints. A proportional weight could be assigned, depending on the local communication volume and the amount of computations performed by each task. In fact, if two edges straddle partitions, data exchanges are required. As a matter Fig. 13 The resulting graph from a mesh of fact, physical quantities associated to nodes and degrees of freedom should be shared between partitions, when elements belonging to different partitions share, at least, a node. Thus, requiring the same data in more than one process, some kind of communication is demanded between different nodes, in order to guarantee that information is locally stored. Therefore, communication is a time-consuming part of the program, due to the low performance of the means that allow communication between processes. Nowadays, it is recommended to use less messages as possible, to achieve the desired speedup. This is the reason bringing the need of attaching importance to partitioning.
As explained by Karypis and Kumar [92], the graph partitioning problem, known also as k-way, could be formulated as follows:

k-way graph partitioning problem
Given a graph G = (V, E) with |V | = n the number of vertices, the kway problem consists of partitioning V into k subsets, V 1 , V 2 , . . . , V k such that V i ∩ V j = ∅, ∀i = j, |V i |= n/k and i V i = V , and the number of edges of E (minimization of the edge-cut).

Fig. 16
Example of a partitioned mesh: four different colors are employed to distinguish the partitions. The triangular mesh has been obtained through GMSH and the partitions have been generated by employing METIS For those graphs that display weights associated to the vertices and edges, the partition goal is to find k disjoint subsets such that the sum of the vertex-weights in each subset is the same, and the sum of the edge-weights whose incident vertices belong to different subsets is minimized. The final result of a k-way partition of V is commonly depicted as a vector P of length n, such that for every vertex v ∈ V, P[v] is an integer between 1 and k, indicating the partition at which vertex v belongs.
The k-way graph partitioning problem is classified as NP-complete, where NP stands for the Nondeterministic Polynomial time required to verify the correctness of a given solution. This means that no efficient algorithm has been found yet for this class of problems, but nobody has proven that NP-complete problems are unsolvable (see Garey and Johnson [63], to deepen the argument). However, many algorithms have been developed to find reasonably good graph partitions. Three different kinds of partitioning algorithms could be found in the literature (see Karypis and Kumar [92]): 1. Spectral methods: they rely on the computation of the eigenvalues of the Laplacian matrix related to a graph. Then, the eigenvectors associated to the sought eigenvalues may be used to compute good separators in graphs.
These methods, above all the multilevel spectral bisection methods, provide good partitions for a wide class of problems, but they turn out to be very computationally expensive, requiring a large amount of time. Pothen et al. [127] and Hendrickson and Leland [76] presented two different ways of interpreting the spectral method and they could also be adapted as introductory references on the topic. 2. Geometric methods: these approaches use the geometric information of the graph to find a good partition, and for this reason, they are applicable only if coordinates are available for the vertices of the graph. Due to the randomized nature of this class of algorithms, multiple trials are often required, and the obtained solution may display a lower quality, as compared to that of spectral partitions. One of the most prominent algorithms has been described by Miller et al. [113]. 3. Multilevel methods: the algorithms belonging to this class reduce the size of the graph by collapsing vertices and edges, partition a smaller graph, and then uncoarsen it to construct partitions for the original graph. A multilevel algorithm has been developed by Hendrickson and Leland [77].
In order to figure out the graph partitioning problem, the multilevel graph algorithm by Karypis and Kumar [92] is exposed. The knowledge of this method is deepened because it has been employed during the development of the FEM code presented in the next section. The basic concept of this algorithm is to coarsen a graph G 0 down to a few hundred of vertices, to partition the new smaller graph and then to project it back toward the original finer graph, as displayed in Fig. 14. During the coarsening phase, a sequence of smaller graphs G i is constructed. Each of them displays a decreasing number of vertices, such that A set of adjacent vertices of G i is combined to form a single vertex, called multinode of coarser graph G i+1 . The edge between two vertices is collapsed using the matching procedure. A matching of a graph is said to be a set of edges, no two of which share a vertex. Thus, coarser graph G i+1 is built by finding a matching M of G i and collapsing the vertices belonging to M into multinodes. The unmatched vertices are simply copied over to G i+1 . The size of a coarse graph is minimized when a maximal matching is found. It means that any edge in the graph that is not matching M has at least one of its endpoints matched. Some algorithms to discover maximal matchings are presented by Papadimitriou and Steiglitz [124]. An example of coarsening may be found in Fig. 15.
The second phase of a multilevel algorithm computes a partition P m of coarsest graph G m = (V m , E m ) by splitting V m into k parts, each containing 1/k of vertices belonging to original graph G 0 . This could be achieved by employing 2-way algorithms, such as the spectral bisection, the geometric bisection and combinatorial methods, which are able to split V m into two partitions. Instead, if k partitions are desired, the k-way problem could be solved by recursively applying the 2-way algorithms on each partition already created through the bisection approach.The final result is that the amount of time spent to split coarse graph G m turns out to be much smaller than the time spent for the original graph.
To sum up the partitioning procedure, Algorithm 3 is presented.
When the partitioning phase ends, it is known which elements of the mesh are owned by a process, thanks to vector P. A possible intuitive result can be seen in Fig. 16, in which four different colors discriminate the element ownership. Notice that the four partitions are not divided along the two diagonals of the external square, because that configuration would lead to a larger number of shared nodes among the partitions. Partition P i could be obtained from P i+1 , by assigning vertices v collapsed in a multinode u ∈ G i+1 to partition P i+1 [u] . So, P i [v] = P i+1 [u] for all v in multinode u. A smaller edgecut could be obtained by implementing a heuristic refinement algorithm.

Algorithm 3 Mesh (graph) partitioning
for each element do read its number read its connectivity update graph with the numbers of owned nodes end for coarsen graph split graph using a k-way algorithm uncoarsen graph

Reading the Input Files
Ideally, in a parallel FEM code, it may be supposed to dispose of three main input files: one containing the information about the element partitioning, one gathering the physical parameters of the simulation, and one including the nodal coordinates and the connectivities of the various elements.
First of all, a data structure has to be built, in order to contain information about the type, the mechanical properties and the number of nodes that characterize each element. The coordinates and the connectivity of each node have to be provided, to fulfill the description of each element. This information may be communicated to each process in two different ways. A unique copy of the same input files may be available for each process. Those files contain the information of the whole system; so, a process may spend time to read unnecessary data that have to be rejected. Otherwise, a specific file should be written, storing only the parts of the mesh information that are demanded by the process during the FEM simulation.
Dealing with the unknowns of the problem, it is fundamental to allocate the correct amount of local memory to store the nodal values that, in typical displacement-based formulations in solid mechanics, are nodal displacements (and velocities and accelerations). Additionally, each node has a specified number of degrees of freedom, depending on the element type. Each degree has to be labeled with a unique number (see the procedure described in Sect. 4.3.3), so that the unknowns of the problem could be defined and enumerated.
Finally, the boundary conditions have to be set, to complete this phase. Some degrees of freedom are marked by the boundary conditions to be tagged as constrained (e.g. with −1 , instead as with a positive number). Those degrees of freedom are kept out from the search of the nodal unknowns, because their value is imposed from the user, at the initial definition of the mathematical problem.
Algorithm 4 describes how input files could be read in parallel.

Allocating Memory
The exact amount of memory to store global matrices and vectors should be allocated, in order to obtain a good performance from a FEM code, without wasting computational resources. So, the number of the unknowns of the physical problem needs to be determined.
Seeking the global dimension of the problem, every degree of freedom must be enumerated. Some troubles may arise in their numbering, since some nodes shall be shared among partitions. It has to be remembered that every machine can only read information saved on its local memory; so, no data is shared without communication in distributed systems. Actually, a degree of freedom must display the same number in all the processes, even though it is shared between them. Differently from a serial code in which nodes are numbered one by one, another strategy must be sought, for a parallel algorithm but first the matrix and vector storage format has to be introduced. The matrices and vectors could be stored in parallel, by employing the operator splitting approach, so that each process only owns a part of these structures.
Suppose to have four processes, so global matrices and vectors may be split into four different parts, each of them locally stored by a process, as displayed in Fig. 17. Every row or column index matches with the number of a degree of freedom; so, if the dimension of a matrix were N = 80 , the first process would own the rows and degrees of freedom from 1 to 20, the second from 21 to 40, and so forth.
This means that the ownership of the degrees of freedom is set among processes, also for those at the interface between cells. Conventionally, if a node is shared between partitions, its degrees of freedom may be assigned to the processor with the smaller index; so, communication is required to acquire the global numeration on all the other processes. Algorithm 5 describes a possible implementation based on that suggested by Heister et al. [74].
These steps turn out fundamental, to find out the global dimension of the problem and the nonzero patterns. When the enumeration of the degrees of freedom is terminated, size N of the vector of unknowns represents the characteristic dimension of the global matrices. Then, the nonzero pattern is sought, because, usually, matrices related to a FEM analysis are sparse; so, most of their entries are zeros.
The first approach is to store sparse matrices using triplets, which are sets of three numbers representing the nonzero entries by the row and the column indices and a numerical value. So, if nonzeros are nnz, 3 values have to be stored, for each global matrix.
The Compressed Row Storage (CRS) format could be adopted, decreasing the amount of required memory. The CRS format saves in memory only 2 + N + 1 numbers. This storing format puts the subsequent nonzeros of the matrix rows in contiguous memory locations and save their indices using two vectors, as below explained and depicted in Fig. 18  dynamical problem have to be built, together with the nodal force vector. Some information is necessary to complete this task: the type of the elements (usually stated with a code or an acronym), the physical properties of the element (e.g. Young's modulus, mass density, area moment of inertia, stiffness and damping coefficients), the coordinates of the nodes and the numbers of every degree of freedom that the element owns. Now, it is requested to link every dof, from the local numeration to the global one, as Fig. 19 shows. Each dof owned by an element has to be mapped to those related to the referential element.
This step is carried out to correctly insert the values associated with the element matrices and vectors into their global counterparts. In this way, each row/column index of the element matrix corresponds to a row/column index in the global matrix; similarly, the location of a value in the local vector matches with one in the global vector. So, each value can be stored in a precise cell of memory and, possibly, if the cell is not empty, the quantity will be added up to the previous value.
Remembering how matrices and vectors are distributed among processes (see Fig. 17), the contributions to the dofs are locally computed by the process owing the dofs, but other contributions may also come from other processes by In order to allocate the right amount of memory, for a CRS matrix, it is enough to know how many nonzeros are present in each row. So, it is necessary to correctly resize the values vector, because deleting or adding single entries can take an amount of time proportional to the number of nonzeros. In fact, since no gaps are existing between two nonzero entries, for instance, it is required to shift up by one the position all the stored numbers of values, when a new nonzero is inserted in the first cell of the vector.
Finally, the pattern may be expressed by employing the indices of the nonzero entries. These indices could be found by performing a fake assembly of the matrices, without actually inserting values in them. When it is not possible to discover the exact number of the nonzeros, remember that extra space is cheaper than an inadequate amount of memory. So, it is better to allocate extra resources, which should be able to avoid the above-mentioned unnecessary copy and paste operations.

Assembling Global Matrices and Vectors
Once the necessary amount of memory is allocated, the global matrices and vectors could be initialized. For each element, the stiffness, damping and mass matrices of a 1 3 communications. This is the case when the dofs are referred to nodes shared by elements ascribed to different processes. An advice to optimize the implementation of the code may be to generate most of the entries on the right process, in order to avoid communication. Note that, however, it is fine to produce some entries on another "wrong" process, because this can lead to cleaner, simpler and less buggy codes, thus accepting a certain amount of communication between processes.
Once structural dynamical FEM matrices , and are assembled with the contributions from all the elements of the mesh, load vector has to be completed yet. First of all, external concentrated forces applied to the nodes should be considered, reading the proper input file in which their direction and magnitude should be written. The value of force magnitude could simply be added to the right position of , depending on the degree of freedom on which the force is imposed. Finally, is completed when the contributes of the distributed forces on corresponding element sets are taken into account.
The assembly procedure is summarized in Algorithm 6.

Time-Marching Solution
A dynamic solution should be obtained during a simulation, in order to describe the system evolution in time. It is the aim of a FEM code to solve the governing system of equations, as previously presented in Eq. (49): The response history of the physical system may be obtained by performing a direct time integration. The solution may be obtained using a step-by-step time integration, without changing the form of Eq. (75).
Above all, an appropriate time step has to be set, depending on the nature of the numerical simulation. Then, the temporal duration or the total number of the steps of the simulation has to be set. Suddenly, the implemented parallel code should display an overall view of the shared global matrices and vectors. The role of processes is to efficiently and concurrently solve a sequence of linear or nonlinear systems.
The interested reader may consider Davis [30], as an introduction to the direct methods involved in the solution of sparse linear system. The typical steps to solve a generic sparse linear system = may be resumed as follows: 1. Find a permutation in order to reduce the fill-in associated with factorization; 2. Factorize permuted matrix using Cholesky, LU, or QR algorithms; 3. Permute right-hand side ; 4. Solve for using forward/backsolve; 5. Permute solution .
Several packages perform a parallel factorization: some examples of Cholesky and LU algorithms for distributed systems are provided by Amestoy et al. [5] and Hénon et al. [79]. However, in case of system nonlinearities, an iterative procedure shall be employed to recover the approximate solution at every time step. The SNES library of PETSc provides powerful numerical routines to solve nonlinear problems, by using methods such as the nonlinear Richardson, GMRES, Gauss-Seidel and Krylov algorithms.
The solution of the problem is evaluated at instants separated by time increments Δt ; so, nodal values are computed at times Δt , 2Δt , ..., nΔt . At the nth time step Eq. (75) becomes: Discretization in time is accomplished by using Finite Differences approximations of time derivatives. The direct integration computes the unknowns at time step n + 1 from the equations of motion and the known conditions at one or more of the preceding time steps. Algorithms can be classified as being explicit or implicit.
An explicit algorithm uses expressions of the general form so, the solution is generated by the combination of Eq. (77) with Eq. (76).
Instead, an implicit procedure adopts expressions of the general form (76) n,tt + n,t + n = n (77) n+1 = f n , n,t , n,tt , n−1 , ...   In practical applications, the choice between explicit or implicit solver is related to the numerical stability and economy of the method.
Dealing with explicit methods, attention should be paid to the selection of the time step. Usually, these approaches reveal to be only conditionally stable; so, a stability analysis should be carried out before using them. On the other hand, implicit methods are often unconditionally stable, which means that calculations remain stable, regardless of the size of time step Δt.
However, the amount of calculations of each method should be challenged. Simple algebraic operations are necessary for an explicit method to pass from step n to n + 1 , because all data are known at the nth step but the number of time steps increases, since a small Δt may be employed to guarantee stability.
On the contrary, for an implicit method, the number of the time steps usually decreases, thanks to the possibility to employ larger time increments but, at each time step, linear or nonlinear systems have to be solved. So, the computational duty proves to be harder and harder, especially for those problems with a large number of degrees of freedom.

Printing the Solution
When the evolution in time of the solution has to be displayed, it is a good practice to write a proper file during the simulation and then visualize it through a useful external viewer. The solution file should contain the solution values, e.g. of displacements (velocities and accelerations) for each node of the mesh and at each time instant.
As described in Gropp et al. [72] and displayed in Fig. 20, different ways appear to write down the solution for distributed systems: 1. Merging method: it consists of creating a file for each process, containing only a part of the solution. Then, the designated process takes over all stored files and rejoins them in the right order in a single file; 2. Collecting method: before creating any file, one among the processes collects information from all the members of the distributed system and then writes the solution file; 3. Concurrent method: it is based on the creation of a specific format of the file shared between processes, which have the ability to simultaneously write, by specifying the precise location where data have to be stored.
Obviously, the third method turns out to be the most efficient one (but also the most difficult) to be implemented, because no sequential operations are involved and it minimizes the time for data transfer.
Algorithm 7 presents a possible solution to the printing problem using the concurrent method.

Computational Features of the Implemented FEM Code
Throughout the previous sections, the ML problem concerning an elastic Euler-Bernoulli beam on a (visco)elastic foundation, the theoretical bases of its FEM modelization and a strategy to implement it within a parallel computing environment have been presented. In this section, an objectoriented C++ parallel FEM code developed in that framework is assembled and its computational advantages and performances are illustrated. In Sect. 5.1, the fundamental features of the implemented FEM code are briefly sketched. Section 5.2 pertains to the comparison of performance between an earlier sequential MatLab code, implemented by Froio et al. [51], Froio [50] and the current object-oriented C++ parallel version of the algorithm employed in the present ML analyses. In Sect. 5.3, the numerical transient response of a simply supported beam posed on a (visco)elastic foundation and subjected to a ML, with either a constant or a harmonic-varying amplitude is considered. In order to validate the C++ code, the obtained numerical outcomes in terms of critical velocities are successfully compared with corresponding results from the literature [19,51,54,56]. Finally, some benchmark performance results are reported in Sect. 5.4, in order to illustrate the achieved speedup, efficiency and scalability of the proposed C++ parallel code implementation, in the considered ML context.

Brief Description of the Implemented Code
The implemented code is made up of a set of functions and data structures that allow the user to set a general FEM problem on both a serial and a distributed system, by using some specific routines. The parallel FEM code is based on the following computational tools: • C++ [86]: although the conceived algorithm could certainly be implemented using other coding languages, e.g. FORTRAN or Julia [14], C++ offers an object-oriented structure that accommodates the problem into a collection of data structures and operations involving such structures. C++ consists of a core language specification containing basic data types and constructs, and a collection of built-in functions gathered in libraries. In this context, the concept of functions has been introduced, to ensure the modularity of such a programming language, and the concept of class has been created, to encapsulate data and operations on them. • GMSH [67]: it is a free 3D finite element mesh generator with a built-in CAD engine and a post-processor. Its design goal is to provide a fast, light and user-friendly meshing tool, with parametric input and advanced visualization capabilities. Message Passing Interface (MPI) implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI offers advantages for system and software vendors, application developers and computer science researchers and is installed on many of the TOP500 [150] supercomputers. • PETSc [10]: it is a Portable, Extensible Toolkit for Scientific computation developed at the Argonne National Laboratory. PETSc provides data structures and routines for the scalable parallel solution of scientific applications modeled by PDEs. PETSc employs the MPI standard for all the message-passing communication and is intended for the use within large-scale application projects. Fur-  thermore, this library provides many of the mechanisms needed within a parallel application code, such as distributed matrix and vector assembly routines, which allow to overlap communication and computation.
The implemented code requires data from three entering information files, to arrive at the solution of a FEM problem. The first is a mesh file generated using GMSH and includes the nodal coordinates and the finite element connectivities. Furthermore, the description of the elements is fulfilled, with additional information about the acronym (i.e a code or a string), the type and the physical sets created to impose the material properties. The second file is an input file, which contains general data about the simulation, such as the number of degrees of freedom per each node, the properties of the materials and several parameters used to control the numerical routines employed to solve the ensuing FEM system. Furthermore, different kinds of boundary conditions are defined, like those on nodes or physical sets of elements. The third file is a partition file created by METIS, which reports the information to redirect each element to the correct process, thus performing calculations on a restricted amount of data. These three initialization files allow to initialize, on each process, two main data structures, built using a C++ native data type, together with those provided by Eigen. Similarly to the approach proposed by Bonnet et al. [17], a data structure is built up, to describe the elements (see Fig. 21) and the other one gathers the information about the nodes (see Fig. 22).   The data structure called element is filled with the details on the type and the material of which an element is made up. Furthermore, it contains the connectivity, which is employed to store the nodes number owned by the element.
Structured data node is devised to collect the node number, the spatial coordinates deduced by the mesh file, the global numeration of the degrees of freedom and, above all, the unknown nodal values (in the present case, displacements U, and velocities Ud and accelerations Udd).
Additionally, an integer (isShared) is declared to mark if a node is shared between two or more partitions.
When all input files have been read, each process initializes the above-mentioned data structures, per each owned element and owned node, as described by the partitioning file. Then, the simulation goes ahead, following the steps described in Sect. 4, by executing implemented Algorithms 3-7. The entire procedure is summarized in Algorithm 8.

Comparison Between Sequential MatLab and Parallel C++ codes
The present parallel C++ version of the FEM code for ML problems has been based on a previous MatLab [146] implementation [51]. The necessity of porting that project from MatLab to C++ has been derived from the quest of achieving a higher performance, in the demanding solution of the ML dynamical problem, and specifically in order to allow for the development of complete parametric analyses, involving really time-consuming ML simulations. The choice of the programming language has fallen on C++ due to various computational advantages, mainly: execution speed, efficiency, availability of open source implementations (i.e. GCC G++ [64], Microsoft Visual C++ [112]) and widespread exploitation on parallel systems (e.g. workstation clusters, High Performance Computing (HPC) consortia, etc.).
As explained in Fangohr [42], the main difference between C++ and MatLab is that the first one is a low-level compiled programming language, while the second one is a high-level interpreted scripting conceived to develop and to execute programs. This means that C++ needs to be compiled in order to generate an executable program, whereas MatLab reads the source code line by line and translates it on the fly into machine instructions. Interpreted languages usually present degraded execution speeds, due to the dynamic typing of variables, the possibility to have interactive sessions (i.e. the debug mode) and the lack of memory management. Some results that compare the efficiency of interpreted and compiled languages have been discussed, e.g., by Prechelt [128], showing the runtime execution and memory consumption to decrease, by employing C++. On the other hand, the total working time for assembling a C++ program is dilated by the quest of achieving a clean coding. Thus, as exposed by Ousterhout [122], different programming languages may be conceived, as different tools for different tasks. MatLab may be used to rapidly prototype the algorithm, thanks to its built-in toolboxes, routines and facilities for data visualization. However, the need of a High Performance Computing capability may be complied with C++, even though the programmer/user may have to carry out several skillful tasks: choose the correct compiler, include external libraries, allocate and deallocate memory, choose the correct data type and debug the code in the absence of a Graphical User Interface (GUI), such as in many clusters devoted to HPC.
Some benchmark results showing how interpreted languages such as MatLab and Python may turn out to display worse performances than compiled languages like C/C++ and FORTRAN for some standard algorithms are displayed in Table 2.
Similar results have been obtained in the present work, from the comparison of the MatLab and C++ implementations, used to solve linear and nonlinear ML problems. As it could be seen from Table 3, a serial C++ task generates a program that is much faster than a MatLab code.
Tests have been performed on a single core, in order to compare the efficiency of the serial algorithm by only changing the programming environment. The C++ implementation demonstrates to save a considerable amount of time, which is crucial during the numerical tests. Parametric analyses, such as those presented in next Sect. 5.3, consist of a sequence of ML simulations, for which some parameters, like velocity and magnitude frequency of the ML are varied. Thus, saving time on each single simulation allows to complete the entire work in a few hours, instead as of several days.

Validation of the Implemented FEM Code
The present general FEM implementation has been developed and employed to solve the dynamical problem of a simply supported beam lying on a (visco)elastic foundation subjected to a ML, as previously described in Sect. 3.3.
Two types of foundation constitutive law have been herein considered: the classical linear law reported in Eq. (1) and the nonlinear cubic law stated in Eq. (4). The validation of the code has been carried out by comparing the maximum and minimum beam displacements and the critical velocities obtained by Castro Jorge et al. [19] and Froio et al. [51,54,56], by varying either only the ML velocity or both the velocity and the magnitude frequency of the ML. The critical velocities are figured out by seeking those values of ML velocity corresponding to the maximum attained transverse displacement (see e.g. Dimitrovová and Varandas [39]).
The considered type of beam is a UIC60 Vignole rail, one of the most diffused steel profiles adopted in railway tracks, in order to allow for a comparison with the results from the above-mentioned literature. The rail profile is shown in Fig. 23a, while its mechanical properties are reported in the array in Fig. 23b.
A finite rail length (L) of 200 m has been selected, in order to reasonably approach the limit case of a beam with an infinite length, at least in the subcritical velocity regime. For the analyses, the finite beam has been spatially discretized using 200 finite elements, each 1 m long. The selfweight of the beam has not been taken into account, since it seems not to effect much the dynamic response of the mechanical system and the resulting critical velocities.
The assumed load magnitude is F = 83.4 kN, the load acting downward. That value corresponds to a locomotive of the Thalys high-speed train (EU), which displays a total axle mass of about 17,000 kg.
Regarding the aspects of numerical integration, the HHTmethod has been implemented, either for linear or nonlinear systems (see Sect. 3.4). Here, the parameter expressing the numerical dissipation rate is chosen equal to −0.1 , according to Eq. (59).

Constant-Magnitude Moving Load
The first numerical analysis is concerned with the simply supported beam response under a constant-magnitude ML. The beam lies on a (visco)elastic foundation characterized by two different values of Winkler elastic coefficient k l : 250 kN∕m 2 and 500 kN∕m 2 . Both undamped and damped behaviors are taken into account. Damping factor has been introduced to define constant a 0 in Rayleigh-damping Eq. (47) as Damping matrix in Eq. (47) was evaluated by assuming either = 0% or = 0.02% and a 1 = 0.
Computations have been performed for velocities of the ML varying from 50 m/s up to 300 m/s, with a step variation of 1 m/s. Regarding aspects of numerical integration, the time span taken throughout the process corresponds to the amount of time along which the ML is really acting along the supported beam, by moving along it. The adopted time step corresponds to the time taken by the load to travel a distance of 0.2 m, namely a fifth of a finite element length.
Then, maximum upward w + max and downward w − max displacements of the beam have been recorded, for each simulation performed at a certain value of ML velocity. These values are subsequently plotted as a function of ML velocity.
The results dealing with the linear foundation are displayed in Fig. 24 and the consistent comparisons between present and previous [19] outcomes are gathered in Tables 4  and 5.
Within these tables, the percentage relative errors upon the values of critical velocity and minimum/maximum displacements are reported. A very good agreement may be noticed, for both the damped and undamped case. In particular, the retrieved small discrepancies are solely due to round-off differences. The results obtained with the present parallel C++ code turn out to be practically the same as those earlier generated within MatLab. These results are 2 k l expected, since the present code is based on such a previous MatLab implementation, though reformulated within another programming environment and with the insertion of a parallel paradigm, and they demonstrate the reliability of the implementation. By inspecting Fig. 24, increasing the stiffness of the foundation, it appears clear that a shift in the position of the critical velocities toward higher values and a decrease of the deflection amplitude occur. Moreover, it could be seen that damping does not sensibly affect the values of the critical velocities (peak position along horizontal axis) but it only decreases the amplitude of the beam response (peak elevation along vertical axis) .
Concerning the nonlinear cubic foundation, Fig. 25 presents the response of the beam and depicts the displacement peak corresponding to the critical velocities, whose values are portrayed in Tables 6, 7.
During the numerical tests, the nonlinear foundation has been characterized by two parameters: linear stiffness coefficient k l , assumed to be constant and set at 250 kN∕m 2 ; nonlinear coefficient k nl considered equal to 2500 or 25,000 kN∕m 4 .
Then, it has been demonstrated that increasing the nonlinear coefficient, the critical velocities move to higher values, while the maximum displacements of the beam is reduced. Differently from the linear problem, the amount of damping influences either the values of beam maximum response or the critical velocity. In particular, the critical velocities decrease, introducing the effects of damping by using = 2%.

Harmonic Moving Load
Up to now, the ML acting on the beam-foundation system has been characterized by a constant amplitude. In this section, the concentrated ML travels at a constant velocity along the beam, displaying also a harmonic-varying magnitude in time t, which is described by the following equation: where F 0 is defined as the mean value of the ML, F and Ω the magnitude and the frequency of oscillation, respectively. During the numerical tests, ratio = F 0 ∕F has been introduced, to inspect the response of the system to its variation. Parameter is varied from 0 to 1, with steps of 0.25, in order to create different harmonic ML magnitude trends, as depicted in Fig. 26.
Additionally, the time step used during the numerical integration has been set as Δt = min 10 −3 s, h∕(5v) , where h is the referential size of a beam element in the mesh and v the velocity of the ML. So, adopted Δt ensures a sufficient accuracy of the results, for both the linear and nonlinear (80)  As explained by Chen and Huang [23], the obtained figures display the existence of two resonant peaks, corresponding to two different critical velocities, in case of a harmonic ML. They collapse into a unique peak, as the frequency of the harmonic amplitude of the ML approaches zero, leading to the results obtained by considering a constant-magnitude ML. Furthermore, the two critical velocities tend to separate, as the load frequency increases, giving rise to a third peak. In particular, the central peak corresponding to the second critical velocity remains stationary, even though the frequency of the ML increases. The growth of the Ω frequency induces the lowest and the highest critical

Achieved Computational Performance of the Parallel Implementation
The execution time of a sequential code is usually employed to evaluate its computational performance, changing the size of the input data to be processed. The elapsed time of a parallel algorithm relies not only on the input size of the problem but also on the number of processes involved and on their ability to efficiently communicate between themselves. Hence, the performance analysis of a parallel program has to be related to the parallel architecture (see Sect. 4.2), allowing for the execution of the algorithm, in order to evaluate the hardware platforms and to examine the benefits achieved by the scheduled parallelization. Some fundamental concepts have to be introduced, on the purpose to carry out such a kind of benchmark analysis. They are briefly presented in the following; more detailed information could be found in Hill [81], Kumar and Gupta [101] and Grama et al. [69]. Serial runtime T S of a program is the elapsed time between the beginning and the end of the execution on a sequential computer (SISD). Instead, parallel runtime T P is the amount of time that passes from the moment a parallel computation is launched, to the moment in which the last process finishes the execution of all the instructions given by its control unit.
One of the most intuitive parameters that may be employed to capture the benefit of solving a problem in a parallel form is the so-called speedup. Speedup is the ratio of the serial execution time of the fastest known serial algorithm to the runtime of a parallel program that involves a number p of processes. It is well known that the speedup does not keep on increasing with the number of (81) S = T S T P employed processes but it saturates at increasing p, as schematically shown in Fig. 28. This could be explained by different considerations. If s is the serial fraction, defined as the number of components of an algorithm that cannot be parallelized, the speedup is bounded by 1/s for all the values of p, as stated by Amdahl [4]. Furthermore, a wide variety of overheads associated with parallelization may be detected. During the execution of a parallel program, processes spend time not only at performing essential computations but also while carrying out interprocess communication or idling, due to required synchronization. The total sum of all the overhead is mathematically defined as In practice, each process is not able to devote the whole time to computations. So, efficiency E may constitute a measure of the fraction of time for which a process is usefully employed. It could be expressed as The analyses of the implemented parallel code have been led by using the previous parameter, in order to investigate the scalability achieved by the computational algorithm. The scalability of a parallel program on a parallel architecture may be defined as the ability to efficiently exploit the resources that are made available by the increasing number of processes. Kumar and Gupta [101] suggested that scalability may be used to select the best algorithm-architecture combination for a given problem. Moreover, it may be employed to predict the performance for a given set of processes or to determine the optimal number of them, to be used to achieve a certain value of speedup.
The performance analyses of the parallel algorithm involved in solving ML problems have been conducted by creating a referential challenge. The simply supported beam lying on a (visco)elastic foundation has been discretized by 20,000 finite elements, for a total amount of 59,999 degrees of freedom and the numerical integration has been performed on 1000 time steps.

3
The mesh has been refined, in order to challenge the capabilities of the parallel code. In fact, parallelism may completely be exploited when the physical problem presents a great number of unknowns. On the other hand, if the global dimensions of the matrices and vectors are relatively small, the overheads linked to the communication and the synchronization of the processes may kill the resulting performance, nullifying the benefits coming from the parallelization of the code. Further details could be found in the PETSc documentation or in Knepley [97].
The numerical experiments have been carried out on GALILEO, one of the three main clusters owned by Cineca, which is a non-profit consortium, made up of seventy Italian universities, four national research centers, and the Ministry of Education, University and Research (MIUR). GALILEO is composed of 516 nodes, each made up of two 8-cores Intel Haswell 2.40 GHz and 128 GB of RAM running the 64-bit CentOS 7.
Thus, a series of tests has been carried out, by varying the number of processes. As predicted by Amdahl law, the presented algorithm displays a sublinear speedup, which is depicted in Fig. 29a.
Regarding the achieved efficiency of the parallel algorithm, Fig. 29b reports the typical phenomenon that occurs in all parallel systems. Fixing the problem size, the efficiency of the parallel code goes down, due to the increasing overheads. Regarding the current implementation for solving linear ML problems, an important source of overheads comes from the necessity of communication. In fact, the global matrices and vectors are split among an increasing number of processes; so, during the time integration, a growing amount of messages has to be exchanged, in order to solve the series of linear systems.
For the given size of the problem, from Fig. 29 it may be seen that a number of processes greater than about 60 does not bring any further substantial improvements of performance. Furthermore, the maximum achievable speedup tends to be about 25 and, less than 20 processes should be employed to maintain a high level of efficiency.
In many parallel systems, it is possible to observe that an algorithm maintains efficiency E, fixed at a certain value, by simultaneously increasing the number of processes and the size of the problem. These combinations of algorithm and parallel architectures are defined as scalable parallel systems. The achieved scalability of the implemented code is reported in Table 8.
The size of the problem has been increased using a refined mesh; so, the number of degrees of freedom grows from 29,999 to 89,999 passing from a mesh with 10,000 elements to that with 30,000 elements. It could be seen that, increasing the size of the problem and the number of processes, the parallel algorithm maintains stable the resulting efficiency. It reflects the ability of the parallel algorithm-architecture combination to efficiently exploit increasing processing resources.
The same speedup and efficiency analyses have also been carried out for the parallel algorithm used to solve ML problems with a nonlinear cubic foundation. The computational resources required to obtain the solution of the nonlinear system increase, comparing to those for the linear case. In fact, the nonlinear HHT-method (see Sect. 3.4) demands to find an approximate solution per each time step. This is very computationally expensive, due to the internal running of an iterative Newton-Raphson algorithm. So, numerical tests have been performed on a mesh with 20,000 elements, as in the previous case but the number of time steps has been significantly decreased to 5, in order not to waste the available computational resources, without losing accuracy.
GALILEO ended its production phase starting November 2017; so, the following results have been obtained on MAR-CONI A1, another cluster owned by Cineca. MARCONI A1 is composed of 1512 nodes, each made up of two 18-cores Intel Xeon E5-2697 v4 (Broadwell) 2.30 GHz and 128 GB of RAM running the 64-bit CentOS 7.
The speedup and the efficiency of the nonlinear parallel algorithm are displayed in Fig. 30. It could be seen that the reachable speedup for the nonlinear code is greater than that for the linear one, whereby S tends to about 35. Furthermore, even for this algorithm-architecture combination, the maximum number of processes that reduces the walltime approaches 60, and a good level of efficiency may be obtained up to 50 processes. Finally, Table 9 demonstrates the achieved scalability of the parallel code, presenting the values of efficiency, for different meshes, with an increasing amount of elements, employing a growing number of processes.
In November 1995, the European Rail Research Institute (ERRI) decided to founding a committee of experts (ERRI D-124), in charge of studying problems connected to high-speed trains. Final report drafted by ERRI D-124 [29] explained that the excessive vibration of a bridge deck may cause ballast destabilization, in some bridges along the highspeed railway line connecting Paris to Lyon. That destabilization occurred due to a resonance phenomenon caused by the correspondence between the loading frequency of the train and the natural frequency of the bridge. After further investigations on the bridge response, the Comité Européen de Normalization (CEN) has established the maximum admissible bridge deck acceleration to 3.5 m/s 2 , for a ballasted track (see CEN [20]).
Museros [117] upheld the importance of focusing on some specific aspects during the creation of ML models. He pointed out the significance of studying the distribution of the load through the sleepers and the ballast layer, using a train-bridge interaction model. Furthermore, various issues should be taken into account, such as the real value of structural damping during the passage of a train, the boundary restrictions exerted by the rail in the transition over the abutments, the vibration of the ballast layer and the effects of the excitation induced by track irregularities and wheel flats.
Ramondenc [129] argued about the importance of controlling the dynamic behaviour of railway bridges. In fact, without that, bridges and viaducts may experience rupture phenomena by the crossing trains. Some physical models have been developed, in order to investigate how the characteristics of the bridge shall affect the transient response.
Possibly, the degradation of the mechanical properties of the structure should be considered, in order to schedule proper maintenance programs. The universal train model, also called High Speed Load Model (HSLM), was incorporated into the Eurocodes, with the aim of elaborating new regulations based on the resistance and structural durability and the serviceability of bridges.
Goicolea-Ruigómez and Castillo [68] reviewed and commented on the analysis methods, from a practical point of view. They explained how some parameters could be introduced during the investigation of the bridge response. Impact factor Φ may be exploited, to obtain the dynamic response, by amplifying standard static results. The values of Φ could be found in engineering codes published by CEN, Union International des Chemis de Fer (UIC) and Ferrovie dello Stato [48]. However, the impact factor does not include resonance effects possibly caused by the passing of a train. So, some other models have been developed. The dynamic train signature expresses the response of simply supported bridges on a combination of harmonic series involving the dynamic signature, namely the distribution of the train axles, and the dynamic influence line, which is a parameter depending on the span, the first natural frequency and the damping. Moreover, the consideration of a dynamic vehicle-structure interaction may lead to more realistic predictions of the bridge behavior.
Further insights on the dynamic analyses and practical design of bridges subjected to moving vehicles could be found in the collection of articles edited by Delgado et al. [31]. The interested reader should also be dwelt, e.g., on the works by Bogaert [16], Gabaldón Castillo et al. [61] and Hoorpah [82].
Together with the methods mentioned in this section and in Sect. 2, ML FEM models, even describing the train-bridge interaction and the resonance phenomena, provide a general methodology to obtain the dynamic response of a structure, like that of a bridge. In this section, the FEM algorithm and its implementation developed in Sects. 3 and 5 has further been employed to evaluate the dynamic response of a 1D multi-span model of the upper beam of the Paderno d'Adda Bridge (1889), a historical riveted iron arch bridge built nearby Milano, Italy, for railway and road traffic uses [44-47, 118, 142]. Here, the main scopes are those of providing an estimation of its dynamic response, ideally caused by a ML as traveling along the viaduct even at a theoretically high constant velocity. Section 6.1 briefly introduces the technique exploited by engineer Jules Röthlisberger to design the Paderno d'Adda Bridge at the Società Nazionale delle Officine di Savigliano (SNOS), Cuneo. Furthermore, some technical aspects are summarized, to provide a brief description of the structure of the iconic bridge. In Sect. 6.2, the upper box-beam of the Paderno d'Adda Bridge is then modeled as a 1D multispan continuous beam on nine firm supports. Additionally, the parameters employed during the numerical tests are presented. Finally, Sect. 6.3 displays the effects caused by varying the ML velocity, the bending stiffness and the structural damping of the mechanical system. Moreover, several pictures of the time history of the beam response, once subjected to a concentrated ML are depicted, through a sequence of frames displaying its displacements, velocities and accelerations.

Brief Description of the Paderno d'Adda Bridge (1889)
The Paderno d'Adda Bridge is a railway metallic viaduct that crosses the Adda river between Paderno d'Adda and Calusco d'Adda, to a height of approximately 85 m from water, allowing to connect the two provinces of Lecco and Bergamo, near Milano, in the Lombardia region, northern Italy. The riveted iron bridge was rapidly built between 1887 and 1889, to comply with the needs of the growing industrial activities within the region. The viaduct was designed by Swiss engineer Jules Röthlisberger (1851-1911), head of the Technical Office of the Società Nazionale delle Officine di Savigliano (SNOS), Cuneo, the company that built the bridge. As reported in Ferrari and Rizzi [47], the bridge was designed through a graphical-analytical method of structural analysis, by applying the so-called "Theory of the Ellipse of Elasticity". The latter analyzes the flexural elastic response of a structure, by relying on an intrinsic discretization of a continuous beam into a series of elements, each with a proper "elastic weight", directly proportional to its length and inversely proportional to its bending stiffness. So, the problem of determining the flexural elastic response of a beam is reduced to a problem of pure geometry of masses, of a more convenient solution and direct interpretation in terms of the design of the structure. The bridge, depicted in Fig. 31, displays a 266 m long upper flyover made by a continuous box girder with nine equally distributed supports, at a 33.25 m relative distance. Four of the supports are sustained by a marvellous doubly built-in parabolic arch, a beautiful characteristic feature of the bridge; two of them bear directly on the same arch masonry abutments; a seventh, on the Calusco bank, rests on a smaller masonry foundation placed between the arch shoulder and the higher bridge supports; the last two, in masonry work as well, are the two direct beam bearings at its two ends, on top of the two river banks. The four piers resting on the arch are symmetrically placed, in between  the bridge may be accessed in Ferrari et al. [46], and therein quoted references.

1D FEM Modelization of the Bridge Beam
The FEM implementation presented in the previous sections has been employed to analyze the dynamic response of a simplified 1D model of the upper continuous beam of the Paderno d'Adda Bridge. Furthermore, the ideal critical velocity of the beam of the bridge subjected to a concentrated transverse ML has been investigated. The bridge has been modeled as a unique uniform continuous beam resting on nine equally spaced firm supports, as depicted in Fig. 32. The total length of the beam is 266 m.
Although the real 3D box-form beam's structure is reduced to a 1D model, the equivalent mechanical characteristics of the 1D beam have been defined, in order to provide a realistic approximation of the global beam behavior. Thus, due to the complexity of the upper flyover, global bending stiffness EI of the continuous beam may be derived in different ways, for instance according to the following two different approaches: • FEM approach: it consists in defining equivalent bending stiffness EI and mass per unit length of a 1D beam, by extracting them from a complete 3D FEM model of the upper box-beam of the bridge [46]. A possible strategy may be that of considering a unique simply supported span of length l s . Two moments of a unitary magnitude may be applied at the extremes of the equivalent beam, inducing a relative rotation Δ of the two cross-sections at the ends of the beam. From well-known elastic relation (84) Δ = M l s EI reading the relative cross-section rotation through a 3D FEM simulation, the value of equivalent EI may be extrapolated. • Geometric approach: supposing Young's modulus to be constant and known over the beam, it relies on the definition of geometrical property I of the beam's crosssection, i.e. the area moment of inertia. This is considered, to find out the area moment of inertia of an equivalent beam representing the flyover of the Paderno d'Adda Bridge. Owing to a CAD representation of the flyover cross-section, the value of I may be evaluated. For simplicity, the cross-section of the bridge is concentrated into two main areas, to that purpose, representing the running elements within the upper and lower decks of the bridge, neglecting the contributions of the structural elements connecting the two decks. Thus, the area moment of inertia may be calculated by neglecting the contributions of vertical struts and St. Andrew's crosses, and by considering only the most recurring beam longitudinal elements and relevant cross-sections, involved in the assembly of the upper flyover.
The mechanical properties of the beam corresponding to the bridge cross-section in Fig. 33a are gathered in the table sketched in Fig. 33b. The value of equivalent bending stiffness EI has been compared with that provided by the Final Technical Report edited by the SNOS [142], and a reasonable agreement has been recovered among the two results. At design stage, the SNOS has set the minimum value of the bending stiffness equal to EI SNOS = 0.9846 × 10 10 kg m 2 . Thus, considering a possible effect of a considered appropriate safety factor, the values obtained as from above, from the FEM approach ( EI FEM = 1.617 × 10 10 kg m 2 ) and from the geometric approach ( EI CAD = 1.625 × 10 10 kg m 2 ) may be considered as reasonably correct (being also much reciprocally homologous, at about EI = 1.62 × 10 10 kg m 2 ).

3
The multi-span 1D FEM model has been employed for the examination of the dynamic response of the structure subjected to a ML. A locomotive of a weight of 83 t, traveling at a constant velocity is considered for the numerical tests. It corresponds to a transverse (vertical) ML of a magnitude value F = 8.14 × 10 5 N , acting downward. That value corresponds to one of the six special locomotives adopted to test the bridge in the first original try-outs in 1889 (see SNOS [142] and Nascè et al. [118]).
Similarly to the numerical analyses presented in the previous section, the mesh size has been conceived as by imposing an element size of nearly 1 m. However, since each span is 33.25 m long, the finite element size has been adjusted to about h = 0.9779 m , in order to obtain a uniform mesh, with an even number of elements (34) along each intermediate span. Thus, the mesh of the continuous 1D beam turns out to be composed of 34 × 8 = 272 beam finite elements.
Regarding the aspects of numerical integration, the parameter expressing the numerical dissipation rate in the time integration is still taken equal to − 0.1 , according to Eq. (59). The time step involved in the numerical procedure corresponds to Δt = h∕ (5 v) , where v is the ML velocity. The numerical integration is performed until the ML, starting from one support at one extreme of the whole beam, reaches the last support at the other end of the continuous beam.
Since here the effect of a distributed underlying elastic support along the beam is not present, the damping matrix cannot be evaluated according to Eq. (79) of Sect. 5. Thus, the effects induced by structural damping have been considered by employing Eq. (47), still for mass-proportional Rayleigh damping, this time imposing meaning that damping shall approximately affect the first mode of vibration of the multispan beam only. A more detailed description of structural damping could also be employed, although the adopted choice is believed to be sufficient, especially in relation to the considered simplified 1D mechanical model of the bridge beam.

Numerical Results
In this section, the effects of the ML velocity, of the beam bending stiffness and of the structural damping, on the maximum downward and upward displacements, velocities and accelerations of the continuous beam are presented. Given a value of the beam bending stiffness (with under-or overestimation to such a reference value) and of the structural damping ratio, for each simulation performed at a certain value of the ML velocity, the maximum upward (positive) and downward (negative) displacements, velocities and (85) accelerations of the beam are scored. Then, such outcomes of the FEM simulations are plotted as a function of the ML velocity. From such curves, the critical velocities of the beam-foundation system, for a finite beam, may be detected, as the velocity of the ML at which a maximum response is attained.
Moreover, considering only the first span, as just a single-span simply supported beam, a first reference estimation of the critical velocity may be obtained by employing the analytical equation provided by Dimitrovová and Rodrigues [38]: The relationships between beam maximum upward and downward displacements, velocities, accelerations and ML velocities, are shown in the sequence of plots in Fig. 34, respectively, for three values of bending stiffness (reference value plus/minus 20% of it, see table in Fig. 33) and for two values of structural damping, namely = 0% (undamped) and = 1% (lightly damped; light damping shall anyway be expected for such a wrought iron slender structure, as first assessed by experimental campaigns on the bridge, see Gentile and Saisi [65]).
The effect of increasing/decreasing the equivalent bending stiffness of the multi-span beam, as expected, is that of increasing/decreasing the value of the critical velocity. An opposite effect is displayed by the response amplitudes, which reduce by increasing the bending stiffness.
The effect of viscous damping results more complex. Regarding the maximum displacements (Fig. 34a, b), the effect is that of reducing the maximum response only, while keeping unchanged the value of the critical velocities, as also reported by Castro Jorge et al. [19] and Froio et al. [51]. It may also be noticed that the reduction of the response induced by viscous damping is effective only in the neighborhood of the critical velocity. This is due to the specific choice of the viscous damping matrix in Eq. (1). A richer structure of the damping matrix may lead to a structural damping acting on a broader range of the ML velocity. As an interesting feature of the present analysis, it may be remarked that by even adding a small amount of structural damping, as considered here, differently from the displacement response, the positions of the peaks of velocities and accelerations also vary. Figures 35, 36 aim at representing the beam deflection when the ML is located at the middle point of each span, at no structural damping ( = 0 ). The plots have been made by collecting the required data from the output file of the simulation, which gathers displacements, velocities and accelerations, for each time step and for each node of the FEM model. Detecting the correct time instant, the coordinates of (86) v cr = l s √ l s 4 EI ≃ 463 m/s ≃ 1667 km/h each node of the discretized and deformed beam have been stored in a different file, in order to plot the time shots. This procedure has been carried out for two different values of locomotive velocity: v = 12.5 m/s = 45 km/h and v = 431 m/s = 1552 km/h . The first value corresponds to a realistic much subcritical velocity, which was employed in 1889 in the try-out tests on the bridge, although the present speed of the train crossing the flyover shall now be limited to about 10÷ 15 km/h. The second value coincides with the critical velocity detected thanks to the numerical tests depicted in Fig. 34a (see also Eq. (86)).
Observing the subcritical simulation, it could be noticed how the deflection of the beam appears to move together with the ML. The shape of the deformed beam shifts without changes, passing from a span to the next one. Furthermore, the maximum (upward) and minimum (downward) displacements remain fixed at constant values. The two main upward peaks are symmetrically situated within the previous and the next span regarding the ML position. Instead, the maximum downward displacement location coincides with the ML position.
Dealing with the critical velocity, the regularity of the beam deformation recorded in the previous case disappears, due to the arising resonance effects. The maximum upward and downward displacements are not always located at the same position, with respect to the ML location. The maximum upward displacement stands behind the ML in the shots from (b) to (e), while a local maximum grows ahead. Thus, in frame (f), that local maximum becomes the global one for the remaining duration of the ML passage, due to a constructive interference between the rightward wave and the leftward (reflected) wave. Regarding the maximum downward displacement, only shots (a), (b), (c), (d) and (f) display a behaviour similar to that in the subcritical case. In fact, frames (e), (g) and (h) depict the maximum downward value in a span preceding the ML position.
Another way of exploiting the output file is displayed in Fig. 37. The obtained plots have been drawn by gathering the displacement, the velocity and the acceleration values at a single node of the multi-span 1D FEM model of the beam of the bridge. In particular, it has been decided to monitor the kinematic response of the node located at the middle of the fifth span, which corresponds to the keystone of the underlying arch supporting the bridge. The results show the dynamic response of the beam in time, as if a sensor would be placed at that precise point.
Finally, Table 10 presents the maximum upward and downward displacements, velocities and accelerations, varying the velocity of the ML in a range of realistic values (that may theoretically explore the possibility of an increase of the present train speed). These data, coupled with the previous plots, may be employed, in order to analyze useful data at design-stage condition for the historical bridge, to guarantee security standards and passenger comfort as imposed by international regulations, or to schedule Structural Health Monitoring or maintenance operations.
Obviously, the ranges of running speed apt to contain the amount of acceleration response (last column of Table 10) should not go beyond what was originally scheduled at design stage (try-outs in 1889 carried out at a maximum crossing speed of about 45 km/h). This testifies the hint of not conceiving possible extra uses of the historical infrastructure that may go beyond the limited range of velocity around 10 ÷ 15 km/h currently scheduled on the bridge, and this even disregarding possible ageing and damage effects that the monumental viaduct may have experienced in the 130 years of duty, as a combined railway and road bridge.

Conclusions
The present work has aimed at presenting a review framework in terms of computational Moving Load problem analysis and of parallelism implementation strategy in that context, by then developing a novel object-oriented FEM parallel implementation in C++, with the purposes of performing efficient numerical analyses for the dynamic response of beams under a high-velocity ML. This shall be of interested to people that may approach the field of Moving Load problem analysis, with the idea of possibly setting efficient numerical computations based on distributed computing, allowing to use HPC resources that may be accessed, e.g. on computer networks or consortia.
By employing the developed computer code, as briefly described in Sect. 4, it has been possible to draw down a complete C++ parallel implementation based on an earlier sequential FEM coding developed in Froio et al. [51], Froio [50] within a MatLab environment. The present implementation shall allow for a further exploitation and use in the context of HPC resources, toward achieving complete parametric studies, and in view of theoretical investigations on the physics of the underlying dynamic phenomena and of practical implications on the engineering response of structures subject to important ML regimes (for instance, for railway infrastructures and tracks).
The considered finite 1D beam ML problem has been regarded within two main configurations. Firstly, a simply supported beam lying on a distributed (visco)elastic foundation has been examined, in order to investigate the beamfoundation response, as resembling that of railway tracks. Secondly, a multi-span bridge beam has been modeled as a continuous beam on nine equally spaced firm supports. Both structural configurations have been subjected to a transverse concentrated ML traveling with a constant velocity along the beam. The magnitude of the ML has been considered as either constant (in both cases) or as harmonic-varying in time (in the first case).
The literature examination first presented in Sect. 2 has revealed the most significant dynamic characteristics of the above-mentioned structural systems subjected to MLs. Thus, by outlining the distinctive traits of this topic, many different approaches have been presented, attempting to provide an overall picture concerning the investigation of ML problems, specifically for ML beam problems. Both analytical and numerical approaches have been overviewed, in order to gather the essential terminology and all the basic modeling principles and physical features of such mechanical systems.
In Sect. 3, the specific application of the FEM on ML structural problems has been reviewed, to obtain the structural matrices and vectors of a finite beam element, possibly introducing the contribution of modeling an underlying (visco)elastic foundation. In particular, two different types of elastic foundation have been considered: a linear and a nonlinear cubic one, the latter likely leading to a more accurate description of the real mechanical behaviour for railway lines. As a result, it has been possible to investigate the dynamic response of a railway track and its dependence upon the characteristic mechanical parameters of the system, and to validate the reliability of the implemented FEM program, by comparing the present results with reference results from the recent literature. In particular, an extensive campaign of numerical simulations has been carried out, to investigate the effects of variations of the velocity of the ML and of its magnitude frequency of oscillation. The FEM distributed computing formulation has been successfully implemented within a Unix-like environment. First of all, the code exploits a linear algebra library, Eigen, providing the needed data structures for assembling the governing matrices and vectors, and for the numerical solvers. In addition, the program interfaces with a GMSH tool, for the mesh generation and post-processing activities. At a later stage, the object-oriented C++ code has been overhauled, adopting a parallel implementation paradigm. Section 4 has set the basics needed to develop a parallel algorithm implementation, by presenting three different kinds of a "divide and conquer" approach, toward taking advantage of distributed computing systems. After a summary review on the peculiarities of the existing computing architectures, the developed algorithm based on the "operator splitting" strategy has been described. Furthermore, the implemented code has indeed revealed to be able to exploit the computational resources provided by HPC distributed systems, such as GALILEO and MARCONI, two of the main clusters owned and available from Cineca, in Italy.
The parallel algorithm consists of five key stages: the creation of the mesh partitions, the allocation of the proper amount of memory to store the data sets, the assembly of the global distributed matrices and vectors, the solution of the governing algebraic system of linear or nonlinear equations and the post-processing phase. Coupled with the already mentioned packages, Eigen and GMSH, the developed parallel FEM program exploits the characteristics of three external packages. The pre-processing phase has been enhanced with METIS, a set of programs for partitioning FEM meshes. Open MPI has also been employed, to allow for the communication between the processes made available by distributed systems. Finally, PETSc has provided a series of tools to generate distributed matrices and vectors and to solve large sparse systems.
Thus, as presented in Sect. 5, the object-oriented C++ parallel code has demonstrated far better performances than for a previous sequential MatLab implementation, in terms of execution time, and achieved computational efficiency. Moreover, the code has proven to be well suited to run on distributed computing systems, by manifesting a satisfactory behaviour with regard to scalability. This should encourage further developments, in order to completely exploit the code potentialities by further accomplishing, through extensions and generalizations, multi-dimensional 2D or 3D FEM analyses involving several degrees of freedom.
Section 5 has presented complete parametric validation analyses on a beam-foundation system under ML. The foundation has been represented as a homogeneous Winkler elastic support, one of the most common existing mechanical models, while the behavior of the beam has been assumed according to classical Euler-Bernoulli theory for slender beams of a uniform cross-section. The dynamic transient response of a (long) finite, simply supported beam, lying on a linear and a nonlinear cubic elastic foundation, subjected to a concentrated ML traveling at a high constant velocity, with either a constant or harmonic-varying magnitude in time, has been considered. Then, its dynamic response has been investigated, by employing the above-mentioned homemade FEM implementation. The latter includes some routines dedicated to the numerical time integration, namely those relative to the HHT-algorithm. The implemented FEM code has managed to numerically detect those characteristic velocities defined to be critical ("critical velocities"), leading to the largest displacements of the beam-foundation system. The reliability of the analyses and the quality of the recorded dynamic responses has been proven, by performing a consistent validation comparison, showing an adequate degree of agreement with results earlier reported in the very recent dedicated literature [19,51,54,56].
The simulations performed within this stage of the work have offered useful insights on some physical characteristics of ML problems. First of all, the finite beam lying on an elastic foundation and subjected to a constant-magnitude ML has been taken into account. The effects of the linear or the nonlinear parameters defining the foundation constitutive law have been examined. For all the analyzed foundation models, a sensible amplification of the beam response amplitude has been systematically detected, when the ML moves at the critical velocity. It has been identified that the value of the foundation stiffness affects the critical velocity. Furthermore, the damping provided by the beam-foundation system only induces a reduction of the recorded maximum and minimum displacements, basically without displacing the peak of the critical velocity.
Once the critical velocities have been proven to be fairly accurate in estimation for a constant-amplitude ML, a harmonic law has been considered, to describe a time-varying magnitude of the ML. Exciting the beam with a harmonic ML, with different mean values and frequencies, two further critical velocities have been detected from the dynamic response of the beam-foundation system. Thus, three peaks have been recorded for a harmonic ML with a non-zero mean magnitude, depending on both the frequency and the mean magnitude of the ML. The values of critical velocities tend to separate, as the loading frequency increases. Such findings shall display relevant implications in the modeling of the dynamic analysis of railway tracks, since the load harmonic amplitude variation may lead to a dangerous decrease of the lower critical velocity.
In Sect. 5, the preceding sequential MatLab coding and the present parallel object-oriented C++ implementation have been widely compared. It has been highlighted how the current FEM code has drastically reduced the execution time of each simulation. Thus, although the current implementation has still required a great amount of time, the vast numerical campaign has been completed in just some hours, instead as of several days. Furthermore, the reduction of the total time needed to carry out a simulation has not influenced the acquired accuracy of the results. Section 6 has pertained to a simplified though convenient modelization of the dynamic response of the upper continuous beam of the historical railway riveted iron arch bridge at Paderno d'Adda (1889), as a 1D multi-span structure under a ML. The FEM model consists of a continuous beam lying on nine firm supports. Once the equivalent bending stiffness of the continuous beam has been estimated, a complete campaign of numerical tests has been carried out. Similarly to the previous analyses regarding railway tracks, simulations have been performed at a certain value of ML velocity, by recording the maximum upward and downward displacements, velocities and accelerations of the beam, given a certain value of the beam bending stiffness (sceneries of degraded or increased stiffness have been considered, with respect to the expected bending rigidity) and of the structural damping ratio (for zero or light inherent damping). Moreover, the time history of the system subjected to a moving locomotive has been illustrated, through a sequence of time frames, describing the evolution of the beam deflection due to the passage of the ML. Finally, the recorded displacements, velocities and accelerations at the middle point of the fifth span of the beam, corresponding to the keystone of the underlying parabolic arch of the bridge have been displayed, in order to show how the implemented parallel code successfully managed to generate information that could be employed toward assessing the current dynamic behaviour of the structure, in view of possible Structural Health Monitoring strategies, devoted to conservation and possible intervention.
The combination of the new C++ parallel implementation with the mechanical and engineering implications derived from the developed ML numerical tests, with all the competent literature framing, shall consistently show that the present approach reveals to be rather feasible, toward inspecting the physical response of such challenging dynamic systems. In contrast with a previous sequential MatLab code, the present C++ parallel implementation manages to exploit the computational resources made available by clusters and other kinds of distributed systems, specifically in the context of HPC machines. Although some implemented C++ routines could still be refined and further dedicated to given specific needs, the present parallel algorithm shall represent a fundamental computational tool, allowing to deepen the understanding of ML problems, extending then the analysis to even more realistic situations, such as those involving complex multi-dimensional structures.
Finally, the mentioned parallel implementation shall also be considered as an important starting point for further enlargements toward achieving a more comprehensive computer code toward FEM analysis, independently from the specific field of application (here ML dynamics). In fact, the parallel algorithm turns out to be extremely customizable, giving users the possibility to implement additional features and computational routines, by employing the already developed main implant.

Future Developments
This study shall contribute in referencing a fundamental starting point to enhance the understanding of the dynamic behaviour of the considered mechanical problems, which could be roughly classified as ML dynamical problems, such as concerning high-speed railway tracks or multi-span bridges under traffic loads. Future studies on the current subjects could be considered, in order to provide further refined and detailed modelizations, to achieve an even more realistic dynamic response prediction of the mentioned systems, specifically in practical terms.
Regarding the analyses of railway tracks, real applications usually require the extension to an infinite beam length. Thus, it is necessary to avoid the effects of the beam supports, by preventing the perturbations induced by the spurious reflections of the traveling waves at the ends of a finite modelled beam. This aim may be achieved by introducing enhanced boundary conditions, which should be able to absorb the incoming waves at the extremes of the beam (see e.g. Froio et al. [57,58]). Furthermore, a better characterization of the track behaviour may be obtained, for instance by involving a more detailed representation of the underlying foundation, as a continuum, not implemented yet in the current version of the C++ parallel code, for not saying about the modelization of the moving object and the local interaction with the structure and the foundation.
Furthermore, the simplified 1D multi-span FEM model of the upper beam of the Paderno d'Adda Bridge could be improved by the extension to a true 3D beam model, whose computational effort of analysis may be held up by the parallel implementation provided within this work. Same for the whole 3D discretization of the complete structure of the bridge [46]. This may allow to undertake possible modelupdating processes, may be based on experimental evidences linked to present-state conditions of the infrastructure. It would require the development of new finite elements, an innovative way to manage the ML path and the implementation of a database to collect and query the mechanical properties of all types of beam cross-section. Further investigations on the effects induced by the time and space discretizations and the time integration parameters should be carried out, to analyze possible spurious oscillations of the solution that may come out from the dynamic computation.
Both situations may also require more truthful vehicle models, in order to replace the single moving concentrated force. Firstly, trains may be represented as a sequence of concentrated MLs, but then, the vehicle-structure interaction could also be described by involving moving spring-massdamper systems. Moreover, the vibration produced by the irregularities of the railway tracks and the wheel flats could be taken into account, to succeed in realistically describing the examined mechanical systems.
Nonetheless, the present work shall serve as a fundamental basis for future wider studies aimed at collecting useful data, such as critical velocities and vibration amplitudes, which could be employed in the investigated challenging structural dynamics vibration context and at a design or evaluation stage of structures and infrastructures subjected to MLs, at increasing speeds of the considered moving objects.