High intra- and inter-observer reliability of planning implant size in MRI-based patient-specific instrumentation for total knee arthroplasty

Purpose Patient-specific instrumentation (PSI) in total knee arthroplasty (TKA) uses individually designed disposable guides to determine intraoperative bone cuts. The manufacturer provides the surgeon with proposed planning which can be modified and should be approved by the surgeon before the guides are produced. This study aims to assess the intra- and inter-observer reliability among preoperative planning by orthopaedic surgeons using PSI. The authors hypothesize a high intra- and inter-observer reliability in planning TKA using PSI. Methods Four orthopaedic surgeons modified and approved 40 preoperative MRI-based PSI plannings three times. The surgeons were blinded to their own and each other’s results. Intra- and inter-observer reliability was obtained for planned implant size, resection, and position of the implant. Results Intra-observer reliability Intraclass Correlation Coefficients (ICC) were excellent for femoral and tibial implant size with a range of 0.948–0.995 and 0.919–0.988, respectively. Inter-observer reliability for femoral and tibial implant size showed an ICC range of 0.953–0.982 and 0.839-0.951, respectively. Next to implant size, intra- and inter-observer reliability demonstrated good to an excellent agreement (ICC > 0.75) for 7 out of 12 remaining parameters and 6 out of 12 remaining parameters, respectively. Conclusion Preoperative planning of TKA implant size using MRI-based PSI showed excellent intra- and inter-observer reliability. Further research on the comparison of predicted implant size preoperatively to intraoperative results is needed.


Introduction
Patient-specific instrumentation (PSI) in total knee arthroplasty (TKA) uses individually designed, disposable guides, to determine intraoperative bone cuts. Patient-specific preoperative 3D models for the femur and the tibia can be generated either from preoperative magnetic resonance imaging (MRI) or computed tomography scans (CT). A technician is able to make a default plan for the implant size and position using this data. The surgeon can make adjustments to all settings of the femur and tibia component, taking in mind each patient's anatomical variations. After the case is approved, the manufacturer produces disposable guides for intraoperative use. Previous literature has shown that the plan provided by the technician can differ from the approved plan by the surgeon [4,10,12]. Consequently, differences between the suggested and appropriate component size may occur. Therefore, the expertise of the surgeon is essential for evaluating and approving the planning provided by the manufacturer. Nonetheless, none of these studies evaluated the intra-or inter-observer reliability of the planning made by the surgeon.
Multiple other studies have been conducted to assess the radiographic and clinical outcome of conventional TKA 1 3 compared to PSI [6,7,14]. Other studies compared CT-to MRI-based PSI for TKA [1,13,16]. However, no literature exists on evaluating the reliability of the planning method itself by comparing repetitive preoperative planning within or between orthopedic surgeons. This comparison is of added value since it demonstrates whether TKA-planning using PSI is itself reliable. Therefore, the present study is designed to assess the intra-and inter-observer reliability among preoperative planning by orthopedic surgeons using PSI. The authors hypothesize that there is a high intra-and inter-observer reliability in planning TKA using PSI.

Materials and methods
The study group consists of all patients who underwent TKA in 2015 using PSI (Signature™ system, Zimmer-Biomet Inc., Warsaw, IN) based on a preoperative MRI in the Zuyderland Medical Center (Sittard-Geleen, The Netherlands). A total of 309 patients were included. From this cohort, 40 patients were randomly selected and anonymised by Materialise NV (Leuven, Belgium). The preoperative plan, in the default setting as suggested by Materialise NV, had to be evaluated, adjusted where necessary, and approved by the surgeon.
Institutional review board (METC Z, Heerlen, the Netherlands) approval was obtained for this study (trial number 13-N-117).

Measurements
Four orthopaedic surgeons were each given three folders, within every folder the selected 40 cases in random order. As a result, each surgeon performed standard preoperative planning three times per case within 2 weeks. All surgeons were senior surgeons and had a minimum experience of 3 years with PSI for TKA.
Only the manufacturer had information regarding matching case numbers until the evaluation of all approved plannings. For each case, the following 14 parameters were planned: femoral size, femoral posterior medial resection, femoral mediolateral displacement, femoral distal medial resection, femoral flexion-extension, femoral varus-valgus, femoral rotation from the epicondylar axis, tibial size, tibial anteroposterior displacement, tibial mediolateral displacement, tibial resection from the highest point, tibial posterior slope, tibial varus-valgus, and tibial rotation.

Outcome measurements
The primary outcome measurements were intra-and interobserver reliability of planned size component for the femur and tibia. The secondary outcome measurements were intra-and inter-observer reliability of all remaining planned measurements as described above.

Statistical analysis
All statistical analyses were performed using SPSS software version 25 (SPSS Inc., Chicago, Illinois).
The Intra-and inter-observer reliability of all measurements were determined by Intraclass Correlation Coefficients (ICCs), using an absolute-agreement two-way mixed-effects model for intra-observer reliability, and an absolute-agreement two-way random effects model for interobserver reliability.
ICC values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability [9].

Results
Determination of femoral-and tibial implant size showed excellent agreement with ICCs for intra-observer reliability within a range of 0.948-0.995 and 0.919-0.988, respectively, as well as excellent ICCs for inter-observer reliability within a range of 0.953-0.982 and 0.839-0.951, respectively ( Table 1). The maximum size change when an implant size was changed, when compared to other plannings within the same case, was 1 size for the femoral component and 2 sizes for the tibia component. The amount of adjusted implant sizes and differences between the implant sizes within the same case per surgeon are shown in Table 3.
Furthermore, intra-and interobserver reliability demonstrated excellent to good agreement (ICC > 0.75) for 7 out of 12 remaining parameters, and 6 out of 12 remaining parameters, respectively. A different agreement per surgeon, with an intra-observer reliability ranged from moderate to excellent (ICC range > 0.5-> 0.9) was found in 2 out of 12 parameters, as well as in the range from poor to excellent (ICC range < 0.5-> 0.9) (Figs. 1 and 2).
For 3 parameters (femoral rotation from the epicondylar axis, posterior tibial slope, tibial rotation) no changes to the proposed planning were made by any of the surgeons, resulting in an intra-and inter-observer reliability ICC of 1.00. Table 2 shows an overview per case of changes from proposed planning and alterations within adjusted plannings. All modifications per surgeon are listed in Table 3.

Discussion
This study shows that planning of TKA using PSI by different surgeons results in an excellent agreement for implant sizes between surgeons as well as in repeated planning by the same surgeon. Next to implant size, intra-and interobserver reliability demonstrated good to excellent agreement (ICC > 0.75) for 7 out of 12 remaining settings and 6 out of 12 parameters, respectively. Hence, it may be stated that PSI is a reliable method for planning of a TKA.
Previous studies have shown that PSI planning accurately predicts the implant size used intraoperatively [5,8,12]. The current study shows that planning of the implant size, within and between orthopaedic surgeons, is reliable. The maximum size difference was 1 implant size for the femur, and 2 implant sizes for the tibia, compared to the other plannings of the same patient.
Changes to the default plan can result in different implant sizes. Overall, more changes were made to the tibia than the femur. This may explain the greater difference in default and approved implant size for the femur and tibia: 1 size versus 2 sizes, respectively. TKA surgery can be planned more effectively by understanding the size change frequency of implants, in combination with intraoperative concordance to the preoperative plan. Consequently, when implant sizes can be accurately predicted, and planning implant sizes itself is reliable, the operating team will be able to minimize intraoperative implant size errors in advance. This may lead to improved operating room efficiency due to a decreased  Previous studies have emphasized that changes in the initial technician's plan were necessary to get an accurate preoperative planning of the implant sizes. Intraoperative alterations in implant size were significantly lower for the plans approved by the surgeon compared to the default plans provided by the technician [4,10,12]. Based on this previous literature, the expertise of the surgeon is thus essential for evaluating and approving the default planning provided by the manufacturer.
Intra-and inter-observer reliability ICC were 1.00 for femoral rotation from the epicondylar axis, posterior tibial slope, and tibial rotation because none of the surgeons modified these parameters. Due to general consensus on these parameters, less variation will occur, with a higher agreement as result. When there is less consensus on a certain parameter, more changes are made which will result in a lower agreement. Surgeon 3 made the fewest alterations to the proposed plannings, resulting in an excellent agreement (ICC > 0.90) for intra-observer reliability for all settings. Thus, high ICC can be caused by good agreement between adjusted plannings, or due to no alterations made to the proposed plan. Additionally, the adjustment of one parameter can derive alterations of other parameters. For example, an increase of resection might result in the need for a smaller implant size and adjustments in placement of the newly chosen implant size. Awareness of this effect is essential when interpreting the results of this article. This 'snowball effect', as well as less consensus on certain parameters with more changes to the default planning and therefore more differences within and between surgeons, may explain why parameter such as femoral flexion/extension and tibial displacement showed less agreement.
Mechanical alignment technique is considered well performed when the overall limb alignment is within 3° of neutral. Varus-and valgus angles for both femur and tibia showed modifications with a maximum of 0.5° and 2° respectively. Given that the maximum difference of varus/ valgus angles is 2° within the same case, it is supposed that these changes are of no clinical importance. Moreover, adjustments to varus/valgus alignment are known to be dependent on the surgeon's philosophy for an anatomical-, (adjusted) mechanical-, or (restricted) kinematic alignment technique [2,11]. A patient's specifications, such as findings from a physical examination (for example preoperative leg axis, body mass index, and laxity) and previous medical history, can be determinative in the decision for a certain alignment.
Patient-specific characteristics uncontrollable by planning software, namely, ligamentous balancing and lower limb alignment, can require intraoperative changes. Therefore, correct matching of the pre-operative plan and intraoperative observations is a crucial factor in PSI-assisted TKA. In case of a mismatch, it is the surgeon's responsibility to consider a switch to conventional instrumentation. In previous literature intra-operative modifications were made to the pre-operative plan in 23% up to 36% of MRI-based and CTbased PSI-assisted TKA, respectively. Most of these changes occurred due to a poor match between the pre-operative plan and intra-operative observations for the tibial component [3,4,15]. Furthermore, each surgeon has a 'personal touch' not only in planning, but also intraoperatively with his or her own preferences of additional releases, the decision whether or not recuts are needed, or the consideration to select a thicker insert in patients with a high BMI. Nonetheless, excellent agreement for implant sizes between surgeons and within surgeons was found in this study. Also, the agreement of implant size did not differ between the surgeons who made multiple changes to the proposed plans compared to the surgeon who made very little changes to the proposed plan.
This study has some limitations. Firstly, no power analysis for the number of surgeons, the number of patient cases, and repetitive measurements have been conducted. However, Koo et al. suggested as a rule of thumb that researchers should obtain at least 30 heterogeneous samples and involve at least 3 observers whenever possible when conducting a reliability study [9]. Therefore, in the present study 40 This study represents the first assessment of intra-and inter-observer reliability in PSI TKA. The study showed an excellent intra-and inter-observer reliability, among which implant sizes. This may contribute to more optimal and potentially effective preoperative planning of TKA surgery in the future. Therefore, this topic can be of interest for further research. Future research, with larger dataset measurements and different types of both MRI-and CT-based PSI, is necessary to further evaluate these results.

Conclusions
Preoperative planning of TKA implant size using MRI-based PSI showed excellent intra-and inter-observer reliability. Future research on the comparison of predicted implant size preoperatively to intraoperative results is needed.
Funding The author(s) received no specific funding for this work.

Compliance with ethical standards
Conflict of interest Nanne P Kort is a paid consultant for Zimmer-Biomet, Stryker, BodyCad, Supportho, and Bioventus. Nanne P Kort is a principal investigator at Zimmer-Biomet. Nanne P Kort is a committee member and the Dutch Orthopedic Society (NOV). All other authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent There is no informed consent text since this study was based on database research only and without direct contact with patients.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.