Applied error propagation from misinterpretation of 13C chemical shift data
The tetrahydrofuranoid lignin (-)-berchemol was isolated as a triacetate from the stems of Berchemia racemosa SIEB. et ZUCC.  and its structure was elucidated using 1H, 13C, HH-COSY, and HC-COSY NMR measurements. The 13C NMR spectral data of berchemol and its triacetate together with the signal assignments are given in Table II of , additionally the 1H spectrum as well as the HH-COSY and the HC-COSY of the triacetate are shown in the Figs. 1, 2 and 3 of . Berchemol contains two 1,2-dihydroxylated benzene moieties having four characteristic quaternary carbons resonating at 144.2, 145.8, 146.6, and 146.8 ppm, respectively. This is in full agreement with tabulated shift values taken from NMR textbooks , easily distinguishable from 1,3-dihydroxylated benzene derivatives showing resonances typically in the range between 155 and 160 ppm.
In a later paper published by Hu et al. , the obviously correct spectral data of berchemol from  are used to elucidate the structure of a further lignan derivative isolated from Saussurea cordifolia (compound 5 in ) having one 1,2-dihydroxylated and one 1,3-dihydroxylated benzene moiety (Table 1). The spectral data for the relevant quaternary carbons are given at 145.9, 147.2, 148.6, and 149.0 ppm—which is in good agreement with the literature data of berchemol—but now used to derive a 1,2- as well as a 1,3-dihydroxylated benzene fragment giving (2R,3S,4S)-4-(4-hydroxy-3-methoxybenzyl)-2-(5-hydroxy-3-methoxyphenyl)-3-(hydroxymethyl)tetrahydrofuran-3-ol (CAS-RN: 1227937-39-4). This wrong structure is probably based on a misleading interpretation of the 1H NMR data as well as the HMBC data. The electronic version of this publication contains no supplementary information prohibiting therefore reinterpretation of the experimental data.
Using the above mentioned, but wrong structure (CAS-RN: 1227937-39-4) as query for searching the CAS-Registry File gives also the “Absolute stereo mirror image” having the CAS-RN 1427038-08-1 showing a 2S, 3R, 4R configuration. This compound appears exactly once in the chemical literature (SCIFinder, Dec 31st, 2018) published in  and named deltoignan A (compound 9 in ). The spectral data of deltoignan A (1H, 13C, HH-COSY, HC-HMBC) are in good agreement with the data given in , therefore the same wrong conclusion with respect to the constitution of the compound was done based on an already wrong structure proposal. The Fitoterapia paper  again has no supplementary information allowing the reinterpretation of the spectral data.
The wrong structure from  was further used as reference material to elucidate the structure of compound 2 in  creating an additional wrong example of a 1,3-dihydroxylated benzene derivative having 13C NMR chemical shift values at 147.2 and 148.6 ppm for the quaternary carbons. Furthermore, another new compound named vibruresinol isolated from the stems of Viburnum erosum  was elucidated having again a 1,2- as well as a 1,3-dihydroxylated benzene fragment showing the four relevant quaternary carbons at 147.5 (2×), 145.6, and 144.4 ppm incompatible with this structure proposal. The known compounds 4–10 described in  contain always 1,2-dihydroxylated benzene moieties with resonance lines within the expected range from 142.9 to 152.0 ppm.
Summarizing the above given detailed analysis leads to the following conclusions:
In 1989 , an obviously correct structure perfectly compatible with the 13C NMR data was determined.
In 2010 , the correct data from 1989  were used as reference material to derive a wrong structure proposal ignoring basic knowledge at textbook level.
The paper published in 2012  relies on the previous paper  introducing again another wrong example making the error statistically more confident.
The paper published in 2015  also relies on  introducing further examples of a substitution pattern incompatible with the given 13C NMR data.
All three papers successfully passed the peer-reviewing process.
All publications have no “Supplementary Information”.
In all publications with wrong structure proposals, only already tabulated chemical shift data are given and 2D correlations are visualized in the structural diagrams [19, 20]—there is no possibility to access the raw data to redo processing and/or interpretation starting from scratch.
Automatic peer reviewing of the 13C NMR data taken from  using the “CSEARCH-Robot-Referee” shows massive deviations between experimentally determined and predicted chemical shift values in the 1,3-dihydroxylated benzene moiety of deltoignan A. Systematic variation of the questionable positions using a structure generator program (Table 2) able to create similar structures leads to 3571 proposals—2 out of these 3571 structures are known compounds, either within the CSEARCH database or within the PUBCHEM collection [22, 23]. The original proposal can be found at position 298 within the sorted hitlist having an average deviation of 3.06 ppm. A real-world alternative is located at position 4 with an average deviation of 1.61 ppm, furthermore a 1,4-dihydroxylated derivative is proposed at position 1 having an average deviation of 1.39 ppm. From this result, it is clearly shown that the originally proposed structure is definitely wrong; the alternatives either have a 1,2- or 1,4-dihydroxylated benzene fragment. The position of the hydroxy and the methoxy group has to be established from the 2D-NMR data making them a necessary bit of information and therefore the raw data should be made available within the supplementary information. It should be mentioned that both wrong structure proposals [19, 20] are contained in the knowledge base used by the “CSEARCH-Robot-Referee” for this evaluation; despite that, the algorithms applied recognize this inconsistency between the given spectral data and the substitution pattern.
Identical spectral data—different structures determined
In , the isolation and structure elucidation of 4-allylresorcinol by IR, EIMS, 1H, 13C, and 2D-NMR spectra at 400 MHz was described and the interpretation of the spectral data was mainly based on a comparison of the 1H NMR data (Table 3) with the chemical shift values published in .
The 1H NMR data in all three publications are nearly identical, whereas the 13C NMR data in  fit very well to the chemical shift values given in  showing a different structure proposal. Obviously the structure elucidation of 4-allylresorcinol in  is based on an incomplete characterization, because the 13C NMR data have not been published. The conclusions given in  lead therefore to a wrong structure proposal. Based on the 13C NMR data from , the structure of this compound is 4-allylbenzene-1,2-diol instead of the given 4-allylbenzene-1,3-diol (Table 4). When comparing the chemical shift values with reference material from basic NMR textbooks, it is clearly visible that the 1,2-diol gives two signals somewhere in the region of 140–145 ppm, whereas the 1,3-pattern needs two signals in the region of 155–160 ppm. The wrong structure proposed in  is based on the incomplete characterization of a compound in , whereas the obviously correct data from Ref.  are neglected.
The knowledge base of the CSEARCH-Robot-Referee  contains both entries from  and . Despite the wrong structure proposal being available in the knowledge base, large discrepancies between the experimental and the predicted chemical shift values are found. The carbons within the benzene ring show deviations ranging from 7.7 to 18 ppm (Fig. 1).
The different structure proposal in  having nearly identical 13C NMR data is detected via a spectral similarity search and shown as an alternative structure fitting the query spectrum. Subsequent structure generation starting from the wrong proposal  creates 425 topologies, 24 of them are real-world structures either known within the CSEARCH and/or the PUBCHEM database [22, 23]. The original structure proposal (1,3-diol pattern) from  is found at position 67 with an average deviation of 6.02 ppm, the best fitting real-world alternative (1,2-diol pattern) is at position 1 having a deviation of 1.39 ppm between experimental and predicted chemical shift values. This example shows the immense power of the CSEARCH-Robot-Referee performing a fully automatic structure revision which is in full coincidence with a detailed analysis of the public domain literature.