Chinese Expeditionary Force (CEF) was one of the two Chinese military excursions outside of sovereign Chinese territory sent to repel Japanese forces occupying Burma during World War II. By 1941, the World War II entered its darkest days. China had been embroiled in a desperate struggle to resist the Japanese invasion. Japanese troops aiming at cutting the supply routes to China had left only one line of communication open for China to the West—the Yunnan-Burma Road in the southwest [1]. To co-operate with British troops’ anti-Japanese campaigns in Burma and defend China’s southwestern border areas, the CEF were detached to Burma twice to fight the Japanese forces. Eventually, as victorious counterattacks launched by the Chinese India Garrison Army and the CEF, the China-Burma-India transport route reopened, through which supplies from the international community arrived in China. Significantly, due to its location on this China’s “Anti-Japanese Lifeline”, Myitkyina Battle was known as one of the bloodiest battles of the war [2]. This battle warmed up to the largest scale in Burma by Sino-US military cooperation, lasting three months during the monsoon weather. The Allied command totaled its casualties to a really high number: Chinese, 4,344; American, 2,207 [3].

The identification of soldiers lost in battle is of utmost importance to the Chinese Government and to the families and friends who must bear the burden of grief. During World War II and the Korean War, millions of Chinese soldiers were reported missing or dead in action. When bodies cannot be found or identified, families are left with the long-term pain of not knowing with certainty how or whether their loved ones have died. Therefore, intensive efforts by Chinese Government are ongoing to determine the fate of every Chinese soldier yet unaccounted for. As a consequence of these efforts, hundreds of remains have been recovered and returned to China.

In the past fifteen years, DNA analysis has played a significant role in identifying victims of crimes, disasters, and wars. Y-chromosome is male specific and has a patrilineal inheritance mode. They are transmitted with no recombinations but mutations, making Y-STR and SNP quite useful in paternal lineage matching [4]. According to the identification of CEF remains by physical anthropologists and some written records in a personal reminiscence [5], the CEF soldiers involved in the Myitkyina Battle were mostly young people. And most of them were unmarried and childless. In this circumstance, Y chromosome may be quite useful in following the paternal lineages and seeking for their potential male paternal relatives [6]. Instead of using standard forensic means, we carried out an improved procedure (Doc. S1 online), which comprises extraction, purification, and amplification commonly used in ancient DNA studies, Y-filer STR typing, haplogroup prediction, and Y-SNP confirmation, to identify 27 CEF human specimens found in Myitkyina area, Myanmar (Fig. 1a).

Fig. 1
figure 1

(Color online) Geographic and genetic information of the CEF remains. a Geographic location of the sampling site in this study. Myitkyina is the capital city of Kachin State, lies in the northern part of Burma. The samples for DNA identification were taken from 27 skeletons from two closely located hidden mass graves in Tahkawng district, Myitkyina city, Burma, exhumed in April 2015. The CEF remains were severely stained by rust when buried with weapons (photo by Yaxin Liu). b Y-chromosome haplogroup frequencies of CEF samples. The used panels were listed as follows: Coreset Panel—M145, P143, L15, M214, and M45; Haplogroup D panel—M174, M15, P99, and P47; Haplogroup C panel—M130, M217, F2613, F1396, and M48; Haplogroup O panel—M175, M324, KL1, 002611, P201, M7, P164, M134, M117, M119, M110, M268, PK4, and M95; Haplogroup N panel—M231 and LLY22g

Above all, we had encountered some problems during our attempts to make use of Y chromosome DNA for forensic analysis. The first and crucial step in DNA analysis is extraction of genomic DNA. Because the remains were exposed to wild conditions (hotness, dampness, and unknown conditions) for approximately 70 years, and substantial DNA loss and degradation were evidently observed, standard forensic DNA extraction were unsuccessfully. We then have met inhibition problems. A very strong inhibitor co-purified with the DNA, which was able to inhibit PCR amplification. For certain treatment, we were able to overcome the difficulties by protocol of ancient DNA extraction, and by purification of the initial DNA extract using QIAquick spin column (QIAGEN, CA, USA). The absence of PCR product in the DNA-free negative control reactions suggested that the PCR product for DNA analysis was not from contamination.

The Y-STR profiles of 27 skeletal remains are shown in Table S1 (Online). The number of detected loci varied between nine and fifteen in Y-STR profiles of 27 skeletal remains (Table S2 online). Since two samples shared the same Y STR profile across the detected loci, our sample set yielded at least 26 different Y-STR haplotypes. Therefore, a further autosomal STR testing or mtDNA sequencing is needed. A high-discrimination capacity (DC = 0.9630) was observed in CEF from the Myitkyina Battle, demonstrating that Y-filer system was useful for forensic identification in this study.

By using our database containing 37,754 pieces of Y SNP/STR data [7], we obtained seven inferred haplogroups with corresponding highest probabilities (Table S2 online). Considering phylogenetic trees of each haplogroup and the results of core-set panel testing, we chose particular mini-panels with two to six informative markers to further genotyping for each sample (Table S2 online). Eventually, 29 out of the 30 genotyped Y-SNPs were observed in the derived state, thus defining 14 haplogroups observed in our samples, belonging to major clades C, D, N, and O. By typing Y-chromosome DNA in a hierarchical fashion, we ultimately gained 26 Y-STRs haplotypes and 14 Y-SNP haplogroups. The frequency distribution of Y-chromosome SNP haplogroups detected in this study is reported in Fig. 1b.

To acquire the clue to individual origin, a haplotype-sharing analysis was constructed based on Y-STR haplotypes belonging to the same haplogroups (Table S3 online). After that, although we failed to speculate the possible native places of five of the CEF samples due to the patchy distributions of their closest Y-STR haplotypes, we found that six individuals were most likely from the northwestern, ten were from the southwestern, five were from the southern, and one were from eastern China. Fortunately, we acquired an incomplete CEF listing maintained by a private website (http://www.yuanzhengjun.cn/zhengji/zhengji.html), which comprises about 600 soldiers. The statistics of these CEF soldier’s origin reveals that, if classified by administratively provinces, those from Sichuan (18.5 %), Hunan (17 %), Henan (7.2 %), Zhejiang (5.8 %), Chongqing (5.7 %), Guangdong (5.3 %), Hubei (5.2 %), Guangxi (5 %), Yunnan (4.5 %), and Anhui (3.7 %) ranked the top ten, and if divided by geographical region, those from southwest (31.7 %), central (29.3 %), eastern (19.8 %), and southern (10.5 %) China together accounted for 91.3 %. This statistics coincided with the result of our haplotype-sharing analysis except the percent of who came from northwestern China was probably overestimated. However, it is noteworthy that some CEF samples shared the closest haplotypes occurring simultaneously both in northwestern and southwestern populations, suggesting the substantial later migrations/gene flow between northwestern and southwestern China. An ethnic migration route for the Tibetan ancestors was the so-called Tibeto-Burman corridor, which supposed that Tibetans first migrated from Qinghai to the Tibetan Plateau and then subsequently spread throughout the surrounding area [8]. Therefore we sometimes could not distinguish whether a sample was from the northwestern or the southwestern, as migrations could possibly blur the genetic background of these two regions.

It is important to note that, in a standard forensic identification, a reference sample database containing living relatives is necessary. In this study, it is apparent that 27 profiles remain unidentified as they await suitable reference. Therefore a wider media campaign may help prompt living relatives to come forward, donate samples, and provide additional information. In addition, without relevant database, we carried out a Y-SNP & STR associated haplotype-sharing analysis to infer the information about the CEFs’ possible sources and possible living patrilineal relatives. Hence we hope our conjecture might aid in finding the potential living relatives, and as a result, we can build up a reference samples database. In the near future, a special website, which contains detailed DNA typing results of the revolutionary martyrs’ remains, will be created by our lab. We hope this website will help ensure that uncovered skeletal remains could be assigned to a deceased martyr and offer hope to the families that their loved ones will be given a dignified resting place.