Deep learning-based multimodal analysis for transition-metal dichalcogenides

Bhawsar, Shivani; Fang, Mengqi; Sarkar, Abdus Salam; Chen, Siwei; Yang, Eui-Hyeok

doi:10.1557/s43577-024-00741-6

Deep learning-based multimodal analysis for transition-metal dichalcogenides

Impact Article
Open access
Published: 25 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

MRS Bulletin Aims and scope Submit manuscript

Deep learning-based multimodal analysis for transition-metal dichalcogenides

Download PDF

Shivani Bhawsar¹,
Mengqi Fang¹,
Abdus Salam Sarkar¹,
Siwei Chen¹ &
…
Eui-Hyeok Yang ORCID: orcid.org/0000-0003-4893-1691^1,2

394 Accesses
Explore all metrics

Abstract

In this study, we present a novel approach to enable high-throughput characterization of transition-metal dichalcogenides (TMDs) across various layers, including mono-, bi-, tri-, four, and multilayers, utilizing a generative deep learning-based image-to-image translation method. Graphical features, including contrast, color, shapes, flake sizes, and their distributions, were extracted using color-based segmentation of optical images, and Raman and photoluminescence spectra of chemical vapor deposition-grown and mechanically exfoliated TMDs. The labeled images to identify and characterize TMDs were generated using the pix2pix conditional generative adversarial network (cGAN), trained only on a limited data set. Furthermore, our model demonstrated versatility by successfully characterizing TMD heterostructures, showing adaptability across diverse material compositions.

Graphical abstract

Impact Statement

Deep learning has been used to identify and characterize transition-metal dichalcogenides (TMDs). Although studies leveraging convolutional neural networks have shown promise in analyzing the optical, physical, and electronic properties of TMDs, they need extensive data sets and show limited generalization capabilities with smaller data sets. This work introduces a transformative approach—a generative deep learning (DL)-based image-to-image translation method—for high-throughput TMD characterization. Our method, employing a DL-based pix2pix cGAN network, transcends traditional limitations by offering insights into the graphical features, layer numbers, and distributions of TMDs, even with limited data sets. Notably, we demonstrate the scalability of our model through successful characterization of different heterostructures, showcasing its adaptability across diverse material compositions.

Scanning Electron Microscopy: Principle and Applications in Nanomaterials Characterization

Carbon nanotubes: synthesis, properties and engineering applications

Article 18 July 2019

Recent Advances and Perspectives of Lewis Acidic Etching Route: An Emerging Preparation Strategy for MXenes

Article Open access 15 March 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Deep learning (DL) algorithms are a subdomain of artificial intelligence (AI) that uses a high generalization approach to recognize and interpret images,¹ enabling an efficient identification of properties of materials.² AI research has been used in two-dimensional (2D) materials to analyze their optical,^2,3 physical,⁴ and electronic properties.^5,6,7 Electronic properties, such as bandgaps and electron affinities, have been predicted using machine learning (ML) and DL models based on the structure–property relationship. Segmentation,^8,9 thickness identification, and point defects^10,11,12,13 have been analyzed based on DL modeling of the crystal structures and bandgaps of materials. A three-dimensional (3D) convolutional neural network called DL-enabled atomic layer mapping (DALM)¹⁴ has been studied to identify and segment MoS₂ flakes with mono-, bi-, tri-, and multilayers. An encoder-decoder semantic segmentation network¹⁵ has been studied and configured for pixel-wise identification of optical images of 2D materials along with graphical features, such as contrast, color, edges, shapes, flake sizes, and their distributions. Similarly, a DL-based atomic defect detection framework (DL-ADD)¹³ has been demonstrated to efficiently detect atomic defects in MoS₂ and generalize the model for defect detection in other TMD materials. The three DL architectures DenseNet,¹⁶ U-Net,¹⁷ and Mask-region convolutional neural network (RCNN)¹⁸ have been studied to classify, segment, and detect microscopic images of 2D materials for automated atomic layer mapping,³ while demanding many datapoints to train the networks in characterizing the optical images. A ML-based solution¹⁹ has been modeled to map simulation results from indentation pillar-splitting experiments and predict the critical indentation load of fracture instability using Gaussian process regression. Notably, image-to-image translation^20,21,22,23 using generative conditional adversarial networks (cGANs) has been studied in translating optically sectioned structured illumination microscopy (SIM) images, semantic segmentation,^24,25 and image processing.²⁶ A game theory-based cGAN²⁶ has also been demonstrated to predict physical fields such as stress or strain from the material microstructure geometry. While the cGAN works well with limited data to capture complex information from the pixels, the application of pix2pix in characterizing TMDs remains unexplored to date.

Here, we demonstrate a DL-based image-to-image translation approach with cGANs, trained with optical labeled images to enable intelligent characterization of mechanically exfoliated and CVD-grown TMDs. Unlike other AI-based research on TMDs, this method requires limited data to train and evaluate the model. To ensure that our DL model effectively learns the complex variations of pixels and accurately maps them to TMD thicknesses, we utilize experimental data obtained from Raman and PL spectroscopy. These data associate layer information with individual pixels and assign specific colors to represent different layers. We train a pix2pix model to generate the labeled images from optical images to identify the number of layers in TMDs and preprocess data for training. To assess the performance of the model, we conduct quantitative measurements using structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), and mean squared error (MSE) scores. We further investigate the generalization ability of the model by training it on MoS₂ and WS₂ samples and successfully testing it on WSe₂ samples, demonstrating its capability to adapt to different materials. Finally, we apply the model to characterize heterostructures, highlighting its ability to analyze complex material structures.

Results and discussion

Synthesis and characterizations of TMDs

Figure 1 illustrates the workflow of multimodal analysis of TMDs using DL-based cGANs. TMDs were transferred (via mechanical exfoliation) or synthesized (via low-pressure CVD [LPCVD]) on 300-nm SiO₂/Si substrates (see the “Materials and Methods” section) with varying numbers of layers. To verify the growth of materials, we characterized the samples using Raman spectroscopy, photoluminescence (PL) spectroscopy, and atomic force microscopy (AFM) to determine the number of layers. Figure 1c-d gives the Raman and PL peaks of three and four layers as well as bulk (layers greater than four) mechanically exfoliated MoS₂ as an example. All spectra were taken with 532-nm excitation. Figure 1c gives the E¹_2g (in-plane vibration of Mo and S atoms) and A_1g (out-of-plane vibration of S atoms) phonon modes of mechanically exfoliated MoS₂. The phonon modes of MoS₂ three layers (3L) are located at 382.02 cm⁻¹ (E¹_2g) and 405.54 cm⁻¹ (A_1g).

Similarly, phonon modes of MoS₂ bulk (layers greater than four) are located at E¹_2g = 381.62 cm⁻¹ and A_1g = 407.09 cm⁻¹. These peaks are blue-shifted from 405.54 cm^–1 to 407.09 cm⁻¹ for the in-plane and out-of-plane peaks because they move from three to bulk layers, directly proportional to the increased number of layers.²⁷ The Raman shift between in-plane and out-of-plane peaks increases from ∼23.52 cm^–1 for three layers (3L) to ∼25.47 cm^–1 for the bulk (layers greater than four). ²⁷ Figure 1d presents the PL spectra of mechanically exfoliated MoS₂. As observed by others, the PL intensity for three-layer samples was much higher than the other two samples (four or thicker layers).^27,28 Figure 1e shows the architecture of the cGAN model to characterize TMDs. The labeled images section shows labeled images based on the number of layers identified using Raman and PL spectra. A cGAN is comprised of a generator and a discriminator. The generator takes optical images as input and generates images, which are subsequently fed to the discriminator along with labeled images. The discriminator then compares both images and returns the output to the generator.

Data preprocessing

Multiple preprocessing steps were applied to the collected optical images to improve the image quality before the collected optical images were fed to the model. This procedure addressed the potential deterioration of images captured by an optical microscope, including uneven lighting and the gradual degradation of the camera sensor. First, we applied median filtering,²⁹ a denoising technique, to smoothen the optical images and generate denoised images, and Gaussian filtering³⁰ to smoothen the image further. The Gaussian average of neighboring pixels for each pixel can be calculated by

$${\text{Gaussian}}\left( {x,y} \right)\, = \,\frac{1}{{2\uppi \sigma^{2} }}e^{{ - \frac{{x^{2} \, + \,y^{2} }}{{2\sigma^{2}, }}}}$$

(1)

where x and y refer to the pixel location in the image. This process generated blurred images and removed high-frequency noise from the image. Figure 2a, d shows the denoised images of CVD-grown WS₂ and exfoliated MoS₂ after applying median and Gaussian filtering operations. We produced sharpened images by blending the denoised images with a positive weight of 1.5 and the blurred image with a negative weight of 0.5. This process enhanced the edges and other details in images. We normalized the pixel values of the sharpened image to the range (0, 255) calculated by

$$pixel\,Normalized\, = \,\frac{pixel\,Val - pixel\,Min}{{pixel\,Max - pixel\,Min}},$$

(2)

where pixel Val is the actual value of the pixel, and pixel Min and pixel Max are the minimum and maximum values of all pixels in the images, respectively. Figure 2b, e shows the final normalized images after applying median filtering, Gaussian filtering, sharpening and normalization processes. This process ensures that the data are within a consistent scale for a faster convergence during training, leading to less training time to improve the efficacy of the model’s generalization.

Color-based segmentation for detecting TMDs

We further performed color-based segmentation to verify the presence of TMDs in optical images by converting optical images from RGB (R-red, G-green, B-blue) to HSV³¹ (H-hue, S-saturation, V-value) color space to separate the images into three components—hue, saturation, and value. Then, we created a mask based on the hue component values of the HSV color space, resulting in a mask containing the color information of the original images for pixels that meet the hue criteria, while setting other pixels to zero. Figure 2c, f, and i shows the HSV color space of the optical images, with the scale bars displaying the hue component of the images. Figure 2h shows the detected flakes bounded by a rectangle, while a reference line of pixels shown in Figure 2g contains pixels with substrate, monolayers, and bilayers. We generated color profiles of red, green, and blue channels of the pixels of the reference line. Figure 2j shows the color profiles from the reference line where the deviation of a red channel within the small circle represents the presence of a bilayer.

Data labeling

For the layer identification in TMDs, collected optical images were manually annotated using Labelbox,³² an open-source web-based labeling tool to annotate the data using the well-defined ontology. It provides a set of inbuilt web services that can be used to automate the process on a batch of data. Five different classes were used to define each image pixel from mono-, bi-, tri-, and four layers, as well as bulk (layers greater than four), and each class was labeled as one specific color. The monolayer pixels, bilayer pixels, pixels with three layers, four layers, and layers greater than four were colored in blue, green, red, cyan, and light gray, respectively. Figure 3a shows a set of labeled images. Figure 3b–c shows an optical image and the labeled image of mechanically exfoliated MoS₂. Figure 3d shows the mask images for each class and the legend used for coloring the image pixels. Figure 3e–f depicts 3D plots of pixel intensities to visualize the pixels before and after labeling. Once labeled, each image was paired with its respective labeled image to be fed into the model (Figure 4).

Model architecture and training

The model architecture of the pix2pix model, a cGAN designed explicitly for image-to-image translation, comprises two models: a generator and a discriminator. The generator takes an optical image as input and transforms it into another image, which, along with the corresponding labeled image, is fed into the discriminator model, comparing the similarity between both images. The generator model is an encoder-decoder model that is based on U-Net¹⁷ architecture. The encoder encodes the input image and extracts the features while the decoder maps the pixels to the size of the image. The discriminator model is based on PatchGAN²³ architecture, which provides the binary output as 0 or 1 to indicate whether the generated image is fake or real. PatchGAN focuses on discriminating local image patches rather than the entire image. This approach allows for finer-grained analysis of image details and facilitates more precise feedback to the generator. In image-to-image translation tasks like ours, where optical features, including contrast variations, thickness variations, and colors, are crucial, PatchGAN’s localized discrimination helps capture intricate features accurately. PatchGAN enables the discriminator to produce high-resolution outputs by operating on image patches. In our case, where the goal is to generate detailed annotations for optical images of TMDs, the ability to produce high-resolution output maps is essential for preserving image quality and capturing optical features. The generator and discriminator models are stacked together to update each other dynamically during training. The generator model is updated continuously to minimize the loss of the generated images from the discriminator model. This loss is known as adversarial loss, which is also updated to minimize the loss between the generated image and the labeled image calculated by

$$\begin{gathered} G_{loss} \, = \,Adversarial\,loss\, + \,\left( {\uplambda \, \times \,L1\,loss} \right) \hfill \\ G_{loss} \, = \,\frac{1}{m}\sum\limits_{i = 1}^{m} {\log \left( {1 - G\left( x \right)} \right)\, + \,\left( {\uplambda \, \times \,MAE\left( {G\left( x \right),y} \right)} \right),} \hfill \\ \end{gathered}$$

(3)

where λ is hyperparameter, L1 loss is the mean absolute error (MAE) between the generated and labeled images, y is the labeled image, G(x) represents the generated image, G_loss represents the generator loss, and Adversarial loss is the sigmoid cross-entropy loss. Similarly, discriminator loss is calculated by

$$D_{loss} \, = \,\frac{1}{m}\sum\limits_{i = 1}^{m} {\log \left( {1 - y} \right)\, + \,\log \,\left( {G\left( x \right)} \right),}$$

(4)

where D_loss is discriminator loss, y is the labeled image, and G(x) represents the generated image. Overall, the conditional generative adversarial network loss tries to maximize discriminator loss and minimize the generator loss simultaneously to generate the required result. The combined loss function is given by

$$Combined\;los{{s}_{G,D}}={{E}_{x,y}}\left[ \log D\left( x,y \right) \right]+{{E}_{x}}\left[ \log \left( 1-D\left( x,G\left( x \right) \right) \right) \right],$$

(5)

where E_x,y is the expected value of the logarithm of D(x,y) and D(x,y), which represent discriminator output when a pair of input images and its corresponding labeled image are provided as input, E_x is the expected value of the logarithm related to D(x,G(x)), which represents discriminator output when a pair of input images and generator output image are provided. The final loss function of the network is calculated by

$$G*,\,D*\, = \,\arg \mathop {\min }\limits_{G} \,\mathop {\max }\limits_{D} \,Combined\,loss_{G,D} \, + \,\left( {\uplambda 2*MES\left( G \right)} \right),$$

(6)

where MSE(G) is the mean squared error loss of the generator, λ2 is a hyperparameter, and the min and max functions represent minimizing the generator loss and maximizing the discriminator loss, respectively, at the same time.

Prediction and performance evaluation

A set of 100 preprocessed optical images and their corresponding manually labeled images were fed to the model for training. Before feeding the image to the model, we resized it to 512 × 512 pixels to ensure compatibility with the model architecture. Additionally, we employed augmentation techniques, such as flipping and rotation, to further enhance the training data set’s diversity and robustness. We chose the adaptive moment estimation (Adam)³³ optimizer, a method of efficient stochastic optimization that requires first-order gradients with minimal memory requirement. We chose a learning rate 0.0002, a first momentum term (β1) set to 0.5, and a second momentum term (β2) set to 0.999 for the Adam optimizer. We initially set 250 as the total number of training iterations(epochs) with a batch size of 2 and stopped training at 200 epochs after observing a negligible difference between the manually labeled image and the final predicted image. Figure 5a shows the input image, the manually labeled image as ground truth, and the generated image at different epochs. At epoch 10, the generated image is not very similar to the ground truth, and the quality is not good.

Similarly, at epoch 100, the generated image looks similar to ground truth, but there are still some overlaps between the predicted annotation layers. The model’s training was terminated voluntarily due to a negligible difference between manually labeled and generated images at epoch 200. Figure 5b depicts the loss curves during training, where the top curve illustrates the generator GAN loss, which is the adversarial loss of the generator. The generator aims to minimize this loss by generating images indistinguishable from the real labeled images. The second curve shows the feature matching loss, which indicates the stability and quality of the generated image. The bottom two curves represent the discriminator’s fake loss and real loss, which measures how well the discriminator classifies the generated (fake) image as fake and the generated (real) image as real.

To quantitatively assess the quality of generated images, we computed SSIM,³⁴ PSNR,³⁵ and MSE³⁶ scores for each pair of labeled and generated images during training. Figure 5c–e illustrates the plots of these score values for each training iteration, respectively. The SSIM compares the structural information in labeled and generated images by considering luminance, contrast, and structure. It produces a value between −1 and 1, where 1 indicates a perfect match, and 0 indicates no similarity. The PSNR compares the noise level and image distortion between images. A higher value indicates better image quality. The MSE calculates the cumulative squared difference between labeled and generated images where a value near 0 indicates a perfect match and a higher value indicates the dissimilarity between images. Table S1 in the Supporting information shows exemplary ML-based studies,^{8,14,15,23,37,38} compiling numerical comparison of methods utilized.

Additionally, to find the minimum amount of data needed for training, we trained and evaluated the cGAN model using different data set sizes, including 50, 75, and 100 optical images. Figure 6 shows the impact of data set size on the performance of the model. Figure 6a–c displays the discriminator loss curves of the model trained on 50, 75, and 100 optical images, respectively. Notably, when trained with 100 images, there is a clearer overlap between the discriminator’s fake and real loss curves compared to the other data set sizes. This overlap suggests that the generator has effectively learned to produce images that closely resemble real ones, making them challenging for the discriminator to distinguish.

Figure 6d–f shows the computed PSNR, SSIM, and MSE scores for each pair of labeled and generated images during training, corresponding to models trained on 50, 75, and 100 optical images. For the SSIM metric, denoted in red, the model trained with 100 images exhibits the highest values, approaching 1. Similarly, the PSNR metric demonstrates its peak value for the model trained with 100 images. Regarding the MSE metric, its value should decrease during training, ideally approaching 0. This trend is observed in the model trained with 100 images, where the MSE values are closer to 0, indicating superior performance in image translation compared to models trained on fewer images.

Model generalization

We further investigated the model’s generalization by training it on MoS₂ and WS₂ while testing it on WSe₂. The model successfully identified the layers, demonstrating its capability to identify the number of layers across different materials accurately. To evaluate the model’s performance, we tested it on multiple test images of CVD-grown and mechanically exfoliated samples of MoS₂, WS₂, and WSe₂. Figure 7a displays input images and their corresponding predicted images from the model. The generated images are labeled by the model with different colors, each indicating a specific number of layers. Figure 7b shows the ability of the model to analyze heterostructures, where the white dotted triangles in the input image indicate the stacking of WS₂ on top of the MoS₂, which was classified as bilayers in our model. The bottom part of Figure 7c displays the Raman spectra, where the phonon modes of MoS₂ are located at E¹_2g = 380.35 cm⁻¹ and A_1g = 399.98 cm⁻¹, and the phonon modes of WS₂ are observed at E¹_2g = 349.89 cm⁻¹ and A_1g = 414.22 cm⁻¹. As part of the training data set, Raman and PL data were utilized to identify the number of layers in different regions of optical images. The Raman data confirm that the trained model can identify the number of layers in the heterostructures.

Unlike other AI-based research on TMDs, the DL-based image-to-image translation method we introduced here does not require a large amount of data to train and evaluate the model. As shown in Figure 7, despite being trained with limited data, our model can identify the number of layers in various TMD types, demonstrated with MoS₂, WS₂, and WSe₂. Moreover, our method works on the heterostructure, demonstrating the generalizability of the model. This approach can be further extended to characterize other TMDs, illustrating the adaptability and scalability of the model, as demonstrated in Figure 7. The adversarial relationship between the generator and discriminator helps to achieve better results in multimodal transformation-related tasks. It can extract information from the complicated distributions of the data.

Figure 8 illustrates the comparison between model-generated results and manually labeled images. We utilized various CVD-grown and mechanically exfoliated samples as test samples and produced the results accordingly. In Figure 8, the first column displays the input optical image, the second column depicts the ground truth, which is the manually labeled image, the third column exhibits the model-generated image, and the fourth column showcases the absolute error, representing the difference between the images in columns 2 and 3.

Conclusion

We have demonstrated a DL-based pix2pix cGAN network to identify and characterize TMDs with different layer numbers, sizes, and shapes. The DL-based pix2pix cGAN network was trained using a small set of labeled optical images, translating optical images of TMDs into labeled images that map each layer with a specific color and give a visual representation of the number of layers. As part of the data preprocessing, multiple segmentation techniques were implemented to extract graphical features from the optical images, including contrast, color, shapes, flake sizes, and their distributions. Furthermore, the trained model was adapted to characterize other 2D materials not initially included in the data set. The performance of the model was assessed by multiple metrics, including SSIM, PSNR, and MSE scores. In contrast to deep convolutional neural networks, our findings demonstrate an ability to overcome the challenge of lack of generalization when trained with smaller data sets. Our model is solely based on optical images, capturing complex variations, categorizing layers into five different classes and demonstrating adaptability across a diverse range of materials.

Materials and methods

Sample preparation

TMDs were synthesized via LPCVD. Prior to the growth of MoS₂, a thin MoO₃ layer was prepared using physical vapor deposition of MoO₃ onto a Si substrate with 300-nm-thick thermal oxides. Another SiO₂/Si substrate contacted the MoO₃-deposited substrate face-to-face. MoS₂ was grown onto the SiO₂/Si substrate. For the growth, the furnace was heated up with a ramping rate of 18°C min⁻¹ and held for 15 min at 850°C. During the heating procedure, an argon gas (30 sccm) was supplied at 300°C; a hydrogen gas (15 sccm) was delivered at 760°C. Sulfur was supplied when the furnace temperature reached 790°C. After the growth, a few millimeters size of MoS₂ monolayers were obtained. Similarly, for the growth of WS₂, we used WO₃ instead of MoO₃. As the furnace was ramped at 15°C/min, the reaction proceeded by reducing WO₃ by hydrogen and subsequent sulfurization of the WO₃. The growth temperature was 900°C. Ar gas was introduced from 150°C to reduce moisture and ambient gas, and H₂ gas was supplied from 650°C (increasing temperature) to 700°C (decreasing temperature). Before the growth of WSe₂, the SiO₂ substrate was dipped in a 10% KOH solution for 3 min to increase surface energy, followed by a deionized water treatment. The growth temperature was set at 850°C. Ar gas was introduced at 50°C, and H2 gas was supplied at 650°C. Selenium was supplied when the furnace temperature reached 590°C. After the growth, a few millimeters size of WSe₂ were obtained. MoS₂ and WSe₂ crystals were also mechanically exfoliated onto SiO₂/Si substrate using adhesive tape. WS₂/MoS₂ heterostructures were fabricated by transferring WS₂ on CVD-grown MoS₂: As grown WS₂ flakes on Si/SiO₂ substrate were coated with a thin layer of PMMA 950 A4 using a dropper, then left in air at RT for 2 h to drive off the solvent. The chips were floated in 30% KOH (aq); after 10–40 min, the Si chip fell to the bottom, leaving the PMMA + WS₂ square floating on the surface. The PMMA was cleaned in filtered DI water and blow-dried with RT air. The PMMA + WS₂ was successfully transferred to CVD-grown MoS₂ by placing the WS₂ side down and removing PMMA with warm acetone at 60°C for 30 min. The samples were rinsed with warm acetone and then annealed in an ultrahigh vacuum chamber on top of a button heater (Heatwave Laboratories) for 4 h at 350°C to fully remove any residual polymer contamination.

Data acquisition

We utilized an optical microscope, PL and Raman spectroscopy to characterize the existence of layers of the deposited TMDs. Specific areas consisting of crystals with different layer numbers were captured using an optical microscope. Raman and PL spectra were obtained with a 532-nm excitation laser and a laser power of <500 μW to avoid damage to the samples. A spectral grating with 1800 lines/mm was used for both measurements. We chose a 100× objective lens to capture the images.¹⁴

Data processing

The mask of the optical image is calculated by

$$mask\, = \,uppermask\, \times \,lowermask\, \times \,valuemask,$$

(7)

where the value of the uppermask and lowermask depend on the hue component, and the valuemask depends on the value component of the HSV images. The resultant mask was converted to gray scale to create binary images. Here, pixels with values greater than 0 were set to true, while pixels with values equal to or less than 0 were set to false. Each connected component was assigned a unique label, and the resulting labeled image was displayed using a colormap to visualize different flakes present in the image.

Model setup and training

The pix2pix model was implemented using the TensorFlow open-source DL package. Over 35 optical images of WS₂ and 65 images of MoS₂ were used for the training data set. A set of optical images and their corresponding manually labeled images were used to train the model. The training was performed on a system with an NVIDIA GeForce GTX 1080 graphics card with CUDA version 10.1. It took 3 h to train the model for 200 epochs on 100 image data pairs, and the training was stopped once the difference between the actual labeled image and generated image became negligible. We choose 200 as an optimized number of epochs based on the observation.

Data availability

The data will be made available upon reasonable request.

References

S. Masubuchi, E. Watanabe, Y. Seo, S. Okazaki, T. Sasagawa, K. Watanabe, T. Taniguchi, T. Machida, NPJ 2D Mater. Appl. 4, 3 (2020). https://doi.org/10.1038/s41699-020-0137-z
Article Google Scholar
X. Lin, Z. Si, W. Fu, J. Yang, S. Guo, Y. Cao, J. Zhang, X. Wang, P. Liu, K. Jiang, W. Zhao, Nano Res. 11, 6316 (2018)
Article CAS Google Scholar
X. Dong, H. Li, Y. Yan, H. Cheng, H.X. Zhang, Y. Zhang, T.D. Le, K. Wang, J. Dong, M. Jakobi, A.K. Yetisen, A.W. Koch, Adv. Theory Simul. 5(9), 2200140 (2022). https://doi.org/10.1002/adts.202200140
Article CAS Google Scholar
Z. Si, D. Zhou, J. Yang, X. Lin, Appl. Phys. A Mater. Sci. Process. 129, 248 (2023). https://doi.org/10.1007/s00339-023-06543-y
Article CAS Google Scholar
H. Yin, Z. Sun, Z. Wang, D. Tang, C.H. Pang, X. Yu, A.S. Barnard, H. Zhao, Z. Yin, Cell Rep. Phys. Sci. 2, 1004829 (2021)
Google Scholar
M. Fernandez, H. Shi, A.S. Barnard, J. Chem. Inf. Model. 55(12), 2500 (2015). https://doi.org/10.1021/acs.jcim.5b00456
Article CAS PubMed Google Scholar
M. Fernandez, H. Shi, A.S. Barnard, Carbon 103, 142 (2016). https://doi.org/10.1016/j.carbon.2016.03.005
Article CAS Google Scholar
J. Sanchez-Juarez, M. Granados-Baez, A.A. Aguilar-Lasserre, J. Cardenas, Opt. Mater. Express 12, 1856 (2022)
Article CAS Google Scholar
H. Li, J. Wu, X. Huang, G. Lu, J. Yang, X. Lu, Q. Xiong, H. Zhang, ACS Nano 7, 10344 (2013)
Article CAS PubMed Google Scholar
N.C. Frey, D. Akinwande, D. Jariwala, V.B. Shenoy, ACS Nano 14(10), 13406 (2020)
Article CAS PubMed Google Scholar
R. Addou, L. Colombo, R.M. Wallace, ACS Appl. Mater. Interfaces 7, 11921 (2015)
Article CAS PubMed Google Scholar
Y. Shen, S. Zhu, Comput. Mater. Sci. 220, 112030 (2023). https://doi.org/10.1016/j.commatsci.2023.112030
Article CAS Google Scholar
F.X.R. Chen, C.Y. Lin, H.Y. Siao, C.Y. Jian, Y.C. Yang, C.L. Lin, Sci. Data 10, 91 (2023). https://doi.org/10.1038/s41597-023-02004-6
Article CAS PubMed PubMed Central Google Scholar
X. Dong, H. Li, Z. Jiang, T. Grünleitner, İ Güler, J. Dong, K. Wang, M.H. Köhler, M. Jakobi, B.H. Menze, A.K. Yetisen, I.D. Sharp, A.V. Stier, J.J. Finley, A.W. Koch, ACS Nano 15(2), 3139 (2021)
Article CAS PubMed Google Scholar
B. Han, Y. Lin, Y. Yang, N. Mao, W. Li, H. Wang, K. Yasuda, X. Wang, V. Fatemi, L. Zhou, J.I.J. Wang, Q. Ma, Y. Cao, D. Rodan-Legrain, Y.Q. Bie, E. Navarro-Moratalla, D. Klein, D. MacNeill, S. Wu, H. Kitadai, X. Ling, P. Jarillo-Herrero, J. Kong, J. Yin, T. Palacios, Adv. Mater. 32(29), 2000953 (2020). https://doi.org/10.1002/adma.202000953
Article CAS Google Scholar
G. Huang, Z. Liu, L. Van der Maaten, K.Q. Weinberger, Densely connected convolutional networks (2016), Preprint, https://doi.org/10.48550/arXiv.1608.06993
O. Ronneberger, P. Fischer, T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, ed. by N. Navab, J. Hornegger, W. Wells, A. Frangi, Lecture Notes in Computer Science, vol. 9351 (Springer, Cham, 2015), pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
K. He, G. Gkioxari, P. Dollar, R. Girshick, “Mask R-CNN,” in Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV) (Venice, October 22–29, 2017), pp. 2980–2988. https://doi.org/10.1109/ICCV.2017.322
C.E. Athanasiou, X. Liu, B. Zhang, T. Cai, C. Ramirez, N.P. Padture, J. Lou, B.W. Sheldon, H. Gao, J. Mech. Phys. Solids 170, 105092 (2023). https://doi.org/10.1016/j.jmps.2022.105092
Article Google Scholar
X. Huang, M.-Y. Liu, S. Belongie, J. Kautz, Multimodal unsupervised image-to-image translation (2018), Preprint, https://doi.org/10.48550/arXiv.1804.04732
M.-Y. Liu, T. Breuel, J. Kautz, “Unsupervised Image-to-Image Translation Networks,” in Advances in Neural Information Processing Systems 30, ed. by U. von Luxburg, S. Bengio, R. Fergus, R. Garnett, I. Guyon, H. Wallach, S.V.N. Vishwanathan (Curran Associates, Red Hook, 2017), p. 701
J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A.A. Efros, O. Wang, E. Shechtman, “Toward Multimodal Image-to-Image Translation,” in Advances in Neural Information Processing Systems 30, ed. by U. von Luxburg, S. Bengio, R. Fergus, R. Garnett, I. Guyon, H. Wallach, S.V.N. Vishwanathan (Curran Associates, Red Hook, 2017), p. 466
P. Isola, J.-Y. Zhu, T. Zhou, A.A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, July 21–26, 2017), pp. 1125–1134
C. Fu, S. Lee, D.J. Ho, S. Han, P. Salama, K.W. Dunn, E.J. Delp, “Three Dimensional Fluorescence Microscopy Image Synthesis and Segmentation,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (Salt Lake City, June 18–22, 2018), pp. 2302–23028. https://doi.org/10.1109/CVPRW.2018.00298
H. Zhuge, B. Summa, J. Hamm, J.Q. Brown, Biomed. Opt. Express 12, 7526 (2021)
Article CAS PubMed PubMed Central Google Scholar
Z. Yang, C.-H. Yu, M.J. Buehler, Sci. Adv. 7(15), eabd7416 (2021). https://doi.org/10.1126/sciadv.abd7416
Article PubMed PubMed Central Google Scholar
S. Mouri, Y. Miyauchi, K. Matsuda, Nano Lett. 13(12), 5944 (2013)
Article CAS PubMed Google Scholar
W. Zhao, R.M. Ribeiro, M. Toh, A. Carvalho, C. Kloc, A.H. Castro Neto, G. Eda, Nano Lett. 13(11), 5627 (2013)
Article CAS PubMed Google Scholar
S.A. Villar, S. Torcida, G.G. Acosta, J. Math. Imaging Vis. 58(1), 130 (2017).
M. Wang, S. Zheng, X. Li, X. Qin, “A New Image Denoising Method Based on Gaussian Filter,” International Conference on Information Science, Electronics and Electrical Engineering (Sapporo, April 26–28, 2014), pp. 163–167. https://doi.org/10.1109/InfoSEEE.2014.6948089
J.-D. Chang, H.-H. Chen, S.-S. Yu, C.-S. Tsai, J. Comput. 20, 4 (2010)
Google Scholar
Labelbox, Inc., “Labelbox” (2024). https://labelbox.com
D.P. Kingma, J. Ba, Adam: A method for stochastic optimization (2014), Preprint, https://doi.org/10.48550/arXiv.1412.6980
Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, IEEE Trans. Image Process. 13, 600 (2004)
Article PubMed Google Scholar
O. Keleş, M.A. Yılmaz, A.M. Tekalp, C. Korkmaz, Z. Dogan, “On the Computation of PSNR for a Set of Images or Video,” 2021 Picture Coding Symposium (PCS) (IEEE, Bristol, June 29–July 2, 2021), pp. 1–5. https://doi.org/10.48550/arXiv.2104.14868
T.O. Hodson, T.M. Over, S.S. Foks, J. Adv. Model Earth Syst. 13, 12 (2021). https://doi.org/10.1029/2021MS002681
Article Google Scholar
J. Yang, H. Yao, Extreme Mech. Lett. 39, 100771 (2020). https://doi.org/10.1016/j.eml.2020.100771
Article Google Scholar
Y. Saito, K. Shin, K. Terayama, S. Desai, M. Onga, Y. Nakagawa, Y.M. Itahashi, Y. Iwasa, M. Yamada, K. Tsuda, NPJ Comput. Mater. 5, 124 (2019). https://doi.org/10.1038/s41524-019-0262-4
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by National Science Foundation Award No. ECCS-1104870 and the Defense University Research Instrumentation Program (FA9550-11-1-0272).

Author information

Authors and Affiliations

Department of Mechanical Engineering, Stevens Institute of Technology, Hoboken, USA
Shivani Bhawsar, Mengqi Fang, Abdus Salam Sarkar, Siwei Chen & Eui-Hyeok Yang
Center for Quantum Science and Engineering, Stevens Institute of Technology, Hoboken, USA
Eui-Hyeok Yang

Authors

Shivani Bhawsar
View author publications
You can also search for this author in PubMed Google Scholar
Mengqi Fang
View author publications
You can also search for this author in PubMed Google Scholar
Abdus Salam Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Siwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Eui-Hyeok Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B. performed the experiments and ML modeling and co-wrote the manuscript. M.F., A.S.S., and S.C. performed the growth of materials. E.H.Y. supervised the entire project and co-wrote the manuscript.

Corresponding author

Correspondence to Eui-Hyeok Yang.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 32 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bhawsar, S., Fang, M., Sarkar, A.S. et al. Deep learning-based multimodal analysis for transition-metal dichalcogenides. MRS Bulletin (2024). https://doi.org/10.1557/s43577-024-00741-6

Download citation

Received: 07 March 2024
Revised: 17 April 2024
Accepted: 03 May 2024
Published: 25 June 2024
DOI: https://doi.org/10.1557/s43577-024-00741-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep learning-based multimodal analysis for transition-metal dichalcogenides

Abstract

Graphical abstract

Impact Statement

Similar content being viewed by others

Scanning Electron Microscopy: Principle and Applications in Nanomaterials Characterization

Carbon nanotubes: synthesis, properties and engineering applications

Recent Advances and Perspectives of Lewis Acidic Etching Route: An Emerging Preparation Strategy for MXenes

Introduction

Results and discussion

Synthesis and characterizations of TMDs

Data preprocessing

Color-based segmentation for detecting TMDs

Data labeling

Model architecture and training

Prediction and performance evaluation

Model generalization

Conclusion

Materials and methods

Sample preparation

Data acquisition

Data processing

Model setup and training

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Supplementary information

Supplementary file1 (DOCX 32 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation