1 Introduction

In her overview of modern visual communication, Joanna Drucker notes that systems for representing three-dimensional forms evolve alongside broader developments in technological production [1]. In the 19th and early twentieth centuries, as production shifted from small workshops to mechanized assembly lines, theories of drawing and design shifted their reference from the human body to abstract geometric reductions, and artists such as Paul Klee and Wassily Kandinsky became increasingly focused on geometric idealizations of 3D form. Our interdisciplinary experiment with voxel-GANs and the 3D Tree Dataset can be contextualized similarly. As machine learning is finding an increasing number of applications within visual culture, we were interested to observe how such systems might influence our cultural conception of 3D objects.

When we commenced this research in early 2020, many artistic experiments used machine learning and generative adversarial networks to produce images and video, from human faces [2] to the production of anime characters [3] and various forms of image-to-image translation [4]. Following the release of publicly accessible text-to-image generators such as DALL-E in 2021 [5] and Midjourney in 2022 [6] and more recently, the text-to-3D generator Point-E [7], the impact of machine learning on contemporary visuality has become even more pronounced. In this project, we were interested in observing how machine learning generates 3D objects from a 3D dataset (rather than interpolated from 2D images), and how the visual articulation of this process might aid the public understanding of this emerging form of visuality, corresponding to what composer Jennifer Walshe [8] describes as the ‘conceptual art’ implementation of AI, where the creative output seeks to build a pedagogical bridge that can allow audiences to understand and speculate on the nature and function of machine learning-based creative tools. When we commenced this research, we found that relatively few projects had explored how machine learning produced 3D objects, and even fewer had transitioned into creative tools for public use. Four years later, we recognize the growing importance of diffusion models, however at the time, we found the most relevant architecture and development in the voxel-GAN space, and thus our implementation of the 3D Tree Dataset focused on a voxel-based GAN approach. Future investigations could certainly examine the efficacy of diffusion models, but nevertheless we note the repeated call for novel 3D datasets in diffusion model publications [9], thus are encouraged to publish our dataset and implementation. This paper contributes a new 3D dataset of trees, produced specifically for machine learning training, with a focus on aesthetically interesting forms. We describe the logic and production of the 3D Tree Dataset and the GAN configurations that we used to train and produce new trees. We also describe creative works made using this system, the public presentation and qualitative evaluation of our generated outputs, and the development of tools that allow other artists to generate trees from the different conditions and training epochs in our dataset. In Sect. 2, we give an overview of relevant research in machine learning for 3D objects. In Sect. 2, we introduce the 3D Tree Dataset, how it was generated, its scale and its diversity. In Sect. 4, we introduce our use of a voxel GAN in conjunction with the 3D tree dataset and in Sect. 5 we provide qualitative evaluation of our results.

2 Related work

There have been several different published approaches for using machine learning to produce 3D objects. The work of Achlioptas [10] describes how point clouds can be used with neural network models, where the network produces vertex position information for a new mesh. This output lacks the face and edge information required to render or 3D print the mesh, so if traditional algorithms are used to recover face and edge information, the shape and appearance of unique objects such as trees would be excessively influenced by the type of remeshing algorithm used. For example, if edge information is recovered using a connected graph-related algorithm with distance metrics to determine edge weight and branch thickness, trees with very different vertex information could end up looking unintentionally similar, which would defeat the purpose of generating trees using machine learning methods and a conditional dataset comprising different types of trees. In the MeshRenderer project [11], the logic of mesh generation is to keep the face and edge information of a unit sphere but to change the vertex information to generate a new mesh shape. As the objective of MeshRenderer is single image 3D reconstruction, their training method aims to match the rendered images of the mesh to the 2D image input. Although the 3D mesh generation component described in MeshRenderer applies to our neural network model, fixed faces and edges might also make it difficult for the network model to reproduce small details, such as the branches of trees. For recreating a 3D mesh from rendered 2D images, the network requires additional learning and processing to extract spatial information, which in the case of trees would result in lower-quality details for branches. More recent developments in diffusion models for 3D generation have continued apace since our research, notably DreamFusion3D [12] which circumvents the paucity of 3D datasets by using a 2D diffusion model to optimize a Neural Radiance Field (Nerf). Despite these laudable innovations, we believe the 3D Tree Dataset will still be of utility to researchers as the dearth of 3D datasets remains a common factor used to contextualize such research. At the time of our research, the mesh reconstruction and the rendered image methods mentioned above would have struggled with the complex organic distribution of branches, so we decided to use the voxel network as our primary approach to testing the dataset. For voxel networks, if the voxel resolution is high enough, small features such as branches can easily be reproduced. Furthermore, it is easier for the discriminator to learn new shapes and spatial features in voxel format. The voxel-based approach also matched our desire to observe the formal process of a GAN learning to generate a complex organic form, from the earliest cubic volumes of random voxels to the abstract column-like forms of the middle epochs, to the individually recognizable conditions in the final trained model.

Voxel-based GANs have been used both in the classification of 3D objects [10] and in the generation of 3D objects [11]. The datasets used in these studies, such as the IKEA [14] and ModelNet [15] datasets most commonly comprised geometric industrial forms, such as chairs, tables, and cars. We wanted to focus on trees for two reasons. First, we were interested to see how a machine learning system handles organic and heterogeneous objects compared to the regular industrial forms we identified in previous studies. Trees are a difficult domain due to the logarithmic relationship between trunks, branches, and small tributaries [16]. Second, we wanted to generate objects of a visually expressive nature that could be used to communicate the broader paradigm shift of machine learning and creative production with the general public.

2.1 3D-GAN & 3D-VAEGAN

Wu et al. [13] proposed a 3D voxel-GAN that combines general-adversarial modeling and volumetric convolutional networks. Depending on the resolution selected, the input mesh will be voxelized to a 32 × 32x32 voxel model or 64 × 64x64 voxel model. The voxel model from the dataset and the model of the same size from the generator are then processed through the discriminator for training. Wu et al. also proposed a 3D VAE-GAN as an extension of the voxel-GAN using an image encoder. The voxel-GAN and 3D VAE-GANs detailed in this paper were the starting points from which we sought to make initial improvements. The first improvement was to use a 3D model encoder to help the generator construct more realistic trees. The second and major improvement was applying their generative approach to the less predictable and more complex forms of the 3D Tree Dataset.

2.2 CVAE GAN

Bao et al. [17] proposed CVAE-GAN as a general learning framework for combining a variational auto-encoder with a GAN under a conditioned generative process. They combined CVAE and CGANs with mean feature matching to make the final model more stable for generating a realistic and diverse set of objects. In our project, we implemented CVAE-GAN to generate trees in voxel space, but in addition to mean feature matching, we also tested different machine learning optimizations to improve the training results. These minor optimizations did not significantly improve the results, and our the final generated trees in this project are made from a trained CVAE-GAN model.

2.3 WGAN & WGAN-GP

Arjovsky et al. [18] proposed Wasserstein GAN as a means to change the calculation method of the distance function and minimize an approximation of the Earth-Mover’s distance (EM) between the predicted score of dataset objects and that of generated objects, as opposed to the log loss of the object relative to the real or generated objects. During the network training, the discriminator of the WGAN acts like a critic that aims to give higher scores to the dataset objects than to the generated objects. WGAN applies simple clipping to restrict the maximum weight value so that the critic can satisfy the 1-Lipschitz constraint. Based on WGAN, Gulrajani et al. [19] implemented a gradient penalty (GP) to replace weight clipping when enforcing the Lipschitz constraint. As the hard clipping implemented in WGAN may lead to undesired behavior, WGAN-GP introduces a soft gradient penalty constraint to the loss function to force the gradient norm into a small range of values. Much like our approach with the 3D-GAN, 3D-VAEGAN, and CVAE-GAN, we also implemented the WGAN and WGAN-GP to generate trees, which we evaluated visually with our artistic team to judge the baseline quality of the outputs. We found that the WGAN-GP has the advantage of helping the generator learn even when the discriminator learns very quickly, which can help the model produce a higher quality result in a shorter amount of time, but in the end, the discriminator can still learn too quickly such that the generator simply yields an empty output. We also found that the WGAN-GP was not compatible with our need for conditional training. Adding a classifier to the WGAN-GP to convert the model from unconditional to conditional was problematic as the classification loss for the classifier and the EM loss are not mathematically comparable.

3 The 3D tree dataset

Machine learning research in 3D object generation often cites the paucity of labeled 3D datasets as a constraint when compared to the wide availability of tagged 2D image datasets [9] [12]. At the time of our research, related studies for voxel-GANs tended to focus on the geometrically simple forms of available datasets, such as furniture and industrial objects. Trees presented us the opportunity to observe the development of complex organic forms across the training epochs of the GAN and thus to explore on a formal visual and cultural level how machine learning operates in three dimensions. The procedural generation of trees using Lindenmayer systems has existed in computer graphics since the late 1960s and provides a convenient way to make a large and diverse dataset. The artists on our team also valued trees as a visual motif that is commonly perceived as having particularly expressive qualities. In his 2003 essay on the sign system in Chinese landscape painting, Cliff McMahon observes that spatial arrangements of trees can symbolize human family relationships [20], and in his 2002 essay on intimacy and painting in Ming Dynasty China, Craig Clunas observes that in both poetry and painting, the shapes of trees can also represent individual human values such as self-sufficiency, afflicted genius or neglected greatness [21]. In European landscape painting, Simon Schama presents the forest and the oak tree as critical metaphors that trace the modern industrial need for English oak to build ships to the spiritual two trees of Eden or the Romantic desire to see God and the human spirit projected in nature, as well as various forms of tree-worship found in Norse, Sumerian Mesopotamian cultures [22]. To take one specific example, Robert Rosenblum describes the leafless trees of German Romantic painter Caspar David Friedrich as combining the generic symbolism of age and youth with the historical symbolism of German nationalism and the religious symbolism of the crucifixion [23]. Due to this combination of factors, we decided that trees offered an interesting technical challenge for a voxel GAN and would result in an artistic output that could fit within an existing language of visual communication and be useful when it came to the public presentation of our outputs. The 3D Tree Dataset and our trained models can be downloaded from github.com/buganart/BUGAN.

3.1 Generation method

Using an array of art-historical images as source material, we used the tree generation software TreeIt to design a series of tree templates that would be visually interesting without foliage (this is because we wanted to focus on how the voxel-GAN could develop the shape and gesture of trunks and branches rather than the amorphous shapes of clumped foliage which would obscure the branches in a voxel process). From the oak trees of German Romantic painter Caspar David Friedrich to the upright pine trees of Chinese literati painter Ni Zan, to the gnarled trunks illustrated in Japanese bonsai manuals, we designed templates that could have their random seed altered to provide a large scope for variation whilst maintaining an interesting set of expressive forms.

3.2 Scale and diversity

We designed 76 tree templates and exported 26,000 unique.obj models of trees. Changing the random seed parameter in TreeIt affects the branch and branchlet distribution but not the trunk shape. Because we created the tree templates by hand, we were able to focus on visually interesting variations in the trunk and branch shape, deliberately pushing for an aesthetically diverse dataset with a focus on particularly expressive trunk and branch combinations.

4 Application method

As our GAN used voxels, we first processed our trees using the Trimesh library, which voxelized the trees into a set resolution, in our case 32 × 32x32 or 64 × 64x64 voxels. The voxelized objects were then processed and remeshed by the Trimesh marching cubes method, which adds vertices and triangular faces to the region of connected voxels to make the output mesh consistently connected. Table 1 shows the processing results from sample trees representing 8 of the 76 conditions of our tree dataset (the 3D Tree Dataset). Initially, the variation of the dataset resulted in us having difficulty generating trees of any sufficient quality. To overcome this, we generated a second dataset of 22,000 tree.obj models from a single tree template (named Friedrich 1), which we refer to as the ‘Friedrich Single Condition Dataset’. In Table 3, ‘Friedrich 1’ refers to a single condition of 350.obj tree models, but sampled from the entire dataset. Using the Friedrich Single Condition Dataset at 32 × 32x32 resolution we achieved reasonable results with 3D-GAN and 3D-VAE-GAN, however, when we trained the same GAN models using the full 3D Tree Dataset, the output meshes looked a lot less like trees [Table 2]. From visible inspection, fewer branches are connected to the trunk, more amorphous blobs surround the branches, and more floating voxels are disconnected from the main form. There are two explanations for this. First, the shapes of trees represented by each class are different and the variation between classes is not standardized. Furthermore, a small number of classes have groups of trunks, whereas most other classes have a single trunk, which caused a data imbalance in the whole dataset. Second, the 3D Tree Dataset includes many trees with dramatically different shapes, including variations in the number of trunks, upright and windswept gestures, and the number and position of branches. Because the number of voxels that occupy the trunk position is considerably larger than the number of voxels that occupy branch positions for any tree models, the unconditional network focused mainly on the truck generation, while the branch details were mostly ignored, except for the VAE-GAN which alleviates such influence from the VAE reconstruction loss. Put another way, the conditional network can focus on the differences in branch position when the trunk is reasonably consistent, whereas the unconditional network has to accommodate large differences across both of these parameters. Therefore, the unconditional GAN struggled to learn the key characteristics of each tree template when compared to the CGAN. Because we wanted to have more control over specifying which type of trees are generated, we also experimented with CVAE-GAN, which is the conditional version of VAE-GAN, on the multi-classes 3D Tree Dataset. Using supervised learning, we managed to produce good-quality trees of each condition label from the complete 3D Tree Dataset. To improve the performance of our GAN, we then used the WGAN and WGAN-GP. However, it was difficult to add the classifier to the WGAN or WGAN-GP structure and have it trained on the labeled dataset as the loss calculation of the classifier and the Wasserstein Loss are dramatically different. WGAN and WGAN-GP use Wasserstein Loss to train the model, which speeds up training as the gradient magnitude remains sufficient even when the discriminator outperforms the generator. If the dataset only has small variations such as belonging to the same class, both the generator and discriminator model can capture and generate a similar result. However, if the variation in the data is larger, such as occurs between different classes, the discriminator performs poorly and derails the learning process. Detailed parameter fine-tuning may help the model function, however, we did not achieve such a result in our experiment and confined our use of the WGAN and WGAN-GP on the Friedrich Single Condition Dataset. As illustrated in [Table 2], all tested models perform reasonably well on a dataset comprising a single tree condition, however with the full 3D Tree Dataset of all 76 tree templates, GAN, WGAN, and WGAN-GP struggled with the diversity of the dataset and were only able to learn the consistent feature of the trunk, and struggled with the complex variability of branches. In this case, the discriminators of WGAN and WGAN-GP also outperform the generators to such an extent that the models got stuck in a bad local optimum of generating no voxels at all. To reach our goal of a tree generation system that implements all 76 tree templates, we settled on the CVAE-GAN and repeated our training on 64 × 64x64 resolution. Some selected results are shown in [Table 3]. To overcome problems related to the variation within the entire dataset, during our testing period, we generated a dataset of 22,000 trees from a single tree template (Friedrich 1). Friedrich Single Condition Dataset refers to this 22 k set. In Table 3, Friedrich 1 refers to a condition that represents approximately 350.obj models within the full 3D Tree Dataset.

Table 1. Note: This table lists art historical images of trees alongside their corresponding 3D models from the research dataset
Table 2. Voxelized tree generated by conditional and unconditional GAN models

5 Qualitative evaluation

Our best results were derived from conditional training, and our final implementation used CVAE-GAN. To produce a functional tool for others to use, we trained our CVAE-GAN on a small number of tree conditions separately. Because we were interested in observing the evolution from the less distinct 3D models of early epochs to the more refined trees of the later epochs, we created separate checkpoint files so that artists could generate and download the early abstract voxel clusters as well as the more refined tree forms. Using Google Drive, Google Collab, and Wandb, non-specialist artists can generate and download 3D trees from our trained models. Users can generate trees from a particular tree template, or randomly generate trees from the entire dataset. We believe that the geometric inconsistency of the 3D Tree Dataset, combined with the visually familiar forms is a valuable feature and a challenging dataset that we hope will be useful for other researchers.

Table 3. Conditional generated results of the trained CVAEGAN on different categories

Following the artistic motivation behind this project, we were most interested in the visual quality of the generated trees and their ability to function as communicative tools for exploring the topic of machine learning and 3-dimensional forms with students and the general public. After our initial success in training the CVAE-GAN to produce visibly recognizable trees, we made a public exhibition and survey to qualitatively evaluate our outputs. The exhibition 'Machine Visions' (November 2022-February 2023) communicated our project using 30 3D printed sculptures (Fig. 1), five animations (Fig. 2), and a didactic panel describing our process. The sculptures (Fig. 1) used generated.obj files from the CVAE-GAN from various training epochs to illustrate the change from abstract vertical pillars to recognizable tree forms. The animations, presented alongside the sculptures, used mesh-morphing techniques to show the same transition from noisy cubic volumes of voxels to abstract pillars, to recognizable trees. For legibility and aesthetic beauty, we added a biome of grass and flowers, clear directional lighting, and a wireframe mesh overlay so that audiences could visibly understand the forms. This artistic presentation aimed to answer our original research question, which was how might machine learning influence industrial and cultural processes of form and figuration. The slow morphing between epochs situated within the visual language of landscape art communicated our voxel-GAN project visually to a non-specialist audience. The exhibition attracted over 1,000 in-person visitors including primary, secondary, and tertiary school groups and an online audience of over 20,000.Footnote 1 Staff at the museum were able to use the artworks and the explanatory didactic panel to explain the TreeGAN project to our visitor cohort, which was demonstrated in the cogent descriptions of the exhibition later published in print and online media.

Fig. 1
figure 1

3D printed outputs from VAE-GAN [left]

Fig. 2
figure 2

Animation showing the transition between training epochs of the VAE-GAN [right]

In addition to public exhibitions and presentations, we conducted a qualitative survey with 39 respondents. Due to the geometric complexity of the 3D Tree Dataset, our output meshes can easily be distinguished from voxelized dataset inputs due to disconnected branches and variations in branch thickness, thus simple comparisons between the two would have had predictable outcomes. Instead, we focused our qualitative evaluation on asking respondents which trees they deemed more realistic, which had the best form, which were more expressive, and which were more artistically interesting. These questions aimed to combine an evaluation of visual interest with one of the more empathetic and aesthetic identifications, bringing us closer to the artistic motivations of the project. The CVAE-GAN model was universally favored by respondents over the unconditional VAEGAN model (represented by the category 'RANDOM_100') which scored lowest in our survey results [Table 4]. The 'Maple' condition scored highest for both the 'expressive' and 'artistically interesting' questions, visually distinct from the 'Formal_upright' and 'Tall_straight' which scored highest for the 'realistic' and 'best form' questions respectively. The TreeIT input template for the maple condition was a twisted, gnarled trunk, referencing the leaning or semi-cascade styles often found in bonsai manuals or classical Chinese paintings, and despite the more abstract nature of the CVAE-GAN outputs from this condition, we found it relevant that this consistently attracted the aesthetic interest of respondents.

Table 4 Note: This table displays qualitative survey results from 39 respondents on the visual qualities of the generated trees

6 Discussion and future work

In this study, we explored how machine learning functioned in a 3D domain by generating a novel dataset of organic forms combined with a voxel GAN implementation. This shed light on the intersection of machine learning, artistic expression, and machine learning, whilst also contributing a novel, tagged 3D dataset for the use of other researchers.

6.1 Dataset design and representation

The 3D Tree Dataset was created to address the scarcity of tagged 3D datasets as well as to provide a novel geometrical challenge for researchers working in the field. We hope that this dataset provides a useful contrast to existing 3D datasets, and for researchers interested in artistic applications, we hope that the aesthetic range of our dataset will yield interesting applications and results. We also hope that the public presentation of this project through creative works will lead to a further exploration of how voxel-GANs and other 3D machine-learning approaches can be integrated into production pipelines and that our dataset might aid other artists.

6.2 Model performance and challenges

Our experiments with different GAN architectures revealed various challenges in generating realistic 3D models of trees. The diversity of shapes and forms within the dataset posed challenges for unconditional models and the intricacy of details in our voxel-based approach is limited to the resolution of the voxel space. Both findings address the importance of addressing dataset variability and imbalance in training machine learning architectures for 3D object generation.

6.3 Impact of conditional training

By adopting conditional training, particularly in the case of CVAE-GAN, we were able to overcome some of the challenges we encountered in unconditional models. By providing explicit conditioning information, CVAE-GAN enabled us to better control the generation process and facilitated the production of higher-quality 3D tree outputs. This also allowed for a more nuanced exploration of the dataset’s expressive potential, aligning with our artistic and cultural motivations.

6.4 Evaluation and visual communication

Our qualitative evaluation methods, from public exhibition to survey, provided useful insights into the visual quality and communicative potential of our generated trees, and their ability to function as visual aids for exploring the broader evolution of machine learning as a component of contemporary creative workflows.

6.5 Future directions and potential improvements

Looking ahead, future research in this domain will most likely explore alternative model architectures as well as benefit from a wider array of dataset designs. In addition, the development of accessible interactive tools for users to manipulate and customize generated 3D models will likely further democratize and diversify the use of machine learning in artistic creation and visual communication.

In conclusion, our study offers a novel 3D dataset for machine learning as well as insights into the challenges and opportunities of using machine learning for 3D object generation. We hope that our interdisciplinary engagement with broader artistic and cultural contexts also contributes to the ongoing exploration of the impact and utility of machine learning in visual culture and expression.