Predicting Synaptic Connectivity for Large-Scale Microcircuit Simulations Using Snudda

Simulation of large-scale networks of neurons is an important approach to understanding and interpreting experimental data from healthy and diseased brains. Owing to the rapid development of simulation software and the accumulation of quantitative data of different neuronal types, it is possible to predict both computational and dynamical properties of local microcircuits in a ‘bottom-up’ manner. Simulated data from these models can be compared with experiments and ‘top-down’ modelling approaches, successively bridging the scales. Here we describe an open source pipeline, using the software Snudda, for predicting microcircuit connectivity and for setting up simulations using the NEURON simulation environment in a reproducible way. We also illustrate how to further ‘curate’ data on single neuron morphologies acquired from public databases. This model building pipeline was used to set up a first version of a full-scale cellular level model of mouse dorsal striatum. Model components from that work are here used to illustrate the different steps that are needed when modelling subcortical nuclei, such as the basal ganglia.


Introduction
Neuroscientists are producing data at an ever growing rate, and sharing the data in public databases. Within the computational neuroscience field, hypothesis-driven modelling has over many decades generated new ideas that in turn have been tested via experiments. Recently a data-driven mechanistic modelling approach has also gained ground thanks to new technologies allowing the collection of large quantities of useful data. In particular, large-scale spiking neural network models have been reconstructed in a data-driven manner and simulated Gratiy et al., 2018;Migliore et al., 2018;Casali et al., 2019;Einevoll et al., 2019;Kanari et al., 2019;Billeh et al., 2020;Hjorth et al., 2020). Collecting data from the brain at multiple biological scales from mouse, nonhuman primates, and human, are important goals of several of the big brain initiatives (Insel et al., 2013;Amunts et al., 2019;Okano et al., 2015;Grillner et al., 2016), and will further facilitate and speed up this modelling process. In parallel, various brain simulation tools have been optimized to capitalize on supercomputers (Hepburn et al., 2012;Plesser et al., 2015;Carnevale & Hines, 2006;Hines et al., 2009;Kumbhar et al., 2019;Ray & Bhalla, 2008;Gleeson et al., 2010;Jordan et al., 2020;Akar et al., 2019). In this respect, the principles of FAIR -Findable, Accessible, Interoperable, Reusable (Wilkinson et al., 2016) are important for catalysing this process, both with regard to the experimental data, the data-driven models as well as the software used during the modelling and simulation process. We also believe that to be able to reproduce the actual model reconstruction process, given the same or new additional data, is one important aspect of the FAIR criteria when making the modelling process transparent, repeatable, reusable and comparable.
Here we present our open source modelling pipeline that facilitates a reproducible, data-driven reconstruction of cellular level network/microcircuit models. This pipeline inspired by the cortical column microcircuit  has been applied to predict a full-scale microcircuit model of the mouse dorsal striatum (Hjorth et al., 2020). Snudda is a software to create a detailed network of connected neurons, where the connectivity is derived from reconstructed neuronal morphologies as well as from more qualitative experimental knowledge (see also Reimann et al., 2015). 'Snudda' means 'touch' in Swedish, and it supports the creation of a network with connectivity based on touch detection. If detailed morphological data exist, the algorithm looks for close appositions between axons and dendrites, which are locations for putative synapses. Thus the morphology restricts where connections can be positioned. Snudda can also define the axon using a probability cloud if a reconstructed axon is missing. This is an extension of the method where the connection probability is proportional to the overlap of two spheres representing axons and dendrites (Humphries et al., 2009). Based on a set of rules, as described below, the putative synapses are then pruned to match the connectivity seen from pairwise experimental recordings, or other types of connectivity experiments. The same technique can be applied to also place gap junctions. The generated network can then be simulated using parallel NEURON (Carnevale & Hines, 2006). Similar approaches have been used to build the somato-sensory cortex microcircuit Colangelo et al., 2019), visual cortex model Dai et al., 2020), cerebellar network (Sudhakar et al., 2017;Casali et al., 2019;Wichert et al., 2020) and hippocampal neurons (Migliore et al., 2018).
The reconstruction of a local microcircuit model (such as striatum) consists of the following steps: a) experimental data acquisition of the electrophysiological and morphological properties of neuronal types, and also characterisation of synapses, b) optimization of neuron and synapse models, c) placement of the model neurons in the brain volume to be modelled, d) prediction of microcircuit connectivity in silico, e) constraining and emulating inputs for the model, and finally f) simulating the microcircuitry. Our software Snudda is used for steps c)-f). Software Treem for improving the morphological reconstructions in preparatory step b) is described at the end. The code is publicly available on GitHub (https://github.com/Hjorthmedh/Snudda/) and (https://github.com/a1eko/treem). Below we will go through the different steps and provide code to set up an example network using Snudda, followed by explanation of the configuration files, network building and simulation process, as well as some preprocessing options. The network example corresponds to a 0.5 mm cube within the mouse striatum. We will assume that we already have a set of electrophysiologically optimized neurons and synapses, e.g. using the optimization tool BluePyOpt (Van Geit et al., 2016). Examples of neuron and synapse models relevant for striatum are provided on GitHub (in the snudda/examples folder with scripts and notebooks).
Our approach offers novel contributions in the following aspects: (i) we design and present a complete, free and open source toolchain for building and simulating anatomically constrained biologically detailed neural networks including morphology-based neuron touch detection; (ii) we illustrate the use of this platform on the example of the striatal microcircuit, implemented at a very detailed level and accuracy; (iii) we include all tools and parameters in the source code repository, enabling other labs to reproduce as well as reconstruct our striatal model with new data when it becomes available.

Getting Started with Snudda
Snudda is available for download from GitHub (https://github. com/hjorthmedh/Snudda) or from PyPi through pip3 install snudda. Both source code and the data files necessary to set up a striatal network are provided, Table 1 provides an overview of the Snudda directory structure. Snudda is compatible with Linux, Mac and Windows 10.
In the directory snudda/data/neurons/<region> there are separate subdirectories for each neuron type (<region> in our use case is striatum). Each of those directories contains multiple subdirectories, one for each unique morphology from that neuron type. The neuron directories include the morphology in SWC format, a JSON parameter file with one or more sets of optimised neuron parameters from BluePyOpt, a JSON mechanism file specifying which mechanisms are present in each compartment, and a JSON modulation file which specifies the neuron modulation of the neuron. The JSON file format was chosen as it is a standardised and human readable way to store structured data. The neurons folder also has a mechanisms folder containing the NEURON model description language .mod files with definitions of ionic mechanisms.
To keep the networks separate, each generated network has its own directory which contains a network.json file that links together all the different components that make up the network. The network.json file can be manually created, or in the case of the striatal network there is a way to automatically generate a network.json file of user specified size. The script init.py can be extended to create networks of other brain structures. A Jupyter notebook in examples/notebooks shows an alternative example for how to define brain slices and other structures.
In the main Snudda directory there is an examples folder with useful scripts and Jupyter notebooks for generating and running networks. The directory snudda/plotting contains scripts to plot simulation results as well as visualise the network or parts of it using Blender (https://www.blender.org/).

Use Case: Striatal Microcircuit
First we create an example striatal network, then further down we go through all configuration details. The network-config.json can be generated using the snudda init shell command. In the below example a homogeneous cube with 0.5 mm side length in the mouse striatum is generated (Fig. 1a). This corresponds to 10,062 neurons ( Fig. 1b) using the estimated average density of striatal neurons (Rosen & Williams, 2001), but the number of neurons can be varied depending on computational resources and research questions. The neurons are taken from the data/neurons/striatum directory, where every neuron type has its own directory, e.g. dspn or ispn. A neuron type is represented by one or more single-cell models, each in its own subdirectory as described above. When the network is initialised, the init.py code will look in the folders of the different neuron types and instantiate single-cell models in random order. No modifications of the neuron models other than rotations are applied at runtime. In order to improve the cell diversity, one should populate the neuron types directories with a sufficient number of different single-cell models.
The following commands can be used in a terminal window to generate the example network that we have used for the figures in this article. We go into more detail in the sections later in the article. There are additional Jupyter notebook examples in examples/notebooks and examples/ Neuroinformatics2021. The first step is to create a configuration file network.json specifying e.g. 10,062 neurons in a directory called smallSim.
Next we need to place the neurons in a specified volume. For large simulations the neurons are placed inside a volume representing the mouse striatum (Fig. 1a), while smaller networks use a simple cube (Fig. 1b, c). This is done to preserve physiological neuron densities in the simulations. The mesh definition of the striatal volume, or other structures, can be extracted from databases such as the Allen Brain Atlas of the mouse brain. The command to place the neurons inside the volume defined by the mesh is: The next step is the touch detection (Fig. 2a). Here the algorithm voxelizes the space, and looks for overlaps (within a certain predefined distance) between axons and dendrites from different neurons (Fig. 3a, b) (Hellwig, 2000).
The touch detection will create a putative set of synapses at all the close appositions between axons and dendrites.    b Corresponding two neurons in the hypervoxel representation. The neurites are traced by taking small steps along Δx, Δy, Δz corresponding to the direction of the neurite. Where axon and dendrites occupy the same voxel a putative synapse is detected, here marked by a green dot. The volume of the soma is also voxelized. c Ball-and-stick neurons arranged so their synapses are on a grid, with four synapses connecting every neuron pair. Pruning with parameter f1 = 1, 0.5 and 0.25 keeping 100%, 50% and 25% of all synapses, respectively. Number of synapses retained shown under each network (including a numeric estimate of mean and standard deviation, n = 1000). d Combination of f1 and mu2 pruning. Here P = f1 · 1.0 / (1.0 + exp(−8.0 / mu2 · (nSynapses -mu2))), where P is the probability to keep a synapse, nSynapses is the total number of synapses connecting the neuron pre-post pair. e softMax pruning step. If there are more than softMax synapses then the probability of keeping synapses between that pair is P = 2 · softMax / ((1 + exp(−(nSynapses -softMax) / 5)) · nSynapses). f Pruning a3 = 1, 0.5 and 0.25 removing all synapses between a connected pair in 0%, 50% and 75% of the cases, respectively. g Distance dependent pruning of proximal, medial and distal synapses. Jupyter Notebooks to generate this figure are available on Snudda GitHub in the examples/ Neuroinformatics2021 folder) Next we need to generate external synaptic input for the network simulation. Here we specify how much time we want to generate inputs for. The parameters for the synaptic input from cortex and thalamus are defined in a separate JSON file: The last step involves compiling the .mod files, and then running the simulation.
There are two functions in the plotting directory that allow the user to plot either the spike raster or the voltage traces, plotting/ plot_spike_raster.py (Fig. 4) and plotting/plot_traces.py, the latter requires the user to have run the simulation with the -voltOut parameter to also save the voltage traces. The simulation output files are stored in the $simName/simulations directory.

Validation Against Anatomical Data
The network building results in Fig. 2 have been validated in our previous study. Figures 8, S4-S7 in Hjorth et al. (2020) showed experimental pair-wise connection probability between neuron types and how the Snudda generated network matched the data, and matched estimations of number of synapses between connected pairs.
To validate the mean synaptic density in the simulated network, we use the data from the recent studies with genetic labeling and electron microscopy techniques. Santuy et al.

Connectivity Configuration
Running the command snudda init mysimulation -size N generates the Snudda network configuration file networkconfig.json where N is the size of the network to create. The network-config.json contains the blueprints for the striatal network hierarchically organised into the blocks RandomSeed, Volume, Units, Connectivity and Neurons. All parameters in the configuration file are specified in SI units. Below we go into more detail how Snudda is configured.
The RandomSeed block specifies the random seeds used for the different steps of the network creation.
The Volume block can contain several named volumes, such as for example Striatum, Cortex, Thalamus. For each volume block we define type (e.g. mesh), dMin (minimum distance between somas in the volume), meshFile (this specifies the Wavefront OBJ file that defines the mesh enclosing the volume) and meshBinWidth (voxelization size of the mesh for determining what is inside and outside the mesh during cell placement). Future versions of Snudda will allow for density variations within the volume and directional gradients for the neurons.

FG
An example block for PopulationUnits looks as follows.
In the Connectivity block we define the rules guiding how the different neuron populations are connected together. Each connection pair has its own block (e.g. "iSPN,dSPN"). In the example below this is illustrated with the iSPN-dSPN pair (indirect-pathway and direct-pathway striatal projection neurons, respectively), which are connected by GABA synapses.
Each connection has a conductance parameter, which specifies the mean and standard deviation of the conductance. The channelParameters gives flexibility by specifying a dictionary with the channel specific parameters, in this case it is tau1, tau2, failRate (the synapse failure rate) that are passed directly to the NEURON channel model. Next parameterFile (JSON file with additional channel parameters) and modFile (NEURON channel .mod file). The final two blocks pruning and pruningOther specify the pruning parameters for neurons within the same population unit, and neurons in different population units. The pruning parameters help shape the connectivity by parameterising the rules that define which putative connections should be removed, and which ones should be kept. The probability to keep a synapse is equal to the product of the individual pruning steps: The examples given in Fig. 3c-g are synthetic, their purpose is to illustrate pruning rules. The f1 parameter defines how large a fraction of the putative synapses should be kept, a value of 1.0 or None means that this pruning step is not used (Fig. 3c). For f1 = 0.5 we would expect on average 0.5 · 400 = 200 synapses kept, and for f1 = 0.25 we expect 0.25 · 400 = 100 synapses (c.f. 203 and 104 synapses randomly selected for f1 = 0.5 and f1 = 0.25, respectively, in Fig. 3c).
The mu2 defines a sigmoid curve used to decide whether to keep or remove all synapses between a coupled pair of neurons (Fig. 3d): With mu2 = 3, we have P mu (n = 4) = 93.5%, P mu (n = 3) = 50%, P mu (n = 2) = 6.5%, P mu (n = 1) = 0.5%. Thus for f1 = 1 we expect 400 · 0.935 = 374 synapses left. For f1 = 0.5 we have two parts to the pruning, first f1 where half the synapses are removed, then mu2 which operates on all the synapses between a coupled pair. We thus look at the 100 possible neuron pairs. The expected number of synapses E syn ¼ 100 Á ∑ 4 1 P n Á P mu n ð Þ Á n, with P n the probability of a neuron pair having n synapses after the f1 pruning. We then get synapses.
The softMax specifies at which value we start applying a soft cap to the total number of synapses between the pair: where P SM is probability to keep a synapse, and n is the initial number of synapses between the pair of neurons (Fig. 3e). For n = 4 and softMax = 3 yields P SM = 82.5% resulting in on average 400 · 0.825 = 330 synapses left. For softMax = 2 around 239 synapses will remain, and for softMax = 1 on average 129 synapses are kept. The a3 parameter specifies which fraction of all connected pairs to keep, e.g. 0.8 means that 20% of all connected pairs will have all their synapses removed (Fig.  3f). Here a3 = 1, 0.5 and 0.25 result in on average 400, 200 and 100 synapses left, correspondingly. The distPruning defines a distance d dependent function P dist : d → [0,1], where d is the distance from the soma along the dendrites (Fig. 3g). The expected number of synapses for the distance dependent pruning is given by E syn ¼ 20 Á ∑ 20 k¼1 P d k ð Þ where P(d) is one of the equations in Fig. 3g. In this example the distances to the soma are d = 56, 65,86,95,116,125,146,155,176,185,206,215,236,245,266,275,296,305,326 and 335 μm (with 20 putative synapses at each distance). The expected number of synapses in the three cases are thus 107, 134 and 91, respectively.
Continuing our look at the network configuration file structure, the final block Neurons defines the different neuron populations. Here each neuron template has its own block. For each neuron template we specify four files: morphology (a SWC file defining the soma, axon and dendrites), parameters (neuron parameters optimised using BluePyOpt), mechanisms (NEURON mechanisms), modulation (a JSON file defining the neuromodulation). The template can be used to define multiple neurons, the number defined by num. The hoc parameter is optional, and intended to be used in the future when e (Williams et al., 2018). The neuronType can be either neuron or virtualNeuron, the latter can be used to define axons from other structures providing input to the striatum. The rotationMode lets us specify if the neurons should be left unrotated, or rotated in some manner. The volumeID defines which volume the neurons belong to.

x p o r t i n g t o S O N A T A f o r m a t f o r u s e w i t h Neurodamus
The $DATA keyword is a shorthand for the snudda/data folder.

Configuring External Synaptic Input
The input spikes to the network are generated as prescribed in the input configuration file input.json in JSON format. Below we will give a simple example of how to set up input, and there are more examples available on Github in the examples/ notebooks directory.
In this example the dSPN will each receive 200 inputs, with 1 Hz Poisson random spikes. The configuration also specifies the conductance and the tmGlut mod file that is used by NEURON to simulate the input synapses.
To complement the cortical (Ctx) input with thalamic, add a second input block with parameters inside the dSPN target block and give it a name e.g. Thalamic. The entire dSPN population will then receive both cortical and thalamic inputs.
Only one target block is applied to each neuron. When Snudda generates input for the network it iterates through all the different neurons in the simulation and picks the most specific target block that matches that neuron. In the example below a dSPN with neuron ID 5 and morphology dSPN_0 will match all three blocks, but the neuron ID is most specific so the "5" block will be used. A dSPN_1 morphology neuron will only match the dSPN block and will use that.
In the dSPN target configuration the start, end and frequency are specified as vectors. Here the input is 4 Hz at 2-3 s, and 2 Hz at 5-7 s. We can also use population units to specify heterogenous external input to the target volume, see examples/notebooks on Github.
To create advanced inputs not supported by Snudda the custom spike times can be read from a CSV file (with one spike train per row) by using "generator": "csv" and "csvFile":"path/to/your/csvfile".
A more complex example using additional input generation functionality is given below. The type defines what sort of input the synapses form, e.g. AMPA_NMDA or GABA. The number of inputs to each neuron can either be defined directly using nInputs or indirectly by specifying the density of inputs synapseDensity along the dendrites. If both parameters are given, the code will use the density but scale it so that the nInputs are created. There is also an optional parameterFile that can be used to define a set of parameters for the synaptic channel.
The populationCorrelation describes how correlated the Poisson input is that is generated by mixing a shared mother process (each spike is included with probability P = sqrt(C)) and a number of independent child processes (inclusion probability 1 -P) to get the resulting input trains (Hjorth et al., 2009).
We also include the functionality of virtual neurons, which are neurons that are not simulated, instead their activity is driven by a predefined spike train. This can be used to model, for example, the activation of reconstructed cortical axons in the striatum, which after touch detection will drive the neurons they connect to.
When using synapse density to place excitatory input onto the neurons, larger neurons will receive more input than smaller neurons of the same type. However, size does not necessarily correlate with excitability of the neuron, or steepness of the I-V curve which depends on intrinsic channels. To handle this variation of excitability Snudda allows the user to scale the number of synapses reaching a neuron, a process which in a real network might be regulated by neuronal homeostatic processes.

What Happens under the Hood?
For each volume modelled the cell placement is restricted to be inside the mesh specified. The neurons are placed one by one, with coordinates randomly sampled from a uniform distribution. If a neuron position is inside the mesh, and there are no other neurons within a distance dMin from it, the position is accepted. To avoid an artificial increase of neuron density at the border, the neuron positions placed outside the mesh are also tracked. These padding positions are not counted towards the total, and are discarded afterwards. Orientation of the neurons in the striatum is completely random, but it is possible to specify other ways to sample the orientation.
For touch detection the space is divided into voxels of 3 μm side length (Fig. 3b). Synapses are only detected when axon and dendrite are present in the same voxel. The maximal interaction distance is thus decided by the voxel size. To parallelise the touch detection the voxels are grouped into hypervoxels, containing 100 3 voxels each. The mouse dorsal striatum occupies about 26.2 mm 3 (https://mouse.brain-map. org/), and contains almost 2 million neurons (Rosen & Williams, 2001). The first step is to identify which neurons belong to which hypervoxels. For a large portion of the neurons they will be present in more than one hypervoxel. This procedure is done in parallel where the worker nodes of the parallel computer get allocated a subset of the neurons and based on the vertex coordinates of the neurites calculate which hypervoxel the neurons are in. The results are gathered, creating a list of neurons for each hypervoxel. The hypervoxels are then sorted based on the number of neurons inside, and those with most neurons are processed first for better load balance. To perform the touch detection, a line parsing algorithm takes small steps Δx, Δy, Δz along all line segments of the dendrites, marking the voxels they intersect. The voxels contained within the soma are also marked. It then repeats the procedure for the axon line segments of the morphologies. Voxels that contain both axons and dendrites are considered to have a putative synapse if the two neuron types are allowed to have a connection between them (Fig. 3a, b).
The purpose of the touch detection is to find the potential locations where neurons can connect to each other based on the restrictions set by the morphologies. The result of the above touch detection is a set of putative synapses, which is larger than the set of actual synapses. In the pruning step, the set of putative synapses is reduced to match the connectivity statistics from experimental pairwise recordings.
In experiments it is common to report only the binning size and the number of connected neuron pairs, and the total number of pairs. It would be beneficial for circuit modelling if the distance for each pair was also recorded, we could then extract distance dependent connectivity profiles and compare those to what the computer models predict. The pruning is divided into multiple steps, described above. The touch detection for a cubic millimeter can be run in a couple of hours on a desktop, and the whole striatum can be created in a couple of hours on a supercomputer (Fig. 5). As an example, creating a striatal network with 10,000 neurons (6.4 million synapses and 1468 gap junctions) on a desktop Intel Xeon W-2133 CPU @ 3.60GHz with 6 cores and 64GB RAM took: init~1 s, place 10 s, detect 8 min and prune 6 min. For 20,000 (50,000) neurons the corresponding times are~1 s (~1 s), 16 s (32 s), 15 min (38 min), 12 min (36 min) to detect 14.3 million (39.9 million) synapses and 3462 (9395) gap junctions.
In addition to JSON configuration files, the resulting network data is stored in HDF5 files.

The Challenge of Limited Morphology Data
A big challenge of the biologically detailed anatomically constrained simulations of the neural microcircuits is availability of the high quality morphological reconstructions of the main neuron types, in sufficient numbers and variability. Open public morphometric repositories, similar to ModelDB for models present in the world-wide web since 1996 (McDougal et al., 2017), were pioneered in 2006 by G. Ascoli with NeuroMorpho.Org (Akram et al., 2018; http:// neuromorpho.org/) and continued by other research centers like Allen Brain Institute (Jones et al., 2009; https://portal. brain-map.org), Janelia Research Campus (Gerfen et al., 2018;Economo et al., 2019; http://mouselight.janelia.org/), eBRAINS Knowledge Graph (https://kg.ebrains.eu/), to name a few, have become increasingly popular among computational neuroscientists.
Module Treem provides data structure and command-line tools for accessing and manipulating the digital reconstructions of the neuron morphology in Stockley-Wheal-Cannon format, SWC (Cannon et al., 1998). Access to morphological data from the source code is supported by several Python classes. Common operations with SWC files are possible from the user-written scripts or via the command-line tool swc. For the detailed description of the user interface, see API and CLI references in the online documentation (https://treem. readthedocs.io).
A common reconstruction error is so called "z-jump" (Brown et al., 2011) when a part of the neurite gets shifted along the z-axis by a few micrometers as shown in Fig. 6a (top panel). These can result from an accumulated error during the manual reconstruction or as a mistake in automatic procedure. Possible z-jumps can be eliminated in Treem by the repair command using one of the four methods, align, split, tilt or join, as illustrated in Fig. 6a. Choice of the repair method as well as the assessment of the result should ideally be left to the author of the reconstructed data; if this is not possible the preference is given to the method which better preserves cell symmetry.
Since neuronal tissue can shrink due to dehydration during histological preparation, correction factors are required before a reconstructed morphology can enter the simulation pipeline. Shrinkage correction involves scaling of the entire reconstruction in (x, y)-plane, expansion in z-direction, as well as decreasing contraction of selected neurites, e.g. dendrites, by stretching along their principal axes (termed "unravelling" in  Markram et al., 2015) or length-preserving spatial filtering (not shown, see online documentation).
Another important omission is that the neurons located close to the slice surface often have their neurites cut and thus missing in the digital reconstruction. Cut neurites can be replaced using the intact branches of the same topological order from the inner part of the slice, assuming spherical or axial symmetry of the neuron morphology as shown in Fig. 6b. In Treem this is achieved with the repair command (see online documentation for the example commands to reproduce Fig. 6b).
One of the aspects of the large-scale simulations is realistic variability of the model parameters mimicking the natural spread of morpho-electric characteristics in live neurons. To enforce variability in the simulation based on the limited number of reconstructed neurons, we apply random manipulations to the morphological reconstructions. Examples of the lengthpreserving modifications implemented in Treem are shown in Fig. 7a. Methods jitter, twist and rotate do not change the length of the dendritic branches and thus do not affect electrophysiological features of the optimized models but help to recover spatial symmetry of the morphological reconstructions as shown in Fig. 7b. To distribute excitability of the single-cell models, digital reconstructions can be scaled randomly in 3D, as was done in the large-scale simulation by Fig. 6 Repairing digital reconstruction of the neuron morphology. a Correcting "z-jump" reconstruction errors (top panel). Dots illustrate reconstructed points, soma is black, dendrites are blue, the orange dot labels the node at the point of presumed discontinuity. Four correction methods implemented in Treem (Python module treem) are shown below (align, split, tilt and join). b Repairing the dendrites cut at the slice border. Orange dots in a 3D plot show the cut points of the dendrites. Red lines in 2D projections show "repaired" dendrites, i.e. extended neurites using undamaged reconstructions of the same topological order as the cut branches Fig. 7 Adding variability to the morphological reconstructions. a Examples of modification methods used in Treem (modifications preserving the total length are shown). b Distribution of the coordinates of the dendritic terminations of the fast-spiking interneurons at different stages of morphology processing -original reconstructions (n = 3), repaired reconstructions (n = 3) and "jittered", i.e. randomly manipulated reconstructions, nine variants per each cell (n = 27). Here, only random twisting of the dendritic branches at the bifurcation points was applied which proved to be sufficient to restore the symmetry Hjorth et al. (2020) and illustrated in the online documentation of Treem (https://treem.readthedocs.io).

Discussion
In this paper we show how to use our open source modelling pipeline to build microcircuit models. An important goal is that the model building process should be transparent and possible to reproduce by other labs, and the model should be extendable when new data accumulate. The pipeline is developed for setting up large-scale simulations of subcortical nuclei, such as striatum. In our current pipeline, based on the software Snudda, neurons can be placed in a defined volume, and then prediction of the location of synapses (as well as gap junctions) can be made using neuron morphologies and the specified pruning rules. Also synaptic data for short-term synaptic plasticity or failure rates can be represented. Finally a simulation using the NEURON simulation environment can be launched. A challenge when using cellular level data from public databases is that sometimes the data for the reconstructed neuronal morphologies only include soma and dendrites, missing the axon entirely. Therefore Snudda supports the prediction of synapses using different approaches. If the detailed morphology is available, the reconstructed axons and dendrites can be used to constrain which neurons are within reach of one another. If the axon is missing, the user can instead specify an axonal density which is then used for the synapse detection. Also our pipeline provides the opportunity to 'repair' the dendritic morphologies, and we have illustrated ways to do this using the software Treem. In addition, Treem can provide jittering of morphological parameters to increase the variability in the modelled population of neurons, which is useful to avoid artefacts when there are too few available morphologies for each neuron type. The current version of Snudda does not treat spines separately from the rest of the dendrites, a future improvement would be to allow the user to specify requirements to target spines separately, e.g. if spines are already specified in the reconstruction data.
Models of neocortical microcircuits have been built with similar approaches (see above), however, not all elements of their workflow were available or open-source at the time of Snudda development. We believe that our open source pipeline might become useful when building biophysically detailed microcircuit models of other subcortical brain regions, such as the other basal ganglia nuclei.
When using the current modelling workflow, it is assumed that one has a collection of quantitatively detailed neuron models for each neuron type to be used in the modelled microcircuit. Such neuron models might come from public databases (see above). But most likely several of the neuron types in the selected microcircuit to be modelled might have to be built from scratch. Here the challenges are several as described above. Data on electrophysiological recordings published might be incomplete in such a way that only a few selected traces, as shown in the published manuscripts, exist. Also transcriptional, electrophysiological and morphological data might come from different experiments. Ideally, however, it would be best to have recordings from the neurons that were morphologically reconstructed, such as in patch-seq technique (e.g. Fuzik et al., 2016). Although the electrophysiological properties are well studied, one might lack the knowledge of which ion conductances are expressed in the neurons. Fortunately, such data are starting to emerge, and for example, for striatum transcriptomics data already exist (Muñoz-Manchado et al., 2018;Ho et al., 2018;Gokce et al., 2016;Saunders et al., 2018). If one has a good hypothesis of which channels are expressed, characterisation as well as models are starting to be collected at resources such as the Channelpedia (Ranjan et al., 2011;http://channelpedia.net) and Ion Channel Genealogy (Podlaski et al., 2017; https://icg.neurotheory.ox. ac.uk/). Although still not trivial, if one has both the morphology and electrophysiological data of a particular neuron type, workflows have already been developed for optimizing neuron models (Van Geit et al., 2016;Migliore et al., 2018;Masoli et al., 2020).
A natural future goal would, however, be to link microcircuit models built in different labs, e.g. a cortical microcircuit connected to a striatal microcircuit. Then interoperability between models as well as model specification, such as SONATA, would be crucial. We have on our road map for Snudda to support the SONATA standard  and work has already started on it to become interoperable with the EBRAINS infrastructure (https://ebrains.eu/). Data Availability Data used for single-cell neuron models as well as synaptic connectivity within the example network of the striatal microcircuit is available on GitHub (in the snudda/data and examples folders at https://github.com/hjorthmedh/Snudda).

Declarations
Conflicts of interest/Competing Interests The authors have no relevant financial or non-financial interests to disclose.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.