# Design principles for accurate folding of DNA origami

Tural Aksel, Erik Navarro, Nick Fong, Shawn M. Douglas
Department of Cellular and Molecular Pharmacology, University of California, San Francisco

Email: shawn.douglas@ucsf.edu or turalaksel@gmail.com

Author Contributions: T.A. and S.M.D. conceptualized the project. T.A. performed research, collected data, wrote software, and analyzed data. E.N. performed research and collected data. N.F. provided software support. S.M.D. provided resources, supervised the project, wrote software, analyzed data, and wrote the manuscript with input from all authors.

Competing Interest Statement: The authors declare no competing interests.
Classification: Biological sciences
Keywords: DNA origami, thermodynamics, self-assembly.

## Abstract

We describe design principles for accurate folding of three-dimensional DNA origami. To evaluate design rules, we reduced the problem of DNA strand routing to the known problem of shortest-path finding in a weighted graph. To score candidate DNA strand routes we used a thermodynamic model that accounts for enthalpic and entropic contributions of initial binding, hybridization, and DNA loop closure. We encoded and analyzed new and previously reported design heuristics. Using design principles emerging from this analysis, we redesigned and fabricated multiple shapes and compared their folding accuracy using electrophoretic mobility analysis and electron microscopy imaging. Redesigned shapes showed 6- to 30-fold improvements in yield compared to original designs. We demonstrate accurate folding can be achieved by optimizing staple routes using our model and provide a computational framework for applying our methodology to any design.

## Significance Statement

This study presents a novel approach for designing DNA origami nanostructures with superior folding accuracy. We developed a global optimization algorithm with a thermodynamic scoring function to determine optimal DNA strand routing. This computational framework enables systematic implementation of design principles that enhance folding accuracy. Redesigned DNA origami shapes showed 6- to 30-fold improvements in correctly folded structure yields. By providing both fundamental insights into DNA origami self-assembly and practical tools for improved designs, this work significantly advances the precision and efficiency of DNA origami fabrication. This breakthrough paves the way for more sophisticated applications in nanomedicine, nanoelectronics, and related fields.

## Introduction

In 2006, Rothemund pioneered the DNA origami technique, which utilizes a single-stranded DNA (ssDNA) 'scaffold' and numerous short DNA oligonucleotide 'staples' to orchestrate the self-assembly of predetermined shapes (1). This versatile approach to bottom-up fabrication (2–4) holds the potential to revolutionize manufacturing of nanoscale materials and devices (5–8), including tiny mechanical instruments for biological and biophysical measurements (9–13), and programmable therapeutics (14–16). A bottleneck to realizing many applications with current design approaches is folding accuracy, or how closely the structure of a folded DNA origami shape agrees with the structure of the intended design (17).

Determining generalizable design principles that increase the folding accuracy of DNA origami has been recognized as an important and unsolved challenge (18–23). In 2012, Martin and Dietz identified the "sequence-specific thermal stability of dsDNA domains formed" as an important parameter in folding accuracy, and suggested a design heuristic of maximizing the number of staples with "at least one contiguous scaffold-binding region with Tm > 45°C" (18). In 2015, Dunn et al. used a scaffold containing two tandem copies of a 2,646-nucleotide (nt) sequence to show that DNA origami folding is highly cooperative and can be kinetically controlled via staple design (19). In 2019, Schneider et al. used FRET assays to systematically evaluate the folding sequence and kinetics of 3,172 pairs of dye-labeled staple termini in a 140-staple 42-helix bundle (21). They found further evidence to support cooperative folding, but no simple local design rules emerged for explaining or controlling the sequence of folding events.

With the goal of unlocking high-precision applications of the method, we developed a computational framework that addresses three related problems: how to express design principles as rules that are both human- and machine-readable, how to score the quality of a candidate staple route, and how to implement design rules in a fast, global, and unbiased fashion. We applied our framework to explore several aspects of DNA strand design, and report design principles emerging from this analysis.

Our approach maps an input design onto a weighted graph in which smaller edge weights correspond to better-scoring staple routes, and therefore a shortest-path algorithm can be applied to find a globally optimized set of staple routes. We considered several schemes for scoring the quality of a staple route and observed favorable results using a thermodynamic model that accounts for staple-to-scaffold binding, scaffold loop closure, and hybridization. We used our algorithm to refine several complex lattice-based structures and noted enhanced folding accuracy across all designs when assessed by gel electrophoresis and transmission electron microscopy (TEM). We encapsulated our design principles and algorithm in a computational notebook to enable researchers to analyze and optimize any design.

### Reduction to a known problem

A DNA origami design can be understood as a directed graph in which each node represents a nucleotide and edges represent phosphate-backbone or base-pairing connections between nucleotides. Cadnano, a computer-aided design tool, provides a graphical interface to enable users to manipulate a structured layout of this directed graph (24). Although the directed graph layout provides a useful molecular abstraction, the format is ill-suited for analyzing the underlying design principles. Therefore, we mapped the directed graph onto a separate weighted graph in which nodes represent staple precursor phosphates. Edges in the second graph represent possible design choices that would remove the phosphates at the connected nodes. Edge weights are assigned by a scoring function that estimates the quality of the staple flanked by the removed phosphates. In this framework, paths are sets of design choices that can be mapped back to the directed graph to generate staple paths, and the shortest paths represent globally optimized solutions according to the scoring function.

Our semi-automated approach consists of creating an incomplete design in Cadnano, and then using a computer program to carry out the mapping, staple-route selection, and reverse-mapping steps to generate an output. In the manual step, users draw a scaffold route and invoke the "autostaple" function to generate precursor staples -- long, often circular, staple routes complementary to the scaffold. Precursor staples can be customized as needed before saving the design as a Cadnano JSON file.

The automated step begins by parsing the input file to construct a directed graph that captures key details of the design: scaffold and staple routes, base-pairing relationships, and the locations of strand termini, strand crossovers, base insertions, and base skips (Fig. 1A). Next, staple precursors found in the directed graph are mapped onto a separate weighted graph whose nodes represent potential breakpoints, or phosphate bonds between nucleotides. Thus, each edge represents a candidate staple route, connecting pairs of breakpoint nodes that, if broken, would yield a valid staple. Each staple precursor forms a subgraph within the weighted graph.

The weighted graph is pruned during construction using a rule-based approach to include or exclude specific types of nodes or edges, thereby implementing design rules in a globally consistent manner. The breakpoint rules we explored are shown in Fig. S1. Edges are pruned to ensure final staple lengths are 21 to 60 nt by default. Design constraints are introduced into the weighted graph to later calculate penalties for global solutions that violate the original design intent by breaking both halves of any double crossover (Fig. 1B). Our algorithm provides support for calculating penalties as small scalar values that can be used to rank potential global staple-breaking solutions with similar absolute scores.

Next, a scoring function is used to assign edge weights (Fig. 1C). Our modular approach can use any function that takes a staple route input and returns a numerical score. Our preferred scoring function is described below. The weighted subgraphs are analyzed using a k-shortest-path algorithm to identify the k shortest path solutions for each subgraph, with k=10 by default (Fig. 1D). Finally, a set of candidate global solutions is assembled. Each global solution is constructed by selecting, at random, one of the k solutions for each subgraph (Fig. 1E). Penalties are assessed for subgraph combinations that violate design constraints. The total score for a global solution is calculated as the sum of edge weights for all staples in the design. The relative best-scoring global solution is reverse-mapped to the directed graph to generate a final Cadnano design, which is output to disk along with a detailed score report (Fig. 1F). Designs can then be synthesized, folded, and subjected to experimental analysis (Fig. 1G).

### Designing a scoring function

Shortest-path optimization compares the relative scores of candidate staple-routing solutions and returns the relative best-scoring result. However, better scores are only meaningful if they correlate with improved folding accuracy. We aimed to devise a scoring function that encapsulated design parameters that affect the probability of successful folding. We adopted an equilibrium modeling approach by independently calculating each staple score in its fully-bound state, and ignoring folding kinetics and inter-staple cooperativity. The probability of an origami design folding accurately, Porigami, is the product of the probabilities of folding of each staple Pstaple in the design, where n is the total number of staples in the design.

P_origami = prod_(i=1)^n P_(staple_i)     (1)

We used a Boltzmann distribution to express Pstaple in terms of dGtotal, the Gibbs free energy change between the unfolded and folded states, with R and T denoting the molar gas constant and temperature, respectively:

P_staple = e^((-dG_total)/RT) / (1 + e^((-dG_total)/RT))     (2)

We assigned edge weights as logarithmic values of Pstaple. A global quality score, Qorigami, is the sum of the edge weights normalized by L, the total number of base pairs formed by scaffold-staple duplexes:

Q_origami = (sum_(i=1)^n ln(P_(staple_i))) / L     (3)

As summarized in Fig. 2B, we estimated dGtotal for each candidate staple route as the sum of three energy terms:

dG_total = dG_bind + S(dG_loop) + S(dG_hyb)     (4)

The dGbind term is the concentration-dependent free-energy change due to the bimolecular binding event when the staple first associates with the scaffold (19). The second term, dGloop, is the energy cost for each scaffold loop closure whereby the staple restricts movement of non-contiguous scaffold segments. Like Dunn et al. (19), we treat the scaffold as a freely jointed chain, and calculate the mean-squared distance between the two scaffold segments to be bridged by the staple. The distance is used to estimate the effective local concentration of one end of the loop at the other and then calculate the dSloop and dGloop values. The third term, dGhyb, is the free-energy gain due to hybridization of the staple to the scaffold (25, 26). Free energy calculations were performed at T=50 degrees C, where nearest-neighbor predictions of dG are most accurate (25), and near the temperature intervals where DNA origami folding has been observed empirically (27). We calculated enthalpic (dHtotal) and entropic (dStotal) components for each staple and used the Gibbs free energy equation to estimate a folding temperature, Tfold, for each staple when dGtotal = 0.

dG_total = dH_total - T_fold * dS_total     (5)

T_fold = (dH_total) / (dS_total) - 273.15 [degrees C]     (6)

We used Tfold estimates for each staple to generate a 'heatmap' representation using the conventional directed-graph schematic (Fig. 2C). Staples are assigned colors using a cool-warm colormap, with a center point at 50 degrees C. Values of Tfold > 50 degrees C appear blue, Tfold ~ 50 degrees C appear gray, and Tfold < 50 degrees C appear red. We visualized relative Tfold values, or Tfold - 50 degrees C.

## Results

To validate our approach, we redesigned four published multilayer DNA origami shapes, and assessed their folding accuracy compared to the original designs using agarose gel electrophoresis and TEM (Fig. 3). The panel of objects included a 100-helix block (10x10 layout) with square-lattice packing, and three different 64-helix block layouts (4x16, 8x8, and 16x4 layouts) each with honeycomb-lattice packing (18, 24). We added a 'read-only' mode to our algorithm that applies the scoring function to existing designs without breaking any staples and used it to generate Tfold heatmaps (Fig. 3 col. 2, Figs. S2-S5) and strip plots (Fig. 3 col. 3). The estimated mean Tfold values for the original were uniformly lower than the redesigned versions.

Accurately folded DNA origami shapes tend to migrate on agarose gels as sharp, well-defined bands with increased mobility, while misfolded shapes migrate as diffuse bands with lower mobility along with additional species that indicate aggregation (5). To assess folding accuracy and robustness, we folded each design at magnesium chloride (MgCl2) concentrations varying from 6 to 20 mM (Fig. S6), and quantified the 20 mM condition using ImageJ (28).

We isolated a rectangular region of the lane spanning vertically from the bottom of the well down to the area below the fastest-moving band for each design pair, generated a 1D histogram, and manually subdivided the histogram into three regions: Slow, Peak, and Fast. Peaks correspond to the manually determined regions containing the leading band in each lane. Slow and Fast regions run above and below a Peak, respectively. We defined the Yield of accurate folding for each design as the integrated pixel intensity corresponding to the fastest moving Peak among the original and redesign conditions, which in all cases was the redesign. Our redesigned structures had uniformly better accurate-folding Yields compared to the originals. The 10x10 block redesign improved from 2% to 61% of band intensity qualified as accurate (Fig. 3A). The 64-helix blocks with 4x16, 8x8 and 16x4 configurations improved from 4% to 48%, 3.6% to 56%, and 6% to 48% (Fig. 3B-D). All redesigned variants displayed some degree of accurate folding at MgCl2 concentrations as low as 10 mM (Fig. S6).

We physically extracted Peak bands from the lanes containing the redesigned shapes and imaged them via TEM to confirm that the structures were intact and accurately folded (Fig. 3, column 5 and Figs. S7-S10). Micrographs of the unimproved designs can be found in the original publications for each structure (18, 24).

### Key design principles

The original and redesigned shapes exhibited folding yields that correlated with global Q scores provided by our algorithm (Eqn. 3; Fig. 4A). To identify key design principles and related tunable design parameters, we analyzed the original and redesigned staple routes to identify the relative contributions of the dGhyb, dGloop, and dGbind to the overall dGtotal score.

### Balancing hybridization and loop-closure terms

Accurate folding requires staple routes that balance the enthalpically favorable formation of staple-scaffold duplexes via hybridization (dGhyb) with entropic penalties of scaffold loop-closures (dGloop). Fig. 4B shows the median dGhyb and dGloop values and how they changed between the original and redesigned shapes. Per-staple thermodynamic plots are shown in Fig. S11. Our model enables the quantitative comparison of specific design choices, including scaffold routing and passivation strategies. In the case of the 10x10 brick, both dGhyb and dGloop values improved markedly in our redesign (Fig. S2). For the remaining architectures, our algorithm determined staple routes with improved dGhyb values (ddGhyb ranged from -8.5 to -12.7 kcal/mol), at the cost of slightly less favorable dGloop terms (ddGloop of +0.9 to +2.8 kcal/mol).

### Staple Concentration

The first term in our model, dGbind, accounts for the relative concentrations of scaffold and staple strands. For all lab tests we used fixed concentrations of 10 nM scaffold and 100 nM for each staple, therefore all edge weights included an identical dGbind value (10.3 kcal/mol). Our model predicts that further increasing a staple's concentration, for example from 100 nM to 1000 nM will reduce its dGbind penalty by about 1.5 kcal/mol (Fig. 4C). Increasing the molar ratio of staples to scaffold above 10:1 is unlikely to significantly improve folding accuracy. However, there may be benefits to tuning the concentrations of subsets of staples that exhibit cooperative binding. Future models incorporating staple cooperativity will be essential for exploring the effects of staple concentration in greater depth.

### Scaffold Permutation

The sequence register of circular ssDNA scaffolds can be permuted within a design without modifying the staple routes. Staple dGhyb terms can vary across permutations, affecting the global quality score, Qorigami. We modeled the effect of permuting the scaffold sequence to every possible register of each original and redesigned structure (Fig. 4D, Fig. S12A), and observed max-min spreads of up to 3.4 kcal/mol between the median dGhyb terms, and spreads up to 0.03 in Qorigami score. We conclude that post staple-routing permutation analysis may improve folding accuracy, but the benefit is small relative to optimizing the staple routes. For comparison, the Qorigami values in our redesigned shapes improved by 0.27 (16x4 block) to 0.56 (10x10 block) per base pair.

### Scaffold Sequence

To assess the prospect of leveraging design-specific custom scaffold sequences to enhance folding accuracy, we modeled the sequence-tunable dynamic range of the Ghyb term (Fig. 4E). We generated two sets of random duplex sequences with lengths varying from 4 to 20 nucleotides and GC content of 0-60%, the typical range offered by commercial DNA synthesis vendors. Ghyb values of a 14-nt duplex, for example, differed by up to 8 kcal/mol based on GC content, suggesting that duplex-level sequence customization may offer a powerful complement to staple-breakpoint optimization for enhancing folding accuracy. Sequence customization will require design and synthesis of design-specific scaffolds (29, 30).

Fig. 4F summarizes the potential ddGtotal, in kcal/mol, that can be achieved by modifying key design and experimental parameters, along with ddG ranges for the individual energy terms in our staple scoring function (Eqn. 4).

### Scaffold Routing

Our model can be used to compare alternative scaffold routes for a design. We analyzed two versions of the 4x16 block (Fig. S13). The redesign and the "staggered" redesign had similar Q scores (-0.217 and -0.241, respectively) and comparable gel yields. The non-staggered version performed slightly better in both predicted folding accuracy and measured yield on the gel, although the difference was modest.

Scaffold routing can have a significant impact on folding energetics. For instance, in our redesign of the 10x10 block, we modified the original highly staggered scaffold seam, resulting in a favorable ddGloop (Fig. 4B), contributing to the substantial increase in folding yield observed for this structure.

These results suggest that while small variations in scaffold routing may have minimal effects on overall folding accuracy, more substantial changes can lead to meaningful improvements. Our model provides a quantitative framework for evaluating such design choices, allowing for informed decisions in scaffold routing strategies.

### Limitations and outlook

Our model, while demonstrating significant improvements, is inherently incomplete and does not account for all factors determining accurate folding. The yields of our redesigns still show 39-52% misfolded populations, indicating room for further optimization. Important factors such as folding kinetics and cooperativity, which likely significantly affect folding accuracy, are not captured in our current model. Our equilibrium modeling approach does not capture the thermal annealing process and instead uses a fixed temperature of 50 degrees C for all scoring function calculations. Future work extending our method to incorporate a nonequilibrium framework that captures the evolution of folding over time and temperature could offer benefits such as improvements in folding accuracy, prediction of optimal folding temperatures, and reduction of annealing times.

## Discussion

DNA origami design, like de novo protein design, presents a vast solution space. Unlike proteins, the geometric rules of DNA origami self-assembly have allowed for designing shapes without modeling the energetics of the folded state. However, the increasing demand for highly precise and repeatable fabrication in DNA origami applications signals an evolution toward computational modeling approaches analogous to those used in protein design.

Our study presents design principles that enhance the folding accuracy of DNA origami by integrating a global optimization algorithm with a robust thermodynamic model. The predictive capability of our scoring function, rooted in the thermodynamic stability of staple-scaffold interactions, has been empirically validated through electrophoretic mobility analysis and TEM imaging. In addition to the shapes reported here, our design principles have been successfully applied to generate precision instruments for probing cell activation and protein structure (31, 32).

Here, we focused on multilayer lattice-based DNA origami structures and assessed folding accuracy by gel mobility and negative-stain TEM. Other DNA origami architectures, such as polygonal meshes, wireframe polyhedra, or curved shapes may require customized models for optimization. Future studies may also benefit from additional methods for characterizing folding accuracy, such as cryo-EM structural analysis or design-specific functional assays.

Our methodology and computational toolset promise to broaden the horizons for the design of nanoscale constructs and pave the way for sophisticated applications previously challenging to achieve, from medical to electronic and photonic nanodevices. As an immediate practical benefit, improved folding accuracy and yield will lower costs for large-scale DNA origami manufacturing.

## Materials and Methods

### Origami design, optimization, and fabrication

DNA origami designs were designed in Cadnano (24). Staple route auto-breaking, and analysis were performed using a custom software toolkit; links are provided in the Code Availability section. Folding reactions included 10 nM scaffold ssDNA (Tilibit) and 100 nM each staple (IDT) and 1X FOB20 (5 mM Tris-Base, 1 mM EDTA, 5 mM NaCl, 20 mM MgCl2 at pH 8.0). Magnesium-dependent stability was assessed using folding buffers containing 6 to 20 mM MgCl2 (Fig. S6). Folding reactions were performed using a Bio-Rad Tetrad2 thermal cycler. We used the following temperature annealing ramp: (1) Incubate at 65 degrees C for 10 min; (2) Incubate at 60 degrees C for 1 h, decrease by 1.0 degrees C every cycle; (3) Goto step 2 an additional 20 times. Between steps 1 and 2, the temperature decreased from 65 degrees C to 60 degrees C at the maximum rate of the thermal cycler, or about 3 seconds. Starting at step 2, each cycle is 1 hour in duration (i.e., 1 hour at 60 degrees C, 1 hour at 59 degrees C, etc.). Each 1 degrees C transition takes a little over 1 second. Total annealing program time is 21 hours 10 minutes 32 seconds. After the final step, the structures are allowed to cool to room temperature in an uncontrolled fashion.

### Gel analysis

Origami were analyzed using 2% agarose gel electrophoresis in Tris-borate-EDTA (45 mM tris-borate and 1 mM EDTA) supplemented with 11 mM MgCl2 and SYBR Safe. Upon sample-loading, gels were run for 3 h at 80 V and subsequently scanned using a Typhoon FLA imager. Redesigned structures were purified by extracting the desired gel band using a razor blade, muddled to break down the agarose and then centrifuged through a Freeze 'N Squeeze column (Bio-Rad).

### TEM characterization

For negative-stain grid preparation 5 ul of gel-purified DNA origami sample was deposited onto a glow-discharged thin-carbon coated grid, incubated for 1 min. Excess liquid was wicked away using filter paper. Next, 10 ul of freshly prepared 2% uranyl formate (Electron Microscopy Sciences) was applied to the grids and immediately wicked away using filter paper. A second round of 10 ul of 2% uranyl formate was then applied to the grids for 3 min before excess liquid was wicked away and the grid left to dry. Micrographs of the negatively stained grids were collected on Tecnai T12 (FEI) at x30,000 magnification.

## Code and Data Availability

The source code for the Autobreak model is available under an open-source license at:
https://github.com/douglaslab/pyOrigamiBreak

A backup of the source code repository is archived at:
https://zenodo.org/records/13916388

Cadnano designs and scaffold sequence files are available at:
https://github.com/douglaslab/cadnano-designs/tree/main/2024autobreak

A computational notebook for running the code is available at:
https://colab.research.google.com/drive/1wRAO8LdY5XCeuZfsmvdUHlJWMBQmQBVa

## Acknowledgments

This work was supported by ONR grant N00014-17-1-2627, NIH grant R35GM125027, and NSF grant DBI-1548297. T.A. was supported by the Ruth L. Kirschstein NRSA Postdoctoral Fellowship grant F32GM119322.

## References

1. P. W. K. Rothemund, Folding DNA to create nanoscale shapes and patterns. Nature 440, 297-302 (2006).
2. T. Gerling, K. F. Wagenbauer, A. M. Neuner, H. Dietz, Dynamic DNA devices and assemblies formed by shape-complementary, non-base pairing 3D components. Science 347, 1446-1452 (2015).
3. G. Tikhomirov, P. Petersen, L. Qian, Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns. Nature 552, 67-71 (2017).
4. C. M. Wintersinger, et al., Multi-micron crisscross structures grown from DNA-origami slats. Nat. Nanotechnol. 18, 281-289 (2022).
5. S. M. Douglas, et al., Self-assembly of DNA into nanoscale three-dimensional shapes. Nature 459, 414-418 (2009).
6. D. Han, et al., DNA gridiron nanostructures based on four-arm junctions. Science 339, 1412-1415 (2013).
7. E. Benson, et al., DNA rendering of polyhedral meshes at the nanoscale. Nature 523, 441-444 (2015).
8. G. Posnjak, et al., Diamond photonic crystals assembled from DNA origami. arXiv [physics.app-ph] (2023).
9. A. Rajendran, M. Endo, H. Sugiyama, Single-molecule analysis using DNA origami. Angew. Chem. Int. Ed Engl. 51, 874-890 (2012).
10. J. J. Funke, H. Dietz, Placing molecules with Bohr radius resolution using DNA origami. Nat. Nanotechnol. 11, 47-52 (2016).
11. A. Shaw, et al., Binding to nanopatterned antigens is dominated by the spatial tolerance of antibodies. Nat. Nanotechnol. 14, 184-190 (2019).
12. R. Veneziano, et al., Role of nanoscale antigen organization on B-cell activation probed using DNA origami. Nat. Nanotechnol. (2020). https://doi.org/10.1038/s41565-020-0719-0.
13. A. Gopinath, et al., Absolute and arbitrary orientation of single-molecule shapes. Science 371 (2021).
14. S. M. Douglas, I. Bachelet, G. M. Church, A logic-gated nanorobot for targeted transport of molecular payloads. Science 335, 831-834 (2012).
15. S. Li, et al., A DNA nanorobot functions as a cancer therapeutic in response to a molecular trigger in vivo. Nat. Biotechnol. 36, 258-264 (2018).
16. C. Sigl, et al., Programmable icosahedral shell system for virus trapping. Nat. Mater. 20, 1281-1289 (2021).
17. S. Dey, et al., DNA origami. Nature Reviews Methods Primers 1, 1-24 (2021).
18. T. G. Martin, H. Dietz, Magnesium-free self-assembly of multi-layer DNA objects. Nat. Commun. 3, 1103 (2012).
19. K. E. Dunn, et al., Guiding the folding pathway of DNA origami. Nature 525, 82-86 (2015).
20. F. Dannenberg, et al., Modelling DNA origami self-assembly at the domain level. J. Chem. Phys. 143, 165102 (2015).
21. F. Schneider, N. Moritz, H. Dietz, The sequence of events during folding of a DNA origami. Science Advances 5, eaaw1412 (2019).
22. J. M. Majikes, et al., Revealing thermodynamics of DNA origami folding via affine transformations. Nucleic Acids Res. 48, 5268-5280 (2020).
23. J. Wang, et al., Probing Heterogeneous Folding Pathways of DNA Origami Self-Assembly at the Molecular Level with Atomic Force Microscopy. Nano Lett. 22, 7173-7179 (2022).
24. S. M. Douglas, et al., Rapid prototyping of 3D DNA-origami shapes with caDNAno. Nucleic Acids Res. 37, 5001-5006 (2009).
25. J. SantaLucia Jr, D. Hicks, The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct. 33, 415-440 (2004).
26. R. Owczarzy, B. G. Moreira, Y. You, M. A. Behlke, J. A. Walder, Predicting Stability of DNA Duplexes in Solutions Containing Magnesium and Monovalent Cations. Biochemistry 47, 5336-5353 (2008).
27. J.-P. J. Sobczak, T. G. Martin, T. Gerling, H. Dietz, Rapid Folding of DNA into Nanoscale Shapes at Constant Temperature. Science 338, 1458-1461 (2012).
28. Abramoff, P. J. Magalhaes, S. J. Ram, Image processing with ImageJ. Biophotonics Int. 11, 36-42 (2004).
29. P. M. Nafisi, T. Aksel, S. M. Douglas, Construction of a novel phagemid to produce custom DNA origami scaffolds. Synth. Biol. 3 (2018).
30. K. Shen, et al., Engineering an Escherichia coli strain for production of long single-stranded DNA. Nucleic Acids Res. (2024). https://doi.org/10.1093/nar/gkae189.
31. R. Dong, et al., DNA origami patterning of synthetic T cell receptors reveals spatial control of the sensitivity and kinetics of signal activation. Proceedings of the National Academy of Sciences 118, e2109057118 (2021).
32. T. Aksel, Z. Yu, Y. Cheng, S. M. Douglas, Molecular goniometers for single-particle cryo-electron microscopy of DNA-binding proteins. Nat. Biotechnol. (2020). https://doi.org/10.1038/s41587-020-0716-8.

## Figures

Figure 1. Overview of our staple-routing algorithm. (A) The user prepares a Cadnano design, leaving auto-generated staple precursors unbroken. Based on user input for allowed staple breakpoints, our script maps the input design to a weighted graph in which nodes represent candidate breakpoints, and edges represent staples that terminate at the flanking breakpoints. Each staple precursor maps to an individual subgraph. (B) Graph constraints are stored in order to later penalize route solutions that violate design intent, e.g., break both halves of a crossover. (C) Edge weights are calculated using the designated scoring function. (D) The k-shortest-paths algorithm is applied to the precursor subgraphs. (E) Subgraphs are assembled into candidate solutions, and constraint violations are penalized. (F) The relative best-scoring solution is used to output a Cadnano file implementing the corresponding staple routes. The output staple colors correspond to normalized values derived from the scoring function. (G) Design performance is analyzed using a folding performance screen followed by agarose gel electrophoresis.

Figure 2. A three-term thermodynamic model is used to calculate edge weights. (A) In the precursor subgraphs, edges connect pairs of breakpoint nodes (red circles) that flank a possible staple route. Here, candidate staple (boxed) is complementary to four distinct stretches of the scaffold (1, 2, 3, 4). (B) dGtotal values are calculated from the sum of three thermodynamic terms representing the meeting of the staple and scaffold (dGbind), each "loop closure" that constrains the 3D movement of two non-contiguous stretches of the scaffold (dGloop), and the free-energy gain due to hybridization between the staple and scaffold (dGhyb). (C) Optimized solutions are reversed-mapped to a 'heatmap' directed-graph layout with staples colored by the predicted folding temperature Tfold. A strip plot displays the Tfold values relative to 50 degrees C. Staples predicted to be less folded at T=50 degrees are colored red; more-folded staples are colored blue.

Figure 3. Experimental comparison of original and redesigned shapes. Four helix layouts were analyzed: (A) 10x10 square-lattice 100-helix block, (B) 4x16 block, (C) 8x8 block, and (D) 16x4 block. 64-helix blocks (B-D) used honeycomb lattice layouts. The first column shows the design cross-sections. The second column shows global quality scores per base pair (bp-1) (Eqn. 3). The third and fourth columns show two representations of per-staple folding temperature (Tfold) estimates: a heatmap showing the staple routes and a strip plot with Tfold values relative to 50 degrees C, with '~' indicating median value. Staples predicted with Tfold < 50 degrees C are colored red, and Tfold > 50 degrees C are colored blue. The fifth column shows electrophoretic mobility analysis of samples folded in the presence of 20 mM MgCl2. 'Yield' is the integrated intensity of boxed regions divided by the total intensity of the lane. 'Peak' regions for redesigns were physically extracted and imaged by TEM; representative micrographs are shown in the sixth column.

Figure 4. Analysis and modeling of key design parameters. (A) Global normalized design scores and gel yields correlate with R2=0.8615. (B) Plots of mean dGhyb and dGloop values for original and redesigned blocks show the relative contributions of hybridization and loop-closure to dGtotal. Staple concentrations were fixed across all folding reactions at 100 nM, resulting in a 10:1 ratio to scaffold concentration and a calculated value of dGbind = 10.383 for each staple. Bar plot shows ddG terms, or the change in median dG values between the original and redesigned shapes. (C) Increasing the concentration of a staple relative to the scaffold can marginally reduce its dGbind penalty term. (D) Global Qorigami scores of every scaffold permutation form tight distributions. (E) Customizing the length and sequence of duplexes formed between staple segments and the scaffold provides a wide range of possible dGhyb values. (F) 1D plot of per-staple dynamic ranges, in change in kcal/mol, of user-designable parameters and corresponding energy terms from our thermodynamic model.