Agent Based Models of Transposable Elements: Lessons from Ecology

Research Summary

Transposable elements (TEs) are mobile genetic elements that have traditionally been viewed as genomic parasites, yet their persistence and accumulation patterns challenge conventional evolutionary explanations. This work presents an improvement to the agent-based model described by Kremer et al., addressing both technical limitations and biological abstractions previously described. This model supports the previous finding that TEs can persist through ecosystem engineering, where they create their own habitat by preferentially inserting into inactive TE copies, even under strong negative selection without requiring co-evolution or horizontal transfer. Key innovations include the incorporation of silencing mechanisms, multi-faceted fitness impacts, and hyperparasitism parameters that model complex TE-TE interactions such as SINE-LINE relationships. Technical improvements include efficient data structures, vectorized operations, GPU acceleration, and JIT compilation to enable efficient simulation of larger, more complex genomic systems and longer timescales. This work operationalizes the “genome ecology” analogy by enabling direct investigation of the emergent properties of TE-host interactions, including predator-prey dynamics and hyperparasitism. The model contributes to understanding biological agency by demonstrating how TE communities may exhibit self-regulating behaviors and context-dependent responsiveness. An interactive simulation component serves as both a research tool and educational resource, making complex genomic dynamics accessible across disciplines. This computational framework bridges theoretical ecology and evolutionary biology, providing new insights into genome evolution and transposon biology.

Introduction

Transposable elements (TEs) are a class of mobile genetic elements that are ubiquitously found across the tree of life¹. Since their discovery, TEs have generally been considered parasitic in nature, being deleterious at the host level^1,2. There is certainly some validity to this claim- high TE expression is associated with chromosomal rearrangements and gene disruption through insertional mutagenesis. However, there are also cases of TEs being co-opted by the host for organism-level functions, such as formation of the placenta³. Host-level managment of TE expression varies between organisms, but generally involves silencing through methylation and heterochromatinization in eukaryotes^4–6. TEs are categorized at multiple levels, with class based on the transposition intermediate, and superfamily/family based on phylogenetic traits⁷. Briefly, class I TEs transpose through a ‘copy-and-paste’ mechanism of transcription to mRNA, translation to various proteins, and sequence re-insertion to the genome. Class II TEs generally transpose through a ‘cut-and-paste’ mechanism of excision and re-insertion. Not all TEs encode their own transcription machinery (non-autonomous TEs), and instead can parasitize the transcription machinery of autonomous TEs¹. Traditionally, questions related to the evolutionary impact of TE copy number on host fitness have been addressed with equilibrium equations and ordinary differential equations^8–10.

However, these methods struggle to address complex stochastic interactions between TEs and the host on shorter time-scales^11,12. Recent interest from the TE research community in applying frameworks from ecology to the genomic environment have necessitated new mathematical approaches^13–15. Agent-based models (ABMs) have shown clear utility in economics, ecology, and oncology for studying the emergent properties of bottom-up stochastic systems^11,12. Moreover, ABMs are incredibly intuitive and easy to visualize, making them an invaluable learning tool in a domain that requires insight from a wide array of disciplines, including evolutionary biology, ecology and philosophy. ABMs are defined as simulations of one or more populations of autonomous agents (also called individuals or entities) interacting within a system, both of whose behaviours and interactions are constrained by one or more rules¹². In each iteration, agents interact with each other and the system, and an event occurs if one or more conditions are met. The parameters defining agent behaviours can be fixed values or drawn from a probability distribution. These factors result in a versatile model that is inherently stochastic and context-aware but simple in nature. ABMs are primarily used to answer questions about emergent properties of a system, which are often inherently stochastic and context-dependent.

In a previous study, Kremer et al. built an ABM of TEs in a simplified asexual prokaryotic genome^15,16. The primary agents are TEs of a single autonomous class, who all share the same parameters for transposition and mutation rate. The secondary agents are genes, which are also all identical. At each iteration, TEs have a probability of transposing into a gene or intergenic region and mutating, which all have some fitness impact. This simulation provided interesting results, but had significant limitations including code inefficiencies and major biological abstractions. I present a model that addresses both limitations and can be applied to either prokaryotes or eukaryotes. To guide the extension of this model to eukaryotes, was selected as a genus of interest. are of significant interest in TE research due to their high TE expression in cerebral tissues, which is hypothesized to be linked to their cognitive complexity and intelligent behaviour⁶.

Methodology

To construct the agent-based model, a similar prokaryotic model built by Dr. Stefan Kremer (University of Guelph) was used as a reference¹⁹. The improvements are summarized in Table 2 and benchmarks in Figure X. In the following model, TEs can be sampled from multi- ple classes, including non-autonomous classes and the autonomous classes they parasitize. TEs can be modelled as individuals (parameters sampled from a distribution) or as a group (iden- tical parameters), and the addition of a silencing mechanism has been introduced. Genes can contain multiple subcomponents including promoters, open-reading frames, exons, introns, and insulators- all with different fitness impacts upon TE insertion. In order to benchmark the improved model, the original was re-written in Python 3. Files necessary to replicate this work are available at https://github.com/noahzeidenberg/TE-Agents2.

Table: Summary of code functions, problems in TE_World2, and implemented solutions.

Code Function	Original Implementation and Associated Issues	Solution
Find elements at a genomic position	Linear Search (O(n)) • Speed decreases significantly as genome size increases • Low speed for finding overlapping regions	IntervalTree data structure (O(log n)) • Efficient range queries and overlap detection • Scales well with increased genome complexity
Element storage	Each element stored as a separate Python object with individual attributes • High memory overhead • Poor cache quality	Packed arrays • Better memory use and indexing
Tracking which genomic positions are occupied	Boolean arrays and lists • 8 bits per position • Large genomes have 100+ Mbps, consuming significant memory	Bit arrays • 1 bit per position • Fast occupancy queries • Reduced memory bandwidth
Element Processing	For loops • Inherently slow due to interpreter overhead	Vectorized operations with NumPy • Massive speedup for mathematical operations • More efficient CPU and memory usage
Function Compilation	Native Python • Python interpreter is inherently slow even with vectorization	Just-in-time (JIT) compilation via Numba • Functions compiled to machine code, yielding higher performance after a one-time compilation cost
Object Allocation	Frequent (de)allocation of TE and Gene objects • Object allocation is computationally expensive • Frequent allocation causes memory fragmentation • Frequent (de)allocation induces garbage collection pressure	Object pooling enables reuse rather than destruction/recreation
Chromosome Copying	Full copying • Most inherited chromosomes are initially identical and may never be modified • Full copying wastes memory and time	Copy-on-write semantics for data sharing until modification is needed • Memory use scales with mutations/silencing rather than population size • Faster inheritance operations
Memory Management	No inherent management or cap for available RAM • Large simulations could exceed available RAM, causing swapping and performance loss	Explicitly memory-mapped data structures • Enables simulations larger than available RAM • Managed automatically by the operating system
Parallel Processing	None • Extremely slow and inefficient for identical fitness calculations applied at scale • Significant loop overhead and function calls	Vectorized fitness calculations • Entire populations are processed simultaneously
Collision detection (e.g. a TE inserts into a gene)	Exception-based control flow • Very computationally expensive in Python • Significant speed bottleneck	Return-based control flow and status flagging • Faster collision detection and case handling • Reduced overhead in insertion operations
Population statistics calculations	Recalculated each access • Recalculation for unnecessary statistics adds significant overhead	Cached statistics and lazy evaluation • Faster access to frequently-used statistics and reduced redundancy • Memory is only used when statistics are needed
Statistics logging	Detailed logging and data collection each generation • Detailed statistics aren’t needed every generation • I/O operations can easily bottleneck • Long simulations result in massive logs	Configurable collection frequency; every N iterations • Reduced I/O overhead • Detail and performance trade-off is up to the user • Greater scalability for long simulations
GPU utilization	None • Many operations can be accelerated on a GPU	Automated GPU detection and use in generating probability tables • Significant boost to speed and efficiency • Memory offload to vRAM
Timeout Protection	None • Infinite loops or hangs waste computational resources	Added timeout protection to long-running operations; gzipped save-states • Can restart from save-states without total data-loss • Automatic recovery and retry logic • Greater fault-tolerance

Science is Fun!

Below is an interactive simulation of transposable elements in a genome. You can adjust various parameters using the sliders and observe how they affect TE dynamics, host fitness, and population evolution in real-time.

How to use the simulation:

Adjust parameters using the sliders in the four control panels
Click “Start Simulation” to begin the simulation
Observe the real-time charts showing population and TE dynamics
Watch the statistics cards for current values
Check the simulation log for detailed events
Use “Stop” and “Reset” buttons to control the simulation

Key parameters to experiment with:

TE Death Rate: High values often lead to TE extinction, while low values allow TE proliferation
Genome Size: Larger genomes provide more space for TEs to jump without colliding with genes
Host Mutation Rate: High values can cause population instability
Carrying Capacity: Affects population stability and competition dynamics

Applications

Understanding Transposable Elements

The improved agent-based model provides several key insights into TE dynamics and behavior:

TE Persistence through Ecosystem Engineering: Following the work of Kremer et al., this model demonstrates that TEs can accumulate and persist even when they are deleterious, without the need for co-evolution, recombination, or horizontal transfer¹⁵. This occurs when TEs reach a stable density relative to non-TE sequences, achieved by self-insertion, effectively create their own “habitat” for active TEs (ecosystem engineering). This prompts further work in applying neutral theories of evolution to the genomic ecosystem. The improved model further solidifies and potentially refines the conditions under which this self-creation of habitat enables TE persistence.
Modelling Silencing Mechanisms: The inclusion of silencing as a parameter allows for direct investigation of how host defense systems regulate TE activity, particularly how it inhibits (or supports) TEs in reaching a stable density. Given the modular and evolving nature of both TEs and host silencing mechanisms, this parameter enables simulation of predator-prey dynamics, where the host’s ability to evolve strategies against TE movement depends on how TEs impact fitness.
Investigating Fitness Impacts: TEs often have deleterious effects on host fitness through mutagenesis, ectopic recombination, or metabolic costs. This simulation’s ability to model such impacts, parallel to perturbations of TE accumulation under specific conditions helps to explore the complex balance between the arguably selfish replication of TEs and the selective pressures exerted by the host. Measures of TE-specific fitness impact are most easily estimated from population-level genomic data. With the steady increase in publicly available population genetics data, a model that is highly adaptable and largely organism-agnostic is hugely valuable to TE researchers.
Exploring TE Hyperparasitism: The “hyperparasitism” parameter directly addresses the intriguing relationship between different TE families, such as the interaction between LINE and SINE sequences. SINEs (such as Alu elements) are considered “parasites of the parasites” as they rely on the reverse transcriptase encoded by LINE elements for post-transcriptional re-insertion⁷. This model can offer quantitative insights into the conditions and consequences of such multi-level parasitic interactions, contributing to a more nuanced understanding of intra-genomic relationships.
Beyond Conventional Explanations: This work builds on ideas developed by Kremer et al. to explore scenarios where TEs accumulate despite strong selection for their removal and without becoming “domesticated”¹⁶. This further challenges prevailing views that TE accumulation is solely driven by host-level selection, genetic drift, or co-evolution, opening new avenues for understanding TE dynamics].

Understanding Agency

The concept of biological agency (see Sultan et al. Noble and Noble) refers to the capacity of living systems to actively participate in their own maintenance and function by regulating their structures and activities in response to distinct stimuli¹⁸. We see this, for example, in the distinct steps cells take during fetal development or the adaptive yet predictable immune response to novel antigens. Scientists unfamiliar with the concept of biological agency may intuit a connotation with intention and/or consciousness, but in truth agency is an empirical and independent property. It may be more digestible to the dogmatic researcher to think of biological agency in the context of a system of viable responses to stimuli, each with their own probability- avoiding questions of intention and consciousness entirely. This project contributes to understanding agency through several mechanisms:

TEs as Self-Regulating Systems: The phenomenon of “ecosystem engineering” observed in Kremer et al., where TEs persist by creating their own habitat (inactive copies) to maintain a stable density, aligns with the idea of a system’s capacity for self-regulation and self-synthesis¹⁵. Although TEs are components of the genome, their collective behaviour in the simulation demonstrates how a system (the TE community) can regulate its own dynamics and “persist” in its environment, which is a diagnostic signature of agency.
System-to-Component Explanation: Biological agency can provide a “system-to-component” direction of explanation, where the properties and activities of parts are understood by how the system as a whole regulates them in pursuit of its goals (e.g. stability or persistence). The model could illustrate this by showing how the emergent behaviour of the system (e.g. TE-piRNA equilibria) shapes the “decisions” or “activities” of individual TEs (e.g. rates of replication, inactivation) within the simulation.
Context-Dependent Responsiveness: The varying of parameters in the model (silencing, fitness impacts, hyperparasitism, host environment) represents different “conditions” that TEs encounter. The TEs’ responses, leading to persistence or extinction, highlight their “context sensitivity” and “goal-directedness” (in the non-deliberate sense of reliably attaining stable states). This contributes to the understanding of biological systems as flexible, robust, and responsive.

Exploring the Analogy of the Genomic Ecosystem

The “genome ecology” analogy posits that the host genome can be viewed as an ecosystem, with various genetic elements being treated as different species and relevant cellular machinery (e.g. polymerases, nucleotides, etc.) treated as resources. For example, TEs can be thought of as distinct species (e.g. LINE1 vs. Alu), interacting with each other and their molecular environment. This analogy is not intended to be “one-to-one”, but instead is intended to be a useful conceptualization for researchers understanding and exploring transposable element biology. This project significantly deepens this analogy through several key aspects:

Ecosystem Engineering in the Genome: The finding from Kremer et al. that TEs can persist by generating their own habitat is a direct application of the ecological concept of ecosystem engineering at the genomic level¹⁶. This extension further explores how these internal dynamics allow TEs to actively “shape their own environment,” a unique feature distinguishing the genome from many traditional ecological systems.
Modelling Parasitism:
- Host-Parasite Dynamics: The parameters for fitness impacts and silencing directly model the host-parasite relationships central to many TE paradigms, where TEs are seen as genomic parasites and hosts develop mechanisms to control them.
- TE Hyperparasitism: The inclusion of TE hyperparasitism (e.g., SINEs parasitising LINEs) introduces a more complex, multi-level behaviour into the genomic ecosystem, similar to food webs in traditional ecology¹. This moves beyond simple competition models and allows for the exploration of intricate inter-species dynamics within the genome.
Operationalizing Transposon Ecology: By explicitly designing the model to exclude co-evolution, recombination, and horizontal transfer (similar to Kremer et al.’s approach), this represents a “strictly ecological approach” to transposon dynamics¹⁹. This allows for the isolation and study of ecological processes occurring within the genome (transposon ecology) independently of organism-level evolutionary changes, helping to determine how much variation in TE abundance and distribution is explained by these ecological factors.
Testing Ecological Theories: This project can serve as a valuable model system for testing general ecological hypotheses, such as those related to community structure, stability, and diversity, at an extremely fine level of grain within well-bounded genomic “ecosystems”. The relatively straightforward comparability of genomes may enable a more streamlined investigation than traditional community and population ecology.

Conclusions

This project represents an exciting advancement in the computational modeling of transposable element dynamics through the development of an improved agent-based model that addresses both the technical limitations and biological abstractions present in previous approaches. The enhanced model also supports the previous finding that TEs can persist and accumulate through ecosystem engineering mechanisms, even under conditions of strong negative selection, without requiring co-evolution, recombination, or horizontal transfer. This finding challenges conventional explanations for TE accumulation and opens new avenues for understanding genomic dynamics through an ecological lens.

The model’s ability to incorporate silencing mechanisms, multi-faceted fitness impacts, and hyperparasitism provides novel insight into the complex interactions between TEs and their host genomes. By modeling these phenomena as predator-prey dynamics or multi-trophic interactions, this work operationalizes the “genome ecology” analogy in a quantitative framework. The inclusion of a hyperparasitism parameter offers valuable insights into the multi-level parasitic relationships that characterize many genomic ecosystems.

From a broader perspective, this research also contributes to our understanding of biological agency by demonstrating how TEs might exhibit self-regulating behaviors and context-dependent responses. The phenomenon of ecosystem engineering, where TEs create their own habitat or “niche” through propagating inactive copies, exemplifies how biological systems can actively shape their environment to achieve stability and persistence. This work provides empirical support for system-to-component explanations of biological phenomena, where emergent properties at the population level influence individual element behaviors.

The technical improvements implemented in this model, including IntervalTree data structures, vectorized operations, JIT compilation, and memory optimization, represent a substantial advancement in computational efficiency for object-oriented genomic simulations. These optimizations enable the exploration of larger, more complex genomic systems and longer timescales, making the model a valuable tool for the broader TE research community.

The interactive simulation component serves not only as a demonstration of the model’s capabilities but also as an educational tool that makes complex genome dynamics accessible to researchers across disciplines. This interdisciplinary approach is crucial for advancing our understanding of transposable elements, which requires insights from evolutionary biology, ecology, computer science and philosophy.

Future applications of this model could include investigations into specific taxonomic groups, such as the Octopus genus, which exhibits unique patterns of TE expression in neural tissues. The model’s flexibility and organism-agnostic design make it well-suited for comparative genomic studies and hypothesis testing across diverse biological systems. However, fitness effects of TE insertion can only be accurately estimated for a given TE/species if there are enough samples from the population- which is generally not publicly available for less-studied species like Octopus spp.

In conclusion, this work establishes a robust computational framework for studying transposable element dynamics that bridges the gap between theoretical ecology and genomic biology. By demonstrating that TEs can persist through ecosystem engineering mechanisms and by providing tools to investigate the complex interactions within genomic ecosystems, this research contributes to a more nuanced understanding of genome evolution and biological agency. The model’s technical improvements and educational components ensure that it will serve as a valuable resource for the scientific community, facilitating both research and learning in the rapidly evolving field of transposon biology.

References

Hua‐Van, A., and Capy, P. (2024). Transposable Elements and Genome Evolution 1st ed. (Wiley) https://doi.org/10.1002/9781394312467.

Orgel, L.E., Crick, F.H.C., and Sapienza, C. (1980). Selfish DNA. Nature 288, 645–646. https://doi.org/10.1038/288645a0.

Senft, A.D., and Macfarlan, T.S. (2021). Transposable elements shape the evolution of mammalian development. Nature Reviews Genetics 22, 691–711. https://doi.org/10.1038/s41576-021-00385-1.

Macchi, F., Edsinger, E., and Sadler, K.C. (2022). Epigenetic machinery is functionally conserved in cephalopods. BMC Biology 20, 202. https://doi.org/10.1186/s12915-022-01404-1.

Gebert, D., and Rosenkranz, D. (2015). RNA-based regulation of transposon expression. WIREs RNA 6, 687–708. https://doi.org/10.1002/wrna.1310.

Zolotarov, G., Fromm, B., Legnini, I., Ayoub, S., Polese, G., Maselli, V., J. Chabot, P., Vinther, J., Styfhals, R., Seuntjens, E., et al. (2022). MicroRNAs are deeply linked to the emergence of the complex octopus brain. Science Advances. https://doi.org/10.1126/sciadv.add9938.

Bourque, G., Burns, K.H., Gehring, M., Gorbunova, V., Seluanov, A., Hammell, M., Imbeault, M., Izsvák, Z., Levin, H.L., Macfarlan, T.S., et al. (2018). Ten things you should know about transposable elements. Genome Biology 19, 199. https://doi.org/10.1186/s13059-018-1577-z.

Charlesworth, B., and Charlesworth, D. (1983). The population dynamics of transposable elements. Genetical Research 42, 1–27. https://doi.org/10.1017/S0016672300021455.

Langley, C.H., Montgomery, E., Hudson, R., Kaplan, N., and Charlesworth, B. (1988). On the role of unequal exchange in the containment of transposable element copy number. Genetical Research 52, 223–235. https://doi.org/10.1017/S0016672300027695.

10.

Bichsel, M., Barbour, A.D., and Wagner, A. (2013). Estimating the fitness effect of an insertion sequence. Journal of Mathematical Biology 66, 95–114. https://doi.org/10.1007/s00285-012-0504-2.

11.

Figueredo, G.P., Siebers, P.-O., Owen, M.R., Reps, J., and Aickelin, U. (2014). Comparing Stochastic Differential Equations and Agent-Based Modelling and Simulation for Early-Stage Cancer. PLOS ONE 9, e95150. https://doi.org/10.1371/journal.pone.0095150.

12.

McDonald, G.W., and Osgood, N.D. (2023). Agent-Based Modeling and its Tradeoffs: An Introduction & Examples. https://doi.org/10.48550/arXiv.2304.08497.

13.

Venner, S., Feschotte, C., and Biémont, C. (2009). Dynamics of transposable elements: Towards a community ecology of the genome. Trends in Genetics 25, 317–323. https://doi.org/10.1016/j.tig.2009.05.003.

14.

Brookfield, J.F.Y. (2005). The ecology of the genome — mobile DNA elements and their hosts. Nature Reviews Genetics 6, 128–136. https://doi.org/10.1038/nrg1524.

15.

Kremer, S.C., Linquist, S., Saylor, B., Elliott, T.A., Gregory, T.R., and Cottenie, K. (2020). Transposable element persistence via potential genome-level ecosystem engineering. BMC Genomics 21, 367. https://doi.org/10.1186/s12864-020-6763-1.

16.

Kremer, S.C., Linquist, S., Saylor, B., Elliott, T.A., Gregory, T.R., and Cottenie, K. (2021). Long-term TE persistence even without beneficial insertion. BMC Genomics 22, 260. https://doi.org/10.1186/s12864-021-07568-4.

17.

Sultan, S.E., Moczek, A.P., and Walsh, D. (2022). Bridging the explanatory gaps: What can we learn from a biological agency perspective? BioEssays 44, 2100185. https://doi.org/10.1002/bies.202100185.

18.

Noble, R., and Noble, D. (2022). Can agency be reduced to molecules? In from electrons to elephants and elections: Exploring the role of content and context 1st ed. S. Wuppuluri and I. Stewart, eds. (Springer International Publishing) https://doi.org/10.1007/978-3-030-92192-7_37.

19.

Linquist, S., Saylor, B., Cottenie, K., Elliott, T.A., Kremer, S.C., and Ryan Gregory, T. (2013). Distinguishing ecological from evolutionary approaches to transposable elements. Biological Reviews 88, 573–584. https://doi.org/10.1111/brv.12017.