Transposable elements (TEs) are mobile genetic elements that have traditionally been viewed as genomic parasites, yet their persistence and accumulation patterns challenge conventional evolutionary explanations. This work presents an improvement to the agent-based model described by Kremer et al., addressing both technical limitations and biological abstractions previously described. This model supports the previous finding that TEs can persist through ecosystem engineering, where they create their own habitat by preferentially inserting into inactive TE copies, even under strong negative selection without requiring co-evolution or horizontal transfer. Key innovations include the incorporation of silencing mechanisms, multi-faceted fitness impacts, and hyperparasitism parameters that model complex TE-TE interactions such as SINE-LINE relationships. Technical improvements include efficient data structures, vectorized operations, GPU acceleration, and JIT compilation to enable efficient simulation of larger, more complex genomic systems and longer timescales. This work operationalizes the “genome ecology” analogy by enabling direct investigation of the emergent properties of TE-host interactions, including predator-prey dynamics and hyperparasitism. The model contributes to understanding biological agency by demonstrating how TE communities may exhibit self-regulating behaviors and context-dependent responsiveness. An interactive simulation component serves as both a research tool and educational resource, making complex genomic dynamics accessible across disciplines. This computational framework bridges theoretical ecology and evolutionary biology, providing new insights into genome evolution and transposon biology.
Transposable elements (TEs) are a class of mobile genetic elements that are ubiquitously found across the tree of life1. Since their discovery, TEs have generally been considered parasitic in nature, being deleterious at the host level1,2. There is certainly some validity to this claim- high TE expression is associated with chromosomal rearrangements and gene disruption through insertional mutagenesis. However, there are also cases of TEs being co-opted by the host for organism-level functions, such as formation of the placenta3. Host-level managment of TE expression varies between organisms, but generally involves silencing through methylation and heterochromatinization in eukaryotes4–6. TEs are categorized at multiple levels, with class based on the transposition intermediate, and superfamily/family based on phylogenetic traits7. Briefly, class I TEs transpose through a ‘copy-and-paste’ mechanism of transcription to mRNA, translation to various proteins, and sequence re-insertion to the genome. Class II TEs generally transpose through a ‘cut-and-paste’ mechanism of excision and re-insertion. Not all TEs encode their own transcription machinery (non-autonomous TEs), and instead can parasitize the transcription machinery of autonomous TEs1. Traditionally, questions related to the evolutionary impact of TE copy number on host fitness have been addressed with equilibrium equations and ordinary differential equations8–10.
However, these methods struggle to address complex stochastic interactions between TEs and the host on shorter time-scales11,12. Recent interest from the TE research community in applying frameworks from ecology to the genomic environment have necessitated new mathematical approaches13–15. Agent-based models (ABMs) have shown clear utility in economics, ecology, and oncology for studying the emergent properties of bottom-up stochastic systems11,12. Moreover, ABMs are incredibly intuitive and easy to visualize, making them an invaluable learning tool in a domain that requires insight from a wide array of disciplines, including evolutionary biology, ecology and philosophy. ABMs are defined as simulations of one or more populations of autonomous agents (also called individuals or entities) interacting within a system, both of whose behaviours and interactions are constrained by one or more rules12. In each iteration, agents interact with each other and the system, and an event occurs if one or more conditions are met. The parameters defining agent behaviours can be fixed values or drawn from a probability distribution. These factors result in a versatile model that is inherently stochastic and context-aware but simple in nature. ABMs are primarily used to answer questions about emergent properties of a system, which are often inherently stochastic and context-dependent.
In a previous study, Kremer et al. built an ABM of TEs in a simplified asexual prokaryotic genome15,16. The primary agents are TEs of a single autonomous class, who all share the same parameters for transposition and mutation rate. The secondary agents are genes, which are also all identical. At each iteration, TEs have a probability of transposing into a gene or intergenic region and mutating, which all have some fitness impact. This simulation provided interesting results, but had significant limitations including code inefficiencies and major biological abstractions. I present a model that addresses both limitations and can be applied to either prokaryotes or eukaryotes. To guide the extension of this model to eukaryotes, was selected as a genus of interest. are of significant interest in TE research due to their high TE expression in cerebral tissues, which is hypothesized to be linked to their cognitive complexity and intelligent behaviour6.
To construct the agent-based model, a similar prokaryotic model built by Dr. Stefan Kremer (University of Guelph) was used as a reference19. The improvements are summarized in Table 2 and benchmarks in Figure X. In the following model, TEs can be sampled from multi- ple classes, including non-autonomous classes and the autonomous classes they parasitize. TEs can be modelled as individuals (parameters sampled from a distribution) or as a group (iden- tical parameters), and the addition of a silencing mechanism has been introduced. Genes can contain multiple subcomponents including promoters, open-reading frames, exons, introns, and insulators- all with different fitness impacts upon TE insertion. In order to benchmark the improved model, the original was re-written in Python 3. Files necessary to replicate this work are available at https://github.com/noahzeidenberg/TE-Agents2.
Table: Summary of code functions, problems in TE_World2, and implemented solutions.
| Code Function | Original Implementation and Associated Issues | Solution |
|---|---|---|
| Find elements at a genomic position | Linear Search (O(n)) • Speed decreases significantly as genome size increases • Low speed for finding overlapping regions |
IntervalTree data structure (O(log n)) • Efficient range queries and overlap detection • Scales well with increased genome complexity |
| Element storage | Each element stored as a separate Python object with individual
attributes • High memory overhead • Poor cache quality |
Packed arrays • Better memory use and indexing |
| Tracking which genomic positions are occupied | Boolean arrays and lists • 8 bits per position • Large genomes have 100+ Mbps, consuming significant memory |
Bit arrays • 1 bit per position • Fast occupancy queries • Reduced memory bandwidth |
| Element Processing | For loops • Inherently slow due to interpreter overhead |
Vectorized operations with NumPy • Massive speedup for mathematical operations • More efficient CPU and memory usage |
| Function Compilation | Native Python • Python interpreter is inherently slow even with vectorization |
Just-in-time (JIT) compilation via Numba • Functions compiled to machine code, yielding higher performance after a one-time compilation cost |
| Object Allocation | Frequent (de)allocation of TE and Gene objects • Object allocation is computationally expensive • Frequent allocation causes memory fragmentation • Frequent (de)allocation induces garbage collection pressure |
Object pooling enables reuse rather than destruction/recreation |
| Chromosome Copying | Full copying • Most inherited chromosomes are initially identical and may never be modified • Full copying wastes memory and time |
Copy-on-write semantics for data sharing until modification is
needed • Memory use scales with mutations/silencing rather than population size • Faster inheritance operations |
| Memory Management | No inherent management or cap for available RAM • Large simulations could exceed available RAM, causing swapping and performance loss |
Explicitly memory-mapped data structures • Enables simulations larger than available RAM • Managed automatically by the operating system |
| Parallel Processing | None • Extremely slow and inefficient for identical fitness calculations applied at scale • Significant loop overhead and function calls |
Vectorized fitness calculations • Entire populations are processed simultaneously |
| Collision detection (e.g. a TE inserts into a gene) | Exception-based control flow • Very computationally expensive in Python • Significant speed bottleneck |
Return-based control flow and status flagging • Faster collision detection and case handling • Reduced overhead in insertion operations |
| Population statistics calculations | Recalculated each access • Recalculation for unnecessary statistics adds significant overhead |
Cached statistics and lazy evaluation • Faster access to frequently-used statistics and reduced redundancy • Memory is only used when statistics are needed |
| Statistics logging | Detailed logging and data collection each generation • Detailed statistics aren’t needed every generation • I/O operations can easily bottleneck • Long simulations result in massive logs |
Configurable collection frequency; every N iterations • Reduced I/O overhead • Detail and performance trade-off is up to the user • Greater scalability for long simulations |
| GPU utilization | None • Many operations can be accelerated on a GPU |
Automated GPU detection and use in generating probability
tables • Significant boost to speed and efficiency • Memory offload to vRAM |
| Timeout Protection | None • Infinite loops or hangs waste computational resources |
Added timeout protection to long-running operations; gzipped
save-states • Can restart from save-states without total data-loss • Automatic recovery and retry logic • Greater fault-tolerance |
Below is an interactive simulation of transposable elements in a genome. You can adjust various parameters using the sliders and observe how they affect TE dynamics, host fitness, and population evolution in real-time.
How to use the simulation:
Adjust parameters using the sliders in the four control panels
Click “Start Simulation” to begin the simulation
Observe the real-time charts showing population and TE dynamics
Watch the statistics cards for current values
Check the simulation log for detailed events
Use “Stop” and “Reset” buttons to control the simulation
Key parameters to experiment with:
TE Death Rate: High values often lead to TE extinction, while low values allow TE proliferation
Genome Size: Larger genomes provide more space for TEs to jump without colliding with genes
Host Mutation Rate: High values can cause population instability
Carrying Capacity: Affects population stability and competition dynamics
The improved agent-based model provides several key insights into TE dynamics and behavior:
TE Persistence through Ecosystem Engineering: Following the work of Kremer et al., this model demonstrates that TEs can accumulate and persist even when they are deleterious, without the need for co-evolution, recombination, or horizontal transfer15. This occurs when TEs reach a stable density relative to non-TE sequences, achieved by self-insertion, effectively create their own “habitat” for active TEs (ecosystem engineering). This prompts further work in applying neutral theories of evolution to the genomic ecosystem. The improved model further solidifies and potentially refines the conditions under which this self-creation of habitat enables TE persistence.
Modelling Silencing Mechanisms: The inclusion of silencing as a parameter allows for direct investigation of how host defense systems regulate TE activity, particularly how it inhibits (or supports) TEs in reaching a stable density. Given the modular and evolving nature of both TEs and host silencing mechanisms, this parameter enables simulation of predator-prey dynamics, where the host’s ability to evolve strategies against TE movement depends on how TEs impact fitness.
Investigating Fitness Impacts: TEs often have deleterious effects on host fitness through mutagenesis, ectopic recombination, or metabolic costs. This simulation’s ability to model such impacts, parallel to perturbations of TE accumulation under specific conditions helps to explore the complex balance between the arguably selfish replication of TEs and the selective pressures exerted by the host. Measures of TE-specific fitness impact are most easily estimated from population-level genomic data. With the steady increase in publicly available population genetics data, a model that is highly adaptable and largely organism-agnostic is hugely valuable to TE researchers.
Exploring TE Hyperparasitism: The “hyperparasitism” parameter directly addresses the intriguing relationship between different TE families, such as the interaction between LINE and SINE sequences. SINEs (such as Alu elements) are considered “parasites of the parasites” as they rely on the reverse transcriptase encoded by LINE elements for post-transcriptional re-insertion7. This model can offer quantitative insights into the conditions and consequences of such multi-level parasitic interactions, contributing to a more nuanced understanding of intra-genomic relationships.
Beyond Conventional Explanations: This work builds on ideas developed by Kremer et al. to explore scenarios where TEs accumulate despite strong selection for their removal and without becoming “domesticated”16. This further challenges prevailing views that TE accumulation is solely driven by host-level selection, genetic drift, or co-evolution, opening new avenues for understanding TE dynamics].
The concept of biological agency (see Sultan et al. Noble and Noble) refers to the capacity of living systems to actively participate in their own maintenance and function by regulating their structures and activities in response to distinct stimuli18. We see this, for example, in the distinct steps cells take during fetal development or the adaptive yet predictable immune response to novel antigens. Scientists unfamiliar with the concept of biological agency may intuit a connotation with intention and/or consciousness, but in truth agency is an empirical and independent property. It may be more digestible to the dogmatic researcher to think of biological agency in the context of a system of viable responses to stimuli, each with their own probability- avoiding questions of intention and consciousness entirely. This project contributes to understanding agency through several mechanisms:
TEs as Self-Regulating Systems: The phenomenon of “ecosystem engineering” observed in Kremer et al., where TEs persist by creating their own habitat (inactive copies) to maintain a stable density, aligns with the idea of a system’s capacity for self-regulation and self-synthesis15. Although TEs are components of the genome, their collective behaviour in the simulation demonstrates how a system (the TE community) can regulate its own dynamics and “persist” in its environment, which is a diagnostic signature of agency.
System-to-Component Explanation: Biological agency can provide a “system-to-component” direction of explanation, where the properties and activities of parts are understood by how the system as a whole regulates them in pursuit of its goals (e.g. stability or persistence). The model could illustrate this by showing how the emergent behaviour of the system (e.g. TE-piRNA equilibria) shapes the “decisions” or “activities” of individual TEs (e.g. rates of replication, inactivation) within the simulation.
Context-Dependent Responsiveness: The varying of parameters in the model (silencing, fitness impacts, hyperparasitism, host environment) represents different “conditions” that TEs encounter. The TEs’ responses, leading to persistence or extinction, highlight their “context sensitivity” and “goal-directedness” (in the non-deliberate sense of reliably attaining stable states). This contributes to the understanding of biological systems as flexible, robust, and responsive.
The “genome ecology” analogy posits that the host genome can be viewed as an ecosystem, with various genetic elements being treated as different species and relevant cellular machinery (e.g. polymerases, nucleotides, etc.) treated as resources. For example, TEs can be thought of as distinct species (e.g. LINE1 vs. Alu), interacting with each other and their molecular environment. This analogy is not intended to be “one-to-one”, but instead is intended to be a useful conceptualization for researchers understanding and exploring transposable element biology. This project significantly deepens this analogy through several key aspects:
Ecosystem Engineering in the Genome: The finding from Kremer et al. that TEs can persist by generating their own habitat is a direct application of the ecological concept of ecosystem engineering at the genomic level16. This extension further explores how these internal dynamics allow TEs to actively “shape their own environment,” a unique feature distinguishing the genome from many traditional ecological systems.
Modelling Parasitism:
Host-Parasite Dynamics: The parameters for fitness impacts and silencing directly model the host-parasite relationships central to many TE paradigms, where TEs are seen as genomic parasites and hosts develop mechanisms to control them.
TE Hyperparasitism: The inclusion of TE hyperparasitism (e.g., SINEs parasitising LINEs) introduces a more complex, multi-level behaviour into the genomic ecosystem, similar to food webs in traditional ecology1. This moves beyond simple competition models and allows for the exploration of intricate inter-species dynamics within the genome.
Operationalizing Transposon Ecology: By explicitly designing the model to exclude co-evolution, recombination, and horizontal transfer (similar to Kremer et al.’s approach), this represents a “strictly ecological approach” to transposon dynamics19. This allows for the isolation and study of ecological processes occurring within the genome (transposon ecology) independently of organism-level evolutionary changes, helping to determine how much variation in TE abundance and distribution is explained by these ecological factors.
Testing Ecological Theories: This project can serve as a valuable model system for testing general ecological hypotheses, such as those related to community structure, stability, and diversity, at an extremely fine level of grain within well-bounded genomic “ecosystems”. The relatively straightforward comparability of genomes may enable a more streamlined investigation than traditional community and population ecology.
This project represents an exciting advancement in the computational modeling of transposable element dynamics through the development of an improved agent-based model that addresses both the technical limitations and biological abstractions present in previous approaches. The enhanced model also supports the previous finding that TEs can persist and accumulate through ecosystem engineering mechanisms, even under conditions of strong negative selection, without requiring co-evolution, recombination, or horizontal transfer. This finding challenges conventional explanations for TE accumulation and opens new avenues for understanding genomic dynamics through an ecological lens.
The model’s ability to incorporate silencing mechanisms, multi-faceted fitness impacts, and hyperparasitism provides novel insight into the complex interactions between TEs and their host genomes. By modeling these phenomena as predator-prey dynamics or multi-trophic interactions, this work operationalizes the “genome ecology” analogy in a quantitative framework. The inclusion of a hyperparasitism parameter offers valuable insights into the multi-level parasitic relationships that characterize many genomic ecosystems.
From a broader perspective, this research also contributes to our understanding of biological agency by demonstrating how TEs might exhibit self-regulating behaviors and context-dependent responses. The phenomenon of ecosystem engineering, where TEs create their own habitat or “niche” through propagating inactive copies, exemplifies how biological systems can actively shape their environment to achieve stability and persistence. This work provides empirical support for system-to-component explanations of biological phenomena, where emergent properties at the population level influence individual element behaviors.
The technical improvements implemented in this model, including IntervalTree data structures, vectorized operations, JIT compilation, and memory optimization, represent a substantial advancement in computational efficiency for object-oriented genomic simulations. These optimizations enable the exploration of larger, more complex genomic systems and longer timescales, making the model a valuable tool for the broader TE research community.
The interactive simulation component serves not only as a demonstration of the model’s capabilities but also as an educational tool that makes complex genome dynamics accessible to researchers across disciplines. This interdisciplinary approach is crucial for advancing our understanding of transposable elements, which requires insights from evolutionary biology, ecology, computer science and philosophy.
Future applications of this model could include investigations into specific taxonomic groups, such as the Octopus genus, which exhibits unique patterns of TE expression in neural tissues. The model’s flexibility and organism-agnostic design make it well-suited for comparative genomic studies and hypothesis testing across diverse biological systems. However, fitness effects of TE insertion can only be accurately estimated for a given TE/species if there are enough samples from the population- which is generally not publicly available for less-studied species like Octopus spp.
In conclusion, this work establishes a robust computational framework for studying transposable element dynamics that bridges the gap between theoretical ecology and genomic biology. By demonstrating that TEs can persist through ecosystem engineering mechanisms and by providing tools to investigate the complex interactions within genomic ecosystems, this research contributes to a more nuanced understanding of genome evolution and biological agency. The model’s technical improvements and educational components ensure that it will serve as a valuable resource for the scientific community, facilitating both research and learning in the rapidly evolving field of transposon biology.