We quantified how GC-content divergence (ΔGC) changes with ER size across all bacterial taxa (phylum, class, order, family, genus) that contained at least 100 ERs in the dataset.
Because ΔGC crosses zero, applying a single regression model across all values would obscure opposite directional trends on either side of this boundary. To avoid this and to detect asymmetric patterns, the analysis was performed separately for ERs with ΔGC < 0 and for ERs with ΔGC > 0. A minimum of 5 ERs per side was required to fit a regression model.
For each taxon and each ΔGC side, we fitted an ordinary least squares (OLS) linear regression of the form:
ΔGC ∼ log10(% chromosome size)
The dependent variable is log10(% chromosome) and the predictor is ΔGC. For each model, we extracted the regression slope and the p-value associated with the ΔGC term. A trend was considered statistically significant when p < 0.05.
Interpretations were generated independently for the ΔGC < 0 and ΔGC > 0 subsets:
For every taxon with at least 100 ERs:
All figures were combined into a single interactive HTML document:
For each taxon, the following statistics were recorded:
N_total)These results are provided as an interactive sortable table: