Endogenous retroviruses are a source of oncogenic enhancers in acute myeloid leukemia

Endogenous retroviruses (ERVs) and other transposons can act as tissue-specific regulators of gene expression in cis, with potential to affect biological processes. In cancer, epigenetic alterations and transcription factor misregulation may uncover the regulatory potential of typically repressed ERVs, which could contribute to tumour evolution and progression. Here, we asked whether transposons help to rewire oncogenic transcriptional circuits in acute myeloid leukemia (AML). Using epigenomic data from both primary cells and cell lines, we have identified six ERV families that are frequently found in an open chromatin state in AML when compared to differentiated healthy myeloid cells. A subset of these AML-associated ERVs harbour enhancer-specific histone modifications, and are bound by hematopoiesis-associated transcription factors that play key roles in haematopoiesis and in the pathogenesis of AML. Using CRISPR-mediated genetic editing and simultaneous epigenetic silencing of multiple ERV copies, we have established causal links between ERV deregulation in AML and expression changes of adjacent genes. Finally, we show that deletion and epigenetic silencing of an ERV, through modulating expression of APOC1 gene, leads to growth suppression by inducing apoptosis in leukemia cell lines. Our results suggest that ERV derepression provides an additional layer of gene regulation in AML that may be exploited by cancer cells to help drive oncogenic phenotypes.


Introduction
Transposable elements (TEs) act primarily as selfish genetic entities that have expanded to make up around half of the human genome (Lander et al., 2001). Although they are commonly repressed by host epigenetic mechanisms, TEs are frequently coopted as tissue-specific gene regulatory elements, such as alternative gene promoters and distal enhancers (Chuong et al., 2017). Notably, up to 34% of transcription factor (TF) binding sites in the human genome are derived from TEs (Sundaram et al., 2014), with an overrepresentation from long terminal repeat (LTR) families, which include endogenous retroviruses (ERVs). LTRs frequently recombine, leaving the majority of ERV elements as intact solitary LTRs with potential gene regulatory activity (Belshaw et al., 2007;Thomas et al., 2018). In line with this notion, genome-wide assays have documented that numerous LTR sequences carry hallmarks of active regulatory elements (Chuong et al., 2016;Fuentes et al., 2018;Jacques et al., 2013;Kunarso et al., 2010;Lynch et al., 2011;Pontis et al., 2019;Sundaram et al., 2014). In a few instances, loss-of-function experiments have provided compelling evidence of LTR co-option for host gene regulation and cellular function in hematopoiesis (Pi et al., 2010), innate immunity (Chuong et al., 2016), pregnancy (Ferreira et al., 2016) and fertility (Flemr et al., 2013).
Various studies have documented widespread epigenetic and transcriptional deregulation of TEs in several cancer types, raising the possibility that TE-derived regulatory elements may be exploited to promote tumorigenesis Burns, 2017). Indeed, activation of LTR-based promoters initiates cancerspecific chimeric transcripts in Hodgkin lymphoma, melanoma and diffuse large B-cell lymphoma, among others Edginton-White et al., 2019;Jang et al., 2019). These so-called onco-exaptation events were recently shown to be highly prevalent in multiple cancer types (Jang et al., 2019). However, studies to date have been largely centered on LTR promoter activity. As the function of TEs as enhancers is more challenging to discern, such cases remain largely unexplored in malignancies, with one notable exception of a recent study showing that systematic perturbation of LTR5_Hs family in the human embryonic carcinoma affects host gene transcription over long genomic ranges (Fuentes et al., 2018).
Acute myeloid leukemia (AML) is a highly heterogeneous clonal malignancy, characterized by the combinations of distinct driver mutations (Cancer Genome Atlas Research Network et al., 2013). Notably, over 70% AML patients have a genetic abnormality in one epigenetic modifier (Papaemmanuil et al., 2016). Interestingly, altered chromatin landscape, including DNA methylation (Figueroa et al., 2010), histone modifications and chromatin accessibility (Assi et al., 2018;Yi et al., 2019) establishes different AML subtypes linked to specific genetic alterations.
Consequently, widespread disruption of epigenetic patterns is a characteristic feature of AML, which may stimulate TE activation as regulatory elements. In this respect, AML is an ideal model system to explore onco-exaptation of TEs and determine to what extent such events are important for maintaining the expression of genes that promote malignancy progression and evolution.
Here we have identified six ERV/LTR families with putative regulatory potential in AML, bearing enhancer-like epigenomic signatures and binding of TFs that play key roles in haematopoiesis and in the pathogenesis of AML. Deletion of selected ERV copies and epigenetic inactivation of an entire ERV family demonstrated their roles as gene regulators. Remarkably, we found that genetic and epigenetic perturbation of a single LTR copy leads to impaired cell growth by modulating expression of APOC1 gene, suggesting that the activation of this particular LTR has a driving role in leukemia cell phenotype.

Identification of putative AML-specific regulatory TEs
In order to assess whether cellular transformation in AML unmasked intrinsic regulatory activity of TEs, we identified putative regulatory TEs by focusing on repeat families that were frequently associated with open chromatin in AML. To this end, we analysed DNase-seq data from 32 AML samples generated by the Blueprint epigenome project (Yi et al., 2019), and compared them with data from differentiated myeloid cells (macrophages, monocytes) from the same consortium ( Figure 1A).
Additionally, we generated our own DNase-seq data from three commonly used AML cell lines with different genetic and cytogenetic backgrounds: HL-60, MOLM-13 and OCI-AML3. We overlapped DNase hypersensitive sites (DHSs) with the Repeatmasker annotation and compared the DHS frequency at each repeat family with random controls (Supplementary Table 1). We identified 12 repeat families that were enriched for DHS-associated copies in at least one of the AML cell lines and in Figure 1. ERVs with regulatory potential are activated in AML and associated with gene expression differences (hematopoietic cells credit: A. Rad CC-BY-SA-3.0). A. Schematic of the strategy to detect repeat families associated with open chromatin in AML. B. Heatmap of the observed/expected enrichment for DHSs in selected repeat families. Cell lines are presented in the following order: HL-60, MOLM-13, OCI-AML3. C. DNase-seq profile across all elements of each AML DHS-associated repeat (A-DAR) families in OCI-AML3. D. Gene expression violin plots for genes within 50 kb of A-DARs, with or without overlapping DHS in AML and/or differentiated cells. E. For each gene lying near an A-DAR element, we compared its expression in AML samples where the respective ERV has a DHS, versus AML samples where the DHS is absent. Expression values were normalized using the variance stabilizing transformation (vst; log2 scale) in DESeq2. Highlighted are genes with >4-fold difference and vst>0. F. Example of a gene (SCIN) that displays a strict correlation between its expression and the presence of a DHS peak at a nearby LTR12C element in different AML samples.

DNase hypersensitive sites (DHS)
Overlap with Repeatmasker Compare with randomised data Distance from ERV centre dataset of 32 AML samples from the Bonifer lab (Assi et al., 2018) confirmed the DHS enrichment at all of the above families, and identified additional weaker associations, including with several Alu subfamilies (Supplementary Figure 1A). For stringency, we focused on families that were DHS-enriched in both datasets, all of which are LTRs from ERVs: LTR2B, LTR2C, LTR5B, LTR5_Hs, LTR12C, LTR13A. We excluded the internal portion of HERVK (HERVK-int) because its enrichment was largely due to the fragmented nature of the Repeatmasker annotation, and because its LTRs (LTR5B, LTR5_Hs) displayed much higher DNase-seq signal than the internal portion (Supplementary Figure 1B). We will collectively refer to the six selected ERV families as 'AML DHS-associated repeats' (A-DARs). The oldest A-DARs (LTR5B, LTR13A) date back to the common ancestor between hominoids and old world monkeys, whereas the youngest (LTR5_Hs) are human-specific (Hubley et al., 2016).
The DNase-seq profiles across each ERV displayed a consistent pattern for elements of the same family in AML cell lines (less evident for LTR2C), suggestive of TF binding events within these ERVs ( Figure 1C displays OCI-AML3 profiles  Figure 2A). Although there was no strict association with particular AML subtypes, we found that samples with NPM1 mutations were better inter-correlated than those without (Supplementary Figure 2B).
The same was true for samples with FLT3-ITD and DNMT3A mutations, which frequently co-occur with NPM1 mutations, as well as those with CEBPA mutations (Supplementary Figure 2B). Specific mutations may therefore help to establish particular patterns of ERV activation in AML, although other characteristics of the malignancy are also likely to affect them.

A-DAR chromatin status correlates with nearby gene expression
To test whether A-DARs were associated with gene activation, we analysed matching DNase-seq and RNA-seq data from the Blueprint consortium. ERVs can affect the expression of proximal genes, but also act at a distance via long-range interactions in 3D space (Fuentes et al., 2018;Raviram et al., 2018). However, long-range interactions display substantial cell specificity, namely within the hematopoietic system (Javierre et al., 2016). Given the heterogeneity between AML samples and the lack of matching Hi-C data, we focused our analysis on genes within 50 kb of an ERV from the selected families. Genes lying close to A-DAR elements with DHS in 2 or more AML samples displayed higher expression levels than those lying close to A-DAR elements without DHS ( Figure 1D). This was more pronounced for ERVs with DHS also present in differentiated cells. Such bulk correlations are only suggestive of a regulatory role of ERVs and need to be formally tested for a causal link, as we have previously shown (Todd et al., 2019). Nonetheless, we found ERVs with strong supporting evidence for their regulatory activity, as the expression levels of their vicinity genes were >4-fold higher in AML samples with DHS at a given ERV, versus those without ( Figure 1E). This included a strict correlation between chromatin accessibility at a LTR12C element and the expression of the SCIN gene ( Figure 1F), which has a mild predictive value for AML prognosis (Z.-H. Zhang et al., 2018). TPD52 and AHSP expression, which were correlated with the presence of DHS at a LTR12C and LTR5B element respectively, have also been linked to AML prognosis (Ha et al., 2019;Zhu et al., 2017). These data suggest that at least some A-DAR elements gain gene regulatory activity in AML, which correlates with disease outcomes.

A-DARs bear signatures of enhancer elements
DNase hypersensitivity is associated with both active gene promoters and distal enhancers. LTR12C elements, for example, were previously shown to frequently act as alternative gene promoters in different cell types, including hepatocellular carcinoma (Hashimoto et al., 2015) and cell lines treated with DNMT and HDAC inhibitors (Brocks et al., 2017). In contrast, LTR5_Hs (HERV-K) elements appear to mainly act as distal enhancer elements in embryonic carcinoma cells and embryonic stem cells (Fuentes et al., 2018;Pontis et al., 2019). We therefore aimed to establish whether A-DARs could act as promoters and/or enhancers in AML.
To test for gene promoter activity, we performed de novo transcriptome assembly in AML samples and differentiated myeloid cells, and calculated the number of spliced transcripts for which the transcriptional start site (TSS) overlapped an A-DAR element.
AML samples displayed 31-53 such transcripts, whereas differentiated cells had 20-28, most of which emanated from LTR12C elements ( Figure 2A). We identified 82 spliced transcripts that were present in two or more AML samples but were absent in differentiated cells (Supplementary Table 2). Most of these were short transcripts and only 26 had evidence of splicing into exons of annotated genes. One example involved a LTR2C element active in a subset of AMLs, which acted as a non-reference promoter    H3K4me3 AML Diff.
We then asked whether A-DARs are frequently marked by histone modifications typically associated with promoters or enhancers. Using ChIP-seq data from the Blueprint consortium, we first plotted the percentage of elements from each ERV family that were marked by H3K27ac, H3K4me1, H3K4me3 or H3K9me3 in AML and differentiated myeloid cells (Supplementary Figure 3A). Notably, in AML samples an average 5.7-15.2% of elements from each family overlapped H3K4me1 peaks, a mark predominantly associated with poised and active enhancers. This was substantially higher than the fraction overlapping with the active promoter mark H3K4me3 (1.3-3.4%). Indeed, a more detailed analysis of histone modification patterns at A-DAR elements showed that H3K4me1 is either found in conjunction with H3K27ac (active enhancers), or on its own (primed enhancers) but is more rarely found together with H3K4me3 ( Figure  To test whether myeloid leukemia cell lines could be used to dissect the putative enhancer roles of A-DARs, we performed H3K27ac ChIP-seq on HL-60, MOLM-13, OCI-AML3 and K562 cells, and compared patterns with those seen in AML samples. A-DAR elements that overlap H3K27ac peaks in AML samples were also frequently associated with this mark in cell lines ( Figure 2E). A ChromHMM annotation for K562 cells from ENCODE further supported that these elements often bear enhancer signatures ( Figure 2E). It is worth noting that there is substantial variation in H3K27ac enrichment of A-DARs among cell lines, much like primary AML samples.
Nonetheless, example loci show that H3K27ac deposition at A-DAR elements in cell lines can recapitulate primary AML data ( Figure 2F), opening up the opportunity to functionally test for enhancer activity of these loci by genetic and epigenetic editing approaches.

A-DARs bind AML-related TFs
Two large-scale data mining efforts had previously identified several TFs that are associated with ERV families identified here, either through ChIP-seq data or motif analyses (Jacques et al., 2013;Sundaram et al., 2014). These included hematopoiesis-and AML-related TFs such as TAL1, SPI1, GATA2 and ARNT, amongst others. To confirm and extend these observations, we first performed our own analysis of TF ChIP-seq data from K562 cells (ENCODE consortium). Our comparison with AML data above gave us confidence that K562 cells were an adequate model to study TF binding patterns at A-DARs. We analysed all TF ChIP-seq peak data available from ENCODE, comparing their overlap with A-DARs against a random control. We selected TFs that are bound to at least 5% of the elements in a given ERV family, in a statistically significant manner, yielding a list of 217 TFs ( Figure 3A; Supplementary Table 3). The vast majority of these TFs were found to be expressed in AML samples (198 had higher expression than TBP). Many of these TFs are involved in hematopoietic gene regulation and/or in the etiology of AML, including SPI1, TAL1, IKZF1 and PKNOX1 ( Figure 3A). ChIP-seq profiles of individual elements revealed a localized pattern of TF binding at a subset of elements ( Figure 3B). Notably, different ERV families bind different combinations of TFs.
We also performed TF motif analysis, looking for enrichment against shuffled sequence controls ( Figure 3C, Supplementary Table 4). Reassuringly, the results from this analysis were largely congruent with the ChIP-seq data. Apart from confirming the presence of motifs for SPI1, PKNOX1 and other TFs, we also found enrichment for HOXA9/MEIS1 motifs in four different ERV families. High HOXA9 and MEIS1 expression is associated with specific AML subtypes, namely NPM1 mutated AMLs (Mullighan et al., 2007), and are sufficient to drive leukemogenesis in mouse models (Kroon et al., 1998). It remains to be tested whether the presence of HOXA9/MEIS1 motifs is what drives NPM1-specific patterns of DHSs at A-DARs (Supplementary Figure 2B). In line with the high frequency of many of the identified TF motifs, we found that they were present in the consensus sequences of each ERV family ( Figure 3D), suggesting that the respective retroviruses brought in these motifs within their LTRs upon invasion of the human genome. Finally, we asked whether some TF motifs were responsible for chromatin opening at individual elements. We compared motif frequency between elements with DHSs (DHS+) in at least five of the analysed AML samples and DHS-negative elements ( Figure 3E). Although some TFs were preferentially associated with DHS+ elements (e.g., CEBPB in LTR2B), many were not, and none were sufficient to explain the presence or absence of DHSs. For example, even though SPI1 binding motif is present in the majority of DHS+ elements, 200 bp a large portion of non-DHS elements also harbor this motif ( Figure 3E). This suggests that other factors play a role in determining LTR regulatory potential, in line with our previous observations in mouse stem cells (Todd et al., 2019).
These analyses suggest that the potential regulatory activity at particular ERV families in AML is likely driven by the binding of hematopoiesis-associated TFs, which are either upregulated in AML or whose binding sites become accessible in AML through epigenetic alterations.

Genetic excision of A-DAR elements interferes with host gene expression
To test for causal roles of enhancer-like A-DAR elements in gene regulation, we used CRISPR-Cas9 to delete three candidate ERVs. The selected ERVs are enriched in H3K27ac, bound by multiple hematopoiesis-associated TFs in K562 cells ( Figure 4A), and overlap DHSs in multiple AML samples, but not in monocytes or macrophages (Supplementary Figure 4). We generated clonal lines from K562 cells that had heterozygous or homozygous deletions of these A-DAR elements, and measured the expression of associated genes in multiple clones. Other leukemia cell lines (HL-60, OCI-AML3, MOLM-13) proved more refractory to genetic deletion, due to the low efficiency of Cas9 delivery and single cell expansion.
One of the deleted loci was a LTR5B element located in the first intron of ZNF321P, which is bound by PKNOX1, SPI1, STAT5 and TAL1 ( Figure 4A, top). Deletion of this element led to a significant decrease in ZNF321P expression and also affected the expression of two other nearby genes, ZNF320 and ZNF888 ( Figure 4B, left). Notably, all three genes display higher expression in AML samples when compared to monocytes and macrophages ( Figure 4B, right). Interestingly, ZNF320 is also upregulated in multiple cancer types (Machnik et al., 2019), but its relevance to cancer etiology remains unclear. ZNF320 is a member of the Krüppel-associated box (KRAB) domain zinc finger family and predominantly binds LTR14A and LTR14B elements (Imbeault et al., 2017), suggesting a potential role in ERV silencing. Heterozygous deletion of another LTR5B element, bound by BCOR, SPI1, TAL1 and RUNX1 ( Figure   4A, middle), reduced the expression of Ribosomal Protein L7 Like 1 (RPL7L1) ( Figure   4C, left), which is upregulated in AML when compared to differentiated myeloid cells ( Figure 4C, right). Notably, this specific LTR5B contains a single nucleotide polymorphism (SNP) for which the minor allele (highest population frequency of 0.41) disrupts a MAFK binding motif (Supplementary Figure 4C). Using data from the GTEx project, we found that the minor allele was associated with lower RPL7L1 expression in whole blood (Supplementary Figure 4C), suggesting that the MAFK motif is    Figure 4A, bottom). Excision of this particular element led to around 3-fold reduction in BIK expression ( Figure 4D, left), which is higher in AML samples when compared to other hematopoietic cell types ( Figure 4D, right). This LTR13A also contains a SNP, where the minor allele (highest population frequency of 0.5) is a critical residue in a RUNX1 binding site, but that was not associated with any significant differences in BIK expression in whole blood (Supplementary Figure 4E).
Overall, CRISPR-mediated genetic deletion assays demonstrate a role of a subset of A-DAR elements in gene regulation in K562 cells. Moreover, DHSs within the candidate ERVs and high expression of their associated genes in AML patients provide strong evidence for their regulatory activation in vivo.

CRISPRi-mediated inactivation of LTR2B elements leads to growth suppression
To test the regulatory function of multiple A-DAR elements simultaneously, we next sought to epigenetically silence one ERV family by CRISPR interference (CRISPRi) using a catalytically dead Cas9 (dCas9) fused to the KRAB transcriptional repressor protein. We chose the LTR2B family mainly because this was the only one whose DHS enrichment was significant only in AML samples, and not in CD34+ cells ( Figure 1B), suggesting a more cancer-specific role than other A-DARs. We designed 4 sgRNAs targeting the most conservative regions of LTR2B family, predicted to recognize around 217 copies (68%) when allowing zero mismatches complementary to the sgRNAs. Our LTR2B sgRNAs are also predicted to target copies of highly related LTR2 copies (71 copies; 8%). We first generated stable CRISPRi OCI-AML3 and K562 cell lines and then stably introduced LTR2B sgRNAs or an empty vector control using a lentiviral system. To determine dCas9 specificity on a genome-wide scale, we performed dCas9 ChIP-seq in K562 cell lines expressing LTR2B sgRNAs or empty vector. We detected 395 dCas9 peaks in cells with LTR2B sgRNAs (and none in control cells), 187 of which were associated with LTR2B elements, and 90 with LTR2 elements ( Figure 5A,B). The remaining 118 peaks ( Figure 5B) were included in downstream analyses to evaluate putative off-target effects. We also performed H3K27ac and H3K9me3 ChIP-seq in the same cells to assess the epigenetic changes imparted by CRISPRi. We quantified the ratio in histone modification levels at dCas9 peaks between cells expressing LTR2B sgRNAs and those with the empty vector control. As expected, upon CRISPRi in K562 cells we observed a reduction of H3K27ac signal and/or gain of H3K9me3 signal at most loci bound by dCas9, demonstrating effective epigenetic editing ( Figure 5C,D). Notably, LTR2B/LTR2 target sites generally underwent more pronounced changes in H3K27ac and H3K9me3    Intriguingly, proliferation assays showed that epigenetic silencing of LTR2B and LTR2 elements by CRISPRi significantly suppressed cell proliferation in both K562 and OCI-AML3 cell lines ( Figure 5E). To test the impact of LTR2B and LTR2 inactivation on the host transcriptome, and gain insights into the mechanism underlying impaired cell growth, we performed RNA-seq in both cell lines ( Figure 5F; Supplementary Figure   5D). We identified a total of 58 and 99 differentially expressed genes in K562 and OCI-AML3 cells, respectively (Supplementary Table 5). To elucidate direct effects of CRISPRi, we focused on genes that are within 50 kb of a dCas9 peak and found 15 and 6 differentially expressed genes (in K562 and OCI-AML3 cells, respectively), all but one of which were downregulated. Only one of these genes (BIK), which was downregulated in OCI-AML3, was associated with an off-target dCas9 peak. The remaining genes were associated with 15 different LTR2B/LTR2 elements. In some instances, the LTR2B/LTR2 element was very close to the promoter of the affected gene, such that silencing could have resulted from H3K9me3 spreading. We therefore performed genetic deletion of one of these elements, which also led to a decrease in gene expression of the adjacent ZNF611 gene, albeit to a lesser extent than by Overall, these data show that a subset of LTR2B and LTR2 elements act as key gene regulators in leukemia cell lines, and that their epigenetic silencing impairs cell growth, providing evidence for a putative functional role in AML.

APOC1-associated LTR2 is required for proliferation of myeloid leukemia cells
APOC1 has recently been shown to maintain cell survival in AML and the knockdown of APOC1 impairs cell growth (Yang et al., 2018). Similar findings were made in pancreatic and colorectal cancer, where APOC1 overexpression is associated with poor prognosis (Ren et al., 2019;Takano et al., 2008). We therefore asked whether ERV-mediated regulation of APOC1 could affect cell growth. There is an LTR2 insertion upstream of the APOC1 promoter (APOC1-LTR2; Figure 6A, Supplementary   Figure 6A), which has been previously described to act as an alternative promoter in several tissues, but only accounts for up to 15% of total APOC1 transcription (Medstrand et al., 2001). In K562 and OCI-AML3 RNA-seq data, we find no evidence of APOC1-LTR2 promoter activity ( Figure 6A), which we have confirmed by RT-qPCR (Supplementary Figure 6B), suggesting that APOC1-LTR2 acts as an enhancer element. APOC1-LTR2 is enriched in STAT5 and TAL1 binding and shows an increase in H3K9me3 and decrease in H3K27ac upon CRISPRi in both K562 and OCI-AML3 ( Figure 6A, Supplementary Figure 6A). To test for a direct role of APOC1-LTR2 in APOC1 gene expression and cell growth, we deleted this element in K562 cells. We obtained 7 heterozygous and 8 homozygous clones from a total of 110 clones.
Interestingly, none of the homozygous clones were able to grow more than 10 days in culture, suggesting that homozygous deletion may impair cell growth. To pursue the impact of APOC1-LTR2 on cell growth, we used lentiviral-mediated CRISPR-Cas9 delivery and performed assays in a pool of edited cells ( Figure 6B). At day 6, after GFP and puromycin selection of the two flanking sgRNAs, we observed around 60% deletion of APOC1-LTR2 and more than 2.5-fold reduction in APOC1 gene expression compared to an empty vector control ( Figure 6C,D). Remarkably, deletion of this element was sufficient to drive a significant suppression of cell proliferation compared to control cells ( Figure 6E). As there is a small fraction of unedited cells in the pool, we asked whether the unedited cells may outcompete edited cells over time. After day 20, the deletion was reduced to around 35% and only 1.2-fold difference was observed in deletion leads to impaired cell growth, we analyzed cell cycle and apoptosis with flow cytometry in K562 cells at day 6. While no differences in G1, S, and G2 phases were detected, there was a significant increase in the sub-G1 population ( Figure 6F) in edited cells. In agreement with this, Annexin V signal was significantly higher in edited cells compared to unedited cells at day 6 ( Figure 6G), showing that the deletion of   depletion (Ren et al., 2019;Takano et al., 2008;Yang et al., 2018). As expected, this difference is much smaller after day 20 (Supplementary Figure 6D). We also tested the effect of APOC1-LTR2 deletion in OCI-AML3 cells, but due to the low efficiency of Cas9 delivery and low viability of cells at day 6, we performed expression and annexin V analysis at day 10. Similar to what we observed in K562 cells, APOC1-LTR2 deletion in OCI-AML3 cells led to around 4-fold decrease in APOC1 expression and increased annexin V signal, and these effects were milder at day 23 (Supplementary Figure   6E,F).
Together, our findings indicate that the APOC1-LTR2 element is essential for proliferation of leukemia cells by acting as an enhancer of the APOC1 gene, which in turn controls cell survival via an anti-apoptotic mechanism.

Discussion
Here we provide multiple lines of evidence that particular ERVs are used as regulatory elements to activate gene expression in AML, which may be exploited by cancer cells to help drive disease phenotypes and cancer progression. It had been previously postulated that the epigenetically relaxed state of cancer cells provides a window of opportunity for ERV activation, triggering their intrinsic regulatory capacity Chuong et al., 2017;Lamprecht et al., 2010). However, to the best In line with previous studies, only a subset of all the elements in a given ERV family (10-37%) are marked by active histone modifications and bound by haematopoiesisrelated TFs, despite a high level of motif conservation across ERV copies.
Nonetheless, we identified multiple ERV elements with strong evidence supporting their role as bona fide gene regulators: 1) we found striking correlations between differential chromatin accessibility at 20 ERVs and the expression of nearby genes, some of which have been linked to AML prognosis ( Figure 1E,F); 2) CRISPR-mediated genetic editing experiments revealed an additional five ERVs that act as enhancers in leukemia cells (Figure 4, Supplementary Figure 5E, Figure 6); 3) CRISPRi identified another 13 different elements whose epigenetic silencing led to the down-regulation of nearby genes (Supplementary Table 6). A more exhaustive search would likely have revealed additional regulatory elements, namely via epigenetic silencing of other ERV families. We were also purposefully conservative in our assessment of ERV effects on gene expression, which may frequently extend beyond 50 kb, although additional evidence would then be necessary to support a direct role.
Despite the growing evidence that ERVs can act as regulatory elements in different cancers, there are limited examples for their inappropriate activation contributing to oncogenesis, a term coined as onco-exaptation . The term has been frequently used to describe the gain of regulatory activity at TEs in cancer. Our view is that, similar to the term exaptation (Gould and Vrba, 1982), onco-exaptation requires that this new regulatory activity provides the cancer cell with a selective advantage. Strong demonstrations of such adaptive roles are scarce. Notably, the Wang lab recently showed that an AluJb element acts as an oncogenic promoter to drive LIN28B expression and tumour progression in lung cancer (Jang et al., 2019). In our study we identified an LTR2 element, the genetic and epigenetic perturbation of A considerably larger number of patients would be necessary to confirm this finding, although independent datasets have led to similar observations in colorectal and pancreatic cancer (Ren et al., 2019;Takano et al., 2008). APOC1 is also activated in monocyte-to-macrophage differentiation (Lauer et al., 1988), raising the possibility that APOC1-LTR2 may play other roles in haematopoiesis outside of AML.
Given their repetitive nature, one intriguing question is why particular ERVs within a family are recurrently activated in AML to drive nearby gene expression, yet the majority of them are functionally neutral. One explanation lies in the nature of interand intra-cellular epigenetic heterogeneity that increases during malignancy formation.
This gives rise to epigenetic activation of a set of ERVs, as proposed in the epigenetic evolution model . Accordingly, cells harbouring activated ERVs that drive oncogenes gain a selective advantage and increase in frequency during cancer evolution. Therefore, clonal expansion of these cells will enable the detection of oncogenic ERVs in a cell population. However, whether ERV activation contributes to cancer evolution or is simply a consequence of the molecular state of cancer remains a matter of debate.
Irrespective of whether epigenetic heterogeneity at ERVs contributes to tumour evolution, distinct patterns of ERV activity are observed across different patients (Supplementary Figure 1A). These differences appear to be partly driven by the underlying mutational profiles. We also identified a SNP within an ERV that seemingly affects its regulatory activity by altering a TF binding site (Supplementary Figure 4C), suggesting that genetic variation within ERVs also contributes to inter-individual differences in ERV activity. Finally, younger ERVs such as LTR5_Hs are structurally polymorphic within the human population (Belshaw et al., 2005;Thomas et al., 2018), adding another layer of genetic variation. Regulatory ERVs may therefore foster genetic, epigenetic and transcriptional heterogeneity of the disease with potential to contribute to clinical outcomes. One significant consequence of the molecular heterogeneity of AML is the escape of resistant clones from treatment, resulting in high relapse rates. It will be therefore interesting to discover to which extent the ERVderived heterogeneity contributes to inter-individual differences in response to AML therapies.
Altogether, our work reveals the first example of ERVs as oncogenic enhancers in AML and, more importantly, in any human malignancy. These data highlight the significance of expanding the search for oncogene drivers to the repetitive part of the genome, which may pave the way for the development of novel prognostic and therapeutic approaches.
For cell proliferation assays, exponentially growing cells were plated in 24-well plates (1x10 5 cells/ml). Every 2-3 days media were replaced and cells were split to 1x10 5 cells/ml. The viable cells were counted daily for 6 days.
Cell-cycle and apoptosis assays Cell cycle assay was performed using muse cell cycle kit by following manufacturers instructions (Millipore) and the cells were analysed by BD FACS Canto II. For apoptosis assay, the cells were stained by an annexin V 647 (Thermofisher Scientific) and DAPI and analysed by BD FACS Canto II.

Chromatin accessibility assay
To asses chromatin accessibility, DNase I digestion was performed as previously described (Song and Crawford, 2010 Primary processing of high-throughput sequencing data Reads from high-throughput sequencing data generated here or from external datasets (Supplementary Table 8) were trimmed using first trimmed using Trim Galore.

Mutational profile analysis
A-DAR elements overlapping DHSs in at least one sample were selected, and a correlation matrix built based on the patterns of DHS overlap between samples. These were compared with the AML mutational profiles extracted from the respective publications (Assi et al., 2018;Yi et al., 2019). Correlation coefficients between AML samples sharing a particular mutation were compared with correlation coefficients between samples without the mutation.

Identification of active A-DAR promoters
Aligned BAM files from Blueprint RNA-seq data were processed using StringTie v1.3.3b (Pertea et al., 2015) with options --rf -G to generate sample-specific guided transcriptome assemblies. Spliced transcripts initiating at A-DAR elements were then identified by intersecting the TSSs of multi-exon transcripts A-DAR annotations. A-DAR elements with TSSs in AML samples but not in differentiated cells were selected and the associated transcripts visually inspected.
K562 TF ChIP-seq analysis ENCODE TF ChIP-seq peak files from K562 (Supplementary Table 8) were downloaded and intersected with A-DAR annotations, as well as with a randomly shuffled version of these elements. TFs significantly enriched (corrected p<0.05) in at least one of the A-DAR families, covering at least 5% of the elements in that family were selected. For each TF, average enrichment values were calculated across technical and biological replicates, as well as independent ChIP-seq experiments of the same TF.

TF motif analysis
Motif analysis of A-DARs was performed using the AME and FIMO tools of the MEME SUITE v5.0.1 (Bailey et al., 2015) using the HOCOMOCO v11 human TF motif database. Motifs enriched in at least one A-DAR family were identified using AME, and motif frequency and location extracted using FIMO. Consensus sequences were downloaded from Dfam (Hubley et al., 2016).
CRISPRi ChIP-seq and RNA-seq analyses Normalised H3K27ac and H3K9me3 ChIP-seq read counts were extracted around dCas9 peaks (±500 bp from the peak centre). Genes within 50 kb of a dCas9 peak were considered as putative direct targets of CRISPRi. Differential gene expression analysis was performed using DEseq2 (Love et al., 2014).

Data availability
Datasets are available through the Gene Expression Omnibus (GEO) under accession number GSE136764.
of the data is available from www.blueprint-epigenome.eu. Funding for the project was provided by the European Union's Seventh Framework Programme (FP7/2007(FP7/ -2013 under grant agreement no 282510 -BLUEPRINT. This research utilised Queen Mary's Apocrita HPC facility, supported by QMUL Research-IT (King et al. 2017). performed bioinformatic analyses.

Competing Interests
The authors declare no competing interests.

Tables
Supplementary