CATlas-maunal
Introduction to CATlas
Welcome to the CATlas database! We provide comprehensive reference maps of cis-regulatory elements identified from single-cell epigenetic datasets.
CATlas Highlights
-
Comprehensive Coverage:
CATlas collected massive single-cell epigenetic profiles from ~11.3 million individual cells across 5 species, >1,700 cell types/states from various tissues, and 6,458,419 candidate cis-regulatory elements.
-
Dynamic Visualization of Multimodal Omics:
CATlas can be used to browse single-cell epigenetic datasets in multiple ways, including cell clusters at single-cell resolution, epigenetic signal tracks at cell type/state resolution, and 3D genome conformation at kilobase resolution.
-
Robust Prediction of Regulatory Code via Deep Learning Models:
CATlas can be used to decipher the gene regulatory codes from single-cell epigenetic datasets by leveraging advanced and robust deep learning models, which can be used to predict the potential function of cis-regulatory elements, prioritize TF binding sites, and interpret risk variants at cell type/state resolution.
-
Crowdsourcing Single-cell Genomic datasets:
CATlas utilizes crowdsourcing to accelerate data collection via sharing standard data formats and robust computational pipelines.
-
Discovery of Biological Insights:
CATlas offers visualization of multimodal omic datasets, enabling comparison across a wide range of cell types and states, along with functional predictions of cis-regulatory elements. It is essential for generating new hypotheses and biological insights.
CATlas resources
Explore the diverse array of Cell types collected and annotated in CATlas.
For each dataset, processed data and results are available for downloading.
Click the [Download] button will display Dataset Resource Page. Click the ftp linkin the Dataset Resource Page will access to the download links.

CATlas communities
Stay informed about the latest new functions and dataset updates in CATlas by subscribing our email list.
Join our vibrant community and engage in discussions by navigating to the CATlas Google Group.
Using CATlas
Browse datasets
-
Navigate to the "Browse" Page:
Go to the "Browse" page to access single-cell datasets.
-
Select Datasets:
Use the checkboxes in the sidebar to filter and select datasets based on different species and sequencing methods.
-
Access Dataset Details and Downloads:
- Click the [Details] button to view more detailed information about the selected datasets.
- Click the [Download] button if you want to download processed data.
- Click the [Explore] button to visualize epigenetic signal tracks, cell browser and 3D genome conformation, depending on the dataset types.

Browse Datasets Explanations
-
"Visualization of Epigenetic signal tracks"
-
Select Cell Types:
- Add or remove cell types using the multi-select list below the [Class] section.
-
Tracks Panel:
- Datatypes: Click on the desired datatype to toggle its inclusion.
- Merged: Select this option to merge tracks by cell types or by data types.
- Log-transformed: Toggle whether to apply a log transformation to BigWig tracks.
-
Cell Names and Colors:
- List cell type names and their default colors as stored in CAtlas.
-
Load Public Tracks
- Click the button to manage the inclusion of SNP tracks (for human) and JASPAR tracks (for human, mouse, and fly).
-
Select Cell Types:
-
"Visualization ofCell browser "to view cell clusters via the UCSC Cell Browser
Visualization of Cell Clusters and Gene Activity:Click view -> split view -> go to "gene" panel
-
"Visualization of 3D genome conformation":
- Select only one cell type.
- Click the [hic contact] button to load HiC tracks into the Integrated Genome Viewer (IGV).
- Use drag and drop within the IGV to visualize different genomic regions.

Search dataset or cell types
-
Search Dataset
Enter a species name (e.g., human, mouse) and use the fuzzy autocomplete feature to select a specific dataset. Once selected, you will be directed to the epigenetic signal tracks page.
-
Search Cell Types
-
Single Cell Type Search:
If the input cell type has only one record in CAtlas, the system will display signal tracks for that specific cell type.
-
Multiple Cell Types Search:
If the input cell type has multiple records in CAtlas, a table will list all matching cell types. Please click on the rows in this table and then click the [VISUALIZATION] button to proceed to the cell types track page.
Cell Type Visualization Page
Predict regulatory codes via Deep Learning Models
Overview of Deep Learning Models
We have trained Basenji models ( Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 2018 May;28(5):739-750. doi: 10.1101/gr.227819.117. Epub 2018 Mar 27. PMID: 29588361; PMCID: PMC5932613) on >1,700 cell types/states, with a focus on snATAC-seq datasets. For each model, we applied grid-search to find the optimal hyperparameters, benchmarked the performance on predictions, and tested for robustness across multiple training processes.
Make Predictions
Based on a variety of user queries, CATlas can utilize these models to predict the potential function of CREs in different cell types/states, prioritize putative TF binding sites, interpret the function of non-coding risk variants, visualize results, and deliver prediction outcomes to users via email.

-
For Genome Regions Analysis
- Input a task name for your analysis (default: "default").
- Select a species of interest.
- Select a dataset for model-based analysis.
- Modify the cell types input to add or delete cell types, limit to 6.
- Input a prediction length (default: 200).
- Specify the chromosome range (e.g., "chr1, start, end").
- Input your email to receive results.
- Click the "Run Analysis" button.
-
For Genome Variant Analysis (Human Only)
- Follow the same first five steps as in the genome regions analysis.
- Input specific genomic variants (e.g., "rs657666:C>T").
- Follow the same last two steps as in the genome regions analysis.
This analysis may take several minutes. After completion, users can directly view the results online.
Prediction Results Page
-
For Genome Regions
CATlas will display both the true signals and predicted signals on queried genomic regions for selected cell types. Please use the track control panel for more display options. In silico saturation mutagenesis results are listed below, formatted in Plotly, allowing for dynamic interaction.
-
For each track in IGV, the label "
" at the top represents the experimental coverage, while the " | _predicted" below depicts the predictions made by Basenji. | Explanations for "Plotly for analysis result "
- The first row of the seq logo plot shows the summed minimum (loss) change among possible substitutions.
- The second row represents the summed maximum (gain) change.
- The third row of line plots illustrates both the minimum (loss) and maximum (gain) changes among possible substitutions.
- In the fourth row of the heatmap, the quantities displayed indicate the change in Basenji prediction 'Δ pred' (summed across the sequence) following the substitution of the nucleotide specified in each row.
-
For each track in IGV, the label "
-
For Genome Variants
Similar to the genome region analysis, with the mutation site marked by a red dashed line.
-
For each track in IGV, the label "
" at the top represents the experimental coverage, while the " | _predicted" below depicts the predictions made by Basenji. | Explanations for "Plotly for analysis result "
- The first row of the seq logo plot displays the summed change among possible substitutions based on the pre-mutated nucleotide.
- In the second row of the heatmap, the quantities shown indicate the change in Basenji prediction 'Δ pred' (summed across the sequence) following the substitution of the nucleotide specified in each row.
- The third row of the seq logo plot illustrates the summed change among possible substitutions based on the post-mutated nucleotide.
- In the fourth row of the heatmap, the quantities displayed represent the change in Basenji prediction 'Δ pred' following the substitution of the specified nucleotide in each row.
-
For each track in IGV, the label "
Download prediction results
-
Via Email
The user will receive an email with the job ID, website link for visualization, and prediction results in the attached file.
Please unzip the attached file, check the “report.pdf”, and explore prediction results in corresponding directories.
-
Online
The analysis results will be kept online for three months. Users can retrieve and visualize the results via JOBID labeled in email.
Upload-Data Hub
-
Complete the Pre-submission Survey: (Survey Form)
- Navigate to the Survey: Upon accessing the Upload-Data Hub, you will find a survey asking about the nature of your data and your research objectives.
- Fill Out the Survey: Provide detailed responses to help us understand the context and scope of your dataset. This information is crucial for tailoring our data integration and analysis tools to your needs.
-
Our Team Will Contact You:
Initial Contact: Once we receive your survey and metadata, a member of our team will reach out to you via email to discuss the next steps. This may include verification of details, discussion of data formats, and scheduling data upload.
-
Data Security and Privacy:
We prioritize the security and privacy of your data. Uploaded data is stored on secure servers with restricted access. Please ensure that any sensitive or identifiable information is removed or properly anonymized before uploading.
-
Support:
If you encounter any issues during the upload process or have any questions about data preparation, please contact our support team at "Contact"