# JASPAR TFBS extraction tool #

This tool extracts the JASPAR TFBSs intersecting with an input set of genomic
regions in BED format. TFBSs can be further filtered by providing a list of 
TFs, matrix IDs, or a TFBS score threshold (see more details below).

JASPAR TFBSs are publicly available as bigBed files [here](https://frigg.uio.no/JASPAR/JASPAR_TFBSs/). There, TFBS collections are grouped by JASPAR release to ease the usage of release-specific TFBS predictions. The required bigBed files to run this tool are found within each release-specific directory (i.e. JASPAR2022_hg38.bb to use the hg38 TFBS predictions from the 2022 release).

## Usage ##

The tool is run with the following command:

```
./bin/extract_TFBSs_JASPAR.sh -i INPUT BED -b INPUT BIGBED [-o OUTPUT] [-t TFs] [-m MATRIX IDs] [-s SCORE THRESHOLD] [-p NUM PROCESSORS]
```
where the following options are mandatory:

* `INPUT BED` is a BED file containing the regions of interest. **Note!** Currently, compressed BED files such as *.bed.gz* are not supported neither as input or output.
* `INPUT BIGBED` is a bigBed file containing the JASPAR TFBSs. These files can be downloaded from [here](https://frigg.uio.no/JASPAR/JASPAR_TFBSs/).

And the following ones are optional:

* `OUTPUT` is a path to the output file. When this is not provided, the extraction results are sent to standard output.
* `TFs` is a file containing a list of TF gene symbols separated by a new line. When provided, only TFBSs for the specified TFs will be shown.
* `MATRIX IDs` is a file containing a list of JASPAR matrix IDs separated by a new line. When provided, only TFBSs for the specified matrix IDs will be shown.
* `SCORE THRESHOLD` is an integer denoting the minimal score a TFBS should show. For more information about the correspondence between TFBS score and a p-value, see this page [here](https://genome-euro.ucsc.edu/cgi-bin/hgTrackUi?hgsid=290259872_tndyJlqyvi4iWtlWaXIXbDZqvILC&db=hg19&c=chr6&g=jaspar).
* `NUM PROCESSORS` is the number of cores to run in parallel (default = 2).

### Example files ###

The files in the `example_files` folder show an example of some input files. They are designed to work with the JASPAR 2022 bigBed file for Ciona intestinalis (JASPAR2022_ci3.bb) found [here](https://frigg.uio.no/JASPAR/JASPAR_TFBSs/). This bigBed file is not available in this repository because of its size.

After downloading the bigBed file, you can run an example with:
```
./bin/extract_TFBSs_JASPAR.sh \
    -i example_files/ciona_regions.bed \
    -b example_files/JASPAR2022_ci3.bb \
    -o example_files/ciona_TFBSs.bed \
    -t example_files/TFs.txt \
    -s 300
```

The output of this file is going to be saved in `example_files/ciona_TFBSs.bed`. 

```
cat example_files/ciona_TFBSs.bed
chr1	63	72	MA0118.1	306	-	Macho-1
```

### Docker and singularity ###

The tool can also easily be run using the container [here](https://hub.docker.com/r/cbgr/jaspar_tfbs_extraction). Below you can see how to run the same job with the example files with docker and singularity.

**Docker:**

It is necessary to bind the directory containing the data with some path in the container. In this case, we bind the root directory of this repository to the `/input/` path within the container so that the tool can access the necessary data and scripts. Make sure to modify this according to how your data is organized. In addition, we use the `--user $(id -u):$(id -g)` to avoid any permission issues when running the tool. 

```
docker run \
    -v $(realpath .):/data/:rw \
    --user $(id -u):$(id -g) \
    --rm \
    cbgr/jaspar_tfbs_extraction:<version> \
        bash /input/bin/extract_TFBSs_JASPAR.sh \
        -i /input/example_files/ciona_regions.bed \
        -b /input/example_files/JASPAR2022_ci3.bb \
        -o /input/ciona_TFBSs.bed \
        -t /input/example_files/TFs.txt \
        -s 300
```

Note that `<version>` should be substituted by the container version you want to use (e.g. `latest`).

**Singularity:**

Similar as in docker, we use the `--home` option to set the path to this repository's root directory as the home directory within the container. Additionally, we also use the `-e` option to make sure no environment variables are passed. 

```
singularity run \
    --home $(realpath .) \
    -e \
    docker://cbgr/jaspar_tfbs_extraction:<version> \
        bash bin/extract_TFBSs_JASPAR.sh \
        -i example_files/ciona_regions.bed \
        -b example_files/JASPAR2022_ci3.bb \
        -o ciona_TFBSs.bed \
        -t example_files/TFs.txt \
        -s 300
```

Note that `<version>` should be substituted by the container version you want to use (e.g. `latest`).