RSA-tools - Tutorials

The aim of these tutorials is to give a theoretical and practical introduction to the Regulatory Sequence Analysis Tools (RSAT) software suite. The most convenient way to follow the tutorial is to display the current page in a separate window, and to use the tools with the current one.

The RSAT home page displays two frames. The frame on the left contains a menu, presenting the available tools. Each time you click on a tool name, the right frame displays the form for the corresponding tool.

The tools are organized in a modular way : rather than having a single form for the complete analysis, we found it more convenient to present separate forms for the successive steps of a given analysis. A typical analysis will thus consist in using successivbely different tools (for example sequence retrieval -> motif discovery -> pattern matching -> feature-map). For this purpose, the tools are interconnected, allowing you to send automatically the result of one request as input for the next request (piping). The links between tools are illustrated in the flow chart below. An advantage of this modular organization is that you can either follow a full pipeline throught the tools, or directly enter at any step of an analysis with external data of your own.

We will analyze some practical examples to get familiar with the different tools, and the way they are interconnected.

The tutorial contains different parts, illustrating the typical situations that can be encountered when analysing regulatory sequences :

  1. Pattern matching: you know the regulatory motif (e.g. the consensus for a transcriptional factor), and you are interested by one or several particular sequences (e.g. promoters of a gene of interest, or binding fragments obtained from ChIP-on-chip experiments): you look for the matching positions within the sequences.
  2. Genome-scale pattern matching: you know the regulatory motif, and you would like to scan the genome to detect genes having this motif in their regulatory regions, which may be considered as potential target genes for the transcription factor of interest.
  3. Motif discovery (or pattern discovery). You know the sequences, you ignore the regulatory motif : you dispose of a set of functionally related regulatory sequences (e.g. promoters of co-expressed genes, or peaks collected from ChIP-seq experiments), and you suspect that they are enriched in binding site for one or seveal transcription factors. You thus want to detect a motif "ab initio" from the sequences.


    Representations of transcription factor binding motifs

  1. String-based representations
  2. Position-specific scoring matrices (PSSM)
  3. Sequence retrieval

  4. from RSAT
  5. from EnsEMBL
  6. Pattern matching

  7. dna-pattern: string-based pattern matching
  8. Detailed protocol for matrix-scan:
    Turatsinze, J.V., Thomas-Chollier, M., Defrance, M. and van Helden, J. (2008) Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc, 3, 1578-1588. Pubmed 18802439
  9. Motif discovery

    String-based motif discovery

  10. Counting word occurrences in DNA sequences.
  11. oligo-analysis: detection of over-represented oligonucleotides (words).
  12. dyad-analysis: detection of over-represented spaced pairs of oligonucleotides.
  13. position-analysis: detection of words having a positional bias in sequences aligned on some reference position.
  14. Detailed protocol for string-based motif discovery:
    Defrance, M., Janky, R., Sand, O. and van Helden, J. (2008) Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nature Protocols 3, 1589-1603. Pubmed 18802440
  15. plant upstream sequences: motif discovery on the Web browser or running a Docker container:
    Ksouri N, Castro-Mondragón JA, Montardit-Tardá F, van Helden J, Contreras-Moreira B, Gogorcena Y (2021) Tuning promoter boundaries improves regulatory motif discovery in nonmodel plants: the peach example. Plant Physiol 185(3):1242-1258. Pubmed 33744946 (updates Pubmed 27557774)
  16. Comparison and clustering of PSSM

  17. matrix-clustering
  18. Building control sets

  19. Random models
  20. Selecting random genes
  21. Generating random sequences
  22. Applications

  23. Collecting peak sequences from the Galaxy Web site.
  24. peak-motifs: motif detection in full-size datasets of ChIP-seq peak sequences.

For suggestions please post an issue on GitHub or contact the