Difference between revisions of "BLAST"

From SNIC Documentation
Jump to: navigation, search
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs BLAST] (basic alignment search tool) is a software package for aligning nucleotide or amino acid sequences. Its primary use is to search databases for sequences that are similar to a given candidate sequence.
+
{{software info
 +
|description=package for aligning nucleotide or amino acid sequences
 +
|license=free
 +
|fields=bioinformatics
 +
}}
 +
[http://blast.ncbi.nlm.nih.gov/ BLAST] (basic local alignment search tool) is a software {{#show: {{PAGENAME}} |?description}}, and its primary use is to search databases for sequences that are similar to a given candidate sequence.
 +
 
 +
== Availability ==
 +
{{list resources for software}}
  
 
== Versions ==
 
== Versions ==
  
There are two blast versions that are in current widespread use; the legacy NCBI BLAST and the new rewrite BLAST+.  
+
There are two BLAST versions that are in current widespread use; the legacy NCBI BLAST and the new rewrite BLAST+.  
  
 
BLAST+ was written to improve performance and maintainability, and to facilitate introduction of new features. It is similiar in most respects and has been made almost completely backwards compatible, by way of a wrapper script called <code>./legacy_blast.pl</code>. New projects are encouraged to use BLAST+ if at all possible.
 
BLAST+ was written to improve performance and maintainability, and to facilitate introduction of new features. It is similiar in most respects and has been made almost completely backwards compatible, by way of a wrapper script called <code>./legacy_blast.pl</code>. New projects are encouraged to use BLAST+ if at all possible.
Line 12: Line 20:
 
=== Work locally ===
 
=== Work locally ===
  
Many of the features in BLAST require access to database flatfiles, and standard practice when running a compute cluster is to copy all necessary files to a node local directory before any work is done with them. This behaviour is highly encouraged on most resources, since multiple simultaneous accesses to the same large files on a shared disk is likely to cause problems for all computations currently running on the resource, and not only for the owner of the badly behaving jobs. For this reason, most SNIC resources have amenities in place to aid you in running your BLAST jobs in an optimal manner (for example <code>prepare_db</code> and <code>$BLASTDB<code>, described for example [https://extras.csc.fi/mgrid/blast_re/ here]).
+
Many of the features in BLAST require access to database flatfiles, and standard practice when running a compute cluster is to copy all necessary files to a node local directory before any work is done with them. This behaviour is highly encouraged on most resources, since multiple simultaneous accesses to the same large files on a shared disk is likely to cause problems for all computations currently running on the resource, and not only for the owner of the badly behaving jobs. For this reason, most SNIC resources have amenities in place to aid you in running your BLAST jobs in an optimal manner (for example <code>prepare_db</code> and <code>$BLASTDB</code>, described for example [https://extras.csc.fi/mgrid/blast_re/ here]).
  
 
=== Use all your processors ===
 
=== Use all your processors ===
Line 20: Line 28:
 
=== Do not run out of memory ===
 
=== Do not run out of memory ===
  
If possible, you should ensure that you have enough RAM to hold the database as well as the results and still have some headroom. This ensures that Blast will not need to read data from disk unnecessarily, which otherwise would cause significant slowdown. This can be done for example by:
+
If possible, you should ensure that you have enough RAM to hold the database as well as the results and still have some headroom. This ensures that BLAST will not need to read data from disk unnecessarily, which otherwise would cause significant slowdown. This can be done for example by:
  
 
* '''Choose a system with enough RAM''' <br/> Multiprocessor systems generally have more memory than single processor systems, and the database will also require proportionally less memory, since only one copy is needed in the OS file cache regardless of the number of processors using it.
 
* '''Choose a system with enough RAM''' <br/> Multiprocessor systems generally have more memory than single processor systems, and the database will also require proportionally less memory, since only one copy is needed in the OS file cache regardless of the number of processors using it.
 
* '''Partition the search space''' <br/> For huge databases or very restricted amounts available memory it may be required to split the database into manageable chunks and process them as separate jobs.
 
* '''Partition the search space''' <br/> For huge databases or very restricted amounts available memory it may be required to split the database into manageable chunks and process them as separate jobs.
 +
 +
== License ==
 +
{{show license}}
 +
 +
== Experts ==
 +
{{list experts}}
 +
 +
== Links ==
 +
* [http://blast.ncbi.nlm.nih.gov/ Official website]
 +
* [http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs Manual]

Latest revision as of 12:03, 18 September 2013

BLAST (basic local alignment search tool) is a software package for aligning nucleotide or amino acid sequences, and its primary use is to search databases for sequences that are similar to a given candidate sequence.

Availability

ResourceCentreDescription
BedaC3SEthroughput cluster resource
KalkylUPPMAXcluster resource of about 21 TFLOPS
KappaNSCthroughput cluster resource of 26 TFLOPS
MatterNSCcluster resource of 37 TFLOPS dedicated to materials science
TriolithNSCCapability cluster with 338 TFLOPS peak and 1:2 Infiniband fat-tree

Versions

There are two BLAST versions that are in current widespread use; the legacy NCBI BLAST and the new rewrite BLAST+.

BLAST+ was written to improve performance and maintainability, and to facilitate introduction of new features. It is similiar in most respects and has been made almost completely backwards compatible, by way of a wrapper script called ./legacy_blast.pl. New projects are encouraged to use BLAST+ if at all possible.


Computational considerations

Work locally

Many of the features in BLAST require access to database flatfiles, and standard practice when running a compute cluster is to copy all necessary files to a node local directory before any work is done with them. This behaviour is highly encouraged on most resources, since multiple simultaneous accesses to the same large files on a shared disk is likely to cause problems for all computations currently running on the resource, and not only for the owner of the badly behaving jobs. For this reason, most SNIC resources have amenities in place to aid you in running your BLAST jobs in an optimal manner (for example prepare_db and $BLASTDB, described for example here).

Use all your processors

BLAST uses only one processor core by default, but you increase this number using the -a command line option (-num_threads for BLAST+), which can often provide a significant increase in speed. If you are using a preinstalled BLAST version on a SNIC resource, the recommended number of cores to use is given by the $BLAST_NUM_CPUS environment variable (e.g. used like blastall -a $BLAST_NUM_CPUS ... ). However, in some situations you may want to consider decreasing this number, particularly if your searches generate a large enough number of hits to deplete RAM, causing the OS to start swapping data and results to disk, which will near slow your job to a stop (see below).

Do not run out of memory

If possible, you should ensure that you have enough RAM to hold the database as well as the results and still have some headroom. This ensures that BLAST will not need to read data from disk unnecessarily, which otherwise would cause significant slowdown. This can be done for example by:

  • Choose a system with enough RAM
    Multiprocessor systems generally have more memory than single processor systems, and the database will also require proportionally less memory, since only one copy is needed in the OS file cache regardless of the number of processors using it.
  • Partition the search space
    For huge databases or very restricted amounts available memory it may be required to split the database into manageable chunks and process them as separate jobs.

License

License: Free.

Experts

No experts have currently registered expertise on this specific subject. List of registered field experts:

  FieldAE FTEGeneral activities
Anders Hast (UPPMAX)UPPMAXVisualisation, Digital Humanities30Software and usability for projects in digital humanities
Anders Sjölander (UPPMAX)UPPMAXBioinformatics100Bioinformatics support and training, job efficiency monitoring, project management
Anders Sjöström (LUNARC)LUNARCGPU computing
MATLAB
General programming
Technical acoustics
50Helps users with MATLAB, General programming, Image processing, Usage of clusters
Birgitte Brydsö (HPC2N)HPC2NParallel programming
HPC
Training, general support
Björn Claremar (UPPMAX)UPPMAXMeteorology, Geoscience100Support for geosciences, Matlab
Björn Viklund (UPPMAX)UPPMAXBioinformatics
Containers
100Bioinformatics, containers, software installs at UPPMAX
Chandan Basu (NSC)NSCComputational science100EU projects IS-ENES and PRACE.
Working on climate and weather codes
Diana Iusan (UPPMAX)UPPMAXComputational materials science
Performance tuning
50Compilation, performance optimization, and best practice usage of electronic structure codes.
Frank Bramkamp (NSC)NSCComputational fluid dynamics100Installation and support of computational fluid dynamics software.
Hamish Struthers (NSC)NSCClimate research80Users support focused on weather and climate codes.
Henric Zazzi (PDC)PDCBioinformatics100Bioinformatics Application support
Jens Larsson (NSC)NSCSwestore
Jerry Eriksson (HPC2N)HPC2NParallel programming
HPC
HPC, Parallel programming
Joachim Hein (LUNARC)LUNARCParallel programming
Performance optimisation
85HPC training
Parallel programming support
Performance optimisation
Johan HellsvikPDCMaterialvetenskap30materials theory, modeling of organic magnetic materials,
Johan Raber (NSC)NSCComputational chemistry50
Jonas Lindemann (LUNARC)LUNARCGrid computing
Desktop environments
20Coordinating SNIC Emerging Technologies
Developer of ARC Job Submission Tool
Grid user documentation
Leading the development of ARC Storage UI
Lunarc Box
Lunarc HPC Desktop
Krishnaveni Chitrapu (NSC)NSCSoftware development
Lars Eklund (UPPMAX)UPPMAXChemistry
Data management
FAIR
Sensitive data
100Chemistry codes, databases at UPPMAX, sensitive data, PUBA agreements
Lars Viklund (HPC2N)HPC2NGeneral programming
HPC
HPC, General programming, installation of software, support, containers
Lilit Axner (PDC)PDCComputational fluid dynamics50
Marcus Lundberg (UPPMAX)UPPMAXComputational science
Parallel programming
Performance tuning
Sensitive data
100I help users with productivity, program performance, and parallelisation. I also work with allocations and with sensitive data questions
Martin Dahlö (UPPMAX)UPPMAXBioinformatics10Bioinformatic support
Matias Piqueras (UPPMAX)UPPMAXHumanities, Social sciences70Support for humanities and social sciences, machine learning
Mikael Djurfeldt (PDC)PDCNeuroinformatics100
Mirko Myllykoski (HPC2N)HPC2NParallel programming
GPU computing
Parallel programming, HPC, GPU programming, advanced support
Pavlin Mitev (UPPMAX)UPPMAXComputational materials science100
Pedro Ojeda-May (HPC2N)HPC2NMolecular dynamics
Machine learning
Quantum Chemistry
Training, HPC, Quantum Chemistry, Molecular dynamics, R, advanced support
Peter Kjellström (NSC)NSCComputational science100All types of HPC Support.
Peter Münger (NSC)NSCComputational science60Installation and support of MATLAB, Comsol, and Julia.
Rickard Armiento (NSC)NSCComputational materials science40Maintainer of the scientific software environment at NSC.
Szilard PallPDCMolecular dynamics55Algorithms & methods for accelerating molecular dynamics, Parallelization and acceleration of molecular dynamics on modern high performance computing architectures, High performance computing, manycore and heterogeneous architectures, GPU computing
Thomas Svedberg (C3SE)C3SESolid mechanics
Torben Rasmussen (NSC)NSCComputational chemistry100Installation and support of computational chemistry software.
Wei Zhang (NSC)NSCComputational science
Parallel programming
Performance optimisation
code optimization, parallelization.
Weine Olovsson (NSC)NSCComputational materials science90Application support, installation and help
Åke Sandgren (HPC2N)HPC2NComputational science50SGUSI

Links