Difference between revisions of "Swestore-dCache"

From SNIC Documentation
Jump to: navigation, search
(Getting access)
(Swestore documentation moved)
(Tag: New redirect)
 
(162 intermediate revisions by 9 users not shown)
Line 1: Line 1:
[[Category:Storage]]
+
#REDIRECT[[Swestore Documentation Moved]]
[[Category:SweStore]]
 
SNIC is building a storage infrastructure to complement the computational resources.
 
 
 
Many forms of automated measurements can produce large amounts of data. In scientific areas such as high energy physics (the Large Hadron Collider at CERN), climate modeling, bioinformatics, bioimaging etc., the demands for storage are increasing dramatically. To serve these and other user communities, SNIC has appointed a working group to design a storage strategy, taking into account the needs on many levels and creating a unified storage infrastructure, which is now being implemented.
 
 
 
Swestore is in collaboration with [http://www.ecds.se/ ECDS], [http://snd.gu.se/ SND], Bioimage Sweden, [http://www.bils.se/ BILS], [http://www.uppnex.uu.se/ UPPNEX],[http://wlcg.web.cern.ch/ WLCG], [http://www.nrm.se/ NaturHistoriska RiksMuseet].
 
 
 
= National storage =
 
The Swestore Nationally Accessible Storage, commonly called just Swestore, is a robust, flexible and expandable long
 
term storage system aimed at storing large amounts of data produced by various Swedish research projects. It is based on the [http://www.dcache.org dCache]
 
storage system and is distributed across the SNIC centres [http://www.c3se.chalmers.se/ C3SE], [http://www.hpc2n.umu.se/ HPC2N], [http://www.lunarc.lu.se/ Lunarc],
 
[http://www.nsc.liu.se/ NSC], [http://www.pdc.kth.se PDC] and [http://www.uppmax.uu.se Uppmax].
 
 
 
Data is stored in two copies with each copy at a different SNIC centre. This enables the system to cope with a multitude of issues ranging from a simple
 
crash of a storage element to losing an entire site while stil providing access to the stored data. To protect against silent data corruption the
 
dCache storage system checksums all stored data and periodically verifies the data using this checksum.
 
 
 
The system does NOT yet provide protection against user errors like inadvertent file deletions and so on.
 
 
 
One of the major advantages to the distributed nature of dCache is the excellent aggregated transfer rates possible. This is achieved by bypassing a central node
 
and having transfers going directly to/from the storage elements if the protocol allows it.
 
The Swestore Nationally Accessible Storage system can achieve aggregated  transfer rates
 
in excess of 100 Gigabit per second, but in practice this is limited by connectivity to each University (usually 10 Gbit/s) or a limited number of files (typically
 
max 1 Gbit/s per file/connection).
 
 
 
==Access protocols==
 
; Currently supported protocols
 
: GridFTP - gsiftp://gsiftp.swestore.se/
 
: Storage Resource Manager - srm://srm.swegrid.se/
 
: Hypertext Transfer Protocol (read-only), Web Distributed Authoring and Versioning - http://webdav.swestore.se/ (unauthenticated), https://webdav.swestore.se/
 
 
 
; Protocols in evaluation/development
 
: NFS4.1, iRODS
 
 
 
For authentication eScience certificates are used, which provides a higher level of security than legacy username/password schemes.
 
 
 
== Getting access ==
 
; Apply for storage
 
: Please follow instructions [[Apply for storage on SweStore|here]]
 
; Get a client certificate.
 
: Follow the instructions [[Grid_certificates#Requesting_a_certificate|here]] to get your client certificate. For Terena certificates, please make sure you also [[Exporting_a_client_certificate|export the certificate for use with grid tools]]. For Nordugrid certificates, please make sure to also [[Requesting_a_grid_certificate_from_the_Nordugrid_CA#Installing_the_certificate_in_your_browser|install your client certificate in your browser]].
 
; Request membership in the SweGrid VO.
 
: Follow the instructions [[Grid_certificates#Requesting_membership_in_the_SweGrid_VO|here]] to get added to the SweGrid virtual organisation.
 
; Transmit and prepare the certificate.
 
: In order to use the client certificate on SNIC resources for generating proxy certificates and using command line tools, the certificate needs to be [[Preparing_a_client_certificate|converted into PEM files]] on the target cluster if not already in that format.
 
 
 
NOTE: If you have installed a Terena-certificate in your browser and you have ARC 3.x installed, there is no need to convert or export the certificate from the browser. The arcproxy command can generate a proxy-certificate from the certificate stored in the Firefox credential store. See also [[Grid_certificates#Creating_a_proxy_certificate_using_the_Firefox.2FThunderbird_credential_store|proxy certificates]].
 
 
 
== Download and upload data ==
 
; Interactive browsing and manipulation of single files
 
: SweStore is accessible in your web browser in two ways, as a directory index interface at https://webdav.swestore.se/ and with an interactive file manager at https://webdav.swestore.se/browser/. To browse private data you must first install your certificate in your browser (see above). Projects are organized under the <code>/snic</code> directory as <code><nowiki>https://webdav.swestore.se/snic/YOUR_PROJECT_NAME/</nowiki></code>.
 
; Upload and delete data interactively or with automation
 
There are several tools that are capable of using the protocols provided by SweStore national storage.
 
For interactive usage on SNIC clusters we recommend using the ARC tools which should be installed on all SNIC resources.
 
As an integration point for building scripts and automated systems we suggest using the curl program and library.
 
: Use the ARC client. Please see the instructions for [[Accessing SweStore national storage with the ARC client]].
 
: Use lftp. Please see the instructions for [[Accessing SweStore national storage with lftp]].
 
: Use cURL. Please see the instructions for [[Accessing SweStore national storage with cURL]].
 
: Use globus-url-copy. Please see the instructions for [[Accessing SweStore national storage with globus-url-copy]].
 
 
 
== More information ==
 
* [http://status.swestore.se/munin/monitor/monitor/ Per Project Monitoring of Swestore usage]
 
 
 
If you have any issues using SweStore please do not hesitate to contact [mailto:swestore-support@snic.vr.se swestore-support].
 
 
 
== Tools and scripts ==
 
 
 
There exists a number of tools and utilities developed externally that can be useful. Here are some links:
 
 
 
* [https://github.com/samuell/arc_tools ARC_Tools] - Convenience scripts for the arc client (Only a recursive rmdir so far).
 
* [http://sourceforge.net/projects/arc-gui-clients ARC Graphical Clients] - Contains the ARC Storage Explorer (SweStore supported development).
 
* Transfer script, [http://snicdocs.nsc.liu.se/wiki/SweStore/swstrans_arc swetrans_arc], provided by Adam Peplinski / Philipp Schlatter
 
* [http://www.nordugrid.org/documents/SWIG-wrapped-ARC-Python-API.pdf Documentation of the ARC Python API (PDF)]
 
 
 
== Slides and more ==
 
 
 
[http://docs.snic.se/wiki/Swestore/Lund_Seminar_Apr18 Slides and material from seminar for Lund users on April 18th]
 
 
 
= Center storage =
 
Centre storage, as defined by the SNIC storage group, is a storage solution that lives independently of the computational resources and can be accessed from all such resources at a centre. Key features include the ability to access the same filesystem the same way on all computational resources at a centre, and a unified structure and nomenclature for all centra. Unlike cluster storage which is tightly associated with a single cluster, and thus has a limited life-time, centre storage does not require the users to migrate their own data when clusters are decommissioned, not even when the storage hardware itself is being replaced.
 
 
 
== Unified environment ==
 
To make the usage more transparent for SNIC users, a set of environment variables are available on all SNIC resources:
 
 
 
* <code>SNIC_BACKUP</code> – the user's primary directory at the centre<br>(the part of the centre storage that is backed up)
 
* <code>SNIC_NOBACKUP</code> – recommended directory for project storage without backup<br>(also on the centre storage)
 
* <code>SNIC_TMP</code> – recommended directory for best performance during a job<br>(local disk on nodes if applicable)
 

Latest revision as of 10:01, 8 February 2023