Difference between revisions of "Sunrise.fjfi.cvut.cz"
(→Reinstall) |
(→Singularity) |
||
Line 332: | Line 332: | ||
apptainer shell /cvmfs/farm.particle.cz/singularity/fzu_gui-centos7 | apptainer shell /cvmfs/farm.particle.cz/singularity/fzu_gui-centos7 | ||
− | # Metacentrum containers available in /cvmfs/singularity.metacentrum.cz | + | # Metacentrum containers available in /cvmfs/singularity.metacentrum.cz, e.g. |
+ | apptainer shell /cvmfs/singularity.metacentrum.cz/Metacentrum/ood-meta2.sif | ||
+ | apptainer shell /cvmfs/singularity.metacentrum.cz/Metacentrum/debian11-openpbs.sif | ||
+ | |||
+ | # European Environment for Scientific Software Installations (EESSI) | ||
+ | # doesn't provide apptainers images but rely on `modules` for more details see | ||
+ | # /cvmfs/software.eessi.io/README.eessi | ||
===ATLAS & Containers=== | ===ATLAS & Containers=== |
Latest revision as of 08:57, 16 September 2024
Servery / Služby |
Přístupné komukoliv |
Omezený/individuální účet |
Služby |
backup · DHCP · DNS · doména FJFI · eduroam · fileserver · IdM · forum · gitlab · lists · moodle · indico · mailgw · K4 · mailserver · NMS · openvpn · skolniftp · ssh · videokonference · VoIP · video · VPN · wififjfi · wiki · www |
Učebny |
e-sklipek · KFE unixlab · KFE pclab · PD1 · KM 105 · KM 115 |
Ostatní |
Network · Blokované porty |
[edit] · [view] |
Základní informace (sunrise)
- Správce
- Michal Broz
- HW
- ~ 650 core cluster, 200TB storage
- OS
- CentOS7 (+ additional CERN packages, e.g. EOS)
- Využití
- cluster KF
- Konto
- domluvit se správcem
Cluster hardware
year | nodes | cpumark/node | details |
---|---|---|---|
2007 | 1-10 | | |
2010 | 11-14 | 6508 | |
2011 | 15,16 | 6508 | |
2012 | 17,18 | 6508 | |
2012 | 19,20 | 6508 | |
2013 | 21,22 | 5935 | 2x Intel E5-2609@2.40GHz (8 cores, uveden 2012, avx), 48GB RAM (dva volne sloty) |
2014 | 23,24 | 5935 | 2x Intel E5-2609@2.40GHz (8 cores, uveden 2012, avx), 48GB RAM (dva volne sloty) |
2018 | 25-28 | 42460 | 2x AMD EPYC 7281@2.1GHZ (64 HT cores, uveden 2017), 256GB RAM |
2022 | 29-34 | 56562 | 1x AMD EPYC3 7543@2.8GHz (64 HT cores, uveden 2021), 256GB RAM |
Basic info & links
- monitoring
- ssh -L 1080:127.0.0.1:1080 -L 2080:127.0.0.1:80 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz
- ganglia
- repozitáře balíčků
- puppetdb
- puppetboard
- squid
- services
- NAT + DNS (for worker nodes)
- Squid cache (CVMFS)
- Apache (kickstart, yum repository, monitoring interfaces)
- puppet (configuration management)
- PBSPro server
(Re)Instalace worker nodů
Instalace z lokálního USB
- použít oficiální boot image SLC6 a nahrát na CD nebo flash
wget http://linuxsoft.cern.ch/cern/slc6X/x86_64/images/boot.iso livecd-iso-to-disk boot.iso /dev/sd?1 # nahradit "?" písmenem zařízení odpovídající flash
- nabootovat z CDROM/USB (na sunrise11-24 lze zobrazit boot menu po klávese F11)
- přidat bootovací parametry pro instalaci, po stisku klávesy TAB doplnit (XX je číslo sunriseXX)
ks=http://192.168.20.1/ks.php?id=XX ksdevice=eth0 ip=192.168.20.1XX gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc
- u worker nodů sunrise01-10 se při bootovaní z flash přehodí pořadí disků a je potřeba instalovat na sdb místo standardního prvního disku sda
ks=http://192.168.20.1/ks.php?id=XX&dev=sdb ...
Instalace ze sítě přes PXE
Aktuální konfigurace DHCP pro privátní subnet KF clusteru by při nastavení / výběru bootování ze sítě měla zobrazit PXE bootovací menu s různými možnostmi síťové instalace. Aktuálně zde nejsou žádné specializované volby pro sunset stroje, takže je nutné vybranou konfiguraci doplnit o konfigurační volby uvedené výše (ks, ksdevice, ip, gateway, netmask, dns, ...).
Aby TFTP protokol fungoval i za NATem, tak je nutné zavést moduly nf_conntrack_tftp a nf_nat_tftp
cat <<EOF > /etc/modules-load.d/tftp.conf nf_conntrack_tftp nf_nat_tftp EOF
a zkonfigurovat TFTP helper v konfiguraci nftables
cat <<EOF >> /etc/sysconfig/nftables.conf table ip handletftp delete table ip handletftp table ip handletftp { ct helper helper-tftp { type "tftp" protocol udp } chain sethelper { type filter hook forward priority 0; policy accept; ip saddr 192.168.20.0/24 ip daddr 147.32.9.2 udp dport 69 ct helper set "helper-tftp" } } EOF
Reinstall
- před instalací (ale po zastavení puppetu) je potřeba odstranit aktuální certifikát z puppet serveru
puppetserver ca clean sunsetXX.kfcluster
- z boot.iso vykopírovat vmlinuz a initramfs do /boot
- do konfigurace grubu v /boot/grub/menu.lst přidat další záznam s výše uvedenými soubory
- přidat parametry odpovídající bootovacím parametrům uvedeným v sekci Instalace
- v principu by mělo stačit spustit níže uvedený skript (01 je potřeba nahradit číslem sunrise stroje):
XX=01 cd /boot wget -O vmlinuz http://linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/vmlinuz wget -O initrd.img http://linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/initrd.img wget -O ks.cfg "http://192.168.20.1/ks.php?id=${XX}" cat >> /boot/grub/menu.lst <<EOF title Install root (hd0,0) kernel /vmlinuz ks=hd:/dev/sda1:/ks.cfg ksdevice=eth0 ip=192.168.20.1${XX} gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc initrd /initrd.img EOF
- pokud na puppet serveru nefunguje autosign, tak je potreba podepsat novy puppet certifikat
# vypsání nepodepsaných certifikátů (parametr --all vypíše všechny) puppetserver ca list # podepsání certifikátu puppetserver ca sign sunsetXX.kfcluster
Konfigurace
Konfigurace worker nodů se provádí pomocí puppetu z ashley.fjfi.cvut.cz.
Creating puppet.git repository
# source scl_source enable git19 cd git init --bare puppet.git git clone ssh://root@ashley.fjfi.cvut.cz:/root/puppet.git cd puppet cp -a /etc/puppetlabs/code/environments/production/* rm -rf .git git commit -m "Initial commit with basic configuration for puppet 4.10.1" git branch -m master production git push origin production # create puppet.git/hooks/post-receive using code from # https://puppet.com/blog/git-workflow-and-puppet-environments # modify code to skip post-receive hook for branches with "tmp" prefix
Puppet configuration workflow
The layout of production directory is inspired by common profiles/roles/nodes abstraction layers described e.g. here.
Our puppet configuration is stored in GIT repository and automatically applied on puppet server after successful GIT push in main puppet.git repository. Don't directly modify files in `puppet config print environmentpath` subdirectories. You can create your own test environment directly on puppet server only if its name starts with "tmp" or "work" prefix. Branches from puppet.git with "tmp" prefix are excluded from automatic post-receive commit hook and they are not cloned in the puppet configuration environment.
- on RHEL6 it is necessary to use SCL version of git 1.9
source scl_source enable git19
- checkout current data from remote production branch
git clone -b production ssh://root@ashley.fjfi.cvut.cz:/root/puppet.git work_user cd work_user git checkout -b work_user git config user.name "First Surename" git config user.email "first.surename@fjfi.cvut.cz"
- modify files in "work_user" directory and if you cloned these files in puppet environment directory /etc/puppetlabs/code/environments you can test updated configuration
puppet apply --environment=work_user --test --debug
- when you are happy with updated configuration merge modification in "production" branch and commit to master repository
git commit -m "summary info for modifications" file1 file2 ... git checkout production git pull git checkout work_user git rebase production git checkout production git merge work_user git push
Monitoring puppet
Monitoring web interfaces listen only on localhost, you have to tunnel local ports from ashley.fjfi.cvut.cz to you machine before you are able to see provided data
ssh -L 1080:127.0.0.1:1080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz
Batch
Currently this cluster is using PBSPro as a batch system.
- user commands to submit/check/delete jobs
qsub script.sh qstat qdel job_id
- show queue configuration
qstat -Q -f qmgr -c 'p s'
- set worker node online/offline
pbsnodes -o sunriseXX-0 pbsnodes -r sunriseXX-0
Squid (CVMFS)
Machines located at FNSPE should used for CVMFS our local squid proxy and as a backup it is also possible (allowed) to utilize FZU proxy. CVMFS configuration in /etc/cvmfs/default.local should contain:
CVMFS_HTTP_PROXY="http://squid.fjfi.cvut.cz:3128;http://squid.farm.particle.cz:3128;DIRECT"
For KF cluster worker nodes it is now better to use Squid cache directly from headnode
CVMFS_HTTP_PROXY="http://ashley.fjfi.cvut.cz:3128;http://squid.farm.particle.cz:3128;DIRECT"
User software
Intel compiler
# setup environment for Intel compiler source /fjfi/apps/intel/Compiler/11.0/074/bin/ifortvars.sh intel64 # Hello World příklad kompilovaný pomocí Intel Fortran Compileru cat > hello.f <<EOF program hello print *, "Hello World!" end program hello EOF ifortbin hello.f ./a.out
Maple software
You can use Maple (and other softwares) on sunrise. The first step is to connect via ssh to the headnode Ashley. (If you are on Windows, you may need to install an ssh client.) For Mac/Linux, you can open the Terminal and type
ssh USERNAME@sunrise.fjfi.cvut.cz # USERNAME should be replaced with your username.
The connection will require the password linked with the username. (If you don’t have an account, you shall write the administrator.) From there, you should connect to one of the working nodes (sunset 01 to 28). You can monitor which nodes are currently not being used here. To connect to a node, you can type
ssh 192.168.20.1XX # The XX should be changed by the node that you want to connect (from 01 to 28).
One should note that the new nodes on sunrise7 (sunset 25 to 28) may require additional authorizations. Once connected to the working node, you can launch Maple with the command
source /fjfi/apps/maple/maple/bin/maple
which will launch a command-based maple of the latest version installed on sunrise. You can enter Maple code line by line or copy and paste many lines and Maple will compile them in order. To save or export results on Maple, you can use the commands
currentdir(path); save result1_to_save, result2_to_save, “filename.m” # By leaving the path empty, the file will be saved in the current folder (/home/USERNAME/).
The list of other softwares that can be used on sunrise is in the folder /fjfi/apps/ and it includes e.g. Maple, Mathematica, Matlab, Python.
LCG Software Elements
One can also use software provided by LCG. To list availabe software you can use:
export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} lcgenv lcgenv x86_64-slc6-gcc62-opt lcgenv -p LCG_latest lcgenv -p LCG_latest x86_64-slc6-gcc62-opt
E.g. if you need ROOT 6.08.06 with all dependencies (Boost, python, ...) use
export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} eval "`lcgenv -p LCG_88 x86_64-slc6-gcc49-opt ROOT`"
For most recent version use
export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} eval "`lcgenv x86_64-slc6-gcc62-opt all`" eval "`lcgenv -p LCG_latest x86_64-slc6-gcc62-opt ROOT`"
Setup full software stack from LCG Software Elements
source /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_latest x86_64-slc6-gcc62-opt
ATLAS Software
Setup basic ATLAS environment
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' setupATLAS
Validate machine configuration for ATLAS
setupATLAS diagnostics checkOS
List installed software in ATLAS repository
setupATLAS showVersions showVersions root
Use ROOT from ATLAS and LCG software repository
# ROOT directly from ATLAS repository for SLC6 with GCC 6.2 lsetup "root 6.12.04-x86_64-slc6-gcc62-opt" # ROOT from general LCG repository for SLC6 with GCC 6.2 # gcc doesn't comes as a LCG ROOT dependency and second lsetup for gcc is necessary lsetup "lcgenv -p LCG_92 x86_64-slc6-gcc62-opt ROOT"
Containers
Singularity
To run application in a completely different environment you can use apptainer (singularity) containers. Use official documentation for information how to use apptainer containers. Running shell from different distribution can be as simple as
# official ATLAS images apptainer shell /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc5 apptainer shell /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos6 apptainer shell /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 apptainer shell /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-almalinux8 apptainer shell /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-almalinux9 # CERN test images apptainer shell /cvmfs/unpacked.cern.ch/registry.hub.docker.com/library/centos:centos6 apptainer shell /cvmfs/unpacked.cern.ch/registry.hub.docker.com/library/centos:centos7 apptainer shell /cvmfs/unpacked.cern.ch/registry.hub.docker.com/library/debian:stable apptainer shell /cvmfs/unpacked.cern.ch/registry.hub.docker.com/library/fedora:latest # Fermilab worker node images apptainer shell /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-el8:latest apptainer shell /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-el9:latest apptainer shell /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-sl6:latest apptainer shell /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-sl7:latest # OSG images apptainer shell /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el6:latest apptainer shell /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest apptainer shell /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el8:latest apptainer shell /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el9:latest # FZU containers available in /cvmfs/farm.particle.cz/singularity apptainer shell /cvmfs/farm.particle.cz/singularity/fzu_wn-slc6 apptainer shell /cvmfs/farm.particle.cz/singularity/fzu_wn-centos7 apptainer shell /cvmfs/farm.particle.cz/singularity/fzu_wn-centos8 apptainer shell /cvmfs/farm.particle.cz/singularity/fzu_gui-centos7 # Metacentrum containers available in /cvmfs/singularity.metacentrum.cz, e.g. apptainer shell /cvmfs/singularity.metacentrum.cz/Metacentrum/ood-meta2.sif apptainer shell /cvmfs/singularity.metacentrum.cz/Metacentrum/debian11-openpbs.sif # European Environment for Scientific Software Installations (EESSI) # doesn't provide apptainers images but rely on `modules` for more details see # /cvmfs/software.eessi.io/README.eessi
ATLAS & Containers
Since ~ 2020 ATLAS runs basically all grid jobs in the containers which provides same environment regardless of installed OS. This is also very useful in case you would like to run old code compiled in the older OS version (e.g. use CentOS7 for SLC6 ATLAS development). ATLAS provides setupATLAS which can automatically bring you selected OS with righ ATLAS environment
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase #export ALRB_CONT_CMDOPTS="$ALRB_CONT_CMDOPTS -B /mnt:/mnt" alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' setupATLAS -c sl6
Run `setupATLAS -h` to get more details, available OS environment can be listed with `setupATLAS -c -h`