Difference between revisions of "Sunrise.fjfi.cvut.cz"
(→Batch) |
|||
(31 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
=Základní informace ([http://sunrise.fjfi.cvut.cz sunrise])= | =Základní informace ([http://sunrise.fjfi.cvut.cz sunrise])= | ||
− | ;Správce : [http://nms.fjfi.cvut.cz/user/who.php?q= | + | ;Správce : [http://nms.fjfi.cvut.cz/user/who.php?q=novotr14 Radek Novotný] |
− | ;HW : ~ | + | ;HW : ~ 300 core cluster, 20TB storage |
− | ;OS : | + | ;OS : SLC6 (Scientific Linux CERN) |
;Využití : cluster KF | ;Využití : cluster KF | ||
;Konto : domluvit se správcem | ;Konto : domluvit se správcem | ||
+ | |||
+ | ==Basic info & links== | ||
+ | |||
+ | * monitoring | ||
+ | ** <tt>ssh -L 1080:127.0.0.1:1080 -L 2080:127.0.0.1:2080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz</tt> | ||
+ | ** [http://ashley.fjfi.cvut.cz/ganglia ganglia] | ||
+ | ** [http://ashley.fjfi.cvut.cz/repos repozitáře balíčků] | ||
+ | ** [http://127.0.0.1:8080/pdb/dashboard/index.html puppetdb] | ||
+ | ** [http://127.0.0.1:1080/ puppetboard] | ||
+ | ** [http://127.0.0.1:2080/ squid] | ||
+ | * services | ||
+ | ** NAT + DNS (for worker nodes) | ||
+ | ** Squid cache (CVMFS) | ||
+ | ** Apache (kickstart, yum repository, monitoring interfaces) | ||
+ | ** puppet (configuration management) | ||
+ | ** PBSPro server | ||
+ | |||
+ | ==(Re)Instalace worker nodů== | ||
+ | |||
+ | ===Instalace=== | ||
+ | |||
+ | * použít oficiální boot image SLC6 a nahrát na CD nebo flash | ||
+ | wget http:/<nowiki/>/linuxsoft.cern.ch/cern/slc6X/x86_64/images/boot.iso | ||
+ | livecd-iso-to-disk boot.iso /dev/sd?1 | ||
+ | # nahradit "?" písmenem zařízení odpovídající flash | ||
+ | * nabootovat z CDROM/USB (na sunrise11-24 lze zobrazit boot menu po klávese F11) | ||
+ | * přidat bootovací parametry pro instalaci, po stisku klávesy TAB doplnit (XX je číslo sunriseXX) | ||
+ | ks=http:/<nowiki/>/192.168.20.1/ks.php?id=XX ksdevice=eth0 ip=192.168.20.1XX gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc | ||
+ | * u worker nodů sunrise01-10 se při bootovaní z flash přehodí pořadí disků a je potřeba instalovat na <tt>sdb</tt> místo standardního prvního disku <tt>sda</tt> | ||
+ | ks=http:/<nowiki/>/192.168.20.1/ks.php?id=XX&dev=sdb ... | ||
+ | |||
+ | ===Reinstall=== | ||
+ | |||
+ | * před instalací (ale po zastavení puppetu) je potřeba odstranit aktuální certifikát z puppet serveru | ||
+ | puppet cert clean sunsetXX.kfcluster | ||
+ | * z boot.iso vykopírovat vmlinuz a initramfs do <tt>/boot</tt> | ||
+ | * do konfigurace grubu v <tt>/boot/grub/menu.lst</tt> přidat další záznam s výše uvedenými soubory | ||
+ | * přidat parametry odpovídající bootovacím parametrům uvedeným v sekci Instalace | ||
+ | * v principu by mělo stačit spustit níže uvedený skript (01 je potřeba nahradit číslem sunrise stroje): | ||
+ | XX=01 | ||
+ | cd /boot | ||
+ | wget -O vmlinuz http:/<nowiki/>/linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/vmlinuz | ||
+ | wget -O initrd.img http:/<nowiki/>/linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/initrd.img | ||
+ | wget -O ks.cfg "http:/<nowiki/>/192.168.20.1/ks.php?id=${XX}" | ||
+ | |||
+ | cat >> /boot/grub/menu.lst <<EOF | ||
+ | title Install | ||
+ | root (hd0,0) | ||
+ | kernel /vmlinuz ks=hd:/dev/sda1:/ks.cfg ksdevice=eth0 ip=192.168.20.1${XX} gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc | ||
+ | initrd /initrd.img | ||
+ | EOF | ||
+ | * pokud na puppet serveru nefunguje autosign, tak je potreba podepsat novy puppet certifikat | ||
+ | # vypsání nepodepsaných certifikátů (parametr --all vypíše všechny) | ||
+ | puppet cert list | ||
+ | # podepsání certifikátu | ||
+ | puppet cert sign sunsetXX.kfcluster | ||
+ | |||
+ | ==Konfigurace== | ||
+ | |||
+ | Konfigurace worker nodů se provádí pomocí puppetu z <tt>ashley.fjfi.cvut.cz</tt>. | ||
+ | |||
+ | ===Creating puppet.git repository=== | ||
+ | |||
+ | # source scl_source enable git19 | ||
+ | cd | ||
+ | git init --bare puppet.git | ||
+ | git clone ssh://root@ashley.fjfi.cvut.cz:/root/puppet.git | ||
+ | cd puppet | ||
+ | cp -a /etc/puppetlabs/code/environments/production/* | ||
+ | rm -rf .git | ||
+ | git commit -m "Initial commit with basic configuration for puppet 4.10.1" | ||
+ | git branch -m master production | ||
+ | git push origin production | ||
+ | # create puppet.git/hooks/post-receive using code from | ||
+ | # https://puppet.com/blog/git-workflow-and-puppet-environments | ||
+ | # modify code to skip post-receive hook for branches with "tmp" prefix | ||
+ | |||
+ | ===Puppet configuration workflow=== | ||
+ | |||
+ | The layout of production directory is inspired by common profiles/roles/nodes abstraction layers described e.g. [http://www.craigdunn.org/2012/05/239/ here]. | ||
+ | |||
+ | Our puppet configuration is stored in GIT repository and automatically applied on puppet server after successful GIT push in main <tt>puppet.git</tt> repository. Don't directly modify files in <tt>`puppet config print environmentpath`</tt> subdirectories. You can create your own test environment directly on puppet server only if its name starts with "tmp" or "work" prefix. Branches from <tt>puppet.git</tt> with "tmp" prefix are excluded from automatic post-receive commit hook and they are not cloned in the puppet configuration environment. | ||
+ | |||
+ | * on RHEL6 it is necessary to use SCL version of git 1.9 | ||
+ | source scl_source enable git19 | ||
+ | * checkout current data from remote production branch | ||
+ | git clone -b production ssh:/<nowiki/>/root@ashley.fjfi.cvut.cz:/root/puppet.git work_user | ||
+ | cd work_user | ||
+ | git checkout -b work_user | ||
+ | git config user.name "First Surename" | ||
+ | git config user.email "first.surename@fjfi.cvut.cz" | ||
+ | * modify files in "work_user" directory and if you cloned these files in puppet environment directory <tt>/etc/puppetlabs/code/environments</tt> you can test updated configuration | ||
+ | puppet apply --environment=work_user --test --debug | ||
+ | * when you are happy with updated configuration merge modification in "production" branch and commit to master repository | ||
+ | git commit -m "summary info for modifications" file1 file2 ... | ||
+ | git checkout production | ||
+ | git pull | ||
+ | git checkout work_user | ||
+ | git rebase production | ||
+ | git checkout production | ||
+ | git merge work_user | ||
+ | git push | ||
+ | |||
+ | ===Monitoring puppet=== | ||
+ | |||
+ | Monitoring web interfaces listen only on <tt>localhost</tt>, you have to tunnel local ports from <tt>ashley.fjfi.cvut.cz</tt> to you machine before you are able to see provided data | ||
+ | |||
+ | ssh -L 1080:127.0.0.1:1080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz | ||
+ | |||
+ | * [http://127.0.0.1:8080/pdb/dashboard/index.html puppetdb] | ||
+ | * [http://127.0.0.1:1080/ puppetboard] | ||
+ | |||
+ | ==Batch== | ||
+ | |||
+ | Currently this cluster is using PBSPro as a batch system. | ||
+ | |||
+ | * user commands to submit/check/delete jobs | ||
+ | qsub script.sh | ||
+ | qstat | ||
+ | qdel job_id | ||
+ | |||
+ | * show queue configuration | ||
+ | qstat -Q -f | ||
+ | qmgr -c 'p s' | ||
+ | |||
+ | * set worker node online/offline | ||
+ | pbsnodes -o sunriseXX-0 | ||
+ | pbsnodes -r sunriseXX-0 | ||
+ | |||
+ | ==Squid (CVMFS)== | ||
+ | |||
+ | Machines located at FNSPE should used for CVMFS our local squid proxy and as a backup it is also possible (allowed) to utilize FZU proxy. CVMFS configuration in <tt>/etc/cvmfs/default.local</tt> should contain: | ||
+ | |||
+ | CVMFS_HTTP_PROXY="http:/<nowiki/>/squid.fjfi.cvut.cz:3128;http:/<nowiki/>/squid.farm.particle.cz:3128;DIRECT" | ||
+ | |||
+ | For KF cluster worker nodes it is now better to use Squid cache directly from headnode | ||
+ | |||
+ | CVMFS_HTTP_PROXY="http:/<nowiki/>/ashley.fjfi.cvut.cz:3128;http:/<nowiki/>/squid.farm.particle.cz:3128;DIRECT" | ||
+ | |||
+ | =User software= | ||
+ | |||
+ | ==Intel compiler== | ||
+ | |||
+ | # setup environment for Intel compiler | ||
+ | source /fjfi/apps/intel/Compiler/11.0/074/bin/ifortvars.sh intel64 | ||
+ | # Hello World příklad kompilovaný pomocí Intel Fortran Compileru | ||
+ | cat > hello.f <<EOF | ||
+ | program hello | ||
+ | print *, "Hello World!" | ||
+ | end program hello | ||
+ | EOF | ||
+ | ifortbin hello.f | ||
+ | ./a.out | ||
+ | |||
+ | ==LCG Software Elements== | ||
+ | |||
+ | One can also use software provided by [http://lcginfo.cern.ch LCG]. To list availabe software you can use: | ||
+ | |||
+ | export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases | ||
+ | export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} | ||
+ | lcgenv | ||
+ | lcgenv x86_64-slc6-gcc62-opt | ||
+ | lcgenv -p LCG_latest | ||
+ | lcgenv -p LCG_latest x86_64-slc6-gcc62-opt | ||
+ | |||
+ | E.g. if you need ROOT 6.08.06 with all dependencies (Boost, python, ...) use | ||
+ | |||
+ | export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases | ||
+ | export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} | ||
+ | eval "`lcgenv -p LCG_88 x86_64-slc6-gcc49-opt ROOT`" | ||
+ | |||
+ | For most recent version use | ||
+ | |||
+ | export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases | ||
+ | export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} | ||
+ | eval "`lcgenv x86_64-slc6-gcc62-opt all`" | ||
+ | eval "`lcgenv -p LCG_latest x86_64-slc6-gcc62-opt ROOT`" | ||
+ | |||
+ | ==Singularity containers== | ||
+ | |||
+ | To run application in a completely different environment you can use [http://singularity.lbl.gov singularity] containers. Use official documentation for information how to use singularity containers. Running shell from different distribution can be as simple as | ||
+ | |||
+ | # preview of official ATLAS SLC6 image | ||
+ | singularity shell --bind /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img | ||
+ | # preview of official ATLAS CC7 image | ||
+ | singularity shell --bind /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-centos7.img | ||
+ | |||
+ | # test SL6 OSG image | ||
+ | singularity shell --bind /cvmfs /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osg-3.3-wn-el6:latest | ||
+ | # test SL7 OSG image | ||
+ | singularity shell --bind /cvmfs /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osg-3.3-wn-el7:latest |
Revision as of 00:35, 31 October 2017
Servery / Služby |
Přístupné komukoliv |
Omezený/individuální účet |
Služby |
backup · DHCP · DNS · doména FJFI · eduroam · fileserver · IdM · forum · gitlab · lists · moodle · indico · mailgw · K4 · mailserver · NMS · openvpn · skolniftp · ssh · videokonference · VoIP · video · VPN · wififjfi · wiki · www |
Učebny |
e-sklipek · KFE unixlab · KFE pclab · PD1 · KM 105 · KM 115 |
Ostatní |
Network · Blokované porty |
[edit] · [view] |
Základní informace (sunrise)
- Správce
- Radek Novotný
- HW
- ~ 300 core cluster, 20TB storage
- OS
- SLC6 (Scientific Linux CERN)
- Využití
- cluster KF
- Konto
- domluvit se správcem
Basic info & links
- monitoring
- ssh -L 1080:127.0.0.1:1080 -L 2080:127.0.0.1:2080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz
- ganglia
- repozitáře balíčků
- puppetdb
- puppetboard
- squid
- services
- NAT + DNS (for worker nodes)
- Squid cache (CVMFS)
- Apache (kickstart, yum repository, monitoring interfaces)
- puppet (configuration management)
- PBSPro server
(Re)Instalace worker nodů
Instalace
- použít oficiální boot image SLC6 a nahrát na CD nebo flash
wget http://linuxsoft.cern.ch/cern/slc6X/x86_64/images/boot.iso livecd-iso-to-disk boot.iso /dev/sd?1 # nahradit "?" písmenem zařízení odpovídající flash
- nabootovat z CDROM/USB (na sunrise11-24 lze zobrazit boot menu po klávese F11)
- přidat bootovací parametry pro instalaci, po stisku klávesy TAB doplnit (XX je číslo sunriseXX)
ks=http://192.168.20.1/ks.php?id=XX ksdevice=eth0 ip=192.168.20.1XX gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc
- u worker nodů sunrise01-10 se při bootovaní z flash přehodí pořadí disků a je potřeba instalovat na sdb místo standardního prvního disku sda
ks=http://192.168.20.1/ks.php?id=XX&dev=sdb ...
Reinstall
- před instalací (ale po zastavení puppetu) je potřeba odstranit aktuální certifikát z puppet serveru
puppet cert clean sunsetXX.kfcluster
- z boot.iso vykopírovat vmlinuz a initramfs do /boot
- do konfigurace grubu v /boot/grub/menu.lst přidat další záznam s výše uvedenými soubory
- přidat parametry odpovídající bootovacím parametrům uvedeným v sekci Instalace
- v principu by mělo stačit spustit níže uvedený skript (01 je potřeba nahradit číslem sunrise stroje):
XX=01 cd /boot wget -O vmlinuz http://linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/vmlinuz wget -O initrd.img http://linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/initrd.img wget -O ks.cfg "http://192.168.20.1/ks.php?id=${XX}" cat >> /boot/grub/menu.lst <<EOF title Install root (hd0,0) kernel /vmlinuz ks=hd:/dev/sda1:/ks.cfg ksdevice=eth0 ip=192.168.20.1${XX} gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc initrd /initrd.img EOF
- pokud na puppet serveru nefunguje autosign, tak je potreba podepsat novy puppet certifikat
# vypsání nepodepsaných certifikátů (parametr --all vypíše všechny) puppet cert list # podepsání certifikátu puppet cert sign sunsetXX.kfcluster
Konfigurace
Konfigurace worker nodů se provádí pomocí puppetu z ashley.fjfi.cvut.cz.
Creating puppet.git repository
# source scl_source enable git19 cd git init --bare puppet.git git clone ssh://root@ashley.fjfi.cvut.cz:/root/puppet.git cd puppet cp -a /etc/puppetlabs/code/environments/production/* rm -rf .git git commit -m "Initial commit with basic configuration for puppet 4.10.1" git branch -m master production git push origin production # create puppet.git/hooks/post-receive using code from # https://puppet.com/blog/git-workflow-and-puppet-environments # modify code to skip post-receive hook for branches with "tmp" prefix
Puppet configuration workflow
The layout of production directory is inspired by common profiles/roles/nodes abstraction layers described e.g. here.
Our puppet configuration is stored in GIT repository and automatically applied on puppet server after successful GIT push in main puppet.git repository. Don't directly modify files in `puppet config print environmentpath` subdirectories. You can create your own test environment directly on puppet server only if its name starts with "tmp" or "work" prefix. Branches from puppet.git with "tmp" prefix are excluded from automatic post-receive commit hook and they are not cloned in the puppet configuration environment.
- on RHEL6 it is necessary to use SCL version of git 1.9
source scl_source enable git19
- checkout current data from remote production branch
git clone -b production ssh://root@ashley.fjfi.cvut.cz:/root/puppet.git work_user cd work_user git checkout -b work_user git config user.name "First Surename" git config user.email "first.surename@fjfi.cvut.cz"
- modify files in "work_user" directory and if you cloned these files in puppet environment directory /etc/puppetlabs/code/environments you can test updated configuration
puppet apply --environment=work_user --test --debug
- when you are happy with updated configuration merge modification in "production" branch and commit to master repository
git commit -m "summary info for modifications" file1 file2 ... git checkout production git pull git checkout work_user git rebase production git checkout production git merge work_user git push
Monitoring puppet
Monitoring web interfaces listen only on localhost, you have to tunnel local ports from ashley.fjfi.cvut.cz to you machine before you are able to see provided data
ssh -L 1080:127.0.0.1:1080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz
Batch
Currently this cluster is using PBSPro as a batch system.
- user commands to submit/check/delete jobs
qsub script.sh qstat qdel job_id
- show queue configuration
qstat -Q -f qmgr -c 'p s'
- set worker node online/offline
pbsnodes -o sunriseXX-0 pbsnodes -r sunriseXX-0
Squid (CVMFS)
Machines located at FNSPE should used for CVMFS our local squid proxy and as a backup it is also possible (allowed) to utilize FZU proxy. CVMFS configuration in /etc/cvmfs/default.local should contain:
CVMFS_HTTP_PROXY="http://squid.fjfi.cvut.cz:3128;http://squid.farm.particle.cz:3128;DIRECT"
For KF cluster worker nodes it is now better to use Squid cache directly from headnode
CVMFS_HTTP_PROXY="http://ashley.fjfi.cvut.cz:3128;http://squid.farm.particle.cz:3128;DIRECT"
User software
Intel compiler
# setup environment for Intel compiler source /fjfi/apps/intel/Compiler/11.0/074/bin/ifortvars.sh intel64 # Hello World příklad kompilovaný pomocí Intel Fortran Compileru cat > hello.f <<EOF program hello print *, "Hello World!" end program hello EOF ifortbin hello.f ./a.out
LCG Software Elements
One can also use software provided by LCG. To list availabe software you can use:
export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} lcgenv lcgenv x86_64-slc6-gcc62-opt lcgenv -p LCG_latest lcgenv -p LCG_latest x86_64-slc6-gcc62-opt
E.g. if you need ROOT 6.08.06 with all dependencies (Boost, python, ...) use
export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} eval "`lcgenv -p LCG_88 x86_64-slc6-gcc49-opt ROOT`"
For most recent version use
export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases export PATH=/cvmfs/sft.cern.ch/lcg/releases/lcgenv/latest:${PATH} eval "`lcgenv x86_64-slc6-gcc62-opt all`" eval "`lcgenv -p LCG_latest x86_64-slc6-gcc62-opt ROOT`"
Singularity containers
To run application in a completely different environment you can use singularity containers. Use official documentation for information how to use singularity containers. Running shell from different distribution can be as simple as
# preview of official ATLAS SLC6 image singularity shell --bind /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img # preview of official ATLAS CC7 image singularity shell --bind /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-centos7.img # test SL6 OSG image singularity shell --bind /cvmfs /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osg-3.3-wn-el6:latest # test SL7 OSG image singularity shell --bind /cvmfs /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osg-3.3-wn-el7:latest