Difference between revisions of "Sunrise.fjfi.cvut.cz"

From NMS
Jump to: navigation, search
(Puppet configuration workflow)
(16 intermediate revisions by the same user not shown)
Line 8: Line 8:
 
;Využití : cluster KF
 
;Využití : cluster KF
 
;Konto : domluvit se správcem
 
;Konto : domluvit se správcem
 +
 +
==Basic info & links==
 +
 +
* monitoring
 +
** <tt>ssh -L 1080:127.0.0.1:1080 -L 2080:127.0.0.1:2080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz</tt>
 +
** [http://ashley.fjfi.cvut.cz/ganglia ganglia]
 +
** [http://ashley.fjfi.cvut.cz/repos repozitáře balíčků]
 +
** [http://127.0.0.1:8080/pdb/dashboard/index.html puppetdb]
 +
** [http://127.0.0.1:1080/ puppetboard]
 +
** [http://127.0.0.1:2080/ squid]
 +
* services
 +
** NAT + DNS (for worker nodes)
 +
** Squid cache (CVMFS)
 +
** Apache (kickstart, yum repository, monitoring interfaces)
 +
** puppet (configuration management)
 +
** PBSPro server
  
 
==(Re)Instalace worker nodů==
 
==(Re)Instalace worker nodů==
Line 25: Line 41:
 
===Reinstall===
 
===Reinstall===
  
 +
* před instalací (ale po zastavení puppetu) je potřeba odstranit aktuální certifikát z puppet serveru
 +
puppet cert clean sunsetXX.kfcluster
 
* z boot.iso vykopírovat vmlinuz a initramfs do <tt>/boot</tt>
 
* z boot.iso vykopírovat vmlinuz a initramfs do <tt>/boot</tt>
 
* do konfigurace grubu v <tt>/boot/grub/menu.lst</tt> přidat další záznam s výše uvedenými soubory
 
* do konfigurace grubu v <tt>/boot/grub/menu.lst</tt> přidat další záznam s výše uvedenými soubory
Line 41: Line 59:
 
         initrd /initrd.img
 
         initrd /initrd.img
 
  EOF
 
  EOF
 +
* pokud na puppet serveru nefunguje autosign, tak je potreba podepsat novy puppet certifikat
 +
# vypsání nepodepsaných certifikátů (parametr --all vypíše všechny)
 +
puppet cert list
 +
# podepsání certifikátu
 +
puppet cert sign sunsetXX.kfcluster
  
 
==Konfigurace==
 
==Konfigurace==
Line 46: Line 69:
 
Konfigurace worker nodů se provádí pomocí puppetu z <tt>ashley.fjfi.cvut.cz</tt>.
 
Konfigurace worker nodů se provádí pomocí puppetu z <tt>ashley.fjfi.cvut.cz</tt>.
  
==Creating puppet.git repository==
+
===Creating puppet.git repository===
  
 
  # source scl_source enable git19
 
  # source scl_source enable git19
Line 62: Line 85:
 
  # modify code to skip post-receive hook for branches with "tmp" prefix
 
  # modify code to skip post-receive hook for branches with "tmp" prefix
  
==Puppet configuration workflow==
+
===Puppet configuration workflow===
  
 
The layout of production directory is inspired by common profiles/roles/nodes abstraction layers described e.g. [http://www.craigdunn.org/2012/05/239/ here].
 
The layout of production directory is inspired by common profiles/roles/nodes abstraction layers described e.g. [http://www.craigdunn.org/2012/05/239/ here].
  
Our puppet configuration is stored in GIT repository and automatically applied on puppet server after successful data push in main <tt>puppet.git</tt> repository. Don't directly modify files in <tt>`puppet config print environmentpath`</tt> subdirectories. You can create your own test environment directly on puppet server only if its name starts with "tmp" prefix. Branches from <tt>puppet.git</tt> with "tmp" prefix are excluded from automatic post-receive commit hook and they are not cloned in the puppet configuration environment.
+
Our puppet configuration is stored in GIT repository and automatically applied on puppet server after successful GIT push in main <tt>puppet.git</tt> repository. Don't directly modify files in <tt>`puppet config print environmentpath`</tt> subdirectories. You can create your own test environment directly on puppet server only if its name starts with "tmp" or "work" prefix. Branches from <tt>puppet.git</tt> with "tmp" prefix are excluded from automatic post-receive commit hook and they are not cloned in the puppet configuration environment.
  
 
* on RHEL6 it is necessary to use SCL version of git 1.9
 
* on RHEL6 it is necessary to use SCL version of git 1.9
 
  source scl_source enable git19
 
  source scl_source enable git19
 
* checkout current data from remote production branch
 
* checkout current data from remote production branch
  git clone -b production ssh:/<nowiki/>/root@ashley.fjfi.cvut.cz:/root/puppet.git tmpwork
+
  git clone -b production ssh:/<nowiki/>/root@ashley.fjfi.cvut.cz:/root/puppet.git work_user
  cd tmpwork
+
  cd work_user
  git checkout -b tmpwork
+
  git checkout -b work_user
* modify files in "tmpwork" directory and if you cloned these files in puppet environment directory <tt>/etc/puppetlabs/code/environments</tt> you can test updated configuration
+
git config user.name "First Surename"
  puppet apply --environment=tmpwork --test --debug
+
git config user.email "first.surename@fjfi.cvut.cz"
 +
* modify files in "work_user" directory and if you cloned these files in puppet environment directory <tt>/etc/puppetlabs/code/environments</tt> you can test updated configuration
 +
  puppet apply --environment=work_user --test --debug
 
* when you are happy with updated configuration merge modification in "production" branch and commit to master repository
 
* when you are happy with updated configuration merge modification in "production" branch and commit to master repository
 
  git commit -m "summary info for modifications" file1 file2 ...
 
  git commit -m "summary info for modifications" file1 file2 ...
 
  git checkout production
 
  git checkout production
 
  git pull
 
  git pull
  git checkout tmpwork
+
  git checkout work_user
 
  git rebase production
 
  git rebase production
 
  git checkout production
 
  git checkout production
  git merge tmpwork
+
  git merge work_user
 
  git push
 
  git push
 +
 +
===Monitoring puppet===
 +
 +
Monitoring web interfaces listen only on <tt>localhost</tt>, you have to tunnel local ports from <tt>ashley.fjfi.cvut.cz</tt> to you machine before you are able to see provided data
 +
 +
ssh -L 1080:127.0.0.1:1080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz
 +
 +
* [http://127.0.0.1:8080/pdb/dashboard/index.html puppetdb]
 +
* [http://127.0.0.1:1080/ puppetboard]
  
 
==Batch==
 
==Batch==
  
Aktuálně je nainstalován Torque.
+
Currently this cluster is using PBSPro as a batch system.
  
  # nastavení worker nodu online/offline
+
* show queue configuration
 +
  qstat -Q -f
 +
qmgr -c 'p s'
 +
 
 +
* set worker node online/offline
 
  pbsnodes -o sunriseXX-0
 
  pbsnodes -o sunriseXX-0
 
  pbsnodes -r sunriseXX-0
 
  pbsnodes -r sunriseXX-0
 +
 +
==Squid (CVMFS)==
 +
 +
Machines located at FNSPE should used for CVMFS our local squid proxy and as a backup it is also possible (allowed) to utilize FZU proxy. CVMFS configuration in <tt>/etc/cvmfs/default.local</tt> should contain:
 +
 +
CVMFS_HTTP_PROXY="http:/<nowiki/>/squid.fjfi.cvut.cz:3128;http:/<nowiki/>/squid.farm.particle.cz:3128;DIRECT"
 +
 +
For KF cluster worker nodes it is now better to use Squid cache directly from headnode
 +
 +
CVMFS_HTTP_PROXY="http:/<nowiki/>/ashley.fjfi.cvut.cz:3128;http:/<nowiki/>/squid.farm.particle.cz:3128;DIRECT"
 +
 +
=Uživatelský software=
 +
 +
==Intel kompilátor==
 +
 +
# nastavení prostředí pro použití Intel kompilátoru
 +
source /fjfi/apps/intel/Compiler/11.0/074/bin/ifortvars.sh intel64
 +
# Hello World příklad kompilovaný pomocí Intel Fortran Compileru
 +
cat > hello.f <<EOF
 +
        program hello
 +
            print *, "Hello World!"
 +
        end program hello
 +
EOF
 +
ifortbin hello.f
 +
./a.out

Revision as of 18:06, 25 July 2017

Servery / Služby
Přístupné komukoliv
windows
srk
linux / unix
kmlinux
Omezený/individuální účet
linux / unix
bimbo · buon(KF) · km(KM) · lenochod(KJR) · linux · node(KM) · sunrise(KF) · unixlab(KFE) · vkstat(KM)
Služby
backup · DHCP · DNS · doména FJFI · eduroam · fileserver · IdM · forum · gitlab · lists · moodle · indico · mailgw · K4 · mailserver · NMS · openvpn · skolniftp · ssh · videokonference · VoIP · video · VPN · wififjfi · wiki · www
Učebny
e-sklipek · KFE unixlab · KFE pclab · PD1 · KM 105 · KM 115
Ostatní
Network · Blokované porty
[edit] · [view]

Základní informace (sunrise)

Správce 
Radek Novotný
HW 
~ 300 core cluster, 20TB storage
OS 
SLC6 (Scientific Linux CERN)
Využití 
cluster KF
Konto 
domluvit se správcem

Basic info & links

  • monitoring
  • services
    • NAT + DNS (for worker nodes)
    • Squid cache (CVMFS)
    • Apache (kickstart, yum repository, monitoring interfaces)
    • puppet (configuration management)
    • PBSPro server

(Re)Instalace worker nodů

Instalace

  • použít oficiální boot image SLC6 a nahrát na CD nebo flash
wget http://linuxsoft.cern.ch/cern/slc6X/x86_64/images/boot.iso
livecd-iso-to-disk boot.iso /dev/sd?1
# nahradit "?" písmenem zařízení odpovídající flash
  • nabootovat z CDROM/USB (na sunrise11-24 lze zobrazit boot menu po klávese F11)
  • přidat bootovací parametry pro instalaci, po stisku klávesy TAB doplnit (XX je číslo sunriseXX)
ks=http://192.168.20.1/ks.php?id=XX ksdevice=eth0 ip=192.168.20.1XX gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc
  • u worker nodů sunrise01-10 se při bootovaní z flash přehodí pořadí disků a je potřeba instalovat na sdb místo standardního prvního disku sda
ks=http://192.168.20.1/ks.php?id=XX&dev=sdb ...

Reinstall

  • před instalací (ale po zastavení puppetu) je potřeba odstranit aktuální certifikát z puppet serveru
puppet cert clean sunsetXX.kfcluster
  • z boot.iso vykopírovat vmlinuz a initramfs do /boot
  • do konfigurace grubu v /boot/grub/menu.lst přidat další záznam s výše uvedenými soubory
  • přidat parametry odpovídající bootovacím parametrům uvedeným v sekci Instalace
  • v principu by mělo stačit spustit níže uvedený skript (01 je potřeba nahradit číslem sunrise stroje):
XX=01
cd /boot
wget -O vmlinuz http://linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/vmlinuz
wget -O initrd.img http://linuxsoft.cern.ch/cern/slc6X/x86_64/isolinux/initrd.img
wget -O ks.cfg "http://192.168.20.1/ks.php?id=${XX}"

cat >> /boot/grub/menu.lst <<EOF
title Install
        root (hd0,0)
        kernel /vmlinuz ks=hd:/dev/sda1:/ks.cfg ksdevice=eth0 ip=192.168.20.1${XX} gateway=192.168.20.1 netmask=255.255.255.0 dns=147.32.9.4 ssh vnc
        initrd /initrd.img
EOF
  • pokud na puppet serveru nefunguje autosign, tak je potreba podepsat novy puppet certifikat
# vypsání nepodepsaných certifikátů (parametr --all vypíše všechny)
puppet cert list
# podepsání certifikátu
puppet cert sign sunsetXX.kfcluster

Konfigurace

Konfigurace worker nodů se provádí pomocí puppetu z ashley.fjfi.cvut.cz.

Creating puppet.git repository

# source scl_source enable git19
cd
git init --bare puppet.git
git clone ssh://root@ashley.fjfi.cvut.cz:/root/puppet.git
cd puppet
cp -a /etc/puppetlabs/code/environments/production/* 
rm -rf .git
git commit -m "Initial commit with basic configuration for puppet 4.10.1"
git branch -m master production
git push origin production
# create puppet.git/hooks/post-receive using code from
# https://puppet.com/blog/git-workflow-and-puppet-environments
# modify code to skip post-receive hook for branches with "tmp" prefix

Puppet configuration workflow

The layout of production directory is inspired by common profiles/roles/nodes abstraction layers described e.g. here.

Our puppet configuration is stored in GIT repository and automatically applied on puppet server after successful GIT push in main puppet.git repository. Don't directly modify files in `puppet config print environmentpath` subdirectories. You can create your own test environment directly on puppet server only if its name starts with "tmp" or "work" prefix. Branches from puppet.git with "tmp" prefix are excluded from automatic post-receive commit hook and they are not cloned in the puppet configuration environment.

  • on RHEL6 it is necessary to use SCL version of git 1.9
source scl_source enable git19
  • checkout current data from remote production branch
git clone -b production ssh://root@ashley.fjfi.cvut.cz:/root/puppet.git work_user
cd work_user
git checkout -b work_user
git config user.name "First Surename"
git config user.email "first.surename@fjfi.cvut.cz"
  • modify files in "work_user" directory and if you cloned these files in puppet environment directory /etc/puppetlabs/code/environments you can test updated configuration
puppet apply --environment=work_user --test --debug
  • when you are happy with updated configuration merge modification in "production" branch and commit to master repository
git commit -m "summary info for modifications" file1 file2 ...
git checkout production
git pull
git checkout work_user
git rebase production
git checkout production
git merge work_user
git push

Monitoring puppet

Monitoring web interfaces listen only on localhost, you have to tunnel local ports from ashley.fjfi.cvut.cz to you machine before you are able to see provided data

ssh -L 1080:127.0.0.1:1080 -L 8080:127.0.0.1:8080 ashley.fjfi.cvut.cz

Batch

Currently this cluster is using PBSPro as a batch system.

  • show queue configuration
qstat -Q -f
qmgr -c 'p s'
  • set worker node online/offline
pbsnodes -o sunriseXX-0
pbsnodes -r sunriseXX-0

Squid (CVMFS)

Machines located at FNSPE should used for CVMFS our local squid proxy and as a backup it is also possible (allowed) to utilize FZU proxy. CVMFS configuration in /etc/cvmfs/default.local should contain:

CVMFS_HTTP_PROXY="http://squid.fjfi.cvut.cz:3128;http://squid.farm.particle.cz:3128;DIRECT"

For KF cluster worker nodes it is now better to use Squid cache directly from headnode

CVMFS_HTTP_PROXY="http://ashley.fjfi.cvut.cz:3128;http://squid.farm.particle.cz:3128;DIRECT"

Uživatelský software

Intel kompilátor

# nastavení prostředí pro použití Intel kompilátoru
source /fjfi/apps/intel/Compiler/11.0/074/bin/ifortvars.sh intel64
# Hello World příklad kompilovaný pomocí Intel Fortran Compileru
cat > hello.f <<EOF
        program hello
           print *, "Hello World!"
        end program hello
EOF
ifortbin hello.f
./a.out