Friday, 19 July 2013

Technical Post - Installing HDF5 and NetCDF4 and NetCDF4 libraries for Fortran

This is a post I've been meaning to do for a while now, ever since I got it to work, but it may not be of interest to everyone. Ever since I started working on inverse modelling, I've found that I've needed to use Linux more and more, particularly because many of the atmospheric transport models are written in the Fortran programming language. Also, if you ever plan to do anything on a super computer, you're more than likely going to be dealing with the Linux operating system. I wouldn't say that I'm the greatest fan of Linux, but I have learnt to get my way around the terminal window, to the point where I in fact feel I've learnt to view the computer in an entirely different way. Not to give away too much about my age, but we did have the DOS operating system on our first computer, so as a kid I could get my way around the computer and perform a few operations (like backing up and restoring games). So the terminal was not too big of a shock for me. My biggest problem with Linux is how difficult it is (for me) to install a new program if you don't already have all the correct libraries. And then you install the missing libraries it's complaining about, and then everything breaks. I just wish that there was a little bit more self checking going on, so that it would stop you before you did something stupid. Or even better, check first what libraries you have, and then actually install automatically the correct, compatible libraries for you. But anyway ... that's just me.

So what I would like to present today is my solution to installing important libraries used for accessing very large, generally spatial, datasets. And that's the HDF5 and NetCDF4 libraries. Particularly for the use in Fortran. And as it stands, I can also use NetCDF successfully in Python (but with a few extra NetCDF Python specific packages). The reason why I'm posting this is because I routinely break Linux. I've found that it's a skill I possess. One day everything will be working like clockwork, and the next moment, there are black screens and hung computers. And it's back to the drawing board again. So I've had to install NetCDF (in particular NetCDF, but you need HDF5 if you want to use NetCDF4) on several occasions, and every single time it's been a painful exploration through the internet forum space, greedily searching for any solution which will result in the easy installation of these libraries. After the last time, I decided that enough was enough, and I documented everything that I did, so that when it happens again, I can just follow my step by step guide, without having to do any searching, and it will just work the first time. I'm sure it's a pipe dream, but here it goes anyway. Maybe this will help one person out there, and save them a week's worth of frustration.

This set of instructions is aimed at those planning to use the Intel Fortran compliers on Ubuntu. The important point is that you have to compile NetCDF with the compiler you're going to be using (you could also be using gfortran). Now if you install NetCDF from the repository you are going to run into problems here. I had attempted to do this (I always try to install from the repository if I can), but I eventually had to uninstall the NetCDF version I had installed from the repository. This is the version that kept coming up when I used the command nc-config --all, and I think this is the reason why, no matter how I tried to link the libraries, ifort would tell me that NetCDF had not been compiled with ifort. So rather don't have any other NetCDF versions lurking.

The first thing you have to do is make sure that your .bashrc file is setup correctly so that your compiler and its libraries are working right from the start of the session. And you also need to make a few other changes to point your installation in the right direction. This is by far the scariest part of the installation, because quite frankly, I have no idea what most of these lines are actually doing. Some of them may even be redundant. But these instructions I found on various webpages, trying to explain exactly how to install NetCDF for
Intel Fortran compilers (so mainly sourced from the Unidata website and from the Intel website), and for me these changes did appear to work and did no harm. From what I understand, we're basically trying to point the computer to where it needs to look for various libraries and how to treat certain file types.

I mainly followed the advice from the Unidata website and installed everything (zlib, HDF5, NetCDF4, and NetCDF4-Fortran into one folder (in my case this was /home/alecia/local), which was not the default folder.  When I didn't use the prefix command and let it install in the default location, then I couldn't get HDF5 to install.
These are the extra lines in my .bashrc file, which get added right at the end of the file  [To get into the .bashrc file, you type: cd, and then you type vim .bashrc if you have the vim editor]:

source /opt/intel/composer_xe_2013.3.163/bin/ intel64
export PATH
source /opt/intel/composer_xe_2013.3.163/bin/intel64/
export PATH
export F90=ifort
export F77=ifort
export F9X=ifort
export PATH=$PATH:/home/alecia/local/bin
export NETCDF=/home/alecia/local
export NETCDF_LIB=/home/alecia/local/lib
export NETCDF_INC=/home/alecia/local/include
Save the .bashrc file, and then type

source .bashrc
which will execute the new setup file and make the changes you've added.

The first few lines, up until the second "export PATH" are telling the computer where the Intel libraries are. From the "export F90=ifort" line, this is specifically for the HDF5 and NetCDF4 installation.

Then obtain the tar.gz installation files for curl, zlib, HDF5, NetCDF4, and NetCDF4 libraries for Fortran from the unidata website
I actually installed curl from the repository, because the one I got from the website didn't want to install. But this seemed to work just fine.
Then extract zlib into its own directory:
tar zxvf zlib-1.2.7.tar.gz
cd zlib-1.2.7/
make check install

Of course, everywhere that you see "alecia" you would just replace with your own home folder name. And the version number would be the latest version number. 
Then extract HDF5 in its own folder:
tar zxvf hdf5-1.8.9.tar.gz
cd hdf5-1.8.9/
CFLAGS=-O0 ./configure --prefix=/home/alecia/local --with-zlib=/home/alecia/local --enable-fortran
make check install

Then extract NetCDF4 into its own folder:
tar zxvf netcdf-4.3.0.tar.gz
cd netcdf-4.3.0/
CPPFLAGS=-I/home/alecia/local/include LDFLAGS=-L/home/alecia/local/lib ./configure --prefix=/home/alecia/local 
make check install
Then extract NetCDF4 Fortran libraries into its own folder:
tar zxvf netcdf-fortran-4.2.tar.gz
cd netcdf-fortran-4.2/ 
CPPFLAGS=-I/home/alecia/local/include LDFLAGS=-L/home/alecia/local/lib ./configure --prefix=/home/alecia/local

Then to compile with ifort I needed to use the following command (and apparently the order is important, but I haven't tried different combinations):
ifort -o convert.out -I/home/alecia/local/include convert_ccamtolpdm2_go.f90 -L/home/alecia/local/lib -lnetcdff -lnetcdf

Of course here you would replace the f90 file and the output file with your own.

And to get the executable to run, I had to use:
./convert.out -rpath=/home/alecia/local/lib

When I restarted the system the next day, and tried to redo this, I believe I ran into some problems, so that's when I discovered how to correctly modify the LD_LIBRARY_PATH list (yet again - thanks to the wonderful internet invention of the forum). This is once again a modification done in the .bashrc file.


These two lines get added right at the end of the .bashrc file. As I understand it, LD_LIBRARY_PATH is a list of all the places that the computer needs to look when trying to find a library. When you install libraries into non-default locations then you need to add this location to the list. What you don't want to do is erase the list and then just add the new library location. That is why if you read up about LD_LIBRARY_PATH modifications, a lot of people will tell you just not to do it at all. So the format that I've used here is very important, because this way it definitely adds the library location, as opposed to wiping out the list. Just for you to check, before you make this modification, you can type  

and run in the terminal, and it will give you the list. You can then make the modification to the .bashrc file and save, and then type

source .bashrc
and run and this will run the file and make the added changes. If you rerun the echo $LD_LIBRARY_PATH command, then it should give you the same list you previously had, but now with the new library path /home/alecia/local/lib.

Now you should be able to execute your new Fortran executables by just typing
(or whatever you've called your executable)

I'm sure there are several different ways of doing this, and probably many of them more elegant and efficient, but for me, using a personal computer with Intel i7-3770K machine , running Ubuntu 12.04 on a VMWare virtual machine, this worked (and I've gotten it to work twice - yes I have already broken Linux since finding this solution). I'm still compiling my Fortran code using the same line of commands, and I'm still able to open, read and write to NetCDF files. For how long this will continue I don't know, but if anything goes wrong, this is where I will start with building it up from scratch again.

I hope this helps somebody out there.

No comments:

Post a Comment