Thursday, May 28, 2020

C++ keywords, explicit, default, delete, noexcept, override, and final

C++ keywords, explicit, default, delete, noexcept, override, and final

explicit
prevent use for implicit conversion in side a class

for example

explicit Array(int size);

This code prevent implicit conversion from int to Array class

default
Let compiler generate default functions.

delete
Remove the default implementation of a method

Array(const Array&) = delete;

for example, the above code prevents C++ from automatically creating copy constructor

noexcept
Optimize the code without worrying about exception. Does not guarantee there is no exceptions throw in the function.

override
Tell a virtual function it must override a function in the base class. Preventing override with different parameters.

final
disallow inheritance from class or function
for example

struct Base final{
}

Base struct cannot be inherited.

Thursday, May 21, 2020

R Parallel Writing to Files

R Parallel Writing to Files

Here I will use foreach and doParallel library to demonstrate how to parallel write to file.

library(stringr)
library(flock)
library(foreach)
library(doParallel)
cl <- makeCluster(detectCores(), outfile = "a.out")
registerDoParallel(cl)
lock0 <-tempfile()
foreach (
  i = 1:10000,
  .combine = cbind,
  .packages = c('stringr', 'flock'),
  .export = ls(globalenv())
) %dopar% {
  locked0 <- flock::lock(lock0)
  write(i,file="outfile.txt"),append=TRUE) 
  flock::unlock(locked0)
}
stopCluster(cl)

The makeCluster(detectCores(), outfile = "a.out") statement make a cluster by using the all available cores, and the console output will be direct to a.out file.

The statement registerDoParallel(cl) register the cluster as the foreach parallel backend.
Note, int the foreach statements, we have .packages = c('stringr', 'flock') and .export = ls(globalenv(). The former exposes the specified packages to the context inside the foreach loop and the latter exposes all the declared variable to the foreach loop. Without this, the inside foreach loop cannot see the outside library or variables.

To avoid data race problem when multiple processes/threads writing to the same file, we use flock library as a mutex and wrap the write operation by flock::lock and flock::unlock.

Using mutex can make the processing really slow. The other way to do this is that each process write to its separate file. You can use the process id in the file name. For example,

write(i,file=paste(c("outfile",Sys.getpid(),".txt"), collapse =""),append=TRUE)

One thing to notice is that, if you parallel processing include database connections, the above code will fail since the parallel process cannot spawn the database connections. You can use the below code initialize the connections when build the cluster using clusterEvalQ.

library(RODBC)   #use the ODBC library
library(DBI)
library(odbc)
odbcCloseAll()
library(foreach)
library(doParallel)
cl <- makeCluster(detectCores(), outfile = "a.out")
clusterEvalQ(cl, {
   library(odbc)
   library(RODBC)
   library(DBI)
   dbname1 = "test"  # change this when change server!!!
   channel1 = RODBC::odbcConnect(dbname1)
   con1 <- DBI::dbConnect(odbc(),  dbname1)
})
registerDoParallel(cl)

Visual Studio 2017 SSIS project incompatible

Visual Studio 2017 SSIS project incompatible

Open Visual Studio 2017, select Tools–> Extensions and Updates. Click Online in the left pane, search “Microsoft Reporting Services Projects”. Then click install. You need to close Visual Studio to let the installation begin. When it is done, Tools–> Extensions and Updates. Click Installed in the left pane, search “Microsoft Reporting Services Projects”. Click Enable.
Right click the incompatible project, click Reload. This should solve the problem.

Monday, May 18, 2020

C++ avoid arithmetic operation on size_t time

C++ avoid arithmetic operation on size_t time

Here is a simple test code about this topic

#include <typeinfo>

    string s= "a";
    int i = 0;
    cout<<i<<" "<<typeid(i).name()<<endl;
    cout<<s.length()<<" "<<typeid(s.length()).name()<<endl;
    cout<<i - s.length()<<" "<<typeid(i - s.length()).name()<<endl;

The output is a very large number (18446744073709551615) instead of -1 as intended (see below).

0 i
1 m
18446744073709551615 m

As the type of s.length() is size_t. size_t is unsigned int or unsigned long depending on the machine used. It seems that the compiler converts the value to an unsigned long type.
To avoid this kind of problem, using something int n = s.length(); and then use this variable to do the calculations.

Sunday, May 17, 2020

Is it safe to delete a pointer to nullptr

Is it safe to delete a pointer to nullptr

I did a test on a online compiler(online compiler) with the below code.

#include <iostream>

using namespace std;

int main()
{
    int * a = nullptr;
    delete a;
    return 0;
}

The program compiles and runs with no problem. So it is safe to delete pointer too a nullptr. However, you can not execute delete nullptr; directly since nullptr is a pointer literal.

Sunday, May 10, 2020

Solving AWS ParallelCluster Cannot Submit Multiple Node using Slurm + OpenMPI

Solving AWS ParallelCluster Cannot Submit Multiple Node using Slurm + OpenMPI

Recently, I have tried out AWS ParallelCluster which is a Linux based HPC cluster solution. We use Slurm as the scheduler and OpenMPI. When submit jobs to multiple compute, it has various error messages, below is one version of it.

[ip-10-0-19-27][[16152,1],0][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],1]
[ip-10-0-19-27][[16152,1],1][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],0]
[ip-10-0-19-27][[16152,1],2][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],3]
[ip-10-0-19-27][[16152,1],3][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],2]
[ip-10-0-20-194][[16152,1],4][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],5]
[ip-10-0-20-194][[16152,1],5][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],4]
[ip-10-0-20-194][[16152,1],6][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],7]
[ip-10-0-20-194][[16152,1],7][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[16152,1],6]

It turns out that OpenMPI somehow did not find the network interface. Adding –mca btl_tcp_if_include ens3 command line parameter to mpirun will solve the problem. Here ens3 is the default network interface. You could find it using ifconfig.

Below is an sample submission script.

#!/bin/bash
#SBATCH --job-name=montecarlojob
#SBATCH --ntasks=8
#SBATCH --output=%x_%j.out
module load openmpi
mpirun --mca btl_tcp_if_include ens3 -np 8 a.out

Saturday, May 2, 2020

Conversion between CMakeList and Visual Studio Solution

Conversion between CMakeList and Visual Studio Solution

Convert Visual Studio solution file to CMakeList file

Bellow the solution Github repository
https://github.com/pavelliavonau/cmakeconverter

pip install cmake-converter
cmake-converter -s <path/to/file.sln>

Convert CMakeLists to Visual Studio solution

Bellow the solution Github repository
https://cognitivewaves.wordpress.com/cmake-and-visual-studio/

mkdir _build
cd _build
cmake .. -G "Visual Studio 15 2017 Win64"