OSX is much better than Windows, isn’t it? That’s a common wisdom, and it seemed to be confirmed once more when I installed XGBoost on both OS. Before I deep dive, let me briefly describe XGBoost. It is a machine learning algorithm that yields great results on recent Kag
Reality is a bit different, and the OSX installation isn’t as smooth as it seems. To be accurate, the default OSX installation of XGBoost runs in single thread mode, as explained in these instructions.
Why is this a problem? Because XGBoost is a machine learning algorithm, and running it may be time consuming. I decided to install it on my computers to give it a try. I am currently working on a dataset with about 100k rows (samples) only, and tuning XGBoost on my old Windows laptop (a Lenovo W520) takes about 2 hours. What surprised me is that it takes 7 hours on my brand new Macbook Pro! It is a bit weird, given they both have Intel i7 quad core cpus, and given that the Mac clock speed is higher. Add to this the premium price of the Mac, and you get me really surprised.
I further observed that other cpu intensive tasks are faster on the Mac Book Pro. Something is definitely wrong, but the culprit is easy to spot: it is all about XGBoost being single threaded on OSX.
Before I explain how to enable multi threading for XGBoost, let me point you to this excellent Comp
Back to XGBoost, the inst
- Get Homebrew if it is not installed yet. Indeed, this is a very useful open source installer for OSX. Instaling it is straightforward, open a terminal, then paste and execute the instruction available on Homebrew home page. I reproduce it here for convenience:
/usr/bin/ruby -e "$(curl -fsSL http
s:// raw. gith ubus erco nten t.co m/Ho mebr ew/i nsta ll/m aste r/in stal l) "
- Get gcc with open mp. Just paste and execute the following command in your terminal, once Homebrew installation is completed.
brew install gcc --without-multilib
This automatically downloads and builds gcc. It can take a while, it took about 30 minutes for me. Be patient.
- Get XGBoost. Go to where you want in your filesystem, say <directoy>. Then type the git clone command and execute it:
cd <directory> git clone --recursive http
s:// gith ub.c om/d mlc/ xgbo os t
This downloads the XGBoost code into a new directory named xgboost.
- Next step is to build XGBoost. By default, the build process will use the default compilers, cc and c++, which do not support the open mp option used for XGBoost multi-threading. We need to tell the system to use the compiler we just installed. That’s the step that was missing from the inst
alla tion ins truc tion son XGBoost site.
There are various ways to do it, here is the one I used.
- Go to where we downloaded XGBoost
- Then open make/config.mk and uncomment these two lines
export CC = gcc
export CXX = g++
- Depending on you g++ installaiton you may need to change the above two lines into:
export CC = gcc-6
export CXX = g++-6
- We then build with the following commands.
cd <directory>/xgboost cp make/config.mk . make -j4
- Once the build is finished, we can use XGBoost with its command line. I am using Python, hence I performed this final step. You may need to enter the admin password to execute it.
cd python-package; sudo python setup.py install
This concludes the installation.
I tested it with My Anaconda distribution with Python 3.5. It worked fine, and I could run XGBoost. The speedup thanks to multi threading is noticeable, and my Mac Book Pro is now faster than my old PC.
Updated on July 16, 2016. Makefile changed in xgboost, making it easier to use gcc.
Updated on Jan 4, 2017. Upated the gcc and g++ declarations in makefile. The original way didn’t worked on some g++ installations. Thanks to Brandon Mitchell who spot the issue.