2018년 7월 26일 목요일

ppc64le 환경에서 pyarrow wheel 파일 build하기

h2o 및 h2o4gpu를 open source로부터 build하려면 xgboost4j_gpu.so가 필요합니다.  그런데 그걸 build하려면 또 pyarrow가 필요하지요.  하지만, ppc64le 아키텍처에서 pyarrow를 설치하려면 다음과 같이 error가 나는 것을 보셨을 것입니다.

$ pip install pyarrow
Collecting pyarrow
Using cached https://files.pythonhosted.org/packages/be/2d/11751c477e4e7f4bb07ac7584aafabe0d0608c170e4bff67246d695ebdbe/pyarrow-0.9.0.tar.gz
...
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
/tmp/pip-install-kil31a/pyarrow/build/temp.linux-ppc64le-2.7/lib.cxx:592:35: fatal error: arrow/python/platform.h: No such file or directory
#include "arrow/python/platform.h"
^
compilation terminated.
make[2]: *** [CMakeFiles/lib.dir/lib.cxx.o] Error 1
make[1]: *** [CMakeFiles/lib.dir/all] Error 2
make: *** [all] Error 2
error: command 'make' failed with exit status 2

이 문제에 대해서 최근에 arrow community 도움을 받아 해결을 했습니다.

https://github.com/apache/arrow/issues/2281

좀더 간단하게는 다음과 같이 정리할 수 있습니다.

먼저, Redhat에서는 다음과 같이 사전 필요 fileset들을 설치합니다.

[dhkim@ING ~]$ sudo yum install jemalloc jemalloc-devel boost boost-devel flex flex-devel bison bison-devel

[dhkim@ING ~]$ mkdir imsi
[dhkim@ING ~]$ cd imsi

이 error를 해결하는 핵심은 먼저 arrow와 parquet-cpp를 source에서 build 하는 것입니다.

[dhkim@ING imsi]$ git clone https://github.com/apache/arrow.git

[dhkim@ING imsi]$ git clone https://github.com/apache/parquet-cpp.git

[dhkim@ING imsi]$ which python
~/anaconda2/bin/python

여기서는 anaconda2를 사용하는데, anaconda3도 동일하게 build할 수 있습니다.  먼저 conda 명령어로 다음과 같은 package들을 설치합니다.

[dhkim@ING imsi]$ conda install numpy six setuptools cython pandas pytest cmake flatbuffers rapidjson boost-cpp thrift snappy zlib gflags brotli lz4-c zstd -c conda-forge

여기서는 user home directory 밑에 dist라는 directory에 arrow와 parquet-cpp를 설치하겠습니다.

[dhkim@ING imsi]$ mkdir dist
[dhkim@ING imsi]$ export ARROW_BUILD_TYPE=release
[dhkim@ING imsi]$ export ARROW_HOME=$(pwd)/dist
[dhkim@ING imsi]$ export PARQUET_HOME=$(pwd)/dist

[dhkim@ING imsi]$ mkdir arrow/cpp/build && cd arrow/cpp/build

[dhkim@ING build]$ cmake3 -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$ARROW_HOME  -DARROW_PYTHON=on -DARROW_PLASMA=on -DARROW_BUILD_TESTS=OFF  -DARROW_PARQUET=ON ..

[dhkim@ING build]$ make -j 8

[dhkim@ING build]$ make install
...
-- Installing: /home/dhkim/imsi/dist/include/arrow/python/platform.h
-- Installing: /home/dhkim/imsi/dist/include/arrow/python/pyarrow.h
-- Installing: /home/dhkim/imsi/dist/include/arrow/python/type_traits.h
-- Installing: /home/dhkim/imsi/dist/lib64/pkgconfig/arrow-python.pc

[dhkim@ING build]$ cd ~/imsi/arrow/python

[dhkim@ING python]$ MAKEFLAGS=-j8 ARROW_HOME=/home/dhkim/imsi/dist PARQUET_HOME=/home/dhkim/imsi/dist python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --inplace

[dhkim@ING python]$ export LD_LIBRARY_PATH=/home/dhkim/imsi/dist/lib64:$LD_LIBRARY_PATH

이제 pyarrow를 build할 준비가 끝났습니다.  다만, arrow 쪽의 사소한 bug로 인해, 다음과 같이 끝에 점(.)이 달린 *.so. 라는 soft link들을 만들어주어야 합니다.

[dhkim@ING python]$ ln -s /home/dhkim/imsi/dist/lib64/libarrow_python.so.11.0.0 /home/dhkim/imsi/dist/lib64/libarrow_python.so.
[dhkim@ING python]$ ln -s /home/dhkim/imsi/dist/lib64/libarrow.so.11.0.0 /home/dhkim/imsi/dist/lib64/libarrow.so.
[dhkim@ING python]$ ln -s /home/dhkim/imsi/dist/lib64/libparquet.so.1.4.1 /home/dhkim/imsi/dist/lib64/libparquet.so.

이제 wheel file을 build 합니다.

[dhkim@ING python]$ python setup.py build_ext --build-type=release --with-parquet --bundle-arrow-cpp bdist_wheel

다음과 같이 dist directory 밑에 만들어집니다.

[dhkim@ING python]$ ls -l dist/pyarrow-0.10.1.dev687+g18a61f6-cp36-cp36m-linux_ppc64le.whl
-rw-rw-r-- 1 dhkim dhkim 7195829 Jul 26 16:03 dist/pyarrow-0.10.1.dev687+g18a61f6-cp36-cp36m-linux_ppc64le.whl

이걸 pip로 설치하고, import까지 잘 되는 것을 확인하실 수 있습니다.

[dhkim@ING python]$ pip install dist/pyarrow-0.10.1.dev687+g18a61f6-cp36-cp36m-linux_ppc64le.whl

[dhkim@ING python]$ pip list | grep pyarrow
pyarrow                           0.10.1.dev687+g18a61f6

[dhkim@ING python]$ python
Python 2.7.15 |Anaconda, Inc.| (default, May  1 2018, 23:32:32)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>>

이 과정 안 겪으셔도 되도록, 아래에 pyarrow의 python2.7용 whl과 python3.6용 whl을 google drive에 올려두겠습니다.


python3.6용 wheel


For some gentlemen who got errors like "ImportError: libarrow.so.10: cannot open shared object file: No such file or directory" from the wheel file I uploaded here...

What we need is just perseverance.


1.  First, you need to install the pyarrow*.whl in my blog, and then...

2.  Make soft links as needed.  My wheel file places awkward names like "libarrow.so." due to a bug of https://github.com/apache/arrow/issues/2281 .

[u0017649@sys-96013 pyarrow]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.so.10

[u0017649@sys-96013 ~]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.so.10

[u0017649@sys-96013 ~]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libplasma.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libplasma.so.10

3. Still you might get some more errors.  These will be addressed by installing OS packages. 

ImportError: libboost_system-mt.so.1.53.0: cannot open shared object file: No such file or directory
ImportError: libboost_filesystem-mt.so.1.53.0: cannot open shared object file: No such file or directory

[u0017649@sys-96013 ~]$ sudo yum install boost-system

[u0017649@sys-96013 ~]$ sudo yum install boost-filesystem

[u0017649@sys-96013 ~]$ sudo yum install boost-regex

4. You might and might not get the following weird error.  This can be addressed by upgrading numpy.  Pls refer to
https://issues.apache.org/jira/browse/ARROW-3141 .

>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py", line 50, in <module>
    import pyarrow.compat as compat
AttributeError: module 'pyarrow' has no attribute 'compat'


[u0017649@sys-96013 ~]$ pip install numpy --upgrade
Collecting numpy
  Downloading https://files.pythonhosted.org/packages/2d/80/1809de155bad674b494248bcfca0e49eb4c5d8bee58f26fe7a0dd45029e2/numpy-1.15.4.zip (4.5MB)
    100% |████████████████████████████████| 4.5MB 271kB/s
Building wheels for collected packages: numpy
  Running setup.py bdist_wheel for numpy ... done
  Stored in directory: /home/u0017649/.cache/pip/wheels/13/6b/70/4b5d7861227307f91716c31698240e08c6ec5486d9ee82a97b
Successfully built numpy
Installing collected packages: numpy
  Found existing installation: numpy 1.13.1
    Uninstalling numpy-1.13.1:
      Successfully uninstalled numpy-1.13.1
Successfully installed numpy-1.15.4


5.  And finally, Voila !

[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>>



댓글 없음:

댓글 쓰기