$ pip install pyarrow
Collecting pyarrow
Using cached https://files.pythonhosted.org/packages/be/2d/11751c477e4e7f4bb07ac7584aafabe0d0608c170e4bff67246d695ebdbe/pyarrow-0.9.0.tar.gz
...
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
/tmp/pip-install-kil31a/pyarrow/build/temp.linux-ppc64le-2.7/lib.cxx:592:35: fatal error: arrow/python/platform.h: No such file or directory
#include "arrow/python/platform.h"
^
compilation terminated.
make[2]: *** [CMakeFiles/lib.dir/lib.cxx.o] Error 1
make[1]: *** [CMakeFiles/lib.dir/all] Error 2
make: *** [all] Error 2
error: command 'make' failed with exit status 2
이 문제에 대해서 최근에 arrow community 도움을 받아 해결을 했습니다.
https://github.com/apache/arrow/issues/2281
좀더 간단하게는 다음과 같이 정리할 수 있습니다.
먼저, Redhat에서는 다음과 같이 사전 필요 fileset들을 설치합니다.
[dhkim@ING ~]$ sudo yum install jemalloc jemalloc-devel boost boost-devel flex flex-devel bison bison-devel
[dhkim@ING ~]$ mkdir imsi
[dhkim@ING ~]$ cd imsi
이 error를 해결하는 핵심은 먼저 arrow와 parquet-cpp를 source에서 build 하는 것입니다.
[dhkim@ING imsi]$ git clone https://github.com/apache/arrow.git
[dhkim@ING imsi]$ git clone https://github.com/apache/parquet-cpp.git
[dhkim@ING imsi]$ which python
~/anaconda2/bin/python
여기서는 anaconda2를 사용하는데, anaconda3도 동일하게 build할 수 있습니다. 먼저 conda 명령어로 다음과 같은 package들을 설치합니다.
[dhkim@ING imsi]$ conda install numpy six setuptools cython pandas pytest cmake flatbuffers rapidjson boost-cpp thrift snappy zlib gflags brotli lz4-c zstd -c conda-forge
여기서는 user home directory 밑에 dist라는 directory에 arrow와 parquet-cpp를 설치하겠습니다.
[dhkim@ING imsi]$ mkdir dist
[dhkim@ING imsi]$ export ARROW_BUILD_TYPE=release
[dhkim@ING imsi]$ export ARROW_HOME=$(pwd)/dist
[dhkim@ING imsi]$ export PARQUET_HOME=$(pwd)/dist
[dhkim@ING imsi]$ mkdir arrow/cpp/build && cd arrow/cpp/build
[dhkim@ING build]$ cmake3 -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DARROW_PYTHON=on -DARROW_PLASMA=on -DARROW_BUILD_TESTS=OFF -DARROW_PARQUET=ON ..
[dhkim@ING build]$ make -j 8
[dhkim@ING build]$ make install
...
-- Installing: /home/dhkim/imsi/dist/include/arrow/python/platform.h
-- Installing: /home/dhkim/imsi/dist/include/arrow/python/pyarrow.h
-- Installing: /home/dhkim/imsi/dist/include/arrow/python/type_traits.h
-- Installing: /home/dhkim/imsi/dist/lib64/pkgconfig/arrow-python.pc
[dhkim@ING build]$ cd ~/imsi/arrow/python
[dhkim@ING python]$ MAKEFLAGS=-j8 ARROW_HOME=/home/dhkim/imsi/dist PARQUET_HOME=/home/dhkim/imsi/dist python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --inplace
[dhkim@ING python]$ export LD_LIBRARY_PATH=/home/dhkim/imsi/dist/lib64:$LD_LIBRARY_PATH
이제 pyarrow를 build할 준비가 끝났습니다. 다만, arrow 쪽의 사소한 bug로 인해, 다음과 같이 끝에 점(.)이 달린 *.so. 라는 soft link들을 만들어주어야 합니다.
[dhkim@ING python]$ ln -s /home/dhkim/imsi/dist/lib64/libarrow_python.so.11.0.0 /home/dhkim/imsi/dist/lib64/libarrow_python.so.
[dhkim@ING python]$ ln -s /home/dhkim/imsi/dist/lib64/libarrow.so.11.0.0 /home/dhkim/imsi/dist/lib64/libarrow.so.
[dhkim@ING python]$ ln -s /home/dhkim/imsi/dist/lib64/libparquet.so.1.4.1 /home/dhkim/imsi/dist/lib64/libparquet.so.
이제 wheel file을 build 합니다.
[dhkim@ING python]$ python setup.py build_ext --build-type=release --with-parquet --bundle-arrow-cpp bdist_wheel
다음과 같이 dist directory 밑에 만들어집니다.
[dhkim@ING python]$ ls -l dist/pyarrow-0.10.1.dev687+g18a61f6-cp36-cp36m-linux_ppc64le.whl
-rw-rw-r-- 1 dhkim dhkim 7195829 Jul 26 16:03 dist/pyarrow-0.10.1.dev687+g18a61f6-cp36-cp36m-linux_ppc64le.whl
이걸 pip로 설치하고, import까지 잘 되는 것을 확인하실 수 있습니다.
[dhkim@ING python]$ pip install dist/pyarrow-0.10.1.dev687+g18a61f6-cp36-cp36m-linux_ppc64le.whl
[dhkim@ING python]$ pip list | grep pyarrow
pyarrow 0.10.1.dev687+g18a61f6
[dhkim@ING python]$ python
Python 2.7.15 |Anaconda, Inc.| (default, May 1 2018, 23:32:32)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>>
이 과정 안 겪으셔도 되도록, 아래에 pyarrow의 python2.7용 whl과 python3.6용 whl을 google drive에 올려두겠습니다.
python3.6용 wheel
For some gentlemen who got errors like "ImportError: libarrow.so.10: cannot open shared object file: No such file or directory" from the wheel file I uploaded here...
What we need is just perseverance.
1. First, you need to install the pyarrow*.whl in my blog, and then...
2. Make soft links as needed. My wheel file places awkward names like "libarrow.so." due to a bug of https://github.com/apache/arrow/issues/2281 .
[u0017649@sys-96013 pyarrow]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.so.10
[u0017649@sys-96013 ~]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.so.10
[u0017649@sys-96013 ~]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libplasma.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libplasma.so.10
3. Still you might get some more errors. These will be addressed by installing OS packages.
ImportError: libboost_system-mt.so.1.53.0: cannot open shared object file: No such file or directory
ImportError: libboost_filesystem-mt.so.1.53.0: cannot open shared object file: No such file or directory
[u0017649@sys-96013 ~]$ sudo yum install boost-system
[u0017649@sys-96013 ~]$ sudo yum install boost-filesystem
[u0017649@sys-96013 ~]$ sudo yum install boost-regex
4. You might and might not get the following weird error. This can be addressed by upgrading numpy. Pls refer to
https://issues.apache.org/jira/browse/ARROW-3141 .
>>> import pyarrow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py", line 50, in <module>
import pyarrow.compat as compat
AttributeError: module 'pyarrow' has no attribute 'compat'
[u0017649@sys-96013 ~]$ pip install numpy --upgrade
Collecting numpy
Downloading https://files.pythonhosted.org/packages/2d/80/1809de155bad674b494248bcfca0e49eb4c5d8bee58f26fe7a0dd45029e2/numpy-1.15.4.zip (4.5MB)
100% |████████████████████████████████| 4.5MB 271kB/s
Building wheels for collected packages: numpy
Running setup.py bdist_wheel for numpy ... done
Stored in directory: /home/u0017649/.cache/pip/wheels/13/6b/70/4b5d7861227307f91716c31698240e08c6ec5486d9ee82a97b
Successfully built numpy
Installing collected packages: numpy
Found existing installation: numpy 1.13.1
Uninstalling numpy-1.13.1:
Successfully uninstalled numpy-1.13.1
Successfully installed numpy-1.15.4
5. And finally, Voila !
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>>
For some gentlemen who got errors like "ImportError: libarrow.so.10: cannot open shared object file: No such file or directory" from the wheel file I uploaded here...
What we need is just perseverance.
1. First, you need to install the pyarrow*.whl in my blog, and then...
2. Make soft links as needed. My wheel file places awkward names like "libarrow.so." due to a bug of https://github.com/apache/arrow/issues/2281 .
[u0017649@sys-96013 pyarrow]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.so.10
[u0017649@sys-96013 ~]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.so.10
[u0017649@sys-96013 ~]$ ln -s /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libplasma.so. /home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/libplasma.so.10
3. Still you might get some more errors. These will be addressed by installing OS packages.
ImportError: libboost_system-mt.so.1.53.0: cannot open shared object file: No such file or directory
ImportError: libboost_filesystem-mt.so.1.53.0: cannot open shared object file: No such file or directory
[u0017649@sys-96013 ~]$ sudo yum install boost-system
[u0017649@sys-96013 ~]$ sudo yum install boost-filesystem
[u0017649@sys-96013 ~]$ sudo yum install boost-regex
4. You might and might not get the following weird error. This can be addressed by upgrading numpy. Pls refer to
https://issues.apache.org/jira/browse/ARROW-3141 .
>>> import pyarrow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/u0017649/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py", line 50, in <module>
import pyarrow.compat as compat
AttributeError: module 'pyarrow' has no attribute 'compat'
[u0017649@sys-96013 ~]$ pip install numpy --upgrade
Collecting numpy
Downloading https://files.pythonhosted.org/packages/2d/80/1809de155bad674b494248bcfca0e49eb4c5d8bee58f26fe7a0dd45029e2/numpy-1.15.4.zip (4.5MB)
100% |████████████████████████████████| 4.5MB 271kB/s
Building wheels for collected packages: numpy
Running setup.py bdist_wheel for numpy ... done
Stored in directory: /home/u0017649/.cache/pip/wheels/13/6b/70/4b5d7861227307f91716c31698240e08c6ec5486d9ee82a97b
Successfully built numpy
Installing collected packages: numpy
Found existing installation: numpy 1.13.1
Uninstalling numpy-1.13.1:
Successfully uninstalled numpy-1.13.1
Successfully installed numpy-1.15.4
5. And finally, Voila !
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>>
댓글 없음:
댓글 쓰기