Tuesday, August 31, 2010

Installing & Running Scribe with HDFS support on CentOS

Have being working on Scribe for our logging for a while. Finally feel like writing something about it. The main trigger came from a request to support HDFS. Since I always like to tackle open source project's incompatibilities on different environments, I made this challenge the highest priority (although my boss probably doesn't think so...)

Installing Scribe on CentOS 5.3 64-bit

1. Java SE Development Kit (JDK) 6 latest update - http://www.oracle.com/technetwork/java/javase/downloads/index.html. I used update 20. The java directory is: /usr/java/jdk1.6.0_20.

2. ruby-1.8.5-5.el5_4.8 + ruby-devel-1.8.5-5.el5_4.8 (using yum)

3. python-2.4.3-24.el5 + python-devel-2.4.3-24.el5.x86_64 (using yum)

4. libevent + libevent-devel - libevent-1.4.13-1/libevent-devel-1.4.13-1 (using yum)

5. gcc-c++-4.1.2-46.el5_4.2

6. boost 1.40 - http://downloads.sourceforge.net/project/boost/boost/1.40.0/boost_1_40_0.tar.gz?use_mirror=softlayer

[user@localhost] ./bootstrap.sh
[user@localhost] ./bjam
[user@localhost] sudo su -
[root@localhost] ./bjam install


7. flex-2.5.4a-41.fc6 (using yum)

8. m4-1.4.15 - ftp.gnu.org/gnu/m4 (do not use the version from yum)

9. imake-1.0.2-3.x86_64 (using yum)

10. autoconf-2.65 - ftp.gnu.org/gnu/autoconf (do not use the version from yum)

11. automake-1.11.1 - ftp.gnu.org/gnu/automake (do not use the version from yum)

12. libtool-2.2.6b - ftp.gnu.org/gnu/libtool (do not use the version from yum)

13. bison-2.3-2.1 (using yum). It actually needs yacc. The following is to make yacc script that actually calls bison.

[root@localhost] more /usr/bin/yacc

#!/bin/sh
exec bison -y "$@"

[root@localhost] chmod +x /usr/bin/yacc


14. Latest version: http://incubator.apache.org/thrift/download
thrift-0.2.0 - http://archive.apache.org/dist/incubator/thrift/0.2.0-incubating
thrift-0.4.0 - http://archive.apache.org/dist/incubator/thrift/0.4.0-incubating

I am using thrift-0.2.0 in this example.

[user@localhost] ./bootstrap.sh
[user@localhost] ./configure

If you see this:
error: ./configure: line 21183: syntax error near unexpected token `MONO,'

Copy pkg.m4 in /usr/share/aclocal to thrift's aclocal directory. From the top-level thrift directory, do the following:

cp /usr/share/aclocal/pkg.m4 aclocal


Then again:
[user@localhost] ./bootstrap.sh
[user@localhost] ./configure
[user@localhost] make
[user@localhost] sudo su -
[root@localhost] make install
[root@localhost] exit 

 
You may see the following error when building thrift 0.4.0 and 0.5.0.

make[4]: Entering directory `/home/user/pkgs/thrift-0.4.0/lib/cpp'
/bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I../.. -I/usr/local/include -I./src -Wall -g -O2 -MT ThreadManager.lo -MD -MP -MF .deps/ThreadManager.Tpo -c -o ThreadManager.lo `test -f 'src/concurrency/ThreadManager.cpp' || echo './'`src/concurrency/ThreadManager.cpp
libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../.. -I/usr/local/include -I./src -Wall -g -O2 -MT ThreadManager.lo -MD -MP -MF .deps/ThreadManager.Tpo -c src/concurrency/ThreadManager.cpp -fPIC -DPIC -o .libs/ThreadManager.o In file included from src/concurrency/ThreadManager.cpp:20:
src/concurrency/ThreadManager.h:24:26: tr1/functional: No such file or directoryIn file included from src/concurrency/ThreadManager.cpp:20:

Please change line 24 of ThreadManager.h from

#include <tr1/functional>

to
#include <boost/tr1/tr1/functional>

We also need to compile and install the Facebook fb303 library. From the top-level thrift directory:
[user@localhost] cd contrib/fb303
[user@localhost] ./bootstrap.sh
[user@localhost] ./configure
[user@localhost] make
[user@localhost] sudo su -
[root@localhost] make install
[root@localhost] exit

15. hadoop 0.21.0 - http://www.apache.org/dyn/closer.cgi/hadoop/core/

[user@localhost] cd hadoop-0.21.0/hdfs/src/c++/libhdfs
[user@localhost] ./configure JVM_ARCH=tune=k8 --with-java=/usr/java/jdk1.6.0_20
[user@localhost] make
[user@localhost] sudo su -
[root@localhost] cp .libs/libhdfs.so .libs/libhdfs.so.0 /usr/local/include
[root@localhost] cp hdfs.h /usr/local/include
[root@localhost] exit


16. scribe-2.2 - http://github.com/downloads/facebook/scribe/scribe-2.2.tar.gz - have to use scribe-2.1 or above to support HDFS.

[user@localhost] ./bootstrap.sh --enable-hdfs
[user@localhost] ./configure
[user@localhost] make


17. Configure and Run Hadoop (single-node cluster in this tutorial).

a. First we need to modify a few configuration files. From top-level hadoop directory, edit conf/hadoop-env.sh, conf/core-site.xml, conf/hdfs-site.xml and conf/macoded-site.xml files.

[user@localhost] more conf/hadoop-env.sh

export HADOOP_OPTS=-Djava.net.codeferIPv4Stack=true

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use. Required.
export JAVA_HOME=/usr/java/jdk1.6.0_20
.
.
.

[user@localhost] more conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

[user@localhost] more conf/hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>

[user@localhost] more conf/macoded-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>macoded.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the Macodeduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>


b. Format the namenode. From top-level hadoop directory
[user@localhost] bin/hadoop namenode -format


c. Start hadoop
[user@localhost] start-all.sh


d. Use jps to check if all the processes are started.
[user@localhost] jps
25362 JobTracker
24939 NameNode
25099 DataNode
25506 TaskTracker
25251 SecondaryNameNode
25553 Jps

e. Use netstat to check if port 9000 (set in core-site.xml) is listening.
[user@localhost] sudo netstat -nap | grep 9000
sudo netstat -nap | grep 9000
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 24939/java
tcp 0 0 127.0.0.1:9000 127.0.0.1:59957 ESTABLISHED 24939/java
tcp 0 0 127.0.0.1:9000 127.0.0.1:59960 ESTABLISHED 24939/java
tcp 0 0 127.0.0.1:59957 127.0.0.1:9000 ESTABLISHED 25099/java
tcp 0 0 127.0.0.1:59960 127.0.0.1:9000 ESTABLISHED 25251/java

f. Open a browser and type server_ip:50070 to check if it shows 1 Live Nodes in the Cluster Summary. Be patient, sometimes it takes some time (30 seconds to 1 minute) to show. Remember to put the cursor to the address bar and codess enter to refresh the page. For some reason, F5 (Reload) doesn't work for me.



18. Configure and Run Scribe
a. Set up the java class paths for Scribe. I have installed my hadoop 0.21.0 in ~/pkgs/hadoop-0.21.0 directory

[user@localhost] export CLASSPATH=~/pkgs/hadoop-0.21.0/hadoop-hdfs-0.21.0.jar:~/pkgs/hadoop-0.21.0/lib/commons-logging-1.1.1.jar:~/pkgs/hadoop-0.21.0/hadoop-common-0.21.0.jar


b. Edit a Scribe configuration file to use HDFS. Change to scribe src directory

[user@localhost] more scribe_hdfs.conf
port=1463
max_msg_per_second=2000000
check_interval=3

# DEFAULT
<store>
category=default
type=buffer

target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10

<primary>
type=file
fs_type=hdfs
file_path=hdfs://localhost:9000/scribetest
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=40000000
rotate_period=daily
rotate_hour=0
rotate_minute=5
add_newlines=1
</primary>
<secondary>
type=file
fs_type=std
file_path=/tmp/scribetest
base_filename=thisisoverwritten
max_size=3000000
</secondary>
</store>


c. Run Scribe
[user@localhost] scribed scribe_hdfs.conf
[Thu Sep 9 15:35:22 2010] "setrlimit error (setting max fd size)"
[Thu Sep 9 15:35:22 2010] "STATUS: STARTING"
[Thu Sep 9 15:35:22 2010] "STATUS: configuring"
[Thu Sep 9 15:35:22 2010] "got configuration data from file <scribe_hdfs.conf>"
[Thu Sep 9 15:35:22 2010] "CATEGORY : default"
[Thu Sep 9 15:35:22 2010] "Creating default store"
[Thu Sep 9 15:35:22 2010] "configured <1> stores"
[Thu Sep 9 15:35:22 2010] "STATUS: "
[Thu Sep 9 15:35:22 2010] "STATUS: ALIVE"
[Thu Sep 9 15:35:22 2010] "Starting scribe server on port 1463"
Thrift: Thu Sep 9 15:35:22 2010 libevent 1.4.13-stable method epoll


If it cannot run and complains the following:
libboost_system.so.1.40.0: cannot open shared object

Do the following:
[user@localhost] echo '/usr/local/lib/' >> /etc/ld.so.conf.d/my_boost.conf
/sbin/ldconfig -v


If you see the following output when running scribe:
[user@localhost] scribed scribe_hdfs.conf
[Thu Sep 9 15:39:38 2010] "setrlimit error (setting max fd size)"
[Thu Sep 9 15:39:38 2010] "STATUS: STARTING"
[Thu Sep 9 15:39:38 2010] "STATUS: configuring"
[Thu Sep 9 15:39:38 2010] "got configuration data from file <scribe_hdfs.conf>"
[Thu Sep 9 15:39:38 2010] "CATEGORY : default"
[Thu Sep 9 15:39:38 2010] "Creating default store"
[Thu Sep 9 15:39:38 2010] "configured <1> stores"
[Thu Sep 9 15:39:38 2010] "STATUS: "
[Thu Sep 9 15:39:38 2010] "STATUS: ALIVE"
[Thu Sep 9 15:39:38 2010] "Starting scribe server on port 1463"
[Thu Sep 9 15:39:38 2010] "Exception in main: TNonblockingServer::serve() bind"
[Thu Sep 9 15:39:38 2010] "scribe server exiting"

It may be due to Port 1463 not being available. Run "netstat -nap | grep 1463" to find out which program is using it.

19. Send something to Scribe to log in HDFS
From a different terminal, in top-level Scribe directory:
[user@localhost] echo "hello world" | examples/scribe_cat test


In the terminal that runs Scribe server, you should see the following output:

[Thu Sep 9 15:46:14 2010] "[test] Creating new category from model default"
[Thu Sep 9 15:46:14 2010] "store thread starting"
[Thu Sep 9 15:46:14 2010] "[hdfs] Connecting to HDFS"
[Thu Sep 9 15:46:14 2010] "[hdfs] Before hdfsConnectNewInstance(localhost, 9000)"
Sep 9, 2010 3:46:14 PM org.apache.hadoop.security.Groups
INFO: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
[Thu Sep 9 15:46:15 2010] "[hdfs] After hdfsConnectNewInstance"
[Thu Sep 9 15:46:15 2010] "[hdfs] Connecting to HDFS"
[Thu Sep 9 15:46:15 2010] "[hdfs] Before hdfsConnectNewInstance(localhost, 9000)"
[Thu Sep 9 15:46:15 2010] "[hdfs] After hdfsConnectNewInstance"
[Thu Sep 9 15:46:15 2010] "[hdfs] opened for write hdfs://localhost:9000/scribetest/test/localhost.localdomain/test-2010-09-09_00000"
[Thu Sep 9 15:46:15 2010] "[test] Opened file for writing"
[Thu Sep 9 15:46:15 2010] "[test] Opened file
for writing"
[Thu Sep 9 15:46:15 2010] "[test] Changing state from to "
Opening Primary
[Thu Sep 9 15:46:15 2010] "[test] successfully read <0> entries from file
"
[Thu Sep 9 15:46:15 2010] "[test] No more buffer files to send, switching to streaming mode"
[Thu Sep 9 15:46:15 2010] "[test] Changing state from to "


20. Check if the message has been logged to HDFS:
Stop Scribe first (either stop Scribe or make the file rotate, otherwise Hadoop won't write it to the filesystem.)


[user@localhost] hadoop fs -lsr /
10/09/09 16:26:02 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
10/09/09 16:26:03 WARN conf.Configuration: macoded.task.id is decodecated. Instead, use macodeduce.task.attempt.id
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /jobtracker
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /jobtracker/jobsInfo
drwxr-xr-x - user supergroup 0 2010-09-09 16:23 /scribetest
drwxr-xr-x - user supergroup 0 2010-09-09 16:23 /scribetest/test
drwxr-xr-x - user supergroup 0 2010-09-09 16:23 /scribetest/test/localhost.localdomain
-rw-r--r-- 3 user supergroup 13 2010-09-09 16:25 /scribetest/test/localhost.localdomain/test-2010-09-09_00000
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /tmp
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /tmp/hadoop
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /tmp/hadoop/macoded
drwx------ - user supergroup 0 2010-09-09 16:21 /tmp/hadoop/macoded/system
-rw------- 1 user supergroup 4 2010-09-09 16:21 /tmp/hadoop/macoded/system/jobtracker.info


Get the directory out to take a look:
[user@localhost] hadoop fs -get /scribetest test
10/09/09 16:26:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
10/09/09 16:26:47 WARN conf.Configuration: macoded.task.id is decodecated. Instead, use macodeduce.task.attempt.id

[user@localhost] more test/test/localhost.localdomain/test-2010-09-09_00000
hello world



One note about the Secondary store in Scribe configuration file: Scribe opens the files for both Primary and Secondary stores even in the normal situation as long as replay_buffer is true (default). Then it will try to delete the secondary store file when Primary store is handling the messages. This is causing problem because HDFS has not completed its access to the secondary store file. The following exception will happen:

[Thu Sep 9 16:02:03 2010] "[hdfs] deleteFile hdfs://localhost:9000/scribetest1/test/localhost.localdomain/test_00000"
[Thu Sep 9 16:02:03 2010] "[hdfs] Connecting to HDFS"
[Thu Sep 9 16:02:03 2010] "[hdfs] Before hdfsConnectNewInstance(localhost, 9000)"
[Thu Sep 9 16:02:03 2010] "[hdfs] After hdfsConnectNewInstance"
[Thu Sep 9 16:02:03 2010] "[test] No more buffer files to send, switching to streaming mode"
Exception in thread "main" java.io.IOException: Could not complete write to file /scribetest1/test/localhost.localdomain/test_00000 by DFSClient_1545136365
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:720)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)

at org.apache.hadoop.ipc.Client.call(Client.java:905)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1406)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1393)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:66)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:91)
Call to org/apache/hadoop/fs/FSDataOutputStream::close failed!
[Thu Sep 9 16:02:03 2010] "[hdfs] closed hdfs://localhost:9000/scribetest1/test/localhost.localdomain/test_00000"


There is more information about this error in "NameNode Logs" link on http://server_ip:50070/dfshealth.jsp page.

To avoid this problem we should either set replay_buffer to false or make the seconardy store local instead of HDFS (the above configuration file example, scribe_hdfs.conf).

The following configuration is to set replay_buffer to false and both primary and secondary stores to use HDFS.

[user@localhost] more hdfs_both.conf
port=1463
max_msg_per_second=2000000
check_interval=3

# DEFAULT
<store>
category=default
type=buffer
replay_buffer=no
target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10

<primary>
type=file
fs_type=hdfs
file_path=hdfs://localhost:9000/scribetest
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=40000000
rotate_period=daily
rotate_hour=0
rotate_minute=5
add_newlines=1

</primary>
<secondary>
type=file
fs_type=hdfs
file_path=hdfs://localhost:9000/scribetest1
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=40000000
rotate_period=daily
rotate_hour=0
rotate_minute=5
add_newlines=1
</secondary>

</store>





References:
1. Thomas Dudziak's blog: How to install Scribe with HDFS support on Ubuntu Karmic
2. Agile Testing: Compiling, installing and test-running Scribe
3. Google Scribe server group: Failover when writing to HDFS problems

16 comments:

Apple said...

your blog helps us much ! thank you for post it !

ChiMaster said...

Thank you! I'm glad you like it. It feels good to be able to help.

baysao said...

Thank you very much !

Trang Anh said...

Dear all,

i built scribe using that tutorial but i had problems:

conn_pool.o: In function `__static_initialization_and_destruction_0(int, int)':
conn_pool.cpp:(.text+0x71): undefined reference to `boost::system::get_system_category()'
conn_pool.cpp:(.text+0x7b): undefined reference to `boost::system::get_generic_category()'
conn_pool.cpp:(.text+0x85): undefined reference to `boost::system::get_generic_category()'
conn_pool.cpp:(.text+0x8f): undefined reference to `boost::system::get_generic_category()'
conn_pool.cpp:(.text+0x99): undefined reference to `boost::system::get_system_category()'
scribe_server.o: In function `__static_initialization_and_destruction_0(int, int)':
scribe_server.cpp:(.text+0xe1): undefined reference to `boost::system::get_system_category()'
scribe_server.cpp:(.text+0xeb): undefined reference to `boost::system::get_generic_category()'
scribe_server.cpp:(.text+0xf5): undefined reference to `boost::system::get_generic_category()'
scribe_server.cpp:(.text+0xff): undefined reference to `boost::system::get_generic_category()'
scribe_server.cpp:(.text+0x109): undefined reference to `boost::system::get_system_category()'
Anybody tell me why ?

thanks in advance

ChiMaster said...

Have you done Step 6?

Please make sure you have the libboost_* libraries in your /usr/lib directory.

Trang Anh said...

thanks so much for replying.

i using default ,i realized that the libboost* located in /usr/local/lib/ so that it can not reference to libboost* lib when built scribe, then i edited 2 properties in 3 Makefile files

From
BOOST_CPPFLAGS = -I/usr/include
BOOST_LDFLAGS = -L/usr/lib

To:
BOOST_CPPFLAGS = -I/usr/local/include
BOOST_LDFLAGS = -L/usr/local/lib

finally, i have built scribe.

Trang anh

Trang Anh said...

Dear all,

Anybody show me how to read data stored server using java?

thanks && regards

Trang Anh.

Trang Anh said...

dear all,

i performed this tutotrial and i had this error when building thrift. i used thrift 0.5, centos 5.5, 64 bit
....................
checking for egrep... /bin/grep -E
checking for a sed that does not truncate output... /bin/sed
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -E
checking for icc... no
checking for suncc... no
checking whether cc understands -c and -o together... yes
checking for system library directory... lib
checking if compiler supports -R... no
checking if compiler supports -Wl,-rpath,... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
configure: error: Cannot find php-config. Please use --with-php-config=PATH
configure: error: ./configure failed for lib/php/src/ext/thrift_protocol

Anybody can show me how to fix it

thanks in advance.

ChiMaster said...

Please install php-devel.

$ sudo yum install php-devel.x86_64

Trang Anh said...

Dear all,

Enviroment:Centos 5.5
i install scribe log server follow above tutorial sucessfully but when i send message to scribe log then server throws exception:

Error occurred during initialization of VM
Unable to load native library: /usr/local/libjava.so: cannot open shared object file: No such file or directory

Any idea for me to resolve the problem .

Thanks in advance

AJ said...

I have Scribe server running which can write events to a port on which Flume can listen and in turn writes to HDFS. Now, I am trying to configure Scribe to support HDFS. For this, I reconfigured Scribe with this command and options,

./configure --with-hadooppath=/opt/hadoop-2.0.2/ CPPFLAGS="-DHAVE_INTTYPES_H -DHAVE_NETINET_IN_H -Lopt/hadoop-2.0.2/lib/native/ -I$JAVA_HOME/java/jdk1.6.0_31" -with-boost-filesystem=boost_filesystem --enable-hdfs

After this, while trying to execute "make && make install", the compilation terminated with following error,

/opt/hadoop-2.0.2//include/hdfs.h: In member function 'virtual void HdfsFile::deleteFile()':
/opt/hadoop-2.0.2//include/hdfs.h:437: error: too few arguments to function 'int hdfsDelete(hdfs_internal*, const char*, int)'
HdfsFile.cpp:134: error: at this point in file
HdfsFile.cpp: In member function 'hdfs_internal* HdfsFile::connectToPath(const char*)':
HdfsFile.cpp:203: warning: deprecated conversion from string constant to 'char*'

So, I edited HdfsFile.cpp and added an argument to the hdfsDelete function as

hdfsDelete(fileSys, filename.c_str(),1); ( Previously it was hdfsDelete(fileSys, filename.c_str());)

After this, I again reconfigured scribe and executed make && make install, then it gave following error,

HdfsFile.o: In function `HdfsFile::deleteFile()':
HdfsFile.cpp:(.text+0xe6): undefined reference to `hdfsDelete'
HdfsFile.o: In function `HdfsFile::fileSize()':
HdfsFile.cpp:(.text+0x152): undefined reference to `hdfsGetPathInfo'
HdfsFile.cpp:(.text+0x168): undefined reference to `hdfsFreeFileInfo'
HdfsFile.o: In function `HdfsFile::write(std::basic_string, std::allocator > const&)':
HdfsFile.cpp:(.text+0x1d8): undefined reference to `hdfsWrite'
HdfsFile.o: In function `HdfsFile::close()':
HdfsFile.cpp:(.text+0x21e): undefined reference to `hdfsCloseFile'
HdfsFile.o: In function `HdfsFile::openWrite()':
HdfsFile.cpp:(.text+0x2f5): undefined reference to `hdfsExists'
HdfsFile.cpp:(.text+0x318): undefined reference to `hdfsOpenFile'
HdfsFile.o: In function `HdfsFile::openRead()':
HdfsFile.cpp:(.text+0x3fe): undefined reference to `hdfsOpenFile'
HdfsFile.o: In function `HdfsFile::~HdfsFile()':
HdfsFile.cpp:(.text+0x489): undefined reference to `hdfsDisconnect'
HdfsFile.o: In function `HdfsFile::connectToPath(char const*)':
HdfsFile.cpp:(.text+0x59d): undefined reference to `hdfsConnectNewInstance'
HdfsFile.o: In function `HdfsFile::~HdfsFile()':
HdfsFile.cpp:(.text+0x8d9): undefined reference to `hdfsDisconnect'
HdfsFile.o: In function `HdfsFile::listImpl(std::basic_string, std::allocator > const&, std::vector, std::allocator >, std::allocator, std::allocator > > >&)':
HdfsFile.cpp:(.text+0x944): undefined reference to `hdfsExists'
HdfsFile.cpp:(.text+0x976): undefined reference to `hdfsListDirectory'
HdfsFile.cpp:(.text+0xad9): undefined reference to `hdfsFreeFileInfo'
HdfsFile.o: In function `HdfsFile::flush()':
HdfsFile.cpp:(.text+0x18e): undefined reference to `hdfsFlush'
collect2: ld returned 1 exit status

I am not getting what exactly I am missing.

Following are the installed softwares and environment variables,

Centos x86_64 6.3,
hadoop-2.0.2,
scribe-2.1,
thrift-0.9.0,
boost-1.41

libhdfs is located at "/opt/hadoop-2.0.2/lib/native/"

HADOOP_HOME is set to /opt/hadoop

and following are the contents of .bashrc

export LD_LIBRARY_PATH=/disk1/root/build/thrift-0.9.0:/disk1/root/build/thrift-0.9.0/contrib/fb303:/usr/local/lib/
export PATH=$LD_LIBRARY_PATH:$SCRIBE_HOME:$PATH

If it is the problem with version of Hadoop or libhdfs, please tell which version will be needed for the given system configuration and also how can we separately configure libhdfs before trying to configure Scribe with HDFS.

Please help in this regard.

Arjun kumar said...

You have furnished a worthable content in here. I find this article useful to read and share. I have already bookmarked your content for future updates. Thanks for sharing.


Android training chennai | Selenium training chennai | software testing training in chennai

Unknown said...

The future of software testing is on positive note. It offers huge career prospects for talented professionals to be skilled software testers.
Regards,
Software testing courses in chennai|testing training chennai|Software training institutes in chennai

Unknown said...

Thanks for sharing such an useful post here.
Informatica Training institute in chennai | Informatica Training in Chennai | Informatica courses in Chennai

Aurthur said...

Much obliged to you for requiring significant investment to give us a portion of the valuable and restrictive data with us.
Informatica Training in Chennai | Informatica course in Chennai

Unknown said...

Excellent posting. This one of the best resources, I had ever found like this.
DOTNET Training in Chennai | dotnet courses in Chennai | .net training Chennai