Installing Scribe on CentOS 5.3 64-bit
1. Java SE Development Kit (JDK) 6 latest update - http://www.oracle.com/technetwork/java/javase/downloads/index.html. I used update 20. The java directory is: /usr/java/jdk1.6.0_20.
2. ruby-1.8.5-5.el5_4.8 + ruby-devel-1.8.5-5.el5_4.8 (using yum)
3. python-2.4.3-24.el5 + python-devel-2.4.3-24.el5.x86_64 (using yum)
4. libevent + libevent-devel - libevent-1.4.13-1/libevent-devel-1.4.13-1 (using yum)
5. gcc-c++-4.1.2-46.el5_4.2
6. boost 1.40 - http://downloads.sourceforge.net/project/boost/boost/1.40.0/boost_1_40_0.tar.gz?use_mirror=softlayer
[user@localhost] ./bootstrap.sh
[user@localhost] ./bjam
[user@localhost] sudo su -
[root@localhost] ./bjam install
7. flex-2.5.4a-41.fc6 (using yum)
8. m4-1.4.15 - ftp.gnu.org/gnu/m4 (do not use the version from yum)
9. imake-1.0.2-3.x86_64 (using yum)
10. autoconf-2.65 - ftp.gnu.org/gnu/autoconf (do not use the version from yum)
11. automake-1.11.1 - ftp.gnu.org/gnu/automake (do not use the version from yum)
12. libtool-2.2.6b - ftp.gnu.org/gnu/libtool (do not use the version from yum)
13. bison-2.3-2.1 (using yum). It actually needs yacc. The following is to make yacc script that actually calls bison.
[root@localhost] more /usr/bin/yacc
#!/bin/sh
exec bison -y "$@"
[root@localhost] chmod +x /usr/bin/yacc
14. Latest version: http://incubator.apache.org/thrift/download
thrift-0.2.0 - http://archive.apache.org/dist/incubator/thrift/0.2.0-incubating
thrift-0.4.0 - http://archive.apache.org/dist/incubator/thrift/0.4.0-incubating
I am using thrift-0.2.0 in this example.
[user@localhost] ./bootstrap.sh
[user@localhost] ./configure
If you see this:
error: ./configure: line 21183: syntax error near unexpected token `MONO,'
Copy pkg.m4 in /usr/share/aclocal to thrift's aclocal directory. From the top-level thrift directory, do the following:
cp /usr/share/aclocal/pkg.m4 aclocal
Then again:
[user@localhost] ./bootstrap.sh
[user@localhost] ./configure
[user@localhost] make
[user@localhost] sudo su -
[root@localhost] make install
[root@localhost] exit
You may see the following error when building thrift 0.4.0 and 0.5.0.
make[4]: Entering directory `/home/user/pkgs/thrift-0.4.0/lib/cpp'
/bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I../.. -I/usr/local/include -I./src -Wall -g -O2 -MT ThreadManager.lo -MD -MP -MF .deps/ThreadManager.Tpo -c -o ThreadManager.lo `test -f 'src/concurrency/ThreadManager.cpp' || echo './'`src/concurrency/ThreadManager.cpp
libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../.. -I/usr/local/include -I./src -Wall -g -O2 -MT ThreadManager.lo -MD -MP -MF .deps/ThreadManager.Tpo -c src/concurrency/ThreadManager.cpp -fPIC -DPIC -o .libs/ThreadManager.o In file included from src/concurrency/ThreadManager.cpp:20:
src/concurrency/ThreadManager.h:24:26: tr1/functional: No such file or directoryIn file included from src/concurrency/ThreadManager.cpp:20:
Please change line 24 of ThreadManager.h from
#include <tr1/functional>
to
#include <boost/tr1/tr1/functional>
We also need to compile and install the Facebook fb303 library. From the top-level thrift directory:
[user@localhost] cd contrib/fb303
[user@localhost] ./bootstrap.sh
[user@localhost] ./configure
[user@localhost] make
[user@localhost] sudo su -
[root@localhost] make install
[root@localhost] exit
15. hadoop 0.21.0 - http://www.apache.org/dyn/closer.cgi/hadoop/core/
[user@localhost] cd hadoop-0.21.0/hdfs/src/c++/libhdfs
[user@localhost] ./configure JVM_ARCH=tune=k8 --with-java=/usr/java/jdk1.6.0_20
[user@localhost] make
[user@localhost] sudo su -
[root@localhost] cp .libs/libhdfs.so .libs/libhdfs.so.0 /usr/local/include
[root@localhost] cp hdfs.h /usr/local/include
[root@localhost] exit
16. scribe-2.2 - http://github.com/downloads/facebook/scribe/scribe-2.2.tar.gz - have to use scribe-2.1 or above to support HDFS.
[user@localhost] ./bootstrap.sh --enable-hdfs
[user@localhost] ./configure
[user@localhost] make
17. Configure and Run Hadoop (single-node cluster in this tutorial).
a. First we need to modify a few configuration files. From top-level hadoop directory, edit conf/hadoop-env.sh, conf/core-site.xml, conf/hdfs-site.xml and conf/macoded-site.xml files.
[user@localhost] more conf/hadoop-env.sh
export HADOOP_OPTS=-Djava.net.codeferIPv4Stack=true
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/usr/java/jdk1.6.0_20
.
.
.
[user@localhost] more conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
[user@localhost] more conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
[user@localhost] more conf/macoded-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>macoded.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the Macodeduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
b. Format the namenode. From top-level hadoop directory
[user@localhost] bin/hadoop namenode -format
c. Start hadoop
[user@localhost] start-all.sh
d. Use jps to check if all the processes are started.
[user@localhost] jps
25362 JobTracker
24939 NameNode
25099 DataNode
25506 TaskTracker
25251 SecondaryNameNode
25553 Jps
e. Use netstat to check if port 9000 (set in core-site.xml) is listening.
[user@localhost] sudo netstat -nap | grep 9000
sudo netstat -nap | grep 9000
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 24939/java
tcp 0 0 127.0.0.1:9000 127.0.0.1:59957 ESTABLISHED 24939/java
tcp 0 0 127.0.0.1:9000 127.0.0.1:59960 ESTABLISHED 24939/java
tcp 0 0 127.0.0.1:59957 127.0.0.1:9000 ESTABLISHED 25099/java
tcp 0 0 127.0.0.1:59960 127.0.0.1:9000 ESTABLISHED 25251/java
f. Open a browser and type server_ip:50070 to check if it shows 1 Live Nodes in the Cluster Summary. Be patient, sometimes it takes some time (30 seconds to 1 minute) to show. Remember to put the cursor to the address bar and codess enter to refresh the page. For some reason, F5 (Reload) doesn't work for me.
18. Configure and Run Scribe
a. Set up the java class paths for Scribe. I have installed my hadoop 0.21.0 in ~/pkgs/hadoop-0.21.0 directory
[user@localhost] export CLASSPATH=~/pkgs/hadoop-0.21.0/hadoop-hdfs-0.21.0.jar:~/pkgs/hadoop-0.21.0/lib/commons-logging-1.1.1.jar:~/pkgs/hadoop-0.21.0/hadoop-common-0.21.0.jar
b. Edit a Scribe configuration file to use HDFS. Change to scribe src directory
[user@localhost] more scribe_hdfs.conf
port=1463
max_msg_per_second=2000000
check_interval=3
# DEFAULT
<store>
category=default
type=buffer
target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10
<primary>
type=file
fs_type=hdfs
file_path=hdfs://localhost:9000/scribetest
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=40000000
rotate_period=daily
rotate_hour=0
rotate_minute=5
add_newlines=1
</primary>
<secondary>
type=file
fs_type=std
file_path=/tmp/scribetest
base_filename=thisisoverwritten
max_size=3000000
</secondary>
</store>
c. Run Scribe
[user@localhost] scribed scribe_hdfs.conf
[Thu Sep 9 15:35:22 2010] "setrlimit error (setting max fd size)"
[Thu Sep 9 15:35:22 2010] "STATUS: STARTING"
[Thu Sep 9 15:35:22 2010] "STATUS: configuring"
[Thu Sep 9 15:35:22 2010] "got configuration data from file <scribe_hdfs.conf>"
[Thu Sep 9 15:35:22 2010] "CATEGORY : default"
[Thu Sep 9 15:35:22 2010] "Creating default store"
[Thu Sep 9 15:35:22 2010] "configured <1> stores"
[Thu Sep 9 15:35:22 2010] "STATUS: "
[Thu Sep 9 15:35:22 2010] "STATUS: ALIVE"
[Thu Sep 9 15:35:22 2010] "Starting scribe server on port 1463"
Thrift: Thu Sep 9 15:35:22 2010 libevent 1.4.13-stable method epoll
If it cannot run and complains the following:
libboost_system.so.1.40.0: cannot open shared object
Do the following:
[user@localhost] echo '/usr/local/lib/' >> /etc/ld.so.conf.d/my_boost.conf
/sbin/ldconfig -v
If you see the following output when running scribe:
[user@localhost] scribed scribe_hdfs.conf
[Thu Sep 9 15:39:38 2010] "setrlimit error (setting max fd size)"
[Thu Sep 9 15:39:38 2010] "STATUS: STARTING"
[Thu Sep 9 15:39:38 2010] "STATUS: configuring"
[Thu Sep 9 15:39:38 2010] "got configuration data from file <scribe_hdfs.conf>"
[Thu Sep 9 15:39:38 2010] "CATEGORY : default"
[Thu Sep 9 15:39:38 2010] "Creating default store"
[Thu Sep 9 15:39:38 2010] "configured <1> stores"
[Thu Sep 9 15:39:38 2010] "STATUS: "
[Thu Sep 9 15:39:38 2010] "STATUS: ALIVE"
[Thu Sep 9 15:39:38 2010] "Starting scribe server on port 1463"
[Thu Sep 9 15:39:38 2010] "Exception in main: TNonblockingServer::serve() bind"
[Thu Sep 9 15:39:38 2010] "scribe server exiting"
It may be due to Port 1463 not being available. Run "netstat -nap | grep 1463" to find out which program is using it.
19. Send something to Scribe to log in HDFS
From a different terminal, in top-level Scribe directory:
[user@localhost] echo "hello world" | examples/scribe_cat test
In the terminal that runs Scribe server, you should see the following output:
[Thu Sep 9 15:46:14 2010] "[test] Creating new category from model default"
[Thu Sep 9 15:46:14 2010] "store thread starting"
[Thu Sep 9 15:46:14 2010] "[hdfs] Connecting to HDFS"
[Thu Sep 9 15:46:14 2010] "[hdfs] Before hdfsConnectNewInstance(localhost, 9000)"
Sep 9, 2010 3:46:14 PM org.apache.hadoop.security.Groups
INFO: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
[Thu Sep 9 15:46:15 2010] "[hdfs] After hdfsConnectNewInstance"
[Thu Sep 9 15:46:15 2010] "[hdfs] Connecting to HDFS"
[Thu Sep 9 15:46:15 2010] "[hdfs] Before hdfsConnectNewInstance(localhost, 9000)"
[Thu Sep 9 15:46:15 2010] "[hdfs] After hdfsConnectNewInstance"
[Thu Sep 9 15:46:15 2010] "[hdfs] opened for write hdfs://localhost:9000/scribetest/test/localhost.localdomain/test-2010-09-09_00000"
[Thu Sep 9 15:46:15 2010] "[test] Opened file for writing"
[Thu Sep 9 15:46:15 2010] "[test] Opened file for writing"
[Thu Sep 9 15:46:15 2010] "[test] Changing state from to "
Opening Primary
[Thu Sep 9 15:46:15 2010] "[test] successfully read <0> entries from file "
[Thu Sep 9 15:46:15 2010] "[test] No more buffer files to send, switching to streaming mode"
[Thu Sep 9 15:46:15 2010] "[test] Changing state from to "
20. Check if the message has been logged to HDFS:
Stop Scribe first (either stop Scribe or make the file rotate, otherwise Hadoop won't write it to the filesystem.)
[user@localhost] hadoop fs -lsr /
10/09/09 16:26:02 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
10/09/09 16:26:03 WARN conf.Configuration: macoded.task.id is decodecated. Instead, use macodeduce.task.attempt.id
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /jobtracker
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /jobtracker/jobsInfo
drwxr-xr-x - user supergroup 0 2010-09-09 16:23 /scribetest
drwxr-xr-x - user supergroup 0 2010-09-09 16:23 /scribetest/test
drwxr-xr-x - user supergroup 0 2010-09-09 16:23 /scribetest/test/localhost.localdomain
-rw-r--r-- 3 user supergroup 13 2010-09-09 16:25 /scribetest/test/localhost.localdomain/test-2010-09-09_00000
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /tmp
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /tmp/hadoop
drwxr-xr-x - user supergroup 0 2010-09-09 16:21 /tmp/hadoop/macoded
drwx------ - user supergroup 0 2010-09-09 16:21 /tmp/hadoop/macoded/system
-rw------- 1 user supergroup 4 2010-09-09 16:21 /tmp/hadoop/macoded/system/jobtracker.info
Get the directory out to take a look:
[user@localhost] hadoop fs -get /scribetest test
10/09/09 16:26:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
10/09/09 16:26:47 WARN conf.Configuration: macoded.task.id is decodecated. Instead, use macodeduce.task.attempt.id
[user@localhost] more test/test/localhost.localdomain/test-2010-09-09_00000
hello world
One note about the Secondary store in Scribe configuration file: Scribe opens the files for both Primary and Secondary stores even in the normal situation as long as replay_buffer is true (default). Then it will try to delete the secondary store file when Primary store is handling the messages. This is causing problem because HDFS has not completed its access to the secondary store file. The following exception will happen:
[Thu Sep 9 16:02:03 2010] "[hdfs] deleteFile hdfs://localhost:9000/scribetest1/test/localhost.localdomain/test_00000"
[Thu Sep 9 16:02:03 2010] "[hdfs] Connecting to HDFS"
[Thu Sep 9 16:02:03 2010] "[hdfs] Before hdfsConnectNewInstance(localhost, 9000)"
[Thu Sep 9 16:02:03 2010] "[hdfs] After hdfsConnectNewInstance"
[Thu Sep 9 16:02:03 2010] "[test] No more buffer files to send, switching to streaming mode"
Exception in thread "main" java.io.IOException: Could not complete write to file /scribetest1/test/localhost.localdomain/test_00000 by DFSClient_1545136365
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:720)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)
at org.apache.hadoop.ipc.Client.call(Client.java:905)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1406)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1393)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:66)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:91)
Call to org/apache/hadoop/fs/FSDataOutputStream::close failed!
[Thu Sep 9 16:02:03 2010] "[hdfs] closed hdfs://localhost:9000/scribetest1/test/localhost.localdomain/test_00000"
There is more information about this error in "NameNode Logs" link on http://server_ip:50070/dfshealth.jsp page.
To avoid this problem we should either set replay_buffer to false or make the seconardy store local instead of HDFS (the above configuration file example, scribe_hdfs.conf).
The following configuration is to set replay_buffer to false and both primary and secondary stores to use HDFS.
[user@localhost] more hdfs_both.conf
port=1463
max_msg_per_second=2000000
check_interval=3
# DEFAULT
<store>
category=default
type=buffer
replay_buffer=no
target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10
<primary>
type=file
fs_type=hdfs
file_path=hdfs://localhost:9000/scribetest
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=40000000
rotate_period=daily
rotate_hour=0
rotate_minute=5
add_newlines=1
</primary>
<secondary>
type=file
fs_type=hdfs
file_path=hdfs://localhost:9000/scribetest1
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=40000000
rotate_period=daily
rotate_hour=0
rotate_minute=5
add_newlines=1
</secondary>
</store>
References:
1. Thomas Dudziak's blog: How to install Scribe with HDFS support on Ubuntu Karmic
2. Agile Testing: Compiling, installing and test-running Scribe
3. Google Scribe server group: Failover when writing to HDFS problems