The following information may be of interest to developers working on Eucalyptus components written in C.
CC and NC can be queried using CCclient_full
and NCclient
programs, respectively. These programs, located in source directories of the respective component, allow command-line invocation of API functions of the component. Thus, CCclient_full
impersonates CLC and NCclient
impersonates a CC. Before invoking the programs, dynamic library search path must be set to include several Axis2 libraries:
-
libneethi.so.0
-
libmod_rampart.so.0
-
libaxutil.so.0
-
libaxis2_parser.so.0
-
libguththila.so.0
-
libaxis2_http_sender.so.0
-
libaxis2_http_receiver.so.0
-
libaxis2_http_common.so.0
-
libaxis2_engine.so.0
-
libaxis2_axiom.so.0
-
librampart.so.0
And the path to the root of the Eucalyptus installation -- which is system root /
for a package-based installation -- must be set (so that the cryptographic credentials can be found).
export EUCALYPTUS=/opt/eucalyptus
export AXIS2C_HOME=/opt/eucalyptus/packages/axis2c-1.6.0/
export LD_LIBRARY_PATH=$AXIS2C_HOME/lib:$AXIS2C_HOME/modules/rampart/
(NOTE: for most packaged installs, AXIS2C_HOME will be /usr/lib64/axis2c)
Here is an example invocation of CCclient_full
on a CC host with Eucalyptus source tree in $EUCALYPTUS_SRC
that has been compiled:
$EUCALYPTUS_SRC/cluster/CCclient_full localhost:8774 describeNetworks
describenetworks returned status 1
useVlans: 1 mode: MANAGED addrspernet: 32 addrIndexMin: 9 addrIndexMax: 30 vlanMin: 2 vlanMax: 127
found 0 active nets
Here is an example invocation of NCclient
on the same CC host (note the slight change in syntax relative to CCclient_full
: endpoint is specified with -n
option, which defaults to localhost:8775
if not specified):
grep NODES $EUCALYPTUS/etc/eucalyptus/eucalyptus.conf
NODES="192.168.51.165"
$EUCALYPTUS_SRC/node/NCclient -n 192.168.51.165:8775 describeResource
2012-10-10 14:20:36 DEBUG 000010036 ncStubCreate | DEBUG: requested URI http://192.168.51.165:8775/axis2/services/EucalyptusNC
node status=[OK] memory=7792/7792 disk=2/2 cores=4/4 subnets=[none]
CC and NC can be debugged with gdb, which can be:
- used to analyze a core dump,
- attached to a live Apache process hosting CC or NC,
- used to start CC or NC under a debugger from the very beginning.
Each approach will be discussed in turn.
The commands below assume that
$EUCALYPTUS
is set to the root of Eucalyptus installation: typically just/
for package-based installs and often/opt/eucalyptus
for from-source installations.
Core dumps are useful when a SEGFAULT is difficult to trigger manually, especially on CC, which does a lot of forking. You know your CC or NC is segfaulting when httpd-[cc|nc]_error_log
contains lines similar to:
[Wed Aug 29 14:41:07 2012] [notice] child pid 22520 exit signal Segmentation fault (11)
[Wed Aug 29 14:41:13 2012] [notice] child pid 22555 exit signal Segmentation fault (11)
[Wed Aug 29 14:41:19 2012] [notice] child pid 22579 exit signal Segmentation fault (11)
To ensure that CC produces a core dump, you'll need to add the following line
echo "CoreDumpDirectory /tmp" >>$EUCALYPTUS/etc/eucalyptus/httpd-cc.conf
at the end of create_httpd_config()
function in $EUCALYPTUS/etc/init.d/eucalyptus-cc
. For NC do the same with 'nc' instead of 'cc' in the paths above. For the changes to take effect, stop the component, increase the core limit (in case it is too low), and start the component again.
$EUCALYPTUS/etc/init.d/eucalyptus-cc stop
ulimit -c unlimited
$EUCALYPTUS/etc/init.d/eucalyptus-cc start
After that the error in the log should change to:
[Wed Aug 29 15:39:53 2012] [notice] child pid 6926 exit signal Segmentation fault (11), possible coredump in /tmp
And the /tmp
directory should contain the core dump that can be brought up in gdb
:
gdb /usr/sbin/httpd /tmp/core.9895
....
Core was generated by `/usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f73bf7a357c in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install httpd-2.2.15-15.el6.centos.1.x86_64
(gdb)
If there is no coredump (after all, the message only said it was "possible"), you may want to try the method described in section 'Run Eucalyptus component under gdb' below.
Attaching to a running instance of the component is often sufficient to examine its memory state or to catch a reproducible SEGFAULT with the debugger attached.
The main difficulty has to do with deciding which process to attach to and how to ensure the debugger follows the forks you want. Component log files cc.log
and nc.log
may reveal to you the PID of the thread of control that you are looking for.
NC is easier to debug as in steady state it only consists of two heavyweight processes: the core of Apache daemon (running as root
) and the Apache deamon with the Eucalyptus shared library loaded (running as eucalyptus
):
# ps aux | grep eucalyptus/httpd
root 22526 0.0 0.0 55168 1452 ? Ss 16:00 0:00 /usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf
500 22528 0.2 1.4 2105452 114548 ? Sl 16:00 0:00 /usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf
Attaching gdb
to the latter will allow one to pause its execution, possibly set a breakpoint or inspect state of threads, and to either detach or let it run under the debugger (it is important not to pause the component for too long, since eventually network request timeouts on the upstream component may turn the system into an unusual state):
# gdb --pid=22528
....
(gdb) info thread
3 Thread 0x7f601f224700 (LWP 22537) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
2 Thread 0x7f6018116700 (LWP 22541) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7f6093de77e0 (LWP 22528) 0x00007f6092423fff in accept4 () from /lib64/libc.so.6
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007f6092423fff in accept4 () from /lib64/libc.so.6
(gdb) detach
Detaching from program: /usr/sbin/httpd, process 22528
(gdb) quit
NC uses multiple threads, which can be examined interactively to identify them:
(gdb) info thread
3 Thread 0x7f601f224700 (LWP 22537) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
2 Thread 0x7f6018116700 (LWP 22541) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7f6093de77e0 (LWP 22528) 0x00007f6092423fff in accept4 () from /lib64/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f6018116700 (LWP 22541))]#0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
#1 0x00007f60923e6fd0 in sleep () from /lib64/libc.so.6
#2 0x00007f608e0494c6 in monitoring_thread (arg=0x7f608e304520) at handlers.c:620
#3 0x00007f60926d37f1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f6092421ccd in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f601f224700 (LWP 22537))]#0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
#1 0x00007f609241b124 in usleep () from /lib64/libc.so.6
#2 0x00007f608e07fe40 in sensor_bottom_half () at sensor.c:54
#3 0x00007f608e07fecb in sensor_thread (arg=0x0) at sensor.c:76
#4 0x00007f60926d37f1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f6092421ccd in clone () from /lib64/libc.so.6
(gdb)
One can discern from the above that thread 2 is the monitoring_thread
and thread 3 is the sensor_thread
. If there were instances in the process of being started up or rebooted or bundled, you would also see startup_thread
or rebooting_thread
or bundling_thread
in the list.
Several environment variables must be set when starting a Eucalyptus component under gdb
from the beginning:
export EUCALYPTUS=/opt/eucalyptus
export AXIS2C_HOME=/opt/eucalyptus/packages/axis2c-src-1.6.0/
export LD_LIBRARY_PATH=$AXIS2C_HOME/lib:$AXIS2C_HOME/modules/rampart
export PATH=$PATH:$EUCALYPTUS/usr/lib/eucalyptus
The first two are critical for any invocation, the last two may be needed, depending on the execution path of the component. Any running instance of the component must be shut down before invoking the component under the debugger. Depending on the distribution, the Apache binary may be called httpd
or apache2
:
# gdb /usr/sbin/httpd
...
Reading symbols from /usr/sbin/httpd...(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install httpd-2.2.15-15.el6.centos.1.x86_64
(gdb) break monitoring_thread
Function "monitoring_thread" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (monitoring_thread) pending.
(gdb) run -X -f $EUCALYPTUS/etc/eucalyptus/httpd-nc.conf >/dev/null
Starting program: /usr/sbin/httpd -X -f $EUCALYPTUS/etc/eucalyptus/httpd-nc.conf >/dev/null
[Thread debugging using libthread_db enabled]
[New Thread 0x7fff833d4700 (LWP 382)]
Detaching after fork from child process 383.
Detaching after fork from child process 385.
[New Thread 0x7fff7c2c6700 (LWP 386)]
[Switching to Thread 0x7fff7c2c6700 (LWP 386)]
Breakpoint 1, monitoring_thread (arg=0x7ffff24b4520) at handlers.c:498
498 logprintfl (EUCADEBUG, "spawning monitoring thread\n");
(gdb) cont
Continuing.
Note how setting breakpoints before the Eucalyptus component shared library is loaded results in 'not defined' error. Take care to type in the breakpoint information accurately. For NC, the default policy of debugger staying with the parent process is sufficient. For CC, which uses forks extensively, you may be able to reach the desired process by setting set follow-fork-mode child
option on the gdb
prompt.
Stack traces are useful indicators of what a process is doing at a point in time. For instance, analysis of locks being held by threads may help identify the cause of a deadlocked process. Although gdb
can be attached to a Eucalyptus process to obtain stack traces, it can be tedious when many processes are involved, as in the case of the CC. Using pstack
or gstack
(for a threaded process, like NC) is a faster alternative, especially in combination with a bash for-loop. (The two commands are available as part of the gdb
package.)
- For CC, the following command will print the top 10 stack frames of each process that makes up the CC:
for pid in `ps aux | grep euca | grep cc | cut -c 10-15 | xargs` ; do echo; echo $pid; pstack $pid | head -10 ; done | less
- For NC, the following command will do the same for stack state of both processes and threads that make up the NC:
for pid in `ps aux | grep euca | grep nc | cut -c 10-15 | xargs` ; do echo; echo $pid; gstack $pid | head -10 ; done | less
Sniffing control network traffic between Eucalyptus components can help diagnose many problems, especially those related to syntax, signing, or timing of communication messages. Since the message are in semi-human-readable format (XML) and not encrypted or compressed (only signed), not much processing is required to make some sense of them.
Two important parameters for sniffing are ethernet device:
- usually
lo
for communication between co-located components (CLC and CC) - usually
eth0
for communication between distributed components (CC and NC)
and TCP port:
8774
for CLC-CC communication8775
for CC-NC communication
Even the most commonly available Unix tool, tcpdump
, results in readable output with just a few flags:
tcpdump -i eth0 -Als0 port 8775
We either pipe the output of such command into less
for searchable, paged output or save it in raw format with -w filename.dump
option for future analysis, either with tcpdump
or other tools that can read tcpdump
format, such as Wireshark
and tcpflow
.
Also commonly available on Unix systems is ngrep
, which is designed for searching for strings in network traffic. For instance, the following expression looks for packets containing describe
message (such as the DescribeResource, DescribeInstances, and DescribeSensors queries that periodically traverse the system):
ngrep -d eth0 -qi describe port 8775
For extracting content of specific TCP flows (i.e., data flowing in one direction on a connection), a tool called tcpflow
can be useful. It is not commonly available in package repositories, but it is easy enough to install it from source:
pushd /tmp
wget https://github.com/downloads/simsong/tcpflow/tcpflow-1.3.0.tar.gz
tar zxvf tcpflow-1.3.0.tar.gz
cd tcpflow-1.3.0
yum -y install gcc-c++ libpcap-devel
./configure
make
sudo make install
popd
After running the tool for a bit to capture packets, one can examine individual flows with less
or with tools capable of pretty-printing the XML which makes up SOAP messages in Eucalyptus:
mkdir tcpflows
cd tcpflows
tcpflow -i eth0 port 8775
^C
less *
Here we filter out messages containing the string DescribeInstance, concatenate them together, and pass to xmlstarlet
wrapped by a top-level element <trace>
(which is as good as any for the purpose).
yum install xmlstarlet
echo '<trace>'`grep -li describeinstance * | xargs grep --no-filename soapenv`'</trace>' | xmlstarlet fo | less
[[category.debugging]]