Skip to content

Latest commit

 

History

History
278 lines (213 loc) · 13.6 KB

Debugging-Eucalyptus-C-language-components.md

File metadata and controls

278 lines (213 loc) · 13.6 KB

The following information may be of interest to developers working on Eucalyptus components written in C.

Using client binaries

CC and NC can be queried using CCclient_full and NCclient programs, respectively. These programs, located in source directories of the respective component, allow command-line invocation of API functions of the component. Thus, CCclient_full impersonates CLC and NCclient impersonates a CC. Before invoking the programs, dynamic library search path must be set to include several Axis2 libraries:

  •     libneethi.so.0
    
  •     libmod_rampart.so.0
    
  •     libaxutil.so.0
    
  •     libaxis2_parser.so.0
    
  •     libguththila.so.0
    
  •     libaxis2_http_sender.so.0
    
  •     libaxis2_http_receiver.so.0
    
  •     libaxis2_http_common.so.0
    
  •     libaxis2_engine.so.0
    
  •     libaxis2_axiom.so.0
    
  •     librampart.so.0
    

And the path to the root of the Eucalyptus installation -- which is system root / for a package-based installation -- must be set (so that the cryptographic credentials can be found).

export EUCALYPTUS=/opt/eucalyptus
export AXIS2C_HOME=/opt/eucalyptus/packages/axis2c-1.6.0/
export LD_LIBRARY_PATH=$AXIS2C_HOME/lib:$AXIS2C_HOME/modules/rampart/

(NOTE: for most packaged installs, AXIS2C_HOME will be /usr/lib64/axis2c)

Here is an example invocation of CCclient_full on a CC host with Eucalyptus source tree in $EUCALYPTUS_SRC that has been compiled:

$EUCALYPTUS_SRC/cluster/CCclient_full localhost:8774 describeNetworks
describenetworks returned status 1
useVlans: 1 mode: MANAGED addrspernet: 32 addrIndexMin: 9 addrIndexMax: 30 vlanMin: 2 vlanMax: 127
found 0 active nets

Here is an example invocation of NCclient on the same CC host (note the slight change in syntax relative to CCclient_full: endpoint is specified with -n option, which defaults to localhost:8775 if not specified):

grep NODES $EUCALYPTUS/etc/eucalyptus/eucalyptus.conf
NODES="192.168.51.165"
$EUCALYPTUS_SRC/node/NCclient -n 192.168.51.165:8775 describeResource
2012-10-10 14:20:36 DEBUG 000010036 ncStubCreate             | DEBUG: requested URI http://192.168.51.165:8775/axis2/services/EucalyptusNC
node status=[OK] memory=7792/7792 disk=2/2 cores=4/4 subnets=[none]

Using gdb

CC and NC can be debugged with gdb, which can be:

  • used to analyze a core dump,
  • attached to a live Apache process hosting CC or NC,
  • used to start CC or NC under a debugger from the very beginning.

Each approach will be discussed in turn.

The commands below assume that $EUCALYPTUS is set to the root of Eucalyptus installation: typically just / for package-based installs and often /opt/eucalyptus for from-source installations.

Core dump

Core dumps are useful when a SEGFAULT is difficult to trigger manually, especially on CC, which does a lot of forking. You know your CC or NC is segfaulting when httpd-[cc|nc]_error_log contains lines similar to:

[Wed Aug 29 14:41:07 2012] [notice] child pid 22520 exit signal Segmentation fault (11)
[Wed Aug 29 14:41:13 2012] [notice] child pid 22555 exit signal Segmentation fault (11)
[Wed Aug 29 14:41:19 2012] [notice] child pid 22579 exit signal Segmentation fault (11)

To ensure that CC produces a core dump, you'll need to add the following line

echo "CoreDumpDirectory /tmp" >>$EUCALYPTUS/etc/eucalyptus/httpd-cc.conf

at the end of create_httpd_config() function in $EUCALYPTUS/etc/init.d/eucalyptus-cc. For NC do the same with 'nc' instead of 'cc' in the paths above. For the changes to take effect, stop the component, increase the core limit (in case it is too low), and start the component again.

$EUCALYPTUS/etc/init.d/eucalyptus-cc stop
ulimit -c unlimited
$EUCALYPTUS/etc/init.d/eucalyptus-cc start

After that the error in the log should change to:

[Wed Aug 29 15:39:53 2012] [notice] child pid 6926 exit signal Segmentation fault (11), possible coredump in /tmp

And the /tmp directory should contain the core dump that can be brought up in gdb:

gdb /usr/sbin/httpd /tmp/core.9895
....
Core was generated by `/usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f73bf7a357c in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install httpd-2.2.15-15.el6.centos.1.x86_64
(gdb)

If there is no coredump (after all, the message only said it was "possible"), you may want to try the method described in section 'Run Eucalyptus component under gdb' below.

Attach gdb to a Eucalyptus component

Attaching to a running instance of the component is often sufficient to examine its memory state or to catch a reproducible SEGFAULT with the debugger attached.

The main difficulty has to do with deciding which process to attach to and how to ensure the debugger follows the forks you want. Component log files cc.log and nc.log may reveal to you the PID of the thread of control that you are looking for.

NC is easier to debug as in steady state it only consists of two heavyweight processes: the core of Apache daemon (running as root) and the Apache deamon with the Eucalyptus shared library loaded (running as eucalyptus):

# ps aux | grep eucalyptus/httpd
root     22526  0.0  0.0  55168  1452 ?        Ss   16:00   0:00 /usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf
500      22528  0.2  1.4 2105452 114548 ?      Sl   16:00   0:00 /usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf

Attaching gdb to the latter will allow one to pause its execution, possibly set a breakpoint or inspect state of threads, and to either detach or let it run under the debugger (it is important not to pause the component for too long, since eventually network request timeouts on the upstream component may turn the system into an unusual state):

# gdb --pid=22528
....
(gdb) info thread
  3 Thread 0x7f601f224700 (LWP 22537)  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
  2 Thread 0x7f6018116700 (LWP 22541)  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7f6093de77e0 (LWP 22528)  0x00007f6092423fff in accept4 () from /lib64/libc.so.6
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007f6092423fff in accept4 () from /lib64/libc.so.6
(gdb) detach
Detaching from program: /usr/sbin/httpd, process 22528
(gdb) quit

NC uses multiple threads, which can be examined interactively to identify them:

(gdb) info thread
  3 Thread 0x7f601f224700 (LWP 22537)  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
  2 Thread 0x7f6018116700 (LWP 22541)  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7f6093de77e0 (LWP 22528)  0x00007f6092423fff in accept4 () from /lib64/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f6018116700 (LWP 22541))]#0  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
#1  0x00007f60923e6fd0 in sleep () from /lib64/libc.so.6
#2  0x00007f608e0494c6 in monitoring_thread (arg=0x7f608e304520) at handlers.c:620
#3  0x00007f60926d37f1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6092421ccd in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f601f224700 (LWP 22537))]#0  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f60923e715d in nanosleep () from /lib64/libc.so.6
#1  0x00007f609241b124 in usleep () from /lib64/libc.so.6
#2  0x00007f608e07fe40 in sensor_bottom_half () at sensor.c:54
#3  0x00007f608e07fecb in sensor_thread (arg=0x0) at sensor.c:76
#4  0x00007f60926d37f1 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f6092421ccd in clone () from /lib64/libc.so.6
(gdb)

One can discern from the above that thread 2 is the monitoring_thread and thread 3 is the sensor_thread. If there were instances in the process of being started up or rebooted or bundled, you would also see startup_thread or rebooting_thread or bundling_thread in the list.

Run Eucalyptus component under gdb

Several environment variables must be set when starting a Eucalyptus component under gdb from the beginning:

export EUCALYPTUS=/opt/eucalyptus
export AXIS2C_HOME=/opt/eucalyptus/packages/axis2c-src-1.6.0/
export LD_LIBRARY_PATH=$AXIS2C_HOME/lib:$AXIS2C_HOME/modules/rampart
export PATH=$PATH:$EUCALYPTUS/usr/lib/eucalyptus

The first two are critical for any invocation, the last two may be needed, depending on the execution path of the component. Any running instance of the component must be shut down before invoking the component under the debugger. Depending on the distribution, the Apache binary may be called httpd or apache2:

# gdb /usr/sbin/httpd
...
Reading symbols from /usr/sbin/httpd...(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install httpd-2.2.15-15.el6.centos.1.x86_64
(gdb) break monitoring_thread
Function "monitoring_thread" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (monitoring_thread) pending.
(gdb) run -X -f $EUCALYPTUS/etc/eucalyptus/httpd-nc.conf >/dev/null
Starting program: /usr/sbin/httpd -X -f $EUCALYPTUS/etc/eucalyptus/httpd-nc.conf >/dev/null
[Thread debugging using libthread_db enabled]
[New Thread 0x7fff833d4700 (LWP 382)]
Detaching after fork from child process 383.
Detaching after fork from child process 385.
[New Thread 0x7fff7c2c6700 (LWP 386)]
[Switching to Thread 0x7fff7c2c6700 (LWP 386)]

Breakpoint 1, monitoring_thread (arg=0x7ffff24b4520) at handlers.c:498
498         logprintfl (EUCADEBUG, "spawning monitoring thread\n");
(gdb) cont
Continuing.

Note how setting breakpoints before the Eucalyptus component shared library is loaded results in 'not defined' error. Take care to type in the breakpoint information accurately. For NC, the default policy of debugger staying with the parent process is sufficient. For CC, which uses forks extensively, you may be able to reach the desired process by setting set follow-fork-mode child option on the gdb prompt.

Obtaining stack traces with pstack or gstack

Stack traces are useful indicators of what a process is doing at a point in time. For instance, analysis of locks being held by threads may help identify the cause of a deadlocked process. Although gdb can be attached to a Eucalyptus process to obtain stack traces, it can be tedious when many processes are involved, as in the case of the CC. Using pstack or gstack (for a threaded process, like NC) is a faster alternative, especially in combination with a bash for-loop. (The two commands are available as part of the gdb package.)

  • For CC, the following command will print the top 10 stack frames of each process that makes up the CC:
for pid in `ps aux | grep euca | grep cc | cut -c 10-15 | xargs` ; do echo; echo $pid; pstack $pid | head -10 ; done | less
  • For NC, the following command will do the same for stack state of both processes and threads that make up the NC:
for pid in `ps aux | grep euca | grep nc | cut -c 10-15 | xargs` ; do echo; echo $pid; gstack $pid | head -10 ; done | less

Sniffing CC's or NC's network traffic

Sniffing control network traffic between Eucalyptus components can help diagnose many problems, especially those related to syntax, signing, or timing of communication messages. Since the message are in semi-human-readable format (XML) and not encrypted or compressed (only signed), not much processing is required to make some sense of them.

Two important parameters for sniffing are ethernet device:

  • usually lo for communication between co-located components (CLC and CC)
  • usually eth0 for communication between distributed components (CC and NC)

and TCP port:

  • 8774 for CLC-CC communication
  • 8775 for CC-NC communication

Even the most commonly available Unix tool, tcpdump, results in readable output with just a few flags:

tcpdump -i eth0 -Als0 port 8775

We either pipe the output of such command into less for searchable, paged output or save it in raw format with -w filename.dump option for future analysis, either with tcpdump or other tools that can read tcpdump format, such as Wireshark and tcpflow.

Also commonly available on Unix systems is ngrep, which is designed for searching for strings in network traffic. For instance, the following expression looks for packets containing describe message (such as the DescribeResource, DescribeInstances, and DescribeSensors queries that periodically traverse the system):

ngrep -d eth0 -qi describe port 8775 

For extracting content of specific TCP flows (i.e., data flowing in one direction on a connection), a tool called tcpflow can be useful. It is not commonly available in package repositories, but it is easy enough to install it from source:

pushd /tmp
wget https://github.com/downloads/simsong/tcpflow/tcpflow-1.3.0.tar.gz
tar zxvf tcpflow-1.3.0.tar.gz
cd tcpflow-1.3.0
yum -y install gcc-c++ libpcap-devel
./configure
make
sudo make install
popd

After running the tool for a bit to capture packets, one can examine individual flows with less or with tools capable of pretty-printing the XML which makes up SOAP messages in Eucalyptus:

mkdir tcpflows
cd tcpflows
tcpflow -i eth0 port 8775
^C
less *

Here we filter out messages containing the string DescribeInstance, concatenate them together, and pass to xmlstarlet wrapped by a top-level element <trace> (which is as good as any for the purpose).

yum install xmlstarlet
echo '<trace>'`grep -li describeinstance * | xargs grep --no-filename soapenv`'</trace>' | xmlstarlet fo | less

[[category.debugging]]