Debugging with RPM packages

With one of our internal web applications based on Ruby on Rails, we’ve discovered a file descriptor leak in one of the delayed job worker processes. The worker leaked descriptors whenever it invoked a message being send to the message bus using qpid-messaging.

Since we’re using gems compiled as C++ and C extensions, in order to find the root cause, I used the packages provided through the package manager and gdb.

Big thanks to Dan Callaghan who walked me through most of the process and then found the leak in the C++ sources.

TL;DR;

  • identify the leaking descriptors and reproduce it with lsof
  • attach strace to the process and identify file descriptors which are not being closed
  • install debuginfo packages for all dependencies
  • use gdb to figure out what is going on

Reproducer

I’ve used lsof and a friend wrote a small script to quickly monitor the worker process. Looking at the opened files of the process revealed a long list which looked like half closed sockets. It turned out later, that it wasn’t the same problem since the sockets were created, but never bound/connected.

I was unable to reproduce the problem on my local development environment, but found away to do it on our staging environment which resembles production much closer. So whenever I invoked an action in the UI which resulted in a message being sent, I was able to see another file descriptor leak with lsof.

Strace the process

With the reproducer at hand, I started to strace the process:

# Note we're not filtering system calls with -e here.
# Weirdly CLOSE was not reported when just filtering network calls
strace -s 1000 -p  -o strace_output_log.strace

Dan helped me looking through the produced log output, which revealed that the system under investigation created a socket and called getpeername right after it, without binding it resulting in a leaked file descriptor.

10971 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 35
10971 getpeername(35, 0x7fffae712a90, [112]) = -1 ENOTCONN (Transport endpoint is not connected)

Install debuginfo packages and use gdb

In order to debug the system, we need debuginfo packages installed, otherwise you wont be able to step through the sources using gdb. When you attach gdb to the process it will tell you what packages it is missing, for example:

Missing separate debuginfos, use: debuginfo-install qpid-proton-c-0.10-3.fc25.x86_64

You then go install those (be mindful that you the repositories configured e.g. section name fedora-debuginfo):

debuginfo-install qpid-proton-c-0.10-3.fc25.x86_64

and basically start debugging.

Our first suspicion was the qpid messaging library and we check if it’s invocation of getpeername was leaking the file descriptors. I’ve added a break point at the point of the source code we thought was suspicious and in a separate terminal used lsof to see which file descriptor number is leaked. For example:

# I've used a watch, which executes the lsof every 2 seconds by
# default. The grep filters some of the files I'm not interested in
$ watch "lsof -p  | grep -v REG"

The lsof output will show you the leaked file descriptor number in column 4 by default. With that you can check in gdb if the file descriptor being handled in the source code is the one which leaked.

Since that achieved no results, we used gdb to break on invocations of the getpeername identifier and used backtrace to pin point in the sources where the leak occurred.

Advertisements

Profiling Haskell: Don’t chase the red herring

I’m currently working on a small Haskell tool which helps me minimize the waiting time for catching a train into the city (or out). One feature I’ve implemented recently is an automated import of aprox. 25MB compressed CSV data into an SQLite3 database, which was very slow in the beginning. Not focusing on the first results of the profiling information helped to optimize the implementation for a swift import.

Background

The data comes as a 25MB zip archive of text files in a CSV format. All imported, the SQLite database grows to about 800 MiB. My work-in-progress solution was a cruddy shell + SQL script which imports the CSV files into an SQLite database. With this solution, the import takes about 30 seconds, excluding the time you need to manually download the zip file. But this is not very portable, as I wanted to have a more user friendly solution.

The initial Haskell implementation using mostly the esqueleto and persistent DSL functions showed an abysmal performance. I had to stop the process after half an hour.

Finding the culprit

A first profiling pass showed this result summary:

COST CENTRE          MODULE                         %time %alloc                                                                                                                               
                                                                                                                                                                                               
stepError            Database.Sqlite                 77.2    0.0                                                                                                                               
concat.ts'           Data.Text                        1.8   14.5                                                                                                                               
compareText.go       Data.Text                        1.4    0.0                                                                                                                               
concat.go.step       Data.Text                        1.0    8.2                                                                                                                               
concat               Data.Text                        0.9    1.4                                                                                                                               
concat.len           Data.Text                        0.8   13.9                                                                                                                               
sumP.go              Data.Text                        0.8    2.1                                                                                                                               
concat.go            Data.Text                        0.7    2.6                                                                                                                               
singleton_           Data.Text.Show                   0.6    4.0                                                                                                                               
run                  Data.Text.Array                  0.5    3.1                                                                                                                               
escape               Database.Persist.Sqlite          0.5    7.8                                                                                                                               
>>=.\                Data.Attoparsec.Internal.Types   0.5    1.4                                                                                                                               
singleton_.x         Data.Text.Show                   0.4    2.9                                                                                                                               
parseField           CSV.StopTime                     0.4    1.6                                                                                                                               
toNamedRecord        Data.Csv.Types                   0.3    1.2                                                                                                                               
fmap.\.ks'           Data.Csv.Conversion              0.3    2.9                                                                                                                               
insertSql'.ins       Database.Persist.Sqlite          0.2    1.4                                                                                                                               
compareText.go.(...) Data.Text                        0.1    4.3                                                                                                                               
compareText.go.(...) Data.Text                        0.1    4.3

Naturally I checked the implementation of the first function, since that seemed to have the largest impact. It is a simple foreign function call to C. Fraser Tweedale made me aware, that there is not more speed to gain here, since it’s already calling a C function. With that in mind I had to focus on the next entries. It turned out that’s where I gained most of the speed to something more competitive against the crude SQL script and having it more user friendly.

It turned out that Data.Persistent uses primarily Data.Text concatenation to create the SQL statements. That being done for every insert statement is very costly, since it prepares, binds values and executes the statement for each insert (for reference see this Stack Overflow answer).

The solution

My current solution is to prepare the statement once and only bind the values for each insert.

Having done another benchmark, the import time now comes down to approximately a minute on my Thinkpad X1 Carbon.

Changing a website using the developer console

If you need to quickly change a website, you can use a combination of CSS/XPath selectors and a function to hide/remove DOM nodes. I had to find my way through a long list of similar items which was really hard to go through by simply looking at it.

For example, you can simply delete all links you’re not interested in by a simple combination of selector and function:

$x('//li/a[contains(., "not-interesting")]').map(function(n) { n.parentNode.removeChild(n) })

If you’ve made a mistake, reload the website.

git: Moving partial changes between commits

Now and then I face the fact that I’ve added changes to a commit I’d like to have moved into a different commit. Here is what you do:

What’s there

We have two commits. For illustration purposes I’ve trimmed the log output down:

$ git log --stat
commit 19c698a9ee91a5f03f1c3240fc957e6b328931f5

    WIP: adding tests

 parts/tests/functional/conftest.py       |  4 ++--
 parts/tests/functional/test_frobfrob.py  | 43 ++++++++++
 frobfrob.py                              | 14 +++++++++++++-

commit c7ef6c3014ca9d049dea46fbed44010acf53ae79

    prepare frob frob schemas

 parts/tests/functional/conftest.py           | 31 +++++++++++++
 frobfrob/models.py                           | 32 +++++++++++++

commit 5b30d351f51fda40d37d2f7dc25d2367bd37845a
[...]

Now I want to move the changes made to conftest.py from commit c7ef6c3014ca9d049dea46fbed44010acf53ae79 into commit 19c698a9ee91a5f03f1c3240fc957e6b328931f5 (or HEAD).

Pluck out the commit

In order to pluck out the changes to conftest.py, we’ll reset the file against the previous commit 5b30d351f51fda40d37d2f7dc25d2367bd37845a (you could also use HEAD~3).

$ git reset 5b30d351f51fda40d37d2f7dc25d2367bd37845a parts/tests/functional/conftest.py
Unstaged changes after reset:
M       parts/tests/functional/conftest.py

$ git status -s
MM parts/tests/functional/conftest.py

As you can see, we will have staged changes and unstaged changes. The staged changes remove the additions to the conftest.py file and the unstaged changes add our code to conftest.py

Remove and Add

We now create two commits:

  1. Use the staged changes for a new commit which we’ll squash with c7ef6c3014ca9d049dea46fbed44010acf53ae79.
  2. Stage the unstaged changes and create another commit which we’ll squash with 19c698a9ee91a5f03f1c3240fc957e6b328931f5 or HEAD.
# 1. commit Message is something like: squash: removes changes to conftest.py
$ git commit

# 2. commit
# stage changes
$ git add -p

# commit, message will be something like: squash: adds changes to conftest.py
$ git commit

# we end up with two additional commits
$ git log --oneline
492ff22 Adds changes to conftest
8485946 removes conftest files
19c698a WIP: adding tests
c7ef6c3 prepare frob frob schemas
Interactive rebase put’s it all together

Now use an interactive rebase to squash the changes with the right commits:

$ git rebase -i HEAD~5

(Locally) Testing ansible deployments

I’ve always felt my playbooks undertested. I know about a possible solution of spinning up new OpenStack instances with the ansible nova module, but felt it to be too complex as a good idea to implement. Now I’ve found a quicker way to test your playbooks by using Docker.

In principal, all my test does is:

  1. create a docker container
  2. create a copy of the current ansible playbook in a temporary directory and mount it as a volume
  3. inside the docker container, run the playbook

This is obviously not perfect, since:

  • running a playbook locally vs connecting via ssh can be a different beast to test
  • can become resource intensive if you want to test different scenarios represented as docker images.

There is possibly more, but for myself in small it is a workable solution so far.

Find the code on github if you’d like to have a look. Improvements welcome!