Troubleshooting

2 Comments Posted on February 7, 2019February 7, 2019 Troubleshooting

GHC: can’t find a package database

In case you’re using the nix package manager your nix build fails with:

these derivations will be built: /nix/store/7xk0m6r07x85rwlh01b3wvq8bbzwbw1n-purebred-0.1.0.0.drv /nix/store/dmj2ax3qsa55jjl6by9fb9sk929k98nl-ghc-8.6.3-with-packages.drv /nix/store/j9fl8cmq9c6kjnz9dj79rmbs1kzafyys-purebred-with-packages-8.6.3.drv building '/nix/store/7xk0m6r07x85rwlh01b3wvq8bbzwbw1n-purebred-0.1.0.0.drv'... setupCompilerEnvironmentPhase Build with /nix/store/cclv7n6jr311i5ywwkms1m3iz4lsg37j-ghc-8.6.3. unpacking sources unpacking source archive /nix/store/j23vlzlg2rmqy0a706h235j4v9zh4m9s-purebred source root is purebred patching sources compileBuildDriverPhase setupCompileFlags: -package-db=/build/setup-package.conf.d -j4 -threaded Loaded package environment from /build/purebred/.ghc.environment.x86_64-linux-8.6.3 ghc: can't find a package database at /home/rjoost/.cabal/store/ghc-8.6.3/package.db builder for '/nix/store/7xk0m6r07x85rwlh01b3wvq8bbzwbw1n-purebred-0.1.0.0.drv' failed with exit code 1 cannot build derivation '/nix/store/dmj2ax3qsa55jjl6by9fb9sk929k98nl-ghc-8.6.3-with-packages.drv': 1 dependencies couldn't be built cannot build derivation '/nix/store/j9fl8cmq9c6kjnz9dj79rmbs1kzafyys-purebred-with-packages-8.6.3.drv': 1 dependencies couldn't be built error: build of '/nix/store/j9fl8cmq9c6kjnz9dj79rmbs1kzafyys-purebred-with-packages-8.6.3.drv' failed

then the solution to it is actually easier then you think. It happens when you run

cabal new-repl

inside a nix shell, because cabal creates a hidden environment file. So look for a

.ghc.environment.-- # for example on Linux with GHC 8.6.3 .ghc.environment.x86_64-linux-8.6.3

Delete it and you should be good to go.

Leave a comment Posted on December 14, 2018 Software Development, Troubleshooting

Docker volume mount fails

I recently stumbled over this odd error message in one of our gitlab runners:
ERROR: for nginx_proxy Cannot start service load_balancer: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/builds/group/project/nginx/nginx.conf\\\" to rootfs \\\"/data/docker/overlay2/88fb8a0ee201dd14cfc9aa9befe4d7a5eb28e5ec816a2d76726040316853ed11/merged\\\" at \\\"/data/docker/overlay2/88fb8a0ee201dd14cfc9aa9befe4d7a5eb28e5ec816a2d76726040316853ed11/merged/etc/nginx/nginx.conf\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type

It is the result of using the docker-compose up myservice command which is defined to use just an image and mounts files like so:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf

I’ve spent a bit of time figuring out what the underlying problem is. In hindsight, the error message already gives it away, but I was unable to reproduce the issue on my host machine. That is because the problem is actually more related to docker than your host.

When I found out, that the runner in gitlab is actually a docker container, it dawned upon me, that the operation we do here is a container-in-a-container operation. The container typically shares the same docker instance with the host system. The bind mount actually happens on the host machine. It tries to mount the path from a directory/file which doesn’t exist on the host machine.

To verify if we can reproduce the same error on the host, I tried to bind mount a volume with a path which doesn’t exist and voila:
$ sudo docker run --rm -it --volume /build/nginx/nginx.conf:/etc/nginx/nginx.conf nginx --help docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/build/nginx/nginx.conf\\\" to rootfs \\\"/var/lib/docker/devicemapper/mnt/2fab14f3dc592d19b1408618a5ba26e88e334d88fe6b7524dc6c30bb0d26bbfc/rootfs\\\" at \\\"/var/lib/docker/devicemapper/mnt/2fab14f3dc592d19b1408618a5ba26e88e334d88fe6b7524dc6c30bb0d26bbfc/rootfs/etc/nginx/nginx.conf\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

Be weary of running docker in docker when you need to bind mount volumes. Prefer a bare metal or VM as a runner.

Leave a comment Posted on June 30, 2017June 30, 2017 Troubleshooting

Debugging with RPM packages

With one of our internal web applications based on Ruby on Rails, we’ve discovered a file descriptor leak in one of the delayed job worker processes. The worker leaked descriptors whenever it invoked a message being send to the message bus using qpid-messaging.

Since we’re using gems compiled as C++ and C extensions, in order to find the root cause, I used the packages provided through the package manager and gdb.

Big thanks to Dan Callaghan who walked me through most of the process and then found the leak in the C++ sources.

TL;DR;

identify the leaking descriptors and reproduce it with lsof
attach strace to the process and identify file descriptors which are not being closed
install debuginfo packages for all dependencies
use gdb to figure out what is going on

Reproducer

I’ve used lsof and a friend wrote a small script to quickly monitor the worker process. Looking at the opened files of the process revealed a long list which looked like half closed sockets. It turned out later, that it wasn’t the same problem since the sockets were created, but never bound/connected.

I was unable to reproduce the problem on my local development environment, but found away to do it on our staging environment which resembles production much closer. So whenever I invoked an action in the UI which resulted in a message being sent, I was able to see another file descriptor leak with lsof.

Strace the process

With the reproducer at hand, I started to strace the process:

# Note we're not filtering system calls with -e here.
# Weirdly CLOSE was not reported when just filtering network calls
strace -s 1000 -p  -o strace_output_log.strace

Dan helped me looking through the produced log output, which revealed that the system under investigation created a socket and called getpeername right after it, without binding it resulting in a leaked file descriptor.

10971 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 35
10971 getpeername(35, 0x7fffae712a90, [112]) = -1 ENOTCONN (Transport endpoint is not connected)

Install debuginfo packages and use gdb

In order to debug the system, we need debuginfo packages installed, otherwise you wont be able to step through the sources using gdb. When you attach gdb to the process it will tell you what packages it is missing, for example:

Missing separate debuginfos, use: debuginfo-install qpid-proton-c-0.10-3.fc25.x86_64

You then go install those (be mindful that you the repositories configured e.g. section name fedora-debuginfo):

debuginfo-install qpid-proton-c-0.10-3.fc25.x86_64

and basically start debugging.

Our first suspicion was the qpid messaging library and we check if it’s invocation of getpeername was leaking the file descriptors. I’ve added a break point at the point of the source code we thought was suspicious and in a separate terminal used lsof to see which file descriptor number is leaked. For example:

# I've used a watch, which executes the lsof every 2 seconds by
# default. The grep filters some of the files I'm not interested in
$ watch "lsof -p  | grep -v REG"

The lsof output will show you the leaked file descriptor number in column 4 by default. With that you can check in gdb if the file descriptor being handled in the source code is the one which leaked.

Since that achieved no results, we used gdb to break on invocations of the getpeername identifier and used backtrace to pin point in the sources where the leak occurred.

Leave a comment Posted on April 14, 2017April 14, 2017 Software Development, Troubleshooting

Profiling Haskell: Don’t chase the red herring

I’m currently working on a small Haskell tool which helps me minimize the waiting time for catching a train into the city (or out). One feature I’ve implemented recently is an automated import of aprox. 25MB compressed CSV data into an SQLite3 database, which was very slow in the beginning. Not focusing on the first results of the profiling information helped to optimize the implementation for a swift import.

Background

The data comes as a 25MB zip archive of text files in a CSV format. All imported, the SQLite database grows to about 800 MiB. My work-in-progress solution was a cruddy shell + SQL script which imports the CSV files into an SQLite database. With this solution, the import takes about 30 seconds, excluding the time you need to manually download the zip file. But this is not very portable, as I wanted to have a more user friendly solution.

The initial Haskell implementation using mostly the esqueleto and persistent DSL functions showed an abysmal performance. I had to stop the process after half an hour.

Finding the culprit

A first profiling pass showed this result summary:

COST CENTRE          MODULE                         %time %alloc                                                                                                                               
                                                                                                                                                                                               
stepError            Database.Sqlite                 77.2    0.0                                                                                                                               
concat.ts'           Data.Text                        1.8   14.5                                                                                                                               
compareText.go       Data.Text                        1.4    0.0                                                                                                                               
concat.go.step       Data.Text                        1.0    8.2                                                                                                                               
concat               Data.Text                        0.9    1.4                                                                                                                               
concat.len           Data.Text                        0.8   13.9                                                                                                                               
sumP.go              Data.Text                        0.8    2.1                                                                                                                               
concat.go            Data.Text                        0.7    2.6                                                                                                                               
singleton_           Data.Text.Show                   0.6    4.0                                                                                                                               
run                  Data.Text.Array                  0.5    3.1                                                                                                                               
escape               Database.Persist.Sqlite          0.5    7.8                                                                                                                               
>>=.\                Data.Attoparsec.Internal.Types   0.5    1.4                                                                                                                               
singleton_.x         Data.Text.Show                   0.4    2.9                                                                                                                               
parseField           CSV.StopTime                     0.4    1.6                                                                                                                               
toNamedRecord        Data.Csv.Types                   0.3    1.2                                                                                                                               
fmap.\.ks'           Data.Csv.Conversion              0.3    2.9                                                                                                                               
insertSql'.ins       Database.Persist.Sqlite          0.2    1.4                                                                                                                               
compareText.go.(...) Data.Text                        0.1    4.3                                                                                                                               
compareText.go.(...) Data.Text                        0.1    4.3

Naturally I checked the implementation of the first function, since that seemed to have the largest impact. It is a simple foreign function call to C. Fraser Tweedale made me aware, that there is not more speed to gain here, since it’s already calling a C function. With that in mind I had to focus on the next entries. It turned out that’s where I gained most of the speed to something more competitive against the crude SQL script and having it more user friendly.

It turned out that Data.Persistent uses primarily Data.Text concatenation to create the SQL statements. That being done for every insert statement is very costly, since it prepares, binds values and executes the statement for each insert (for reference see this Stack Overflow answer).

The solution

My current solution is to prepare the statement once and only bind the values for each insert.

Having done another benchmark, the import time now comes down to approximately a minute on my Thinkpad X1 Carbon.

Leave a comment Posted on January 25, 2017 Internet, Troubleshooting

Changing a website using the developer console

If you need to quickly change a website, you can use a combination of CSS/XPath selectors and a function to hide/remove DOM nodes. I had to find my way through a long list of similar items which was really hard to go through by simply looking at it.

For example, you can simply delete all links you’re not interested in by a simple combination of selector and function:

$x('//li/a[contains(., "not-interesting")]').map(function(n) { n.parentNode.removeChild(n) })

If you’ve made a mistake, reload the website.

Leave a comment Posted on May 12, 2015May 12, 2015 Troubleshooting

git: Moving partial changes between commits

Now and then I face the fact that I’ve added changes to a commit I’d like to have moved into a different commit. Here is what you do:

What’s there

We have two commits. For illustration purposes I’ve trimmed the log output down:

$ git log --stat
commit 19c698a9ee91a5f03f1c3240fc957e6b328931f5

    WIP: adding tests

 parts/tests/functional/conftest.py       |  4 ++--
 parts/tests/functional/test_frobfrob.py  | 43 ++++++++++
 frobfrob.py                              | 14 +++++++++++++-

commit c7ef6c3014ca9d049dea46fbed44010acf53ae79

    prepare frob frob schemas

 parts/tests/functional/conftest.py           | 31 +++++++++++++
 frobfrob/models.py                           | 32 +++++++++++++

commit 5b30d351f51fda40d37d2f7dc25d2367bd37845a
[...]

Now I want to move the changes made to conftest.py from commit c7ef6c3014ca9d049dea46fbed44010acf53ae79 into commit 19c698a9ee91a5f03f1c3240fc957e6b328931f5 (or HEAD).

Pluck out the commit

In order to pluck out the changes to conftest.py, we’ll reset the file against the previous commit 5b30d351f51fda40d37d2f7dc25d2367bd37845a (you could also use HEAD~3).

$ git reset 5b30d351f51fda40d37d2f7dc25d2367bd37845a parts/tests/functional/conftest.py
Unstaged changes after reset:
M       parts/tests/functional/conftest.py

$ git status -s
MM parts/tests/functional/conftest.py

As you can see, we will have staged changes and unstaged changes. The staged changes remove the additions to the conftest.py file and the unstaged changes add our code to conftest.py

Remove and Add

We now create two commits:

Use the staged changes for a new commit which we’ll squash with c7ef6c3014ca9d049dea46fbed44010acf53ae79.
Stage the unstaged changes and create another commit which we’ll squash with 19c698a9ee91a5f03f1c3240fc957e6b328931f5 or HEAD.

# 1. commit Message is something like: squash: removes changes to conftest.py
$ git commit

# 2. commit
# stage changes
$ git add -p

# commit, message will be something like: squash: adds changes to conftest.py
$ git commit

# we end up with two additional commits
$ git log --oneline
492ff22 Adds changes to conftest
8485946 removes conftest files
19c698a WIP: adding tests
c7ef6c3 prepare frob frob schemas

Interactive rebase put’s it all together

Now use an interactive rebase to squash the changes with the right commits:

$ git rebase -i HEAD~5

Leave a comment Posted on August 29, 2014August 29, 2014 Troubleshooting

Ansible Variables all of a Sudden Go Missing?

I’ve written a playbook which deploys a working development environment for some of our internal systems. I’ve tested it with various versions of RHEL. Yet when I ran it against a fresh install of Fedora it failed:

fatal: [192.168.1.233] => {'msg': "One or more undefined variables: 'ansible_lsb' is undefined", 'failed': True}

It turned out, that ansible gets it’s facts through different programs on the remote machine. If some of these programs are not available (in this instance it was lsb_release) the variables are not populated resulting in this error.

So check if all variables you access are indeed available with:

$ ansible -m setup <yourhost>

2 Comments Posted on February 12, 2014 Software Development, Troubleshooting

Abort a git commit –amend

The situation

You hack on a patch, add files to the index and with a knee jerk reaction do:

git commit --amend

(In fact, I do this in my editor with the vim-fugitive plug-in, but it also happened in the terminal). For the commit message git places you in your text editor. If you quit, your changes are merged with the last commit. Being aware of your trapped situation, what do you do?

The solution

Simply delete the commit message (up to where the comments start with #). The typical git commit-hook will see it as a commit with an empty message and abort the commit and therefore the merge.

Leave a comment Posted on July 16, 2013 Troubleshooting

Touché

I just ran into a problem in which I got myself cornered. I’ve edited a configuration file with elevated permission using sudo:

sudo vim /etc/httpd/conf.d/metrics.conf

From time to time when editing, I background the vim session in order to perform some commands in the shell, then continue working in vim. This time, I’ve sent the editor into the background. When trying to foreground it I got this:

~/Metrics(branch:bug_947304*) » fg
[1]  + 11403 continued  sudo vim /etc/httpd/conf.d/metrics.conf
[1]  + 11403 suspended (tty input)  sudo vim /etc/httpd/conf.d/metrics.conf

What? How? When? Any attempt to foreground this process failed. Reason was the elevated permission I’ve started the editor with. Long story short, you’ll need to kill the editor in order to get back into editing. This mailing list thread explains why.

Leave a comment Posted on July 12, 2013 Troubleshooting

Sluggish guest machine performance with libvirt

I’m currently running a few VMs using libvirt under Fedora 19. One problem I ran into with a new workstation was extremely sluggish performance. After consulting the interwebs, I found this great article.

You want to check two things:

Is your architecture dependent kvm module loaded (e.g. kvm_intel). If not, and you can not load it check your BIOS settings if visualization is enabled.
Check your VM details under “Overview” which hypervisor is in use: kvm or qemu? Choosing kvm will most likely speed things up.

Romanofskis Blog

Romanofskis Blog

Just another WordPress.com weblog

Troubleshooting

GHC: can’t find a package database

Docker volume mount fails

Debugging with RPM packages

Reproducer

Strace the process

Install debuginfo packages and use gdb

Profiling Haskell: Don’t chase the red herring

Background

Finding the culprit

The solution

Changing a website using the developer console

git: Moving partial changes between commits

What’s there

Pluck out the commit

Remove and Add

Interactive rebase put’s it all together

Ansible Variables all of a Sudden Go Missing?

Abort a git commit –amend

The situation

The solution

Touché

Sluggish guest machine performance with libvirt