Profiling Haskell: Don’t chase the red herring

I’m currently working on a small Haskell tool which helps me minimize the waiting time for catching a train into the city (or out). One feature I’ve implemented recently is an automated import of aprox. 25MB compressed CSV data into an SQLite3 database, which was very slow in the beginning. Not focusing on the first results of the profiling information helped to optimize the implementation for a swift import.

Background

The data comes as a 25MB zip archive of text files in a CSV format. All imported, the SQLite database grows to about 800 MiB. My work-in-progress solution was a cruddy shell + SQL script which imports the CSV files into an SQLite database. With this solution, the import takes about 30 seconds, excluding the time you need to manually download the zip file. But this is not very portable, as I wanted to have a more user friendly solution.

The initial Haskell implementation using mostly the esqueleto and persistent DSL functions showed an abysmal performance. I had to stop the process after half an hour.

Finding the culprit

A first profiling pass showed this result summary:

COST CENTRE          MODULE                         %time %alloc                                                                                                                               
                                                                                                                                                                                               
stepError            Database.Sqlite                 77.2    0.0                                                                                                                               
concat.ts'           Data.Text                        1.8   14.5                                                                                                                               
compareText.go       Data.Text                        1.4    0.0                                                                                                                               
concat.go.step       Data.Text                        1.0    8.2                                                                                                                               
concat               Data.Text                        0.9    1.4                                                                                                                               
concat.len           Data.Text                        0.8   13.9                                                                                                                               
sumP.go              Data.Text                        0.8    2.1                                                                                                                               
concat.go            Data.Text                        0.7    2.6                                                                                                                               
singleton_           Data.Text.Show                   0.6    4.0                                                                                                                               
run                  Data.Text.Array                  0.5    3.1                                                                                                                               
escape               Database.Persist.Sqlite          0.5    7.8                                                                                                                               
>>=.\                Data.Attoparsec.Internal.Types   0.5    1.4                                                                                                                               
singleton_.x         Data.Text.Show                   0.4    2.9                                                                                                                               
parseField           CSV.StopTime                     0.4    1.6                                                                                                                               
toNamedRecord        Data.Csv.Types                   0.3    1.2                                                                                                                               
fmap.\.ks'           Data.Csv.Conversion              0.3    2.9                                                                                                                               
insertSql'.ins       Database.Persist.Sqlite          0.2    1.4                                                                                                                               
compareText.go.(...) Data.Text                        0.1    4.3                                                                                                                               
compareText.go.(...) Data.Text                        0.1    4.3

Naturally I checked the implementation of the first function, since that seemed to have the largest impact. It is a simple foreign function call to C. Fraser Tweedale made me aware, that there is not more speed to gain here, since it’s already calling a C function. With that in mind I had to focus on the next entries. It turned out that’s where I gained most of the speed to something more competitive against the crude SQL script and having it more user friendly.

It turned out that Data.Persistent uses primarily Data.Text concatenation to create the SQL statements. That being done for every insert statement is very costly, since it prepares, binds values and executes the statement for each insert (for reference see this Stack Overflow answer).

The solution

My current solution is to prepare the statement once and only bind the values for each insert.

Having done another benchmark, the import time now comes down to approximately a minute on my Thinkpad X1 Carbon.

Changing a website using the developer console

If you need to quickly change a website, you can use a combination of CSS/XPath selectors and a function to hide/remove DOM nodes. I had to find my way through a long list of similar items which was really hard to go through by simply looking at it.

For example, you can simply delete all links you’re not interested in by a simple combination of selector and function:

$x('//li/a[contains(., "not-interesting")]').map(function(n) { n.parentNode.removeChild(n) })

If you’ve made a mistake, reload the website.

git: Moving partial changes between commits

Now and then I face the fact that I’ve added changes to a commit I’d like to have moved into a different commit. Here is what you do:

What’s there

We have two commits. For illustration purposes I’ve trimmed the log output down:

$ git log --stat
commit 19c698a9ee91a5f03f1c3240fc957e6b328931f5

    WIP: adding tests

 parts/tests/functional/conftest.py       |  4 ++--
 parts/tests/functional/test_frobfrob.py  | 43 ++++++++++
 frobfrob.py                              | 14 +++++++++++++-

commit c7ef6c3014ca9d049dea46fbed44010acf53ae79

    prepare frob frob schemas

 parts/tests/functional/conftest.py           | 31 +++++++++++++
 frobfrob/models.py                           | 32 +++++++++++++

commit 5b30d351f51fda40d37d2f7dc25d2367bd37845a
[...]

Now I want to move the changes made to conftest.py from commit c7ef6c3014ca9d049dea46fbed44010acf53ae79 into commit 19c698a9ee91a5f03f1c3240fc957e6b328931f5 (or HEAD).

Pluck out the commit

In order to pluck out the changes to conftest.py, we’ll reset the file against the previous commit 5b30d351f51fda40d37d2f7dc25d2367bd37845a (you could also use HEAD~3).

$ git reset 5b30d351f51fda40d37d2f7dc25d2367bd37845a parts/tests/functional/conftest.py
Unstaged changes after reset:
M       parts/tests/functional/conftest.py

$ git status -s
MM parts/tests/functional/conftest.py

As you can see, we will have staged changes and unstaged changes. The staged changes remove the additions to the conftest.py file and the unstaged changes add our code to conftest.py

Remove and Add

We now create two commits:

  1. Use the staged changes for a new commit which we’ll squash with c7ef6c3014ca9d049dea46fbed44010acf53ae79.
  2. Stage the unstaged changes and create another commit which we’ll squash with 19c698a9ee91a5f03f1c3240fc957e6b328931f5 or HEAD.
# 1. commit Message is something like: squash: removes changes to conftest.py
$ git commit

# 2. commit
# stage changes
$ git add -p

# commit, message will be something like: squash: adds changes to conftest.py
$ git commit

# we end up with two additional commits
$ git log --oneline
492ff22 Adds changes to conftest
8485946 removes conftest files
19c698a WIP: adding tests
c7ef6c3 prepare frob frob schemas
Interactive rebase put’s it all together

Now use an interactive rebase to squash the changes with the right commits:

$ git rebase -i HEAD~5

Ansible Variables all of a Sudden Go Missing?

I’ve written a playbook which deploys a working development environment for some of our internal systems. I’ve tested it with various versions of RHEL. Yet when I ran it against a fresh install of Fedora it failed:

fatal: [192.168.1.233] => {'msg': "One or more undefined variables: 'ansible_lsb' is undefined", 'failed': True}

It turned out, that ansible gets it’s facts through different programs on the remote machine. If some of these programs are not available (in this instance it was lsb_release) the variables are not populated resulting in this error.

So check if all variables you access are indeed available with:

$ ansible -m setup <yourhost>

Abort a git commit –amend

The situation

You hack on a patch, add files to the index and with a knee jerk reaction do:

git commit --amend

(In fact, I do this in my editor with the vim-fugitive plug-in, but it also happened in the terminal). For the commit message git places you in your text editor. If you quit, your changes are merged with the last commit. Being aware of your trapped situation, what do you do?

The solution

Simply delete the commit message (up to where the comments start with #). The typical git commit-hook will see it as a commit with an empty message and abort the commit and therefore the merge.

Touché

I just ran into a problem in which I got myself cornered. I’ve edited a configuration file with elevated permission using sudo:

sudo vim /etc/httpd/conf.d/metrics.conf

From time to time when editing, I background the vim session in order to perform some commands in the shell, then continue working in vim. This time, I’ve sent the editor into the background. When trying to foreground it I got this:

~/Metrics(branch:bug_947304*) » fg
[1]  + 11403 continued  sudo vim /etc/httpd/conf.d/metrics.conf
[1]  + 11403 suspended (tty input)  sudo vim /etc/httpd/conf.d/metrics.conf

What? How? When? Any attempt to foreground this process failed. Reason was the elevated permission I’ve started the editor with. Long story short, you’ll need to kill the editor in order to get back into editing. This mailing list thread explains why.

Sluggish guest machine performance with libvirt

I’m currently running a few VMs using libvirt under Fedora 19. One problem I ran into with a new workstation was extremely sluggish performance. After consulting the interwebs, I found this great article.

You want to check two things:

  • Is your architecture dependent kvm module loaded (e.g. kvm_intel). If not, and you can not load it check your BIOS settings if visualization is enabled.
  • Check your VM details under “Overview” which hypervisor is in use: kvm or qemu? Choosing kvm will most likely speed things up.