Posts Tagged ‘oprofile’

Systemtap

May 27, 2009

So I promised a while ago to describe my systemtap experience.

My initial intent was to use dtrace, and as it isn’t available on Linux, I spent some time playing with OpenSolaris. In the course of my web searches I stumbled across Systemtap that is supposed to provide at least equivalent or sometimes better functionality. I won’t comment on that, because as soon as I found it, I abandoned dtrace. This is nothing against dtrace, but my primary development platform is Linux.

The task I was tackling was investigating performance for my ARM4 agent. In particular, I wanted to improve performance when run on machines with multiple cores to improve scalability. Conventional debuggers don’t really do this, and profilers only give you part of the picture… especially when multiple processes are involved. One question I had in particular was how much time I was spending waiting on mutual exclusions.

So I start playing with the example scripts. Overall it’s pretty easy. In fact, I probably spent as much time setting up a good test scenario as I did developing my systemtap scripts. Their examples for monitoring IPCs are pretty good. Not stellar, but good.

I got a few surprises. Primarily, I was astonished to see how much time I was spending copying memory using message queues. This is by far the bulk of the time expended on a 2 core Opteron system. Secondly, I was astonished at just how difficult it is to monitor mutual exclusions!

Linux uses an exclusion mechanism called futexes. User space memory is used for the exclusion and the operations are fast. Unfortunately, at least with the version I was running on RHEL 5.2, you get no visibility into the user symbol tables. So your futex is displayed as a memory location. That’s pretty useless. I had to go back and modify my code to print all the relevant memory addresses just so I could see which one is which! This was a lot of work that definitely flies in the face of the non-intrusive measurement philosophy.

Overall, I’d rate the experience satisfactory, but systemtap still needs improvement.

I found a few areas where I could tweak my code and get significant performance improvements. I saw some areas where the architecture could be improved, but that’s another problem for another day.

I see rumours on the web of user space taps. It may exist already, but if it’s in the version I used, it’s unclear how to use it. A far more important task for systemtap developers in my mind would be reading program symbol tables. Knowing a futex is generating a hot spot is useless to me if I can’t find where the futex is!

Throughout this process, I’ve been amazed at how “almost” the development tools on Linux are. CodeAnalyst (oprofile) and Systemtap are both tools I need, and both come up short. As good as Linux is for writing code, I can’t fully do the work that needs to be done using the tools available. I still wind up testing on other machines.

I’d give systemtap a B, and Linux in general a C.