Yes, I write my own posts in lame html

instead of using whatever's the new rad blogging thing.

A tale of dependencies

TL;DR: Suppose your end-product already depends on many libs, including your own pkg A, and third-party pkg B. Then letting a change to A in, that then makes it depend on B too, can have serious, sneaky, and interesting consequences.

Copy of the commit message for this Camlistore commit:

pkg/osutil: remove dependency on pkg/env

pkg/osutil started depending on pkg/env in
c55c860 , when we added
DefaultLetsEncryptCache(), because it behaves differently whether we're
on GCE or not.

This in turn made pkg/osutil depend on
cloud.google.com/go/compute/metadata and a couple of others. This, in
itself, was not so bad since the main user of pkg/osutil,
server/camlistored already depends on all these same dependencies
anyway.

However, pkg/types/camtypes depends on pkg/osutil (although maybe it
shouldn't) because it calls osutil.UserClientConfigPath() for an error
message.

And finally, in app/publisher/js - which is some Go code meant as input
for gopherjs, so that it gets compiled into javascript that we include
in the publisher web page - we depend on pkg/types/camtypes for some
search query results.

So as of c55c860, we "leaked" a couple
of additional and unnecessary dependencies into the gopherjs generated
code. That made the minified output unnoticeably grow, and apparently
had no other adverse effect, so we didn't notice.

That is, until we landed b0b6a0a, which
updated a ton of vendored dependencies, most of them in
cloud.google.com/go and google.golang.org/api. And interestingly, the
update made cloud.google.com/go/compute/metadata depend on a lot of
these other dependencies. This had two noticeable consequences:

1) The (unminified) generated js code grew from 6.1M to 8.2M, which in
itself is super bad already.

2) The js code actually stopped working. For some reason (that I can't
explain, and will probably leave to the GopherJS devs if they're
interested), the members/functions that we export to the javascript
namespace with gopherjs (typically with a
js.Global.Set("ExportedNameForFoo", FooFunc)), are not visible anymore,
which of course breaks everything.

Therefore, this CL fixes this issue by simply doing what we knew we
should have done from the start:

we now let pkg/osutil/gce (which already depends on pkg/env) itself
register a LetsEncryptCacheFunc into pkg/osutil, which removes the need
for pkg/osutil to depend on pkg/env.

Google Music VS Free.fr

When I bought the Nexus 5X, I got a 3 months free trial of Google Music with it. Last may I finally decided to give it a try. Below is the data I collected about this trial. I explain in the following what they mean.

DateSatisfactionTime of failure
2016-05-08-2
2016-05-092
2016-05-10-118:00
2016-05-11-218:45
2016-05-122
2016-05-132
2016-05-141
2016-05-150
2016-05-162
2016-05-172
2016-05-18-219:20
2016-05-191
2016-05-202
2016-05-212
2016-05-22-1
2016-05-232
2016-05-242
2016-05-25-221:00
2016-05-260
2016-05-270
2016-05-280
2016-05-290
2016-05-30-118:30
2016-05-311
2016-06-1-220:30
2016-06-21
2016-06-3-218:45
2016-06-41
2016-06-52
2016-06-62
2016-06-71
2016-06-82
2016-06-91
2016-06-102
2016-06-111
2016-06-122
2016-06-13-118:45
2016-06-141
2016-06-15-219:30
2016-06-16-218:00
2016-06-17-218:00
2016-06-182
2016-06-192
2016-06-202
2016-06-211
2016-06-222
2016-06-231
2016-06-242
2016-06-252
2016-06-262
2016-06-27-217:30
2016-06-28-217:49
2016-06-292
2016-06-301
2016-07-10
2016-07-20
2016-07-30
2016-07-42
2016-07-51
2016-07-62
2016-07-71
2016-07-80
2016-07-90
2016-07-100
2016-07-110
2016-07-121
2016-07-131
2016-07-142
2016-07-152
2016-07-161
2016-07-170
2016-07-182
2016-07-191
2016-07-202
2016-07-211
2016-07-222
2016-07-23-218:20
2016-07-242
2016-07-252
2016-07-262
2016-07-271
2016-07-282
2016-07-291
2016-07-302
2016-07-310
2016-08-12
2016-08-22
2016-08-31
2016-08-42
2016-08-51

The reason I started collecting this data was because I noticed how prone to interruptions the service was. A track would stop playing because buffering it wasn't fast enough and playing would catch up with it. When things got at their worst, this would happen several times during a track, like every minute or even every 30 seconds or so. The satisfaction reflects my (subjective) patience with the issue (which was pretty low), according to the following ranking:

To give some perspective, I would give it a -2 if several tracks in a row were interrupted several times.

Now as you can see, most of the failures would happen in the evening, which is probably peak time for internet connections at home. Or at least the first burst, compared to the rest of the day, where people are at work, and not using their home connections. So I would not be surprised if the blame was at least in part on my ISP (Free.fr), and not just on Google Music performance. Especially in the light of the history of conflicts between Google (well, youtube) and Free.fr.

Still, I felt that Google Music had some means to alleviate the problem, such as degrading the quality of the audio stream on the fly when bandwidth would get too low. Or even better, give the user some control over how much of the track should be cached before starting to play it. I can stand getting a lower quality music, or having to wait a bit before a track starts, as both these inconveniences are way less annoying than a track getting interrupted right in the middle of it.

I mentioned all of that when I reported the issue to Google Music, which I did pretty early on. And to be honest, I had my mind set on not renewing the subscription when the free trial expired, if the situation did not improve. It is also why I collected the data, so I could explain and justify at the end of the trial, and maybe run some stats on it.

However, much to my surprise, the situation did improve over July, and it's been pretty much at the same quality level since then. And so I don't really have any reason to analyze the data furthermore, but here it is in case someone else is interested. And I therefore did keep the (paying) service.

I don't know if the Google Music peeps did do something (since one does not get any feedback on reported issues), or if somehow Free.fr picked up on the bottlenecks (for me or in general) at the evening hours and improve on it. But whoever it is, I'm grateful about it.

My precious thoughts on Go

Apparently there is a tacit rule on the internet that you have to write a blog post about Go as soon as you have run your first Hello World. I am unfortunately a tad late to the party, but here it is.

I like it. A lot.

And it is called Go.

Thank you for your much desired and totally undeserved attention.

A tale of two screens

Or, a silly xrandr trick.

I use my laptop as my daily work machine, so it is often plugged to an external screen. And, the linux desktop being what it is (Yay, projectors!), it does not work flawlessly to switch between the two displays. Hence this little script I've been using for a while:

#!/home/mpl/plan9/bin/rc

if (! ~ $1 big small) {
        echo 'Usage: scr big|small'
        exit 
}

switch ($1){
case big
        xrandr --output HDMI1 --auto
        wait
        xrandr --output LVDS1 --off
case small
        xrandr --output LVDS1 --auto
        wait
        xrandr --output HDMI1 --off
}

exit 0

However, it has the unfortunate consequence of completely wrecking any pretense of automatic display detection left.

So it's friday night, the HDMI display was the one activated, and I close the laptop lid to make it sleep. The next day, I unplug everything and take the laptop to the couch for some relaxing saturday hacking. And of course, when I wake it up I get a black screen, because I forgot to switch to the LVDS before unplugging the external monitor.

Now one's first reflex is to switch to the vt1 console to try and fix things from there, right?
Let's see:

% xrandr --output LVDS1 --auto
Can't open display

Of course, forgot to set the DISPLAY, silly me.

% DISPLAY=:0.0 xrandr --output LVDS1 --auto

Back to vt7: still all black. What could be going on? Let's run it in an xterm then:

% DISPLAY=:0.0 xterm -e /bin/bash -c "xrandr --output LVDS1 --auto"

Same result. Now that's about where I had given up a while ago, and I had been resorting to that until now:

Switch to vt7, unlock xscreensaver by typing your password blindly, then back to vt1:

% DISPLAY=:0.0 xterm

Back to vt7, hope the focus is on the newly created xterm, and run blindly my script:

% scr small

which would usually work, but was still too fiddly for my taste.

Today I finally got annoyed enough to debug the issue further (with the LVDS display on, to see what was happening):

From vt1:

% DISPLAY=:0.0 xterm -e /bin/bash -c "/home/mpl/bin/scr small; sleep 60" # sleep for a while so the term does not close immediately after.

On vt7, observe in xterm:

xrandr: Configure crtc 0 failed

Could it be that xrandr is sad because we're not on vt7 when the command is run? (spoiler: yes). So here's for the final silly trick:

% DISPLAY=:0.0 xterm -e /bin/bash -c "sleep 5; /home/mpl/bin/scr small; sleep 60" # give ourselves 5 seconds to switch to vt7.

That gives me a few seconds to switch to vt7, before the command is actually run, which does the trick. Stupid, but it works.

Why containers are the shit

As of today (2014/12/16), you have probably all heard of the containers rage, and about docker in particular.

In the Camlistore project, we have been using docker for a while, and I have come to appreciate why it is useful and convenient. A couple of notable examples are:

Here's the thing though: most of these cases were initially instigated by Brad, and it was not until yesterday that docker came to me as a natural solution for a little problem I had.

I wanted to test the impact on Camlistore of some specific configuration changes related to HTTPS certificates and domain names, and see if all browsers were ok with it. This is not the kind of thing I can test offhand on my dev machine at home, because I have no domain name pointing to there, and hence no valid (i.e. not self-signed) HTTPS cert either.

What I do have though, is access to a remote server under a domain name, which also happens to have a valid HTTPS cert. However, this server runs a "production" Camlistore server that I do not want to disrupt. One option is of course to carefully craft a Camlistore configuration file, in a non-default directory set by an env var, with a different configuration for where to store the blobs, on which port to run, etc. But maybe I was lazy, and it sounded like an annoying thing to do. And that's where docker comes in. All I had to do was:

mkdir -p tmp/.config/camlistore
docker run -v $HOME/tmp/.config/camlistore:/.config/camlistore -p 3180:3179 camlistore/camlistored

The -v option allows me to persist all the configuration stuff from Camlistore in the container (in /.config/camlistore on the container) on the host filesystem (in $HOME/tmp/.config/camlistore). That way I can play around with the configuration file left in there in between runs. The -p option is obviously to map the port Camlistore uses by default in the container (3179) to the 3180 port on the host (because the default port is already used by the production instance).

So yeah, docker is nice. Kindof like a virtual machine, but way easier to set up, and without all the overhead.

A bit of cpu usage monitoring and Go profiling

It bet it happened to any programmer that at some point he notices that a running program seems to be acting up while it shouldn't. For example you notice the cpu fan starting to go a bit faster, or maybe some disk scratching while you can't think of anything running that would cause this kind of activity.

I recently came accross such a situation; the laptop seemed a bit noisier than usual, and there was some disk i/o happening every dozen seconds or so. The first thing one does is to have a quick look in top. And sure enough the program in question (camlistore, to name it) seemed to be peaking every few seconds at ~90% cpu or so. (To be honest, I was kind of expecting it since I had a recollection of noticing such a behaviour while this program was running).

Now that was the perfect occasion to try gopprof (a script around prof), as I had heard good things about it, but never had an excuse to try it out for a real problem. The Go team wrote a brilliant post about it here, so that's how I got started. Of course, camlistore being a collection of servers, it's not supposed to terminate, so I had to adapt the blog example a bit since one is supposed to flush the measurements right before the monitored program terminates. I could have modified camlistored so that it would capture the termination signal and flush before actually terminating, but I instead went for the easier solution which was simply to program it to terminate after a given duration. Here's the gist of it:

func main() {
	flag.Parse()

	var f *os.File
	var err error
	if *cpuprofile != "" {
		f, err = os.Create(*cpuprofile)
		if err != nil {
			log.Fatal(err)
		}
		pprof.StartCPUProfile(f)
	}

/* actual code starts here */

// the last statement was ws.Serve(), which is essentially similar to an
// http server. Since I want to add other statements to be executed while
// the server is running, I got it to run in a goroutine instead.
// Thus it becomes:
// go ws.Serve()

/* actual code ends here */

	if *cpuprofile != "" {
		log.Print("Before sleeping")
		time.Sleep(1800e9)
		pprof.StopCPUProfile()
		log.Print("pprof stopped")
		f.Close()
		log.Fatal("camlistore terminated")
	}

}

As explained in the blog post, one then has to run the program with -cpuprofile=output.prof, and next run goprof to analyse the measurements produced with the run, like this: 'gopprof path_to_program output.prof'.
One can then use the 'web' command to produce a nice calls graph, from which these two were extracted:

readblobs scanblock

From there, one can easily see that a there's a huge number of calls to camli/blobserver/localdisk.readBlobs, which in turn leads to a lot of slices allocations, which explains the predominant garbage collector activity (scanblock).

From that point on, I knew where to look for in the code, and it was actually pointed out to me later on that this particular piece of code is most likely some debug/testing that we should remove anyway. Problem solved.

But!
In this case it was pretty obvious to me that something was going wrong and I was sortof expecting it anyway. However, it might happen that you're suspecting that a program is acting up, but you're not really sure because let's face it, just looking at it in top is a bit weak. So you would like to have real figures to back that up. Gopprof would not really serve here, because it would not give you any direct indication of how much cpu the program is using relatively to the rest of the system.
I thought Valgrind could do that sort of thing but after glancing through the doc I did not find it. I then googled a bit and found out about pidstat (it comes with package sysstat on ubuntu), which can take -p <pid> as argument and can be invoked with a time interval and a time duration, and then yields this kind of output:

mpl@oenone:~$ pidstat -u -p 2756 1 1800
Linux 2.6.32-37-generic (oenone) 	02/04/2012 	_x86_64_	(2 CPU)

11:14:30 PM       PID    %usr %system  %guest    %CPU   CPU  Command
11:14:31 PM      2756    0.00    0.00    0.00    0.00     0  acme
11:14:32 PM      2756    0.00    0.00    0.00    0.00     0  acme
11:14:33 PM      2756    0.00    0.00    0.00    0.00     0  acme
...

Parsing that output and calculating an average of the %cpu over time is now trivial. I reckon awk would probably be a good candidate for that task, but I'm just more comfortable with Go, so I simply wrote this little program:

package main

import (
	"bufio"
	"flag"
	"io"
	"log"
	"os"
	"strings"
	"strconv"
)

func main() {
	flag.Parse()
	if len(flag.Args()) != 1 {
		log.Fatal("need 1 arg")
	}

	f, err := os.Open(flag.Arg(0))
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	sum, count := float64(0), 0
	b := bufio.NewReader(f)
	for i:=0; i<3; i++ {
		b.ReadString('\n')
	}
	for {
		line, err := b.ReadString('\n')
		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}
		fields := strings.Fields(line)
		if len(fields) != 9 {
			log.Fatal("incorrect nb of fields in line")
		}
		cpu, err := strconv.ParseFloat(fields[6], 32)
		if err != nil {
			log.Fatal(err)
		}
		sum += cpu
		count++
	}
	cpu := sum / float64(count)
	log.Printf("%g \n", cpu)
}

There. It was no rocket science, but as someone I know would say, it was an itch I had to scratch.

Comments in the google+ post welcome. In particular, I'm interested if you know of another utility (maybe with nice graphs/plots) that already does this.

Powered by

Golang Perkeep Ubuntu