On key signing and trust

Key signing is a hallowed tradition in the open source world with a very specific protocol for validating and confirming an identity before accepting someone to the web of trust. It’s almost never done without meeting the person being admitted into the trust relationship and it goes like this:

  1. Individuals meet for a beer, or at a key signing party (for those who just went wtf, yes, these things are real, and they are crazy fun! see below for the type of shenanigans that take place at these reality-altering parties)
  2. They exchange strips of paper or business cards with their name, email address, key fingerprint and key ID
  3. They validate each other’s identity using Government issued photo IDs
  4. Once cleared, they pull down each other’s key from the key servers
  5. They validate that the fingerprint of the downloaded key matches what’s written on the piece of paper and the photo IDs exchanged at introduction
  6. If everything checks out, they sign each other’s key
  7. For additional security, the signed key is encrypted using the public key of the recipient and emailed to the address indicated in the key

Let’s look at this unnerving and highly nerdy exchange that has replaced the “Hi, I’m Tom” with “Hi, I’m Tom and here’s my fingerprint and Government issued photo Id”. Here’s the rationale for some of the steps in this workflow.

The key is a personal identification and privacy instrument that is backed by strong science to assure non-repudiation. I will not go into the science in this post, but here’s where you may want to get started if you’re curious. An aspect about this workflow is that nothing is trusted until verified and the protocol is there to make sure that no compromise takes place.

At the beginning of the process, the public key is expected to be on a public key server network (such as pgp.mit.edu) and the meeting in person is to make sure that the key you’re signing (which is on a public system) belongs to the correct individual and not to an individual (or three letter agency) who’s masquerading as someone else. The most secure way to ensure that is in person (because we’re a paranoid bunch), as that will eliminate any chance of a malicious man in the middle. When one produces the piece of paper with the key fingerprint (again backed by strong science) the signer is able to confirm by comparing the fingerprint on the public server with the fingerprint that’s presented in person along with the official photo id, that the public key really belongs to the individual before him/her. The connection has now been made and technology has once again prevailed in mathematically assured validation of another’s identity. The party is just getting started.

Once identity is validated this way, the signer signs the key and uploads the key back to the key server or emails a copy of it. This can be done after the party in a more subdued setting without crazy paper shuffling and photo id validation madness. The astute and more paranoid amongst us, would encrypt it using the public key of the key being signed and email it to the address specified in the key because that’s a good way to validate the email address is correct and belongs to the right user. For the gnupg commands that make this workflow possible, check out the Debian Key signing howto.

In communities such as Debian, this process is mandatory to assure the trust in a system that is largely de-centralized. The Web of Trust that this creates gives rise to a truly magnificent network, which is difficult to subvert so long as the protocol is followed to ensure no compromise.

While this is good for cryptographically assured validation of one’s identity in a global network and non-repudiation of one’s contributions and electronic communications, trust is ultimately a very subjective attribute and probably can never be assured through a hash, because trust can be broken by people even though strong science says otherwise.

Git guts

Today I will dive into the guts of git to showcase the simplicity and elegance in which git manages the content internally in it’s own content addressable file system. Armed with this knowledge, you will be able to get a deeper understanding of the underlying data structure to help you figure out and troubleshoot issues that may inevitably come up as you use git.

To start, I shall create a new directory and initialize git.

$ mkdir git-guts
$ cd git-guts
$ ls -a
. ..
$ git init
Initialized empty Git repository in /Users/anuradha/dev/workbench/git-guts/.git/
$ ls -a
. .. .git

At this point, there are no files under version control yet. Here are the files that have been created during initialization:

.git
.git/branches
.git/config
.git/description
.git/HEAD
.git/hooks
.git/hooks/applypatch-msg.sample
.git/hooks/commit-msg.sample
.git/hooks/post-commit.sample
.git/hooks/post-receive.sample
.git/hooks/post-update.sample
.git/hooks/pre-applypatch.sample
.git/hooks/pre-commit.sample
.git/hooks/pre-rebase.sample
.git/hooks/prepare-commit-msg.sample
.git/hooks/update.sample
.git/info
.git/info/exclude
.git/objects
.git/objects/info
.git/objects/pack
.git/refs
.git/refs/heads
.git/refs/tags

Of these, the hooks are boilerplate and none are yet active. To make them active, they need to be renamed to remove the .sample suffix.

In this post, I shall focus on the .git/objects directory, as that is where all the content is stored as hashed “objects”. To show what happens, let’s add a file to source control and observe the changes:

$ echo "bar" > foo
$ git add foo
$ git commit -m "initial commit"
[master (root-commit) 64f3e97] initial commit
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 foo
$ find .git/objects/ -type f
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6
.git/objects/64/f3e9762509b0ce9cbb252f69847957e5368632
.git/objects/6a/09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae

Adding a single file to the repository caused the creation of three objects. Each object is uniquely identified by a 40-character SHA-1 hash of its content, which brings us to one of the key aspects of git, which is that it’s nearly impossible to alter the contents of any single file without causing a change to the cryptographic hash, and unlike version control systems that pre-date this approach of cryptographically ascertaining the integrity of the content, it’s quite hard to tamper with the file or maliciously change history. This coupled with the ability to sign tags using a private key adds an additional level of authenticity and non-repudiation to the release process.

Let’s analyze the three types of objects. To see the type of object, the git cat-file -t HASH command can be used. It shows that the three types of objects are:

  • blob
  • commit
  • tree

To see the contents of each file, the git cat-file -p HASH command can be used as shown below:

$ git cat-file -p 5716ca5987cbf97d6bb54920bea6adde242d87e6
bar

This is the first of the three objects, which is the “blob”. It is the actual contents of the file. Note that the file is addressable using the hash, making this structure a content-addressable filesystem. But you may wonder, how does git know what the file name is? This object is only named by the hash. I will get to that shortly.

Let’s look at the next object.

$ git cat-file -p 64f3e9762509b0ce9cbb252f69847957e5368632
tree 6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
author Anuradha Weeraman 1358159197 +0530
committer Anuradha Weeraman 1358159197 +0530

initial commit

This is the “commit” object, which is also stored as an object in the file system. Note that there are two fields for the author and the committer, since the two can be different individuals in the case of a large distributed development project. This way original contributions are acknowledged and not lost during the merging and contribution incorporation process. This file also has a hash reference to the commit “tree”. Let’s look at the tree object next.

$ git cat-file -p 6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
100644 blob 5716ca5987cbf97d6bb54920bea6adde242d87e6 foo

This is the last of the three objects, which is the “tree” object. It contains a descriptor of all the files that are part of the commit. It does that by taking the information from the staging area / index and creating an object at the time of the commit. It shows the permissions of the file in a somewhat different format to the standard UNIX file permissions; the last three digits tells you what the permissions of the file was at the time it was committed. The line also indicates the hash of the blob followed by the name of the file. This is how git knows what the blob should be called in the file system when the code is checked out.

Let’s also take a look at what the HEAD of the tree is pointing to:

$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/refs/heads/master
64f3e9762509b0ce9cbb252f69847957e5368632

It now has a reference to the last “commit” object. So when you clone or pull down master, git knows what the last commit was introduced into the repository.

All I’ve described so far was a single commit. How does git keep track of the history and the commit graph based on this structure, you might wonder. Let’s make a change to the foo file and commit it.

$ echo foo > foo
$ git add foo
$ git commit -m "Second commit"
[master 2c8200f] Second commit
1 files changed, 1 insertions(+), 1 deletions(-)
$ find .git/objects -type f
.git/objects/20/5f6b799e7d5c2524468ca006a0131aa57ecce7
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
.git/objects/2c/8200f75860bede9aaa0c156c133d15fa418bd5

.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6
.git/objects/64/f3e9762509b0ce9cbb252f69847957e5368632
.git/objects/6a/09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae

There are three new objects in the system now, a new blob, a tree, and a commit. The blob and tree objects are similar to the ones discussed earlier, but there’s a change to the commit object:

$ git cat-file -p 2c8200f75860bede9aaa0c156c133d15fa418bd5
tree 205f6b799e7d5c2524468ca006a0131aa57ecce7
parent 64f3e9762509b0ce9cbb252f69847957e5368632
author Anuradha Weeraman 1358161997 +0530
committer Anuradha Weeraman 1358161997 +0530

Second commit

It references the parent commit. This way the entire commit graph can be traversed and mapped using these commit objects. The .git/refs/heads/master file is updated to refer to the latest commit. git reflog is a very useful tool which shows the updates to the HEADs over time and can be used to diagnose issues which you might otherwise consider unrecoverable. Git is very protective of data so it’s actually quite hard to lose data, unless you manually trash the object repository. In most occasions, it may turn out to be a dangling unreferenced commit which you can track down using git reflog and recover it. Here’s a post that explains this process for those who are interested.

Now, to make things a little more interesting and to create some awareness of what the git utilities are doing behind the scenes to make our lives easy, let’s create these objects manually using a few low level commands with the help of this new knowledge that we just acquired. For the purpose of this exercise, I will create a brand new repository and initialize git.

Let’s create the blob object for the file “foo” with the content “bar” as in the original example:

$ echo bar | git hash-object -w --stdin
5716ca5987cbf97d6bb54920bea6adde242d87e6

The -w switch tells git to write the object to the repository, and --stdin instructs it to read the contents from standard input. It then outputs the hash of the object that it just created.

Let’s look at the repository to see if it really was created:

$ find .git/objects -type f
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6

So far git has been telling us the truth.

Now, let’s create a tree object. Since git relies on the index, or the staging area in order to determine the contents of the tree, we will use the git update-index command to set things up in the staging area. Note that the current directory is still empty, there is no “foo” file in the current directory. It’s only available as a hashed object inside .git, and still .git doesn’t know it’s called “foo”. To update the staging area to write the tree object:

$ git update-index --add --cacheinfo 100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 foo

This is equivalent to performing git add foo. Now git knows the file name of the object, but the tree object is not yet written to the object repository. To do that:

$ git write-tree
6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae

This writes the tree object, and returns its hash. Let’s look at the file system again:

$ find .git/objects -type f
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6
.git/objects/6a/09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
$ git cat-file -p 6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
100644 blob 5716ca5987cbf97d6bb54920bea6adde242d87e6 foo

Still, the repository does not contain a “foo” file. Right now these objects are dangling, as there’s no commit object referencing them. It’s not possible to checkout a copy of the foo file yet. Let’s create the commit object now:

$ echo "initial commit" | git commit-tree 6a09c5
c3352776341945bcdddd400d3765635bb2be5671

The short hash of the tree object and optionally and preceding commits are passed in as arguments to the git commit-tree command which returns the hash of the commit object. At this point the repository still has no idea what the last commit was, so performing the git log command would result in an error:

$ git log
fatal: bad default revision 'HEAD'

To fix this:

$ echo c3352776341945bcdddd400d3765635bb2be5671 > .git/refs/heads/master

Let’s look at the log again:

$ git log
commit c3352776341945bcdddd400d3765635bb2be5671
Author: Anuradha Weeraman
Date: Mon Jan 14 18:06:51 2013 +0530

initial commit

There you have it. Git now recognizes your last commit.

If you now list the directory where you initialized the git repository, you would not notice any files, since all these objects were created directly in the git object repository. Now that we have created the commit object and the log shows the last commit, we’re able to load the file into the directory to create a working copy. The way we do that is by resetting the contents of the repository to the HEAD which points at the latest commit.

To illustrate this more clearly:

$ ls -a
. .. .git (empty directory)
$ git reset --hard
HEAD is now at c335277 initial commit
$ ls -a
. .. .git foo
$ cat foo
bar

and Voila.

Hope this helps, and you now have a better understanding of the git guts.

FireFox + GPG

FireGPG is a neat little FireFox plugin that acts as a front-end for GPG and provides seamless integration with Gmail. Once installed and Gmail support is enabled (which is, by default), a series of signing/encryption related buttons will appear at the top of the Compose Mail page.

It also lets you easily encrypt or sign any selected text area on a web page.

It’s a very intuitive and effective plugin. I just wish I had stumbled upon this sooner.

Selling out

A supposedly authentic Enigma is up for grabs on eBay. The current bid stands at $13.5k, which is strange ‘cos I thought something of this rarity would go for much much more. Millions even. Its the three rotor variety and ships all the way from Germany.

Gah, my day is ruined.

Update: … going twice … SOLD! to the creepy looking fed in the corner for $67,480.29.

Virtual Private Networking

I’m sure most of you would have had to mess around with VPNs at some point of your lives. Sometimes, VPNs can turn nasty and bind you to an OS that hinders your free spirit. But thanks to IPSec, that doesn’t always have to be the case. For instance, assuming your place of work has set up Cisco based VPN concentrators, connecting to it using Linux is quite simple with the help of vpnc. Cisco, being somewhat of an opensource friendly hardware manufacturer, has released their vpn client software for Linux as a free download so long as you use it with their products. vpnc on the other hand, is an opensource alternative, very easy to configure, and a delight to work with.

Once again, its just a matter of

apt-get install vpnc resolvconf

Then you need to add the following lines to /etc/vpn/vpnc.conf:

IPSec gateway XX.XX.XX.XX
IPSec ID MegaCorpNetwork
IPSec secret ThisIsAPlaintextPassword
Xauth username myuserid

The gateway is the IP Address of the VPN concentrator. If your trusty MIS department has already setup the Cisco VPN client on Windows (such as in a dual booting scenario in my case) you can extract this information from the profile file that gets created. It should reside somewhere in the neighbourhood of “Program Files”, under the Cisco VPN Client installation folder, and within the Profiles subdirectory. There’s one small caveat, the group password, that corresponds to the “IPSec secret” field in vpnc.conf, is usually “encrypted” on Windows. But have no fear, for it can be undone. This is a well known flaw, and the group password encryption is practically redundant. I recommend that you download the C program and run it locally instead of using the form on the web page to decrypt it.

Once you have the plaintext, plug it into your vpn.conf.

Also, vpnc requires TUN/TAP device driver support in the kernel, but the good news is that it comes standard with most distributions. At least the ones I’ve tried out so far. If not:

modprobe tun

Failing which, you’d need to do a bit o’ kernel compilin’.

That out of the way, you’ll also need resolvconf to setup your /etc/resolv.conf so that you’ll be able resolve hostnames properly on the various networks you’re connected to. Later on, if you find out that your hostnames aren’t resolving, /etc/resolvconf/interface-order is probably a good place to start troubleshooting.

Depending on the version of vpnc you’re are using, you can connect to the vpn by either using vpnc, or vpnc-connect. Although I noticed that the latter has been deprecated in most recent versions, but the following should work no matter which version you use:

vpnc /etc/vpnc/vpnc.conf

If you have a static xauth password (which btw, is a very very bad idea) you could either hardcode it in vpnc.conf (again, bad idea) or have it prompted by not specifying in the config file as shown in the sample above. For added security, xauth authentication shouldn’t be relied upon solely, and should be complemented with some form of two-factor authentication for maximum security.

If all goes according to plan, you’ll be prompted with a legal disclaimer from the network you’re connecting to and all the routes will be automatically set up.

To log off from the vpn, simply issue vpnc-disconnect, and you’ll be back to where you started from.