Building Android from source

Building Android from source seems like a daunting task at first, but it’s really not that hard. I’ll walk through the steps as simply as possible to get your first AOSP build running on your handset. This is a rite of passage for many, and guaranteed to give insights on the inner workings of Android. First, let’s do a few push-ups and jumping jacks to mentally and physically prepare for the journey, and at the end of it, you’d wonder why you even had to do that.

What you need:

  • A 64-bit environment
  • ~150 GB of disk space
  • A working Linux installation (I use Debian, but take your pick). I suggest going with bare metal over virtual machines in the interest of sanity and general well being
  • A decent broadband connection (to download a significant portion of the internet)
  • Some time and patience
  • A healthy belief in the supernatural, and their inevitable involvement in the building of large complex codebases

Everything you need to know about the process is right here. Feel free to skimp through these docs first to get an overview of the entire process. What’s below is mostly a summary + a few things not explicitly mentioned that would stump the newbies.

Step 1

Prep your OS and install any dependencies. At the minimum you’re going to need java, python, C/C++, make, and git so make sure they’re installed using your favorite package manager. If you’re on Debian/Ubuntu, you’d run something similar to below:

sudo apt-get install openjdk-8-jdk git-core gnupg flex bison gperf build-essential zip curl zlib1g-dev gcc-multilib g++-multilib libc6-dev-i386 lib32ncurses5-dev x11proto-core-dev libx11-dev lib32z-dev ccache libgl1-mesa-dev libxml2-utils xsltproc unzip

If you find that you need something else down the line that’s not mentioned above, be a man/woman, and just apt-get it. Also, your package names may vary depending on your choice of distribution.

Run the commands manually to do a quick sanity test that things are installed properly.

Step 2

Linux may require some configuration to udev to allow non-root users to work with USB devices, which is going to be required later on, so execute this command:

wget -S -O - http://source.android.com/source/51-android.rules | sed "s/<username>/$USER/" | sudo tee >/dev/null /etc/udev/rules.d/51-android.rules; sudo udevadm control --reload-rules

Step 3 (Optional)

Setup ccache to get extract some performance out of the build, by putting the following into your .bashrc:

export USE_CCACHE=1
export CCACHE_DIR=/path/to/empty/ccache/directory/where/cache/is/kept

Set the cache size to 50G using the following:

prebuilts/misc/linux-x86/ccache/ccache -M 50G

Step 4

The Android Open Source Project is a behemoth of a code base with multiple third party open source components and frameworks included such that it required its own layer on top of git just to manage the dependencies. That’s what repo is. On the positive side, it’s quite easy to get it working.

Download the repo tool to your ~/bin directory as follows:

$ curl https://storage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
$ chmod a+x ~/bin/repo

Make sure that ~/bin is in your $PATH. Type repo a few times and watch it barf in your face while saying:

error: repo is not installed.  Use "repo init" to install it here.

Do not fret, this is normal.

You see, you have to initialize repo and give it a manifest which will tell repo where to download the source from and help it initialize itself. You do this by navigating to an empty directory and executing:

repo init -u https://android.googlesource.com/platform/manifest

If you do not specify a branch name with -b parameter, it will fetch the master branch, which is what we’re going to do here.

You should now see an empty directory, apart from a .repo with the metadata that it downloaded from the manifest file. Now I’m assuming that you’re probably at home and do not have complicated proxy arrangements to get to the internet so I’m going to avoid talking about the HTTP_PROXY environment variable that you have to set if you do, but I guess it’s all redundant now that I’ve already said it, so I’m just going to move on:

repo sync

This will download the internet. This will run for an inordinate amount of time. A few things you can do while this executes:

  • knit
  • write a book
  • have kids
  • watch the poles melt

A rough approximation, and the internet is very divided on this topic, is that it will require anywhere from 12 to 15 GB which will automatically expand to around 34GB on disk after it’s downloaded. In my experience, I’ve only been awake till about the 13GB mark, so didn’t quite get to see the transformation to its final glory. Add 50GB for the ccache and with some room to spare and you see why you need a lot of disk space to go through this.

Now you wait.

Step 5

In my case, I needed to run the build on my Nexus 5. The AOSP source tree doesn’t have everything you need to build images specifically for Nexus 5, so I had to go to this link, scroll down to Nexus 5 and download the Broadcom, LG and Qualcomm binary blobs for the hardware on the Nexus 5. Put them in the root of the Android source tree, and execute it, where it self extracts. This is a needed step if you want to run the image on the device later on.

Step 6

Now comes the compilation step. This is actually the easiest part.

Initialize the build environment by sourcing a script, you can use the bash source command or the good old dot command – whatever strikes your fancy:

. build/envsetup.sh

This will inject a bunch of build related environment variables to your current shell. Note that you have to run this in every shell that you want to run a build from.

Next…

lunch

This will give a list of targets and allow you to select one. In my case, I selected “aosp_hammerhead-userdebug”. Hammerhead being the code name for the Nexus 5.

One more step, and that’s to start the build.

time make -j4

You could easily just say “make”, but I would like to know how long the build took when it eventually finished running, and with the -j4 flag indicate the concurrency level for make (rule of thumb: 2 x number of cores). Now you can go for lunch.

Things to do while this runs:

  • Read Game of Thrones (all the books)
  • Have a fabulous mid-life crisis
  • Watch Lawrence of Arabia

To be fair, it’s not that bad, just a few hours depending on your setup.

Step 7

Once the building is done, you will have a bunch of files under out/target/product/{device} which you can now start flashing.

Connect your Android phone to the computer and assuming that all the drivers and Android SDK is setup (a dependency that I somehow failed to mention before), you should be able to run the following command:

adb reboot bootloader

This can also be achieved by shutting down the device and starting it while pressing a combination of buttons (such as volume down + power on the Nexus 5). This would put you into fastboot mode. On the fastboot screen, pay attention to whether the bootloader is locked or not, if it is, execute the following to unlock it:

fastboot oem unlock

This would wipe the data on the device. To further clean things up, execute the following:

fastboot format cache
fastboot format userdata

Step 8

Navigate once again to the out/target/product/{device} directory and execute the following to flash the built images to the device:

fastboot flash boot boot.img
fastboot flash system system.img
fastboot flash userdata userdata.img
fastboot flash recovery recovery.img

Reboot the device and you’re all set.

This is just the tip of the iceberg, and hopefully you’ll be able to now play with the internals of Android to understand how things are really stitched together. Good luck on your journey.

Advertisements

Getting setup with the Intel Edison

Edison in handIntel Edison is an adorable little system on a chip packed with all the wireless and processing capabilities to build wearables and smart things. It features a dual-core Intel Atom processor @ 500 MHz, integrated Bluetooth 4 and wifi, USB controllers, 1GB of RAM and 4GB of eMMC flash memory, all in a tiny package. It also features a 32-bit onboard Intel Quark @ 100MHz that can be used as a micro-controller. Unlike the Raspberry Pi, it has no video output capabilities however. It runs Linux, specifically, Yocto Linux that is especially targetted for embedded systems.

If you’re interested in getting to know the Edison, I suggest purchasing a kit such as the Xadow wearable kit for the Intel Edison, which comes with a bunch of sensors and modules that you can use to build some useful applications. It comes with:

  • mini expansion board (a much smaller alternative to the Arduino expansion board),
  • barometer
  • 0.96″ OLED
  • vibration motor
  • NFC sensor with 3 programmable tags
  • a touch sensor
  • 3-axis accelerometer
  • buzzer
  • SD card module
  • breakout board
  • Li-Po battery
  • LED strip
  • FFC and power cables

All in all, a fairly comprehensive set of modules for a hobbyist connected device or wearables project. The kit does NOT include the Edison itself, which needs to be purchased separately. The folks at SparkFun has created a stackable set of “blocks” that can be used to build small form factor devices in a quite ingenious manner.

Fun fact: the Edison is powered by 3.3 to 4.5v and supports 40 GPIO pins that use 1.8v logic.

Step 1: To get started, take out the Xadow expansion board from the kit. It features a 70-pin hirose connector on the back where the Edison can be attached.

Expansion board and edison

Step 2: Place the Edison on top and press until you hear a click. You should then have a fairly firmly attached Edison to the Xadow expansion board.

Edison attached to expansion board

Step 3: Take the Xadow programmer module and a FFC (Flat Flexible Connector) cable from kit, flip open the connector locks on both the programmer module and expansion board. Place the FFC connector as shown below and close the connector lock to keep it in place. It should now look a little like what you see below. Flip the switch that’s highlighted in the red circle to the right, towards the “Device” label indicated by the arrow.

Programmer module and expansion board

Step 4: Connect two micro-USB cables to the connectors on the programmer module and the other end to the computer. This should power up the Edison.

Step 5: Head over to Intel to download and install the IoT Developer Kit for your operating system. As part of the installation process, it will flash your Edison with Yocto. I’ll be covering the flashing process and Yocto in a little bit more detail in a later post, but for now, let Intel do the magic for you.

Step 6: Your Edison should be mostly setup now. There’s one last thing you may want to do, which is configure it to connect to your home wifi. You should see the boards all lit up by now:

Edison all set

At this point, the only way to connect it to is via a serial connection made possible through the USB port by FTDI drivers installed with the IoT developer kit. In fact, on the mac, you should see a device such as /dev/cu.usbserial-* which will be used to initiate this serial connection.

To get a shell on the Edison, just run:

screen /dev/cu.usbserial-DA00ZEOX 115200 -L

Which will initiate a serial connection to the Edison at a baud rate of 115200. Press RETURN a couple times and you should see something like this:

Poky (Yocto Project Reference Distro) 1.6 edison ttyMFD2

edison login: 

Enter ‘root’ for the login and you’ll be dropped into a root shell on the Edison. By default it does not have a password. You may also notice that the first character that you type is lost in some occasions. This is due to the Edison being on low power mode at the time that causes the first character to be lost, before it spins up the device.

One thing to note is that exiting a ‘screen’ session is not as straightforward as a telnet or ssh session. You will need to type CTRL-a followed by CTRL-\ to get a prompt to exit the session.

Finally, to configure wifi on the Edison, run configure_edision –wifi command:

Configure Edison: WiFi Connection

Scanning: 1 seconds left

0 :     Rescan for networks
1 :     Manually input a hidden SSID
2 :     ZTE
3 :     ninsei


Enter 0 to rescan for networks.
Enter 1 to input a hidden network SSID.
Enter a number between 2 to 3 to choose one of the listed network SSIDs: 3
Is ninsei correct? [Y or N]: y
What is the network password?: ********
Initiating connection to ninsei...
Done. Network access should be available shortly, please check 'wpa_cli status'.
Connected. Please go to 192.168.1.3 in your browser to check if this is correct.
root@edison:~# ping google.com
PING google.com (222.165.163.20): 56 data bytes
64 bytes from 222.165.163.20: seq=0 ttl=58 time=32.917 ms
^C
--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 32.917/32.917/32.917 ms
root@edison:~#

This allows you to ssh into your Edison over wifi using:

ssh root@192.168.1.3

And you’re all set. Yocto comes preloaded with Node and gcc, so now you have in your hands a network enabled system on a chip for building that next great smart device.

On Perl and Poetry

I first learnt of Perl in the late 90’s. Sometime around ’98 or ’99. Fresh on the heels of BASIC, I was yearning to try out something new when I heard of Perl. I heard it’s what the Internet ran on and it had an almost mythical air to it that made me want to learn it. If you wanted to build dynamic web sites at that time, you had few options, and Perl, Apache and UNIX was the workhorse. I wanted to build dynamic web sites so what I had to do was pretty clear. There was a new fangled thing called Java, but no way was it ever going to catch up to the dominance that Perl had over the Internet. Or so people thought.

Perl was the undisputed king of Internet 1.0. The language, with it’s knack for text processing coupled with it’s highly expressive syntax was ideal for building dynamic web sites. I saw how entwined Perl was in the UNIX sub-culture and how naturally it fit in, and together with Apache/mod_perl how it was poised to reign over the Internet for years to come. I then drifted into the world of enterprise Java and progressed from the monstrosity that was J2EE to the present day JEE, which has since redeemed itself and paid for it’s early sins, and when I came back several years have passed and Perl has been relegated to the position that new kids considered old and dead. However, nothing could be further from the truth.

In the often misunderstood syntax of Perl by those new to the language, who claim it to be cryptic or arcane, there’s an elegance and a beauty that is not always present in other languages and I find that I enjoy hacking on a Perl script more than chipping away at the Java mega-structures. It’s expressiveness and how you can mold the code to fit your pattern of thought by the many variations and permutations the language syntax offers plays a large part in this sense of aesthetic. There’s something about the language that’s reminiscent of a Bach fugue and poetry. I certainly do not feel the same way about Python, although Ruby comes a little close.

I don’t think I will ever stop coding Perl, and Perl 6 has a number of interesting language elements that I hope someday I will get to see, possible running on a GNU/HURD. Now wouldn’t that be a sight to behold?

Google App Engine + APNS

Earlier this month, Google App Engine released support for outbound sockets and I figured that a Saturday spent mucking around with AppEngine to see if I could get it to work with APNS would be time well spent. In the sandboxed world of GAE, lack out outbound socket support meant that it was not possible to communicate with external services by opening a socket, which is what the Apple Push Notification Service (APNS) required. So for a long time, it was not possible to use the AppEngine to build an APNS provider, but now you can. Services like Urban Airship expose this capability in a way that can be consumed through a RESTful service, which works with GAE using UrlFetch, but the focus of this post is to communicate with APNS directly. There are some caveats though. Billing needs to be enabled, although the free tier should be sufficient for playing around, and there’s also the matter of the daily quota.

Here’s a whirlwind tour of getting yourself up and running on APNS with Google AppEngine.

1 – Fun with certificates and keys

Apple makes the job of working with APNS quite a fun and intellectually stimulating experience, if you have nothing else to do on a Saturday. You may also notice a couple of new gray hairs once you’re done, but at the same time, there is an elegance to the architecture that must be acknowledged, even though its painful to setup.

Generate a new certificate signing request
Fire up the mac Keychain Access tool and request a certificate from a certificate authority.
Request a certificate from a CA
In the resulting dialog, enter your email address an identifiable string in the common name field. Also, select the “Saved to disk” option, since we need to upload it later to the provisioning portal.
Certificate assistant
Once you’re done with this, you should have a Certificate Signing Request (CSR) in your file system.

Create a new App Id
Now head over to the Apple developer site, log in with your developer credentials and navigate to the iOS Dev Center, where you should see a link to “Certificates, Identifiers and Profiles” as shown below.
iOS Developer Program
First, create a new App Id, by navigating to that section:
New App Id
In, the add screen, enter any description and select the “Push Notifications” check box:
Push notifications
Also, in the bundle ID section, remember to include an explicit fully qualified bundle Id in the reverse domain notation, as wild-cards are not supported for push notifications:
Bundle Id

Create a new push certificate
Now, navigate to the certificates section, and create a new one. During creation, select the combo box as indicated below:
Development certificate
Next, select the app Id created earlier and when prompted, upload the Certificate Signing Request created earlier. If all goes well, the certificate will be generated. Download this certificate, and double click it to open it in the KeyChain tool. You would see the private key with the common name that you entered earlier when you expand the certificate. Remember to note that the certificate name is prefixed with “Apple Development IOS Push Services”. Select both the certificate and the key, right click and “Export 2 items”. It will prompt you to enter the KeyChain password and will generate a .p12 file that you will need later to configure the server side provider.

Generate a provisioning profile
The last step in this process is to generate a provisioning profile so that you can deploy the app on to the device. In the devices section of the portal, create a new device and enter the 40-character device Id you get from iTunes or the Xcode Organizer. Head over to the Provisioning Profiles section and create a new profile. Remember to select “iOS App Development as shown below:
Provisioning profile
In the next screens, select the App Id, device and certificate created in the previous steps to create the provisioning profile. Download the profile and drag it onto the profiles section of the Xcode organizer.

Now the painful part is done. Time to do some real work.

2 – Create the web service

A pre-requisite for this tutorial is Google App Engine, and getting a service up and running on it. If you haven’t done that before, follow the steps outlined in the getting started page and it should give you a good idea on how to work on this platform. It comes with good Eclipse integration so it should be a snap to get setup.

The framework I’ve used for APNS is java-apns which provides a simple API to APNS. Here’s all of the code I used to build out the simple service, this could be done in a simple servlet or a RESTful service on a JAX-RS implementation like Jersey for example:


InputStream inputStream = context
.getResourceAsStream("/WEB-INF/ApnsDevelopment.p12");

ApnsService service = APNS.newService()
.withCert(inputStream, "password").withSandboxDestination()
.withNoErrorDetection().build();

String payload = APNS.newPayload().alertBody(message).badge(1).build();

ApnsNotification notification = service.push(token, payload);

A couple of things to note, the .p12 file exported from the Keychain needs to be included in the war file (preferably in the WEB-INF directory to prevent public access) and password protected at export time. Also, it’s important to add the “withNoErrorDetection()” method as shown above as it would otherwise try to spawn threads to detect errors and would not run in the GAE environment since thread creation is restricted. The input into this web service is a 40-character token that is received from the device, and the message that is to be sent.

At this point, the server side work is done. Let’s move over to the client.

3 – Create the iOS client

For the purpose of demonstration and testing, I’ve created a simple single view application with the bundle ID specified in the provisioning profile.

The key methods you would need to implement in the AppDelegate would be:

-application:didFinishLaunchingWithOptions:
-application:didRegisterForRemoteNotificationsWithDeviceToken
-application:didFailToRegisterForRemoteNotificationsWithError
-application:didReceiveRemoteNotification

1) -application:didFinishLaunchingWithOptions:
This method gets invoked when the application finishes launching either directly or when launched through a push notification. In the case of the latter, the details of the push notification are passed in through a dictionary object so that it can be dealt with. Here’s the code to register for push notification alerts:

[[ UIApplication sharedApplication] registerForRemoteNotificationTypes:UIRemoteNotificationTypeAlert | UIRemoteNotificationTypeBadge | UIRemoteNotificationTypeSound];

2) -application:didRegisterForRemoteNotificationsWithDeviceToken
This method gets invoked with the device token received from APNS. This token uniquely identifies the device and is not the same as the UDID. The token needs to be sent to the web service so that it can pass it on to the APNS and have messages sent back to this device. This token includes some special characters and spaces which needs to be removed as shown below:

NSString *token = [ deviceToken description ];
token = [ token stringByTrimmingCharactersInSet:[ NSCharacterSet characterSetWithCharactersInString:@"<>"]];
token = [ token stringByReplacingOccurrencesOfString:@" " withString:@"" ];

3) -application:didFailToRegisterForRemoteNotificationsWithError
This method gets invoked if there’s some error in registering for remote notifications which causes the push token to be not available for the app.

4) -application:didReceiveRemoteNotification
This method can be used to trap an incoming message while in the app, and take some action. In this case it just shows it in an alert view.

UIAlertView *alertView = [[ UIAlertView alloc ] initWithTitle:@"Push Alert" message:userInfo[@"aps"][@"alert"] delegate:self cancelButtonTitle:@"OK" otherButtonTitles:nil];
[ alertView show ];

To test this capability, I’ve built a test app that takes input text from a text field and sends it to the web service created in GAE. The resulting push notification is trapped and displayed in an alert view as shown in the sample code above.

Voila

Finally, a couple of things to keep in mind when developing apps that use push notifications:

  • It’s inherently unreliable, do not use it for transferring any critical information
  • While the transport is secured through TLS, it’s still advisable not to use APNS for company confidential information
  • Do not store your certificates in an accessible location on the web server. Password protect it for additional security
  • Store the device tokens safely on the server side, or users will be very upset if its compromised
  • It’s a good practice not to update information in the push notification handler code, since it may trigger updates without the user’s knowledge

That’s all for now. Enjoy!

Git guts

Today I will dive into the guts of git to showcase the simplicity and elegance in which git manages the content internally in it’s own content addressable file system. Armed with this knowledge, you will be able to get a deeper understanding of the underlying data structure to help you figure out and troubleshoot issues that may inevitably come up as you use git.

To start, I shall create a new directory and initialize git.

$ mkdir git-guts
$ cd git-guts
$ ls -a
. ..
$ git init
Initialized empty Git repository in /Users/anuradha/dev/workbench/git-guts/.git/
$ ls -a
. .. .git

At this point, there are no files under version control yet. Here are the files that have been created during initialization:

.git
.git/branches
.git/config
.git/description
.git/HEAD
.git/hooks
.git/hooks/applypatch-msg.sample
.git/hooks/commit-msg.sample
.git/hooks/post-commit.sample
.git/hooks/post-receive.sample
.git/hooks/post-update.sample
.git/hooks/pre-applypatch.sample
.git/hooks/pre-commit.sample
.git/hooks/pre-rebase.sample
.git/hooks/prepare-commit-msg.sample
.git/hooks/update.sample
.git/info
.git/info/exclude
.git/objects
.git/objects/info
.git/objects/pack
.git/refs
.git/refs/heads
.git/refs/tags

Of these, the hooks are boilerplate and none are yet active. To make them active, they need to be renamed to remove the .sample suffix.

In this post, I shall focus on the .git/objects directory, as that is where all the content is stored as hashed “objects”. To show what happens, let’s add a file to source control and observe the changes:

$ echo "bar" > foo
$ git add foo
$ git commit -m "initial commit"
[master (root-commit) 64f3e97] initial commit
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 foo
$ find .git/objects/ -type f
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6
.git/objects/64/f3e9762509b0ce9cbb252f69847957e5368632
.git/objects/6a/09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae

Adding a single file to the repository caused the creation of three objects. Each object is uniquely identified by a 40-character SHA-1 hash of its content, which brings us to one of the key aspects of git, which is that it’s nearly impossible to alter the contents of any single file without causing a change to the cryptographic hash, and unlike version control systems that pre-date this approach of cryptographically ascertaining the integrity of the content, it’s quite hard to tamper with the file or maliciously change history. This coupled with the ability to sign tags using a private key adds an additional level of authenticity and non-repudiation to the release process.

Let’s analyze the three types of objects. To see the type of object, the git cat-file -t HASH command can be used. It shows that the three types of objects are:

  • blob
  • commit
  • tree

To see the contents of each file, the git cat-file -p HASH command can be used as shown below:

$ git cat-file -p 5716ca5987cbf97d6bb54920bea6adde242d87e6
bar

This is the first of the three objects, which is the “blob”. It is the actual contents of the file. Note that the file is addressable using the hash, making this structure a content-addressable filesystem. But you may wonder, how does git know what the file name is? This object is only named by the hash. I will get to that shortly.

Let’s look at the next object.

$ git cat-file -p 64f3e9762509b0ce9cbb252f69847957e5368632
tree 6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
author Anuradha Weeraman 1358159197 +0530
committer Anuradha Weeraman 1358159197 +0530

initial commit

This is the “commit” object, which is also stored as an object in the file system. Note that there are two fields for the author and the committer, since the two can be different individuals in the case of a large distributed development project. This way original contributions are acknowledged and not lost during the merging and contribution incorporation process. This file also has a hash reference to the commit “tree”. Let’s look at the tree object next.

$ git cat-file -p 6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
100644 blob 5716ca5987cbf97d6bb54920bea6adde242d87e6 foo

This is the last of the three objects, which is the “tree” object. It contains a descriptor of all the files that are part of the commit. It does that by taking the information from the staging area / index and creating an object at the time of the commit. It shows the permissions of the file in a somewhat different format to the standard UNIX file permissions; the last three digits tells you what the permissions of the file was at the time it was committed. The line also indicates the hash of the blob followed by the name of the file. This is how git knows what the blob should be called in the file system when the code is checked out.

Let’s also take a look at what the HEAD of the tree is pointing to:

$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/refs/heads/master
64f3e9762509b0ce9cbb252f69847957e5368632

It now has a reference to the last “commit” object. So when you clone or pull down master, git knows what the last commit was introduced into the repository.

All I’ve described so far was a single commit. How does git keep track of the history and the commit graph based on this structure, you might wonder. Let’s make a change to the foo file and commit it.

$ echo foo > foo
$ git add foo
$ git commit -m "Second commit"
[master 2c8200f] Second commit
1 files changed, 1 insertions(+), 1 deletions(-)
$ find .git/objects -type f
.git/objects/20/5f6b799e7d5c2524468ca006a0131aa57ecce7
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
.git/objects/2c/8200f75860bede9aaa0c156c133d15fa418bd5

.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6
.git/objects/64/f3e9762509b0ce9cbb252f69847957e5368632
.git/objects/6a/09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae

There are three new objects in the system now, a new blob, a tree, and a commit. The blob and tree objects are similar to the ones discussed earlier, but there’s a change to the commit object:

$ git cat-file -p 2c8200f75860bede9aaa0c156c133d15fa418bd5
tree 205f6b799e7d5c2524468ca006a0131aa57ecce7
parent 64f3e9762509b0ce9cbb252f69847957e5368632
author Anuradha Weeraman 1358161997 +0530
committer Anuradha Weeraman 1358161997 +0530

Second commit

It references the parent commit. This way the entire commit graph can be traversed and mapped using these commit objects. The .git/refs/heads/master file is updated to refer to the latest commit. git reflog is a very useful tool which shows the updates to the HEADs over time and can be used to diagnose issues which you might otherwise consider unrecoverable. Git is very protective of data so it’s actually quite hard to lose data, unless you manually trash the object repository. In most occasions, it may turn out to be a dangling unreferenced commit which you can track down using git reflog and recover it. Here’s a post that explains this process for those who are interested.

Now, to make things a little more interesting and to create some awareness of what the git utilities are doing behind the scenes to make our lives easy, let’s create these objects manually using a few low level commands with the help of this new knowledge that we just acquired. For the purpose of this exercise, I will create a brand new repository and initialize git.

Let’s create the blob object for the file “foo” with the content “bar” as in the original example:

$ echo bar | git hash-object -w --stdin
5716ca5987cbf97d6bb54920bea6adde242d87e6

The -w switch tells git to write the object to the repository, and --stdin instructs it to read the contents from standard input. It then outputs the hash of the object that it just created.

Let’s look at the repository to see if it really was created:

$ find .git/objects -type f
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6

So far git has been telling us the truth.

Now, let’s create a tree object. Since git relies on the index, or the staging area in order to determine the contents of the tree, we will use the git update-index command to set things up in the staging area. Note that the current directory is still empty, there is no “foo” file in the current directory. It’s only available as a hashed object inside .git, and still .git doesn’t know it’s called “foo”. To update the staging area to write the tree object:

$ git update-index --add --cacheinfo 100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 foo

This is equivalent to performing git add foo. Now git knows the file name of the object, but the tree object is not yet written to the object repository. To do that:

$ git write-tree
6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae

This writes the tree object, and returns its hash. Let’s look at the file system again:

$ find .git/objects -type f
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6
.git/objects/6a/09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
$ git cat-file -p 6a09c59ce8eb1b5b4f89450103e67ff9b3a3b1ae
100644 blob 5716ca5987cbf97d6bb54920bea6adde242d87e6 foo

Still, the repository does not contain a “foo” file. Right now these objects are dangling, as there’s no commit object referencing them. It’s not possible to checkout a copy of the foo file yet. Let’s create the commit object now:

$ echo "initial commit" | git commit-tree 6a09c5
c3352776341945bcdddd400d3765635bb2be5671

The short hash of the tree object and optionally and preceding commits are passed in as arguments to the git commit-tree command which returns the hash of the commit object. At this point the repository still has no idea what the last commit was, so performing the git log command would result in an error:

$ git log
fatal: bad default revision 'HEAD'

To fix this:

$ echo c3352776341945bcdddd400d3765635bb2be5671 > .git/refs/heads/master

Let’s look at the log again:

$ git log
commit c3352776341945bcdddd400d3765635bb2be5671
Author: Anuradha Weeraman
Date: Mon Jan 14 18:06:51 2013 +0530

initial commit

There you have it. Git now recognizes your last commit.

If you now list the directory where you initialized the git repository, you would not notice any files, since all these objects were created directly in the git object repository. Now that we have created the commit object and the log shows the last commit, we’re able to load the file into the directory to create a working copy. The way we do that is by resetting the contents of the repository to the HEAD which points at the latest commit.

To illustrate this more clearly:

$ ls -a
. .. .git (empty directory)
$ git reset --hard
HEAD is now at c335277 initial commit
$ ls -a
. .. .git foo
$ cat foo
bar

and Voila.

Hope this helps, and you now have a better understanding of the git guts.