Thursday, October 17, 2013

Why I will go to CCC13 in Amsterdam ?

Aside from the fact that I work full-time on Apache CloudStack, that I am on the organizing committee and that my boss would kill me if I did not go to the CloudStack Collaboration conference, there are many great reasons why I want to go as an open source enthusiast, here is why:

It's Amsterdam and we are going to have a blast (the city of Amsterdam is even sponsoring the event). The venue -Beurs Van Berlage- is terrific, this is the same venue where the Hadoop summit is held and where the AWS Benelux Summit was couple weeks ago. We are going to have a 24/7 Developer room (thanks to CloudSoft) where we can meet to hack on CloudStack and its ecosystem, three parallel tracks in other rooms and great evening events. The event is made possible by the amazing local support from the team at Schuberg Philis, a company that has devops in its vein and organized DevOps days Amsterdam. I am not being very subtle in acknowledging our sponsors here, but hey, without them this would not be possible.

On the first day (November 20th) is the Hackathon sponsored by exoscale. In parallel to the hackathon, new users of CloudStack will be able to attend a full day bootcamp run by the super competent guys from Shapeblue, they also play guitar and drink beers so make sure to hang out with them :). Even as cool is that the CloudStack community recognizes that building a Cloud takes many components, so we will have a jenkins workshop and an elasticsearch workshop. I am big fan of elasticsearch, not only for keeping your infrastructure logs but also for other types of data. I actually store all CloudStack emails in an elasticsearch cluster. Jenkins of course is at the heart of everyone's continuous integration systems these days. Seeing those two workshops, it will be no surprise to see a DevOps track the next two days.

Kicking off the second day -first day of talks- we will have a keynote by Patrick Debois the jedi master of DevOps. We will then break up into a user track, a developer track, a commercial track and for this day only a devops track with a 'culture' flavor. The hard work will begin: choosing which talk to attend. I am not going to go through every talk, we received a lot of great submissions and choosing was hard. New CloudStack users or people looking into using CloudStack will gain a lot from the case studies being presented in the user track while the developers will get a deep dive into the advanced networking features of CloudStack including SDN support -right off the bat-. In the afternoon, the case studies will continue in the user track including a talk from NTT about how they built an AWS compatible cloud. I will have to head to the developer track for a session on 'interfaces' with a talk on jclouds, a new GCE interface that I worked on and my own talk on Apache libcloud for which I worked a lot on the CloudStack driver. The DevOps track will have an entertaining talk by Michael Ducy from Opscode, some real world experiences by John Turner and Noel King from Paddy Power and the VP of engineering for Citrix CloudPlatform will lead an interactive session on how to best work with the open source community of Apache CloudStack.

After recovering from the nights events, we will head into the second day with another entertaining keynote by John Willis. Here the choice will be hard between the storage session in the commercial track and the 'Future of CloudStack' session in the developer track. With talks from NetApp and SolidFire who have each developed a plugin in CloudStack plus our own Wido Den Hollander (PMC member) who wrote the Ceph integration the storage session will rock, but the 'Future of CloudStack' session will be key for developers, talking about frameworks, integration testing, system VMs...After lunch the user track will feature several intro to networking talks. Networking is the most difficult concept to grasp in clouds (IMHO). The storage session will continue with a talk by Basho on RiakCS (also integrated in CloudStack) and a panel. The dev track will be dedicated to discussions on PaaS, not to be missed if you ask me, as PaaS is the next step in Clouds. To wrap things up, I will have to decide between a session on metering/billing, a discussion on hypervisor choice and support, and a presentation on the CloudStack community in Japan after Ruv Cohen talking about trading cloud commodities.

The agenda is loaded and ready to fire, it will be tough to decide which sessions to attend but you will come out refreshed, energized with lots of new ideas to evolve your IT infrastructure, so one word: Register

And of course many thanks to our sponsors: Citrix, Schuberg Philis, Juniper, Sungard, Shapeblue, NetApp, cloudSoft, Nexenta, iKoula, leaseweb, solidfire, greenqloud, atom86, apalia, elasticsearch, 2source4, iamsterdam, cloudbees and 42on

Tuesday, October 01, 2013

A look at RIAK-CS from BASHO

Playing with Basho Riak CS Object Store

CloudStack deals with the compute side of a IaaS, the storage side which for most of us these days consists of a scalable, fault tolerant object store is left to other software. Ceph led by inktank and RiakCS from Basho are the two most talked about object store these days. In this post we look at RiakCS and take it for a quick whirl. CloudStack integrates with RiakCS for secondary storage and together they can offer an EC2 and a true S3 interface, backed by a scalable object store. So here it is.

While RiakCS (Cloud Storage) can be seen as an S3 backend implementation, it is based on Riak. Riak is a highly available distributed nosql database. The use of a consistent hashing algorithm allows riak to re-balance the data when node disappear (e.g fail) and when node appear (e.g increased capacity), it also allows to manage replication of data with an eventual consistency principle typical of large scale distributed storage system which favor availability over consistency.

To get a functioning RiakCS storage we need Riak, RiakCS and Stanchion. Stanchion is an interface that serializes http requests made to RiakCS.

A taste of Riak

To get started, let's play with Riak and build a cluster on our local machine. Basho has some great documentation, the toughest thing will be to install Erlang (and by tough I mean a 2 minutes deal), but again the docs are very helpful and give step by step instructions for almost all OS.

There is no need for me to re-create step by step instructions since the docs are so great, but the gist is that with the quickstart guide we can create a Riak cluster on `localhost`. We are going to start five Riak node (e.g we could start more) and join them into a cluster. This is as simple as:

    bin/riak start
    bin/riak-admin cluster join dev1@127.0.0.1

Where `dev1` was the first riak node started. Creating this cluster will re-balance the ring:

    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid     100.0%     20.3%    'dev1@127.0.0.1'
    valid       0.0%     20.3%    'dev2@127.0.0.1'
    valid       0.0%     20.3%    'dev3@127.0.0.1'
    valid       0.0%     20.3%    'dev4@127.0.0.1'
    valid       0.0%     18.8%    'dev5@127.0.0.1'

The `riak-admin` command is a nice cli to manage the cluster. We can check the membership of the cluster we just created, after some time the ring will have re-balanced to the expected state.

    dev1/bin/riak-admin member-status
    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid      62.5%     20.3%    'dev1@127.0.0.1'
    valid       9.4%     20.3%    'dev2@127.0.0.1'
    valid       9.4%     20.3%    'dev3@127.0.0.1'
    valid       9.4%     20.3%    'dev4@127.0.0.1'
    valid       9.4%     18.8%    'dev5@127.0.0.1'
    -------------------------------------------------------------------------------
    Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
   
    dev1/bin/riak-admin member-status
    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid      20.3%      --      'dev1@127.0.0.1'
    valid      20.3%      --      'dev2@127.0.0.1'
    valid      20.3%      --      'dev3@127.0.0.1'
    valid      20.3%      --      'dev4@127.0.0.1'
    valid      18.8%      --      'dev5@127.0.0.1'
    -------------------------------------------------------------------------------

You can then test your cluster by putting an image as explained in the docs and retrieving it in a browser (e.g an HTTP GET)

    curl -XPUT http://127.0.0.1:10018/riak/images/1.jpg 
         -H "Content-type: image/jpeg" 
         --data-binary @image_name_.jpg

Open the browser to `http://127.0.0.1:10018/riak/images/1.jpg` As easy as 1..2..3

Installing everything on Ubuntu 12.04

To move forward and build a complete S3 compatible object store, let's setup everything on a Ubuntu 12.04 machine. Back to installing `riak`, get the repo keys and setup a `basho.list` repository:

    curl http://apt.basho.com/gpg/basho.apt.key | sudo apt-key add -
    bash -c "echo deb http://apt.basho.com $(lsb_release -sc) main > /etc/apt/sources.list.d/basho.list"
    apt-get update

And grab `riak`, `riak-cs` and `stanchion`. I am not sure why but their great docs make you download the .debs separately and use `dpkg`.

    apt-get install riak riak-cs stanchion
 
Check that the binaries are in your path with `which riak`, `which riak-cs` and `which stanchion` , you should find everything in `/usr/sbin`. All configuration will be in `/etc/riak`, `/etc/riak-cs` and `/etc/stanchion` inspect especially the `app.config` which we are going to modify before starting everything. Note that all binaries have a nice usage description, it includes a console, a ping method and a restart among others:
    Usage: riak {start | stop| restart | reboot | ping | console | attach | 
                        attach-direct | ertspath | chkconfig | escript | version | 
                        getpid | top [-interval N] [-sort reductions|memory|msg_q] [-lines N] }

Configuration

Before starting anything we are going to configure every component, which means editing the `app.config` files in each respective directory. For `riak-cs` I only made sure to set `{anonymous_user_creation, true}`, I did nothing for configuring `stanchion` as I used the default ports and ran everything on `localhost` without `ssl`. Just make sure that you are not running any other application on port `8080` as `riak-cs` will use this port by default. For configuring `riak` see the documentation, it sets a different backend that what we used in the `tasting` phase :) With all these configuration done you should be able to start all three components:
    riak start
    riak-cs start
    stanchion start
You can `ping` every component and check the console with `riak ping`, `riak-cs ping` and `stanchion ping`, I let you figure out the console access. Create an admin user for `riak-cs`
    curl -H 'Content-Type: application/json' -X POST http://localhost:8080/riak-cs/user \
         --data '{"email":"foobar@example.com", "name":"admin user"}'
 
If this returns successfully this should be a good indication that your setup is working properly. In the response we recognized API and secret keys
    {"email":"foobar@example.com",
     "display_name":"foobar",
     "name":"admin user",
     "key_id":"KVTTBDQSQ1-DY83YQYID",
     "key_secret":"2mNGCBRoqjab1guiI3rtQmV3j2NNVFyXdUAR3A==",
     "id":"1f8c3a88c1b58d4b4369c1bd155c9cb895589d24a5674be789f02d3b94b22e7c",
     "status":"enabled"}
 
Let's take those and put them in our `riak-cs` configuration file, there are `admin_key` and `admin_secret` variables to set. Then restart with `riak-cs restart`. Don't forget to also add those in the `stanchion` configuration file `/etc/stanchion/app.config` and restart it `stanchion restart`.

Using our new Cloud Storage with Boto

Since Riak-CS is S3 Compatible clouds storage solution, we should be able to use an S3 client like Python boto to create buckets and store data. Let's try. You will need boto of course, `apt-get install python-boto` and then open an interactive shell `python`. Import the modules and create a connection to `riak-cs`
    >>> from boto.s3.key import Key
    >>> from boto.s3.connection import S3Connection
    >>> from boto.s3.connection import OrdinaryCallingFormat
    >>> apikey='KVTTBDQSQ1-DY83YQYID'
    >>> secretkey='2mNGCBRoqjab1guiI3rtQmV3j2NNVFyXdUAR3A=='
    >>> cf=OrdinaryCallingFormat()
    >>> conn=S3Connection(aws_access_key_id=apikey,aws_secret_access_key=secretkey,
                          is_secure=False,host='localhost',port=8080,calling_format=cf)
 
Now you can list the bucket, which will be empty at first. Then create a bucket and store content in it with various keys
    >>> conn.get_all_buckets()
    []
    >>> bucket=conn.create_bucket('riakbucket')
    >>> k=Key(bucket)
    >>> k.key='firstkey'
    >>> k.set_contents_from_string('Object from first key')
    >>> k.key='secondkey'
    >>> k.set_contents_from_string('Object from second key')
    >>> b=conn.get_all_buckets()[0]
    >>> k=Key(b)
    >>> k.key='secondkey'
    >>> k.get_contents_as_string()
    'Object from second key'
    >>> k.key='firstkey'
    >>> k.get_contents_as_string()
    'Object from first key'

And that's it, an S3 compatible object store backed by a NOSQL distributed database that uses consistent hashing, all of it in erlang. Automate all of it with Chef recipe. Hook that up to your CloudStack EC2 compatible cloud, use it as secondary storage to hold templates or make it a public facing offering and you have the second leg of the Cloud: storage. Sweet...Next post I will show you how to use it with CloudStack.