Monday, May 2, 2011

Cloud and Big Data Meetup

May 1st, 2011 Micorsoft Campus, Cloud Computing Camp !
A very Rewarding Experience indeed !

My personal interest was to understand how Cloud Computing, Hadoop, NoSql helping Developers to solve problems from purely a Programmer's Viewpoint.

Food for thoughts

End of the day, it answered some of the most important questions
- how to choose a cloud service provider ?
- how to leverage PaaS and IaaS and whats the thin line between them?
- why companies like Netflix very selectively has chosen Amazon Cloud  and how they are creating platform on AWS like Cassandra as a Service ?
- How to adopt Open Sourced CloudFoundry and Microsoft Azure PaaS ?
- What OpenStack can offer to individual developer / start-ups to enjoy the fruits of IaaS at little cost ? 

And of course there were very thoughtful reflections on how to choose what type of CAP model, deep-diving into MongDB internals and talk on a wonderful array of tools to make op's and dev's life smooth while building a scalable system !!

Brush the basics and sky-rocket into Clouds

It all started with a great Keynote Speech from Cisco CTO -
A quick reminder of basic properties of a Cloud Provider -
Shared Resource, Scalability, Self-Service, Multiple Data-Centers, Measurable Resources, Design-for-failure, Auto-Recovery and Centralized monitoring.

These are well-known facts. But the most interesting point to note was evolution of OpenStack.
- Open eliminates Vendor Lock-in. Freedom to federate and move between clouds.

The intricate details of Network Virtualization, Hypervisor integration didn't attract me much ! Rather was interested in tools around Open Source Clouding.

Convergence of Network, Compute and Storage is going to be the driving point for handling information explosion and incredible video transmission in coming years.

Case Study - Simplify life

Netflix being one of the largest Amazon Cloud Consumers, shared its pain points and need for migrating to the Cloud.
- How Cloud helps us get rid of the 'indefinite wait-cycle' problem in traditional data-center.
- Keep waiting for 'Permission Grant', 'More Space Allocation' , 'Re-organize Capacity','Endless meetings with IT' etc. ... 

No more waiting ! By 2012 probably Cloud Model will be seamlessly part of every single software maker !!
In case of Netflix - Amazon API is the IT Dept !!
The best part is the development to deployment flow - Build, War, Rpm, bakes AMI, launch in cloud! Huh !
How did Cloud simplify life ?
Quoting Adrian ....   (Cloud Architecture)
" Central SQL -> Distributed Key/Value NoSQL
  Sticky In-Memory Session > Shared Memcached Session (for Others cold cache like MySQL Native Memory / Terracotta Big-Table)
 Chatty Protocol > Latency Tolerant Protocol
  Tangled Service Interface > Layered Service Interface
  Components as Jars > Components as Service
  Fat Complex objects > Lightweight Serializable Objects .."
Here goes the great story -

On a different context, another interesting article from Adrian Cockcroft on 'creating NoSql service over Amazon Cloud' -

PaaS - the Jewel Box

It was really exciting to know that Windows Azure is actually an OS on Web which allows user to run any windows-compatible application like accessing database blobs, NoSql data storage, VPN, CDN, Service Bus, Access Control, Monitoring.

Anyone interested to try out Azure, can just login at with passcode 'meetup'

Well ! Finally the much-awaited .. CloudFoundry !
VMWare-SpringSource means so much for the Java Community after the demise of Sun !!!
Now you get Grovvy-on-grals, Redis, Node.js and plenty of other services out-of-the CloudFoundry PaaS !
Tthe open source advocate Ezra gave a lightning talk about the internals of CloudFoundry :
"First user develops application foo.
vmc push foo -> talks to Cloud Controller through Rest
Cloud Controller takes a snapshot of the 'foo' app structure.
Converts this application to a Droplet runnable in queue.
Droplet Execution Agent node can sit on Amazon / RackSpace completely abstracted from User.
Staging process finds what runtime to launch and load all infrastructure for example it will introspect a war file  and find the require jdbc driver and load it and so on.
Staging process send messages to all DEA nodes to find who is least-loaded and can execute the request.
All components are connected through ESB.
Also there is a HealthManager Tool that polls Status Table and matches with Real World State and broadcasts messages !  ..."   ..  Thats  a Long Story cut Short !
Here goes the full story -
Well if you are lucky to get your CloudFoundry Passcode , you can start playing with it through great STS IDE !

Build your Dinosaur to crunch Big Data

Okay ! Now that we were surfing long on clouds .. its time to understand best practices of handling Big Data (that eventually may fly on cloud) !

Paco Nathan of IMVU fame delivered a very valuable lecture !

The Take Away message :
- Select Data Frameworks based on your Data Access Patterns :
- Relational Data is not good for Queues, Polling Operations, Social graph, Data Analysis and so on !
Just can't resist to highlight the following  from the slides !
How to apply CAP in various scenarios ?
" Financial Transactions - General Ledgerin RDBMS -  CAx
ad-hoc queries - hosted MySQL - CAx
log - rotation - Riak - xxP
Search Index - Lucene, Solr  - xAP
Static Content Archive - S3 - xAP
Customer facts - Redis , Membase  - xAP
Distributed counters, sets - Redis - xAP
CRUD - key/value   - CxP
Data preparation - Hadoop/ Hive - CxP
Graph Analysis - Hadoop + Redis + Graph - CxP
Data Mart - Hadoop / Hive / Hbase  - CxP.."
In the same line of identifying correct methodology for data analysis based on data access pattern, Apple threw some light about its in-house analytics flow :
"Sensors connect to Cassandra over rest to push data (the Click / Navigation / other User Events).
Aggregators read data from Cassandra Key-Value store then aggregate the data through offline batch and replace the data for the keys (compaction) inside K-V Store..."
Instrument, Measure , Manage Chaos ! Celebrate !

After the thought-provoking sessions, it was time to ride though the Twitter Roller Coaster !
The Mantra
- 'We can be Successful only if We can measure'
- Adopt this early and correctly as per your enterprise system architecture !
- Remember to minimize - MTTD (Mean time to Detect) and , MTTR (Mean time to Recover) !
-  Instrument Everything ! Cache all decoupled data layers ! 
- Use the correct tools
 - for Ruby on Rails  use Passenger (apache load-balancer fix) , Unicorn (Server) in place of Mongrel, use  
    Google Perf Tool.
  - In general use Puppet and Chef as Configuration Tools.
  - Explore these tools and use as per need .. Whales, RainBird, Ivy, Artifactory, AppDynamics, EpicNMS..

Enough of appetizer ... the meal is served hot here

Crash Course - learn MongoDB

Next what could be more enticing than plunging into the internals of MongoDB !
Alvin Richards is simply great -- !
Sharding could be ridiculously simple just by creating a compound key {server:1, application:1, time:1} and participating in shard - db.runCommand({addshard :  "shard1"}) ...  Huh !

He reflected on - Right-balanced-Indexing, Parallel execution of Queries,  Range-based partitions, Automatic Sharding, Consistent Hashing, Replication with asynch master/slave, automatic failover through consensus election and many more ! 
For  a technical deep-dive into consistency models look into -
For hands on refer to and follow

Build a successful SaaS Business ! Sweet dreams !

Well !  After the rewarding technical sessions it was just perfect to wrap up the day with some insights into building successful SaaS Business !

Big thanks to Sebastian and other Event organizers !

References :

No comments: