Cloud Foundry Blog

Future-proofing Your Apps: Cloud Foundry and Node.js

Most real-world applications we ship to consumers or enterprises are multi-year projects. In the cloud era, newer technologies (programming languages, runtimes, frameworks) are created faster than ever. While most of them fail to get any traction, once in a while a technology becomes popular because it solves a problem or set of problems extremely well.

Now in such an era, if you make a large investment for a multi-year project on a PaaS that only supports one technology and some other technology comes along that happens to solve your problem better, then you are stuck. You have unintentionally become a victim of vendor lock-in. The heart of your problem is that your PaaS, and hence your app, was not future-proofed to begin with.

PaaS With 1 set of Technologies

To future-proof your long term project, you should:

  • 1. Use a polyglot PaaS like Cloud Foundry that supports a great mix of both mature technologies and upcoming technologies.Polyglot PaaS
  • 2. Learn about newer and popular technologies like Node.js to:
    • See if they can replace part of your current app (i.e., convert it to a polyglot app).
    • Write future apps for your company in newer technologies using the same PaaS that you are already familiar with.

The remainder of this blog is about the latter option–learning newer and popular technologies, in this case Node.js, to help future-proof your app.

Things to note before you read:

  • While this blog refers to JavaScript frequently, it’s all happening on the server (not in the browser). Think of yourself as a server-side engineer throughout this blog.
  • We will also discuss when not to use Node.js and other similar languages towards the end of the blog.

What is Node.js?

Official definition: “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.” - Nodejs.org

Simple (my) definition: A platform that makes writing powerful C/C++ server-side apps easy by essentially wrapping them in JavaScript.

Let’s understand the definition by looking under the hood:

While at a first glance, perhaps because of the name Node.js, it might feel like it is built using JavaScript, but it is not. It simply runs JavaScript on the server. It is about 80 percent C/C++ code and about 20 percent JavaScript code. The C/C++ libraries are responsible for running JavaScript (via Google Chrome V8 JS engine) and providing support for HTTP, DNS, and TCP, etc.,–important server-side functionalities. The proportionally smaller JavaScript code mostly consists of libraries or modules to help make server-side developers’ lives a lot simpler.

Some useful definitions:

  • Chrome V8 Engine: Chrome V8 Engine is Google’s open source, C++ based JavaScript engine that actually runs JavaScript. The Node.js team took this C++ code and added other important libraries like TCP, HTTP and DNS to create Node.js. This is the same engine that is also embedded in the Google Chrome browser that runs JavaScript in the browser as well. This is not yet-another-JavaScript-engine but one that uses innovative techniques like “Hidden Class Transitions,” “JS to Machine Code compilation” and “Automatic GC” to make it one of the fastest JavaScript engines.

    We encourage you to go through Google’s Chrome comic book to learn more about this. Three relevant pages from that book are shown below.

  • Asynchronous I/O and Evented Support (C/C++):  In order to write a fast and scalable server application, we typically end up writing it in a multi-threaded fashion. While you can build great multi-threaded apps in many languages, it usually requires a lot of expertise to build them correctly. On the other hand, these libraries (along with Chrome’s V8 engine) provide a different architecture that hides the complexities of multi-threaded apps while getting the same or better benefits.
  • Let’s compare classic multi-threaded server with an evented, non-blocking I/O server:
  • An example multi-threaded HTTP server using blocking I/O
  • The above diagram depicts a simplified multi-threaded server. There are four users logging into the multi-threaded server. A couple of the users are hitting refresh buttons causing it to use lot of threads. When a request comes in, one of the threads in the thread pool performs that operation, say, a blocking I/O operation. This triggers the OS to perform context switching and run other threads in the thread pool. And after some time, when the I/O is finished, the OS context switches back to the earlier thread to return the result.
  • Architecture Summary: Multi-threaded servers supporting a synchronous, blocking I/O model provide a simpler way of performing I/O. But to handle a heavy load, multi-threaded servers end up using more threads because of the direct association to connections. Supporting more threads causes more memory and higher CPU usage due to more context switching among threads.
  • For more details, we recommend going through Benjamin Erb’s thesis paper here: http://berb.github.com/diploma-thesis/.
  • Event-driven, non-blocking I/O (Node.js server):The above diagram depicts how Node.js server works. At a high level, Node.js server has two parts to it:
  • At the front, you have Chrome V8 engine (single threaded), event loop and other C/C++ libraries that run your JS code and listen to HTTP/TCP requests.
  • And at the back of the server, you have libuv (includes libio) and other C/C++ libraries that provide asynchronous I/O.
  • Whenever a request is made from a browser, mobile device, etc., the main thread running in the V8 engine checks if it is an I/O. if it is an I/O then it immediately delegates that to the backside (kernel level) of the server where one of the threads in the POSIX thread pool actually makes async I/O. Because the main thread is now free, it starts accepting new requests/events.
  • And at some point when the response comes back from a database or file system, the backend piece generates an event indicating that we have a result from I/O. And when V8 becomes free from what it is currently doing (remember it is single-threaded), it takes the result and returns it to the client.
  • Architecture Summary: This architecture utilizes an event loop (main thread) at the front and performs asynchronous I/O at the kernel level. By not directly associating connections and threads, this model needs only a main event loop thread and many fewer (kernel) threads to perform I/O. Because there are fewer threads and consequently less context-switching, it uses less memory and also less CPU.

What are the benefits of Node.js?

  1. Savings in I/O cost (i.e., high performance): Because of the architecture, Node.js provides high performance like Nginx server as shown below. (As a side note: Nginx uses evented, non-blocking architecture, where as Apache uses multi-threaded architecture. Nginx doesn’t use Node.js, this is just an architecture comparison).
  2. Savings in Memory: Again, because of the architecture, Node.js uses relatively very little memory much like Nginx server as shown below.
  3. JavaScript: Node.js uses a familiar and very popular language–JavaScript–and allows engineers to use a single language for both client and server. (You can also use CoffeeScript (tenth on this list), which compiles to JavaScript.)https://github.com/languages
  4. Thousands of libraries: High performance and a familiar language is great, but you really need libraries to get started. Although Node.js is relatively new, it already has nearly 11,000 libraries.
  5. Second most popular watched project on Github: A large ecosystem of developers means better libraries and frameworks.

    2nd most popular on Github

    2nd most popular watched on Github

When to use Node.js:

Use Node.js to:

  1. Build a (soft) real-time social app like Twitter or a chat app.
  2. Build high-performance, high I/O, TCP apps like proxy servers, PaaS, databases, etc.
  3. Build backend logging and processing apps.
  4. Build great CLI apps similar to vmc-tool, and build tools such as ant or Make.
  5. Add a RESTful API-based web server in front of an application server.

When NOT to use Node.js:

Node.js is not suitable for every application:

  1. Mission-critical (hard) real-time apps like heart monitoring apps or those that are CPU-intensive.
  2. For simple CRUD apps that don’t have any real-time or high-performance needs, Node.js does not provide much of an advantage over other languages.
  3. Enterprise apps that might need some specific libraries for which there may not be a Node.js library yet. (However, you could build a polyglot app that uses Java in conjunction to Node.js to help with libraries.)

What are the drawbacks of Node.js:

Most of the drawbacks are because Node.js itself is relatively new:

  1. Node.js libraries are developed actively with a high rate of change. There are newer versions of libraries literally every month. This can cause version issues and instabilities. Npm shrinkwrap and package.json were introduced a while back to set up standards, but the issue still exists.
  2. Still many libraries, such as the SAML auth library which is required for enterprise apps, are not available yet.
  3. The whole callback, event-driven, functional programming aspects of Node.js can add a learning curve burden to server-side programmers of other object-oriented languages. (Note, there are several libraries to help overcome this. One example is async. In addition, developers can also use CoffeeScript which compiles to JavaScript to help with learning curve).
  4. Asynchronous and event-driven code inherently adds more complexity to the code versus a synchronous code.
  5. JavaScript has more than its share of “bad parts” and might throw off engineers and newcomers. (Side note: Read some good JavaScript books like: JavaScript: The Good Parts if you are a newcomer.)

What are other similar and newer languages I should be aware of:

  1. vertx.io: Write your application components in JavaScript, Ruby, Groovy or Java. Or mix and match several programming languages in a single application.
  2. Erlang: Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability. Some of its uses are in telecoms, banking, e-commerce, computer telephony and instant messaging. Erlang’s runtime system has built-in support for concurrency, distribution and fault tolerance.
  3. Twisted: Twisted is an event-driven networking engine written in Python and licensed under the open source.
  4. EventMachine: EventMachine is an event-driven I/O and lightweight concurrency library for Ruby. It provides event-driven I/O using the Reactor pattern.
  5. Scala: Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant and type-safe way.
  6. Dart: With the Dart platform, you can write code that runs on servers and in modern web browsers. Dart compiles to JavaScript, so your Dart web apps will work in multiple browsers.
  7. Go: Go is an open source programming environment that makes it easy to build simple, reliable and efficient software.
That’s it! Hopefully this blog gave you a good overview of polyglot PaaS and Node.js. We want to get you on track to future-proof your next multi-year project.
Also, please be sure to join me for a live webinar: “Node.js Basics: An Introductory Training” on July 18, 10:00 a.m. PDT.
Want to try Node.js on Cloud foundry?
Cloud Foundry provides a runtime environment for Node.js applications and the Cloud Foundry deployment tools automatically recognize Node.js applications. Simply follow the step-by-step instructions as described here: http://docs.cloudfoundry.com/frameworks/nodejs/nodejs.html and you will be on your way to running Node.js apps soon.

- Raja Rao DV (@rajaraodv – Developer Advocate, Cloud Foundry, (Node.js))

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Cloud Foundry Open PaaS Deep Dive

by Ezra Zygmuntowicz (aka @ezmobius)

You are probably wondering about how Cloud Foundry actually works, hopefully these details will clear things up for you about how Cloud Foundry the OSS project works, why it works, and how you can use it. Cloud Foundry is on github here: https://github.com/cloudfoundry/vcap. The VCAP repo is the meaty part or what we call the “kernel” of Cloud Foundry as it is the distributed system that contains all the functionality of the PaaS. We have released a VCAP setup script that will help you get an Ubuntu 10.04 system running a instance of Cloud Foundry including all the components of VCAP as well as a few services (mysql, redis, mongodb) up and running so you can play along at home.

We want to build a community around Cloud Foundry, as that is what matters most for now as far as the open source project. We imagine a whole ecosystem of “userland” tools that people can create and plug into our Cloud Foundry kernel to add value and customize for any particular situation. This project is so large in scope that we had to cut a release and get community involvement at some point and we feel that the kernel is in great shape for everyone to dig in and start helping us shape the future of the “linux kernel for the cloud” ;)

So how do you approach building a PaaS (Platform as a Service) that is capable of running multiple languages under a common deployment and scalability idiom, that is also capable of running on any cloud or any hardware that you can run Ubuntu on? VCAP  was architected by my true rock star coworker Derek Collison (this guys is the man, for realz!). The design is very nice and adheres to a main core concept of “the closer to the center of the system the dumber the code should be”. Distributed systems are fundamentally hard problems. So each component that cooperates to form the entire system should be as simple as it possibly can be and still do its job properly.

VCAP is broken down into 5 main components plus a message bus: The Cloud Controller, the Health Manager, the Router’s, the DEAs (Droplet Execution Agents) and a set of Services. Let’s take a closer look at each component in turn and see how they fit together to form a full platform or for the cloud. NATS is a very simple pub/sub messaging system written by Derek Collison (this dude knows messaging, trust me;) of TIBCO fame. NATS is the system that all the internal components communicate on. While NATS is great for this sort of system communication, you would never use NATS for application level messaging. VMware’s own RabbitMQ is awesome for app level messaging and we plan to make that available to Cloud Foundry users in the near future.

It should be stated here that every component in this system is horizontally scalable and self healing, meaning you can add as many copies of each component as needed in order to support the load of your cloud, and in any order. Since everything is decoupled, it doesn’t even really matter where each component lives, things could be spread across the world for all it cares. I think this is pretty cool ;)

Cloud Controller

The Cloud Controller is the main ‘brain’ of the system. This is an Async Rails3 app that uses ruby 1.9.2 and fibers plus eventmachine to be fully async, pretty cutting edge stuff. You can find the Cloud Controller here: https://github.com/cloudfoundry/vcap/tree/master/cloud_controller . This component exposes the main REST interface that the CLI tool “vmc” talks to as well as the STS plugin for eclipse. If you were so inclined you could write your own client that talks to the REST endpoints exposed by the Cloud Controller to talk to VCAP in whatever way you like. But this should not be necessary as the “vmc” CLI has been written with scriptability in mind. It will return proper exit codes as well as JSON if you so desire so you can fully script it with bash or Ruby, Python, etc.

We made a decision not to tie VCAP to git even though we love git, we need to support any source code control system, yet we want the simplicity of a git push style deployment, hence the vmc push. But we also do want to have the differential deploys, meaning that we want to push diffs when you update your app, we do not want to have to push your entire app tree every time you deploy. Feeling light and fast is important to us. Our goal is to rival local development.

So we designed a system where we can get the best of both worlds. You as a user can use any source control system you want, when you do a vmc push or vmc update we will examine your app’s directory tree and send a “skeleton fingerprint” to the cloud controller. This is basically a fingerprint of each file in your apps tree and the shape of your directory tree. The cloud controller keeps these in a shared pool, accessible via their fingerprint plus the size for every object it ever sees. Then it returns to the client a manifest of what files it already has and what files it needs your client to send to the cloud in order to have all of your app. It is a sort of ‘dedupe’ for app code as well as framework and rubygem code and other dependency code. Then your client only sends the objects that the cloud requires in order to create a full “Droplet” (a droplet is a tarball of all your app code plus its dependencies all wrapped up into a droplet with a start and stop button) of your application.

Once the Cloud Controller has all the ‘bits’ it needs to fully assemble your app, it pushes the app into the “staging pipeline”. Staging is what we call the process that assembles your app into a droplet by getting all the full objects that comprise your applications plus all of its dependencies, rewrites its config files in order to point to the proper services that you have bound to your application and then creates a tarball with some wrapper scripts called start and stop.

DEA

The Droplet Execution Agent can be found here: https://github.com/cloudfoundry/vcap/tree/master/dea . This is an agent that is run on each node in the grid that actually runs the applications. So in any particular cloud build of Cloud Foundry, you will have more DEA nodes then any other type of node in a typical setup. Each DEA can be configured to advertise a different capacity and different runtime set,  so you do not need all your DEA nodes to be the same size or be able to run the same applications. So continuing on from our Cloud Controller story, the CC has asked for help running a droplet by broadcasting on the bus that it has a droplet that needs to be run. This droplet has some meta data attached to it like what runtime it needs as well as how much RAM it needs. Runtimes are the base component needed to run your app, like ruby-1.8.7 or ruby-1.9.2, or java6, or node. When a DEA gets one of these messages he checks to make sure he can fulfill the request and if he can he responds back to the Cloud Controller that he is willing to help.

The DEA does not necessarily care what language an app is written in. All it sees are droplets. A droplet is a simple wrapper around an application that takes one input, the port number to serve HTTP requests on. And it also has 2 buttons, start and stop. So the DEA treats droplets as black boxes, when it receives a new droplet to run, all it does it tells it what port to bind to and runs the start script. A droplet again is just a tarball of your application, wrapped up in a start/stop script and with all your config files like database.yml rewritten in order to bind to the proper database. Now we can’t rewrite configs for every type of service so for some services like Redis or Mongodb you will need to grab your configuration info from the environment variable ENV[‘VCAP_SERVICES’].

In fact there is a bunch of juicy info in the ENV of your application container. If you create a directory on your laptop and make a file in it called env.rb like this:

$ mkdir env && cd env 
$ cat «EOF > envvars.rb 
require ‘sinatra’ 
require ‘pp’ 
get ‘/’ do   
  “#{ENV.pretty_inspect}”
end 
EOF
$ vmc push …

That will make a simple app that will show you what is available in your ENVIRONMENT so that you can see what to use to configure your application. If you visit this new app you will see something like this: ENV output.

So the DEA’s job is almost done, once it tells the droplet what port to listen on for HTTP requests and runs it’s start script, and the app properly binds the correct port, it will broadcast on the bus the location of the new application so the Routers can know about it. If the app did not start successfully it will return log messages to the vmc client that tried to push this app telling the user why their app didn’t start (hopefully). This leads us right into what the Router has to do in the system so we will hand it over to the Router (applause).

Router

The Router is another eventmachine daemon that does what you would think it does, it routes requests. In a larger production setup there is a pool of Routers load balanced behind Nginx (or some other load balancer or http cache/balancer). These routers listen on the bus for notifications from the DEA’s about new apps coming online and apps going offline etc. When they get a realtime update they will update their in-memory routing table that they consult in order to properly route requests. So a request coming into the system goes through Nginx, or some other HTTP termination endpoint, which then load balances across a pool of identical Routers. One of the routers will pick up the phone to answer the request, it will start inspecting the headers of the request just enough to find the Host: header so it can pick out the name of the application this request is headed for. It will then do a basic hash lookup in the routing table to find a list of potential backends that represent this particular application. These look like: {‘foo.cloudfoundry.com’ => [‘10.10.42.1:5897’, ‘10.10.42.3:61378’, etc]}

So once it has looked up the list of currently running instances of the app represented by ‘foo.cloudfoundry.com’ it will pick a random backend to send the request to. So the router chooses one backend instance and forwards the request to that app instance. Responses are also inspected at the header level for injection if need be for functionality such as sticky sessions.

The Router can retry another backend if the one it chose fails and there are many ways to customize this behavior if you have your own instance of Cloud Foundry setup somewhere. Routers are fairly straightforward in what they do and how they do it. They are eventmachine based and run on ruby-1.9.2 so they are fast and can handle quite a bit of traffic per instance, but like every other component in the system, they are horizontally scalable and you can add more as needed in order to scale up bigger and bigger. The system is architected in such a way that this can even be done on a running system.

Health Manager

The Health Manager is a standalone daemon that has a copy of the same models the Cloud Controller has and can currently see into the same database as the Cloud Controller. This daemon has an interval where it wakes up and scans the database of the Cloud Controller to see what the state of the world “should be”, then it actually goes out into the world and inspects the real state to make sure it matches. If there are things that do not match then it will initiate messages back to the Cloud Controller to correct this. This is how we handle the loss of an application or even a DEA node per say. If an application goes down, the Health Manager will notice and will quickly remedy the situation by signaling the Cloud Controller to star a new instance. If a DEA node completely fails, the app instances running there will be redistributed back out across the grid of remaining DEA nodes.

Services

These are the services your application can choose to use and bind to in order to get data, messaging, caching and other services. This is currently where redis and mysql run and will eventually become a huge ecosystem of services offered by VMware and anyone else who wants to offer their service into our cloud. One of the cool things I will highlight is that you can share service instances between your different apps deployed onto a VCAP system. Meaning you could have a core java Spring app with a bunch of satellite sinatra apps communicating via redis or a RabbitMQ message bus. Ok my fingers are tired and my shoulder hurts so I am going to call this first post done. I plan on blogging a lot more often as well as trying to help organize the community around Cloud Foundry the open source project. I hope you are as excited as I am by this project, it is basically like “rack for frameworks, clouds and services” rather then just ruby frameworks. Pluggable across the 3 different axis and well tested and well coded. This thing is very cool and I am very proud just to be a member of the team working on this thing. This has been a huge team effort to get out the door and we hope it will become a huge community effort to keep driving it forward to truly make it “The Linux Kernel of Cloud Operating Systems”. Will you please join the community with me and help build this thing out to meet its true potential?

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email