Cloud Foundry Blog

Future-proofing Your Apps: Cloud Foundry and Node.js

Most real-world applications we ship to consumers or enterprises are multi-year projects. In the cloud era, newer technologies (programming languages, runtimes, frameworks) are created faster than ever. While most of them fail to get any traction, once in a while a technology becomes popular because it solves a problem or set of problems extremely well.

Now in such an era, if you make a large investment for a multi-year project on a PaaS that only supports one technology and some other technology comes along that happens to solve your problem better, then you are stuck. You have unintentionally become a victim of vendor lock-in. The heart of your problem is that your PaaS, and hence your app, was not future-proofed to begin with.

PaaS With 1 set of Technologies

To future-proof your long term project, you should:

  • 1. Use a polyglot PaaS like Cloud Foundry that supports a great mix of both mature technologies and upcoming technologies.Polyglot PaaS
  • 2. Learn about newer and popular technologies like Node.js to:
    • See if they can replace part of your current app (i.e., convert it to a polyglot app).
    • Write future apps for your company in newer technologies using the same PaaS that you are already familiar with.

The remainder of this blog is about the latter option–learning newer and popular technologies, in this case Node.js, to help future-proof your app.

Things to note before you read:

  • While this blog refers to JavaScript frequently, it’s all happening on the server (not in the browser). Think of yourself as a server-side engineer throughout this blog.
  • We will also discuss when not to use Node.js and other similar languages towards the end of the blog.

What is Node.js?

Official definition: “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.” - Nodejs.org

Simple (my) definition: A platform that makes writing powerful C/C++ server-side apps easy by essentially wrapping them in JavaScript.

Let’s understand the definition by looking under the hood:

While at a first glance, perhaps because of the name Node.js, it might feel like it is built using JavaScript, but it is not. It simply runs JavaScript on the server. It is about 80 percent C/C++ code and about 20 percent JavaScript code. The C/C++ libraries are responsible for running JavaScript (via Google Chrome V8 JS engine) and providing support for HTTP, DNS, and TCP, etc.,–important server-side functionalities. The proportionally smaller JavaScript code mostly consists of libraries or modules to help make server-side developers’ lives a lot simpler.

Some useful definitions:

  • Chrome V8 Engine: Chrome V8 Engine is Google’s open source, C++ based JavaScript engine that actually runs JavaScript. The Node.js team took this C++ code and added other important libraries like TCP, HTTP and DNS to create Node.js. This is the same engine that is also embedded in the Google Chrome browser that runs JavaScript in the browser as well. This is not yet-another-JavaScript-engine but one that uses innovative techniques like “Hidden Class Transitions,” “JS to Machine Code compilation” and “Automatic GC” to make it one of the fastest JavaScript engines.

    We encourage you to go through Google’s Chrome comic book to learn more about this. Three relevant pages from that book are shown below.

  • Asynchronous I/O and Evented Support (C/C++):  In order to write a fast and scalable server application, we typically end up writing it in a multi-threaded fashion. While you can build great multi-threaded apps in many languages, it usually requires a lot of expertise to build them correctly. On the other hand, these libraries (along with Chrome’s V8 engine) provide a different architecture that hides the complexities of multi-threaded apps while getting the same or better benefits.
  • Let’s compare classic multi-threaded server with an evented, non-blocking I/O server:
  • An example multi-threaded HTTP server using blocking I/O
  • The above diagram depicts a simplified multi-threaded server. There are four users logging into the multi-threaded server. A couple of the users are hitting refresh buttons causing it to use lot of threads. When a request comes in, one of the threads in the thread pool performs that operation, say, a blocking I/O operation. This triggers the OS to perform context switching and run other threads in the thread pool. And after some time, when the I/O is finished, the OS context switches back to the earlier thread to return the result.
  • Architecture Summary: Multi-threaded servers supporting a synchronous, blocking I/O model provide a simpler way of performing I/O. But to handle a heavy load, multi-threaded servers end up using more threads because of the direct association to connections. Supporting more threads causes more memory and higher CPU usage due to more context switching among threads.
  • For more details, we recommend going through Benjamin Erb’s thesis paper here: http://berb.github.com/diploma-thesis/.
  • Event-driven, non-blocking I/O (Node.js server):The above diagram depicts how Node.js server works. At a high level, Node.js server has two parts to it:
  • At the front, you have Chrome V8 engine (single threaded), event loop and other C/C++ libraries that run your JS code and listen to HTTP/TCP requests.
  • And at the back of the server, you have libuv (includes libio) and other C/C++ libraries that provide asynchronous I/O.
  • Whenever a request is made from a browser, mobile device, etc., the main thread running in the V8 engine checks if it is an I/O. if it is an I/O then it immediately delegates that to the backside (kernel level) of the server where one of the threads in the POSIX thread pool actually makes async I/O. Because the main thread is now free, it starts accepting new requests/events.
  • And at some point when the response comes back from a database or file system, the backend piece generates an event indicating that we have a result from I/O. And when V8 becomes free from what it is currently doing (remember it is single-threaded), it takes the result and returns it to the client.
  • Architecture Summary: This architecture utilizes an event loop (main thread) at the front and performs asynchronous I/O at the kernel level. By not directly associating connections and threads, this model needs only a main event loop thread and many fewer (kernel) threads to perform I/O. Because there are fewer threads and consequently less context-switching, it uses less memory and also less CPU.

What are the benefits of Node.js?

  1. Savings in I/O cost (i.e., high performance): Because of the architecture, Node.js provides high performance like Nginx server as shown below. (As a side note: Nginx uses evented, non-blocking architecture, where as Apache uses multi-threaded architecture. Nginx doesn’t use Node.js, this is just an architecture comparison).
  2. Savings in Memory: Again, because of the architecture, Node.js uses relatively very little memory much like Nginx server as shown below.
  3. JavaScript: Node.js uses a familiar and very popular language–JavaScript–and allows engineers to use a single language for both client and server. (You can also use CoffeeScript (tenth on this list), which compiles to JavaScript.)https://github.com/languages
  4. Thousands of libraries: High performance and a familiar language is great, but you really need libraries to get started. Although Node.js is relatively new, it already has nearly 11,000 libraries.
  5. Second most popular watched project on Github: A large ecosystem of developers means better libraries and frameworks.

    2nd most popular on Github

    2nd most popular watched on Github

When to use Node.js:

Use Node.js to:

  1. Build a (soft) real-time social app like Twitter or a chat app.
  2. Build high-performance, high I/O, TCP apps like proxy servers, PaaS, databases, etc.
  3. Build backend logging and processing apps.
  4. Build great CLI apps similar to vmc-tool, and build tools such as ant or Make.
  5. Add a RESTful API-based web server in front of an application server.

When NOT to use Node.js:

Node.js is not suitable for every application:

  1. Mission-critical (hard) real-time apps like heart monitoring apps or those that are CPU-intensive.
  2. For simple CRUD apps that don’t have any real-time or high-performance needs, Node.js does not provide much of an advantage over other languages.
  3. Enterprise apps that might need some specific libraries for which there may not be a Node.js library yet. (However, you could build a polyglot app that uses Java in conjunction to Node.js to help with libraries.)

What are the drawbacks of Node.js:

Most of the drawbacks are because Node.js itself is relatively new:

  1. Node.js libraries are developed actively with a high rate of change. There are newer versions of libraries literally every month. This can cause version issues and instabilities. Npm shrinkwrap and package.json were introduced a while back to set up standards, but the issue still exists.
  2. Still many libraries, such as the SAML auth library which is required for enterprise apps, are not available yet.
  3. The whole callback, event-driven, functional programming aspects of Node.js can add a learning curve burden to server-side programmers of other object-oriented languages. (Note, there are several libraries to help overcome this. One example is async. In addition, developers can also use CoffeeScript which compiles to JavaScript to help with learning curve).
  4. Asynchronous and event-driven code inherently adds more complexity to the code versus a synchronous code.
  5. JavaScript has more than its share of “bad parts” and might throw off engineers and newcomers. (Side note: Read some good JavaScript books like: JavaScript: The Good Parts if you are a newcomer.)

What are other similar and newer languages I should be aware of:

  1. vertx.io: Write your application components in JavaScript, Ruby, Groovy or Java. Or mix and match several programming languages in a single application.
  2. Erlang: Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability. Some of its uses are in telecoms, banking, e-commerce, computer telephony and instant messaging. Erlang’s runtime system has built-in support for concurrency, distribution and fault tolerance.
  3. Twisted: Twisted is an event-driven networking engine written in Python and licensed under the open source.
  4. EventMachine: EventMachine is an event-driven I/O and lightweight concurrency library for Ruby. It provides event-driven I/O using the Reactor pattern.
  5. Scala: Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant and type-safe way.
  6. Dart: With the Dart platform, you can write code that runs on servers and in modern web browsers. Dart compiles to JavaScript, so your Dart web apps will work in multiple browsers.
  7. Go: Go is an open source programming environment that makes it easy to build simple, reliable and efficient software.
That’s it! Hopefully this blog gave you a good overview of polyglot PaaS and Node.js. We want to get you on track to future-proof your next multi-year project.
Also, please be sure to join me for a live webinar: “Node.js Basics: An Introductory Training” on July 18, 10:00 a.m. PDT.
Want to try Node.js on Cloud foundry?
Cloud Foundry provides a runtime environment for Node.js applications and the Cloud Foundry deployment tools automatically recognize Node.js applications. Simply follow the step-by-step instructions as described here: http://docs.cloudfoundry.com/frameworks/nodejs/nodejs.html and you will be on your way to running Node.js apps soon.

- Raja Rao DV (@rajaraodv – Developer Advocate, Cloud Foundry, (Node.js))

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Heads Up on Some New Cloud Controller Features

As I discussed in my post at the end of April, the Cloud Controller is undergoing major surgery and this work is being done in the cloud_controller_ng repo. If you are following the review stream for the cloud_controller_ng project, its time to take note of the new Organization and AppSpace objects. These objects are the foundation of several new features we are rolling out this year:

  • operational collaboration
  • advanced quota management and control
  • custom domains and assorted application features

This post will focus on the objects themselves and will discuss operational collaboration to demonstrate their significance. Other features will be discussed in subsequent posts.

In order to understand the new objects, its best to briefly review the current model and to understand some of the limitations with that scheme. The diagram below is a high level view of the current cloud controller model.

Current Cloud Controller Model

In this scheme, each user account directly contains both named applications and named service instances. The names are scoped to the user account object, and only that  account can manipulate the applications and services. The simplicity of this model exposes an operational issue that can occur when more than a single person is responsible for ongoing maintenance of an application.

The issue is that when your production application is created under a given account, ONLY that account can manipulate the app (update the code, scale it out, increase its memory size, etc.). For a single developer operation, this works fine. However, once you have more than one person responsible for an app (e.g., your small 3-man startup), this approach is problematic. To compensate for this, people either use a shared account, share passwords, or have to invoke admin privileges which allow an admin to manipulate the objects in another user’s account. These solutions all work, but all are poor solutions that expose their own set of problems (inability to generate a precise audit log of who did what, too many folks with admin privileges, etc.)

The new model is designed to address the aforementioned deficiencies in a scalable and sustainable way, and at the same time, provide us with the foundation needed in order to deliver additional advanced features.

The diagram below is a high level view of the new cloud controller model.

New Cloud Controller Model

Under the new model, applications and services are now scoped to a new object called the AppSpace. Multiple users can have access to an AppSpace, and each user has a set of permissions that determine what operations she can perform against the applications and services within the space. Instead of shared accounts, sharing passwords, or invoking admin rights, you can simply create an AppSpace for your production facing applications and then allow a select group of developers to manipulate the apps and services within that space.

We have taken things a step further than this with the introduction of the Organization object. This object can contain a number of AppSpaces as well as a membership list of users, etc. If you are familiar with the GitHub account model, if you squint real hard you can see that from a scoping and permissions standpoint, a GitHub Organization and a Cloud Foundry Organization are very similar, and a GitHub repo and a Cloud Foundry AppSpace are similar.

I’ll save the details on advanced quota management and features for another post, but if you read the code and review stream you can see how we are using these new objects as a foundation for quota management, custom domains, and many more advanced features.

-markl

Mark Lucovsky, VP of Engineering – Cloud Foundry

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Deploying Tomcat 7 Using the Standalone Framework

The new standalone framework support greatly increases the number of different types of non-Web applications that can run on Cloud Foundry, including application servers. This tutorial will walk you through the steps to deploy a “hello world” application in a Tomcat 7 container on Cloud Foundry. Currently, Cloud Foundry leverages Tomcat 6 to host Java web applications. While the team is working to support Tomcat 7 as a first-class container, it is very straightforward to use the standalone application support to run Tomcat 7 in the meantime, which is particularly useful for applications that leverage Servlet 3.0. The basic outline involves installing your application into a local Tomcat 7 instance, making minor modifications to the configuration and pushing the entire contents of Tomcat 7 and your application to Cloud Foundry as a standalone application.

Step 1 – Download Apache Tomcat

Download Apache Tomcat 7 to the location where you will use the vmc command line tool. If you do not have vmc, then follow these instructions to install it. I downloaded Apache 7.0.27, which is currently the latest version 7.0 release and the file name is apache-tomcat-7.0.27.zip. All of the commands throughout this tutorial will assume that the present working directory is the Tomcat 7 base directory.

Step 2 – Extract Tomcat and Update Permissions

Extract the tomcat zip file to a local directory. I changed the permissions of the bin/*.sh scripts to have executable permissions.

unzip apache-tomcat-7.0.27.zip
cd apache-tomcat-7.0.27
chmod +x bin/*.sh

Linux and OSX users should be able to test the scripts on their local systems before pushing them to Cloud Foundry’s Ubuntu-based server environment. Unfortunately, Windows users cannot test the bin/startup.sh script changes locally first, but the modifications are really quite simple.

Step 3 – Edit Startup Scripts

bin/startup.sh

Tomcat is typically started with the bin/startup.sh script. In order for Tomcat to use the same shell that invokes startup.sh instead of spawning a new shell, change the execution argument in the last line of startup.sh from “start” to “run“:

exec "$PRGDIR"/"$EXECUTABLE" start "$@"

to:

exec "$PRGDIR"/"$EXECUTABLE" run "$@"

bin/catalina.sh

Instead of using a pre-defined static port, we would like Tomcat 7 to use the port assigned by Cloud Foundry, which will be stored in the VCAP_APP_PORT environment variable when deployed. Place the following bash code near the top of catalina.sh after the initial comments. Just so that we can run this locally as well without modifying the code, this code will assign a static port number of 8080 if the dynamic port is not available as an environment variable.

# USE VCAP PORT IF IT EXISTS, OTHERWISE DEFAULT TO 8080

if [ -z ${VCAP_APP_PORT} ]; then

export VCAP_APP_PORT=8080

fi

export JAVA_OPTS="-Dport.http.nonssl=$VCAP_APP_PORT $JAVA_OPTS"

Step 4 – Edit Tomcat Configuration

conf/server.xml

Set the port attribute of the Server element to -1, which disables the Tomcat shutdown port. Cloud Foundry does not use the shutdown port because it issues a “kill -9 PID” command to stop any standalone app instance. We want to avoid any potential port conflicts with other applications that are running on the same Droplet Execution Agent (DEA), so only using a single http port is the current recommendation for standalone applications running on Cloud Foundry.

Server port=”-1” command=”SHUTDOWN”

Since Cloud Foundry handles the load balancing for you without using the AJP connector, you should disable the AJP connector to ensure we do not get a port conflict by commenting out the section shown below.

 <!-- Define an AJP 1.3 Connector on port 8009 <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" /> --> 

The Connector element should use the port provided in the JAVA_OPTS environment variable, which we have set previously in the catalina.sh script.

Connector port="${port.http.nonssl}" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"

At this point, I recommend trying the edits on your local server to see if Tomcat 7 starts up as expected. In order to test whether the VCAP_APP_PORT is being used, I recommend using a command shell to assign a sample port such as 8082.

jbayer$ export VCAP_APP_PORT=8082
jbayer$ bin/startup.sh

The console should not return (it should block while Tomcat is running) and one of the last lines in the console output should be:

INFO: Starting ProtocolHandler ["http-bio-8082"]

If that is the case, you should be able to visit http://localhost:8082 to see if the welcome page is there.

Tomcat 7 Running Locally

Tomcat 7 Running Locally

At this point you may want to back up (zip) the entire Tomcat 7 directory with the customizations you have done thus far so you can reuse it on other applications later.

Step 5 – Install your application

In order to show a Tomcat 7 feature, I used Servlet 3.0 which now has support for Servlet annotations as shown below in the simple Servlet.

package tomcat7;

import java.io.IOException; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.*;

@WebServlet("/Servlet3") public class Servlet3 extends HttpServlet {

    public Servlet3() {
        super();
    }
    
    protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        response.getWriter().println("Hello from Servlet 3.0!");
    }
    

}

Cloud Foundry uses the ROOT web application with Tomcat 6, so let’s replicate that same behavior for Tomcat 7.

First delete the existing ROOT application.

jbayer$ rm –r webapps/ROOT

Now take your web application and explode it into the webapps/ROOT location. If you have a .war file then the command looks like this:

jbayer$ unzip –d webapps/ROOT ~/dev/mytomcat7.war

Check your application locally to see if it functions properly.

Tomcat 7 With Servlet 3.0 Running Locally

Step 6 – Push the application to Cloud Foundry

Execute the vmc command from the Tomcat 7 base directory, selecting many of the default selections. Notice that vmc auto-detects that it is a “Standalone Application,” and you simply need to provide the startup script path and map an unique URL. Note that you will get an error if the URL is not unique.

jbayer$ vmc push mytomcat7 
Would you like to deploy from the current directory? [Yn]: 
Detected a Standalone Application, is this correct? [Yn]: 
1: java 
2: node 
3: node06 
4: ruby18 
5: ruby19 
Select Runtime : 1 
Selected java 
Start Command: bin/startup.sh 
Application Deployed URL [None]: mytomcat7.cloudfoundry.com 
Memory reservation (128M, 256M, 512M, 1G, 2G) [512M]: 256M 
How many instances? [1]: 
Bind existing services to 'mytomcat7'? [yN]: 
Create services to bind to 'mytomcat7'? [yN]: 
Would you like to save this configuration? [yN]: y 
Manifest written to manifest.yml. 
Creating Application: OK 
Uploading Application: 
  Checking for available resources: OK 
  Processing resources: OK 
  Packing application: OK 
  Uploading (23K): OK 
Push Status: OK 
Staging Application 'mytomcat7': OK 
Starting Application 'mytomcat7': OK

Tomcat 7 With Servlet 3.0 on CloudFoundry.com

At the end of the push process, you will be offered the option to write the configuration to a manifest file. Here is the resulting manifest.mf file that got written to the Tomcat 7 base directory from our deployment. Note that it contains the startup command, bin/startup.sh. By having this file present in the root directory vmc will read from this file and skip the interactive questions the next time you push this application.

---
applications:
  .:
    url: mytomcat7.cloudfoundry.com
    command: bin/startup.sh
    runtime: java
    framework:
      info:
        exec:
        description: Standalone Application
        mem: 64M
      name: standalone
    name: mytomcat7
    instances: 1
    mem: 256M

Conclusion

Other containers such as Jetty would follow a similar pattern as described above. Most applications should be able to use the existing frameworks Cloud Foundry makes available. Should the need arise to customize or bring your own container, Cloud Foundry standalone application support is a great option.

- James Bayer

The Cloud Foundry Team

Try Cloud Foundry on CloudFoundry.com for free

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Administer Cloud Foundry with Mobile Apps

One of the neat things about Cloud Foundry is that, because the code is open source, it’s easy to see how the administration tools (such as the command line-based vmc) work. The Cloud Controller component has a REST API, which provides the ability to query and modify the Cloud Foundry environment. That means it is relatively straightforward to build a management user interface tailored to the platform you are using, or to the requirements and needs of a specific set of users.

To illustrate this, I made a really brief video which I tend to use when I’m speaking about the Cloud Foundry platform and ecosystem. What you see here are two iOS apps - CF Mobile Admin (App Store link) and the AppFog app (App Store Link) – which have the ability to connect to Cloud Foundry instances, list deployed apps, modify instances, stop or start applications and query detailed information about the available resources. These kinds of tools are useful if you don’t have vmc handy, and are obviously great for mobile usage. Note that both of these tools are provided by third parties, and not by the Cloud Foundry development team.

Several developers have asked me if there are equivalent apps for Android or Windows Phone. If there are, I haven’t found them yet. One other example of a custom third-party-provided administration UI–in this case, a desktop-based one–is the Microsoft Management Console (MMC) snap-in, Uhuru Cloud Manager, that Uhuru Software provides as a tool alongside its own PaaS offering. There’s clearly an opportunity to build user interfaces and tools to target platforms such as Android and Windows Phone, or indeed your choice of mobile or desktop OS.

If you come up with anything of your own, do let us know by commenting here on the blog, or by talking to us on Twitter: @cloudfoundry or via the hashtag #cloudfoundry.

Andy Piper, Cloud Foundry Team

Sign up for Cloud Foundry today to try out these mobile administration tools

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Open Web Foundation Agreement for Activity Streams Signed

At Cloud Foundry we care about making life easier for web developers and API developers. Open source helps developers reuse code. Open standards help developers reuse protocols and schemas. We are pleased to announce that VMware has furthered its commitment to open standards with a signed agreement on Activity Streams specifications. VMware signed the Open Web Foundation Agreement (OWFa) for the JSON and Atom Activity Streams specifications, which gives application developers confidence in implementing these open specifications when building web applications and services.

What is OWFa?

In short, the Open Web Foundation Agreement is a promise made by the signing company that they will not assert IP Claims over the work covered in the specification.

OWFa 1.0 grants perpetual (for the duration of the applicable copyright), worldwide, non-exclusive, no-charge, royalty-free, copyright license, without any obligation for accounting to me, to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, distribute, and implement any Contribution to the full extent of my copyright interest in the Contribution

What is Activity Streams?

Activity Streams is a generic schema to represent social activity around the web. There are currently two specifications: one for JSON and one for Atom. The main components of an activity as defined by the activitystrea.ms spec are: Actor, Verb and Object with an optional Target. An actor performs an action of type verb on an object in a target. For more information on how this standard can help you build real-time web applications see this blog post by Cloud Foundry Developer Advocate, Monica Wilkinson who is a contributor on the ActivityStrea.ms Specification.

Many social networks in the consumer and enterprise space now offer an Activity Streams API and have implemented the specification. In particular, Google Plus’ API implements the Activity Strea.ms JSON Specification thanks to the efforts of people like Will Norris who is also a contributor in the specification.

Over the course of development of the specification there have been contributions from developers at: MySpace, Facebook, Microsoft, Google, VMWare, Mozilla, StatusNet, IBM and others. It is important that contributors get their companies to sign the OWFa agreement so implementors can be confident when building their applications. To get a full list of the signees checkout github.

Next Steps

For developers To get started building an Activity Streams real-time application you can checkout this tutorial.

For contributors We are evaluating starting an official IETF charter for a complete Activity Streams Protocol which would not only cover the schema and syntax, it would also cover the REST endpoints and streaming. If you are interested in becoming a contributor or implementor please join the Activity Streams mailing list.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Redis in Action with Cloud Foundry

Redis is a popular open source, advanced key-value store project sponsored by VMware. It has been a Cloud Foundry core service from day one and is widely adopted by developers who love its performance and flexibility. In the following guest post we introduce Dr. Josiah L. Carlson who discusses his upcoming book Redis in Action and  describes how Redis is continuing to change the lives of developers.

Guest blog by Dr. Josiah L. Carlson, a well-known contributor on the Redis mailing list

Over the last several years, a wide variety of non-relational databases have been created to offer varying balances of performance, reliability, and non-relational data models. In late March of 2009, Redis arrived in the open source world and has since been adopted by developers at an increasing rate, driven by a combination of performance, flexibility and a data model that programmers are already familiar with: standard data structures.

To support the growing need for operational simplicity, VMware introduced Cloud Foundry as a way to reduce the effort and overhead required to install and configure a carefully chosen variety of services, languages and frameworks. Not surprisingly (at least to those of us in the community), Redis made the cut at launch, and Cloud Foundry has made a previously easy setup procedure even easier. Assuming you already have the open source vmc tools installed and already have an app set up and configured, installing Redis for use in Cloud Foundry is as easy as:

$ vmc create-service redis --bind <app>

Once you have Redis installed, using it from one of the supported Cloud Foundry languages is only slightly different than if you were hosting your own infrastructure, primarily due to configuration. There are a few articles that discuss the configuration and use of Redis with Cloud Foundry with Ruby, Java/Spring, and Node.js.

Why Redis?

Whenever I talk to an engineer who isn’t familiar with Redis, the first question I am asked is “why Redis?” On the one hand, the answer is very simple: It makes our jobs as engineers easier by addressing problems we need to solve better, in many cases, than relational databases, document databases, or plain key-value databases. By combining five different and familiar data structures stored 100 percent in memory (but also written to disk in one of two ways), Redis offers performance and data access features that are top-notch. An increasing number of engineers, myself included, owe their success in no small part to the use of Redis as a production service.

My history with Redis

I got my start using Redis from a friend and manager who assigned a bug tracker ticket to me, mentioning that I might want to take a look at Redis to handle an internal search over some client data. Nothing too extraordinary, and it was something that Lucene could easily handle out of the box. But there was something about Redis that caught my attention. Because it was only my second task since joining the company, it was reasonable to take a little time to explore a new technology. Around two weeks later, we deployed a new internal search engine that was built using Redis hashes to store sortable data and Redis sets to store search terms. A series of set intersections followed by a sort call actually executed the search, which filtered and sorted over some 60,000 records in 50 milliseconds, or around 200 times faster than what our previous system managed. (I have previously written about a more web-page specific type of search on my blog.)

After arriving at such easy and quick success developing and deploying applications with Redis, I joined the mailing list with a few feature requests. Ultimately, only one of my requested features made it in, but in the Redis community I found a wide variety of problems posted by other developers, and I couldn’t resist offering advice on possible solutions. The breadth of problems posted to the list, along with my own experiences developing over a dozen Redis-backed tools and systems for my now past and current employers, combine to fill the pages of Redis in Action with real problems and their solutions. These are solutions that you can use today on a variety of problems with some of the most popular programming languages.

While Redis in Action is not yet complete, you can find four chapters available today through Manning’s Early Access Program, with at least one additional chapter to be released in June, and one to two chapters every month until it is done. Python source code is included in the book but translations to Ruby, Java and Node.js will be available before the printed edition is available.

Use the code 12ria39 for a 39 percent discount when you pre-order Redis in Action at: http://www.manning.com/carlson.

Dr. Josiah L. Carlson is well known as an active and helpful contributor on the Redis mailing list. He has given talks about real-world uses of Redis, including building a self-service ad network, prioritizing task queues, web spiders, a Twitter analytics platform, real-time search engines and more.

The Cloud Foundry Team
Don’t have a Cloud Foundry account yet?  Sign up for free today

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Cloudfuji Accelerates Delivery of Its Open Source Application Store with Cloud Foundry

Unlike most Platform as a Service offerings today, the open and extensible nature of Cloud Foundry means developers will not be locked into a single cloud or remain beholden to the feature set delivered based on a vendor’s timeline. This month, in our series of guest blogs by application developers, we are featuring the story of Cloudfuji, a modern business application store that uses Cloud Foundry to keep itself nimble and focused on getting to market quickly.

Guest blog by Sean Grove, co-founder of Cloudfuji:

We built Cloudfuji, our modern business application store, on the principle that amazing apps should be 1) easy to make, 2) easy to find, and 3) seamlessly work together. We all have had the experience of data getting stuck in the silos of our individual departmental support systems. Lacking integration, we have been forced to manually copy and paste data from one app to another or using one-off API integrations between the apps. In that scenario, the process of simply finding, trying, provisioning and maintaining the best application can be a nightmare.

With Cloudfuji’s application store, end users can find and instantly launch high-quality open source business applications for bug tracking, agile projects, or CRM, to name a few, that are loosely coupled in a publish-subscribe model with a standardized event schema. This allows them to work not only with each other, but also with the outside world and proprietary legacy applications. This also brings enterprise-wide visibility into events such as real-time notifications on product, sales and marketing activities, regardless of which specific apps are being used. The app development pattern becomes one of small, focused tools that excel at their specific function, and allow other apps to handle anything else. This is the future of the application ecosystem.

The future has demanding technical requirements

 
Figure 1: Cloud Foundry and Cloudfoundry.com used in Stateless Binary Execution Engine
Figure 1: Cloud Foundry and CloudFoundry.com used in Stateless Binary Execution Engine

We use the Rails framework for our front end, which allows us to model our problem domain and iterate extremely quickly. Ruby is mainly used as a light layer for accessing more powerful services underneath such as Mailgun, Redis, RabbitMQ, and AWS S3. Once a user launches an app, we deploy it to the Binary Execution Environment (BEE).

Initially, we had built out our own LXC-based infrastructure, which was a Sisyphean and wasted effort in hindsight. We were continually reinvening a wheel called PaaS that was readily available in amazing forms right off the shelf. We tried a popular Ruby PaaS provider but faced challenges working with their closed platform as we had no influence over their release schedule.

When the open source Cloud Foundry project became available, we quickly got it running within two days without needing any support and had a provisioning system written for it within the week! Since then, we have scaled out our Cloud Foundry instance, built custom end-points to handle any deeper integration or visibility we need into the system and added CloudFoundry.com. The BEE is designed to be completely stateless and the data is decoupled from it. Because the BEE is not associated with data that an app relies on, we can replace it with a new one if the need arises, such as an outage in underlying infrastructure, and temporarily migrate the apps.

Why Cloud Foundry works for us

  • Time-to-market: Although our experience with other PaaS layers led us on a convoluted path, we were able to immediately piggyback on the great work already done on Cloud Foundry. This meant we could stop focusing effort into the lower layers of PaaS–custom kernels, LXC-based para-virtualization, resource management, node health, routing systems, system-level library compatibility and consistency. Time is the biggest killer for startups, and we could easily have stalled in the quagmire of rolling out our own systems. We consider it a bullet dodged.
  • Momentum: Having an open platform where the community can chip in means continuous improvement of the platform in faster cycles. When an open source project has the momentum of a whole community behind it, other projects simply cannot keep up. We’ve already benefited tremendously from work that wasn’t done by us, but by other extremely capable individuals. And in turn, when Cloudfuji needs a new feature, we have the choice of putting it out to the mailing list/community or rolling up our sleeves and writing it ourselves. And for all our respect for other platforms out there, none of them offer anything like this.
  • Target API-identical clouds with a single config setting: Getting a Cloud Foundry system for development in the cloud is a very simple exercise. Although we’re currently also running our own instance of Cloud Foundry, ultimately we expect to be able to offload more to the CloudFoundry.com service when it comes out of beta and can match our demanding needs. We expect that transition to happen almost seamlessly because of the design of both our system and of the Cloud Foundry project.
  • Flexibility to address changes to our business model: Finally, the ability to seamlessly run applications on multiple Cloud Foundry clouds, i.e., move from a public cloud to a private cloud, or vice versa, enables us to plan for a future offering where we can run a Cloudfuji appliance behind the firewall for parties that can’t use public clouds (for example, to meet local compliance needs or because of geographical location).

Taking the next steps

Applications should be easy to create and use. IT services should focus on their strengths and get end users the resources they need, when they need them. We live in a world where we get to take PaaS for granted and leverage great technology that is readily available. We all stand on the shoulders of giants, and we build more amazing products faster than ever before because of it. It’s a cycle we all need to embrace and increasingly reap the benefits from. Choose the open platforms that have strong leadership and excellent communities, and get behind them. It is amazing that we at Cloudfuji are building something of such scale and internal complexity, while staying lean and moving fast with the help of various communities. We couldn’t do all of that without Cloud Foundry.

-Sean Grove

Don’t have a Cloud Foundry account yet?  Sign up for free today

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Building a Real-Time Activity Stream on Cloud Foundry with Node.js, Redis and MongoDB–Part II

In Part I of this series, we showed how to start from the node-express-boilerplate app for real-time messaging and integration with third parties and move towards building an Activity Streams Application via Cloud Foundry. The previous app only sent simple text messages between client and server, but an Activity Streams Application processes, aggregates and renders multiple types of activities. For this new version of the application, my requirements were to show the following:

  • User interactions with the app running on CloudFoundry.com
  • Custom activities created by users on the app’s landing page
  • User activities from GitHub such as creating repositories or posting commits

For this reason, I decided to use Activity Strea.ms which is a generic JSON format to describe social activity around the web. The main components of an activity as defined by the activitystrea.ms spec are: Actor, Verb and Object with an optional Target. An actor performs an action of type verb on an object in a target. For example:

John posted SCALA 2012 Recap on his blog John OSS
10 minutes ago

{
  "published": "2012-05-10T15:04:55Z", 
  "actor": {
    "url": "http://example.org/john", 
    "objectType" : "person",
    "id": "tag:example.org,2011:john", 
    "image": {
    "url": "http://example.org/john/image", 
      "width": 250,
      "height": 250
    },
    "displayName": "John" 
  },
  "verb": "post", 
  "object" : {
    "objectType" : "blog-entry",
    "displayName" : "SCALA 2012 Recap",
    "url": "http://example.org/blog/2011/02/entry", 
    "id": "tag:example.org,2011:abc123/xyz"
  },
  "target" : {
    "url": "http://example.org/blog/", 
    "objectType": "blog",
    "id": "tag:example.org,2011:abc123", 
    "displayName": "John OSS"
  }
}

This generic vocabulary not only helps us transmit activities across apps, but it can also help us model our data store in a flexible fashion and help us with rendering the final HTML or textual output for the client.

I will talk in more detail about how this format helped me with rendering the UX in the next post, but let’s first start discussing the design for the backend and how we arrived at a data flow which includes MongoDB and Redis PubSub:

Modified Architecture

While the initial architecture worked well for a small scale, a real world app must be able to scale to meet the demands.

Persisting the Data

One of the key decisions I had to make was how to store these activities. From my previous experience on a team building a large scale Activity Streams application, I know that you have to optimize for views as the streams are typically placed in the most visited parts of websites, like home pages.

For activity streams, it is best to store all of the information needed to render an activity using a single simple query. Otherwise, given the variety of the activities, you will be joining to many tables and not be able to scale, especially if you are aggregating a variety of actions. Imagine, for example, wanting to render a stream with activities about open source contributions and bugs. In a relational database system you would have tables for:

  • users
  • projects
  • commits
  • bugs

And then you could query this data using an object document mapper like mapper:

User.hasMany("bugs", Bug, "creatorId");
Bug.belongsTo("user", User, "creatorId");

Project.hasMany("bugs", Bug, "projectId");
Bug.belongsTo("project", Project, "projectId");

User.hasMany("commits", Commit, "creatorId");
Commit.belongsTo("user", User, "creatorId");

Project.hasMany("commits", Commit, "projectId");
Commit.belongsTo("project", Project, "projectId");

Bug
  .select('createdAt', 'creatorId', 'name', 'url', 'description')
  .page(0, 10)
  .order('createdAt DESC')
  .all(function(err, bugs) {
   Commit
    .select('createdAt', 'creatorId', 'name', 'url', 'description')
    .page(0, 10)
    .order('createdAt DESC')
    .all(function(err, commits) {
      // coalesce
      ordered = coalesce(bugs, commits, 10);
      uniqueUsers = findUniqueUsers(ordered);
      uniqueProjects = findUniqueProjects(ordered);
      User
        .select('id', 'name', 'url', 'avatar_url')
        .where({ 'id.in': uniqueUsers })
        .all(function(err, users) {
        Project
          .select('id', 'name', 'url', 'avatar_url')
          .where({ 'id.in': uniqueProjects })
          .all(function(err, projects) {
            // finally, now correlate all the data back together
            var activities = [];
            //...
    
        });
      });
    });
  });

As you can see, a classic RDBMS design with very normalized data requires multiple lookups and roundtrips to the server or joins. Even if we had an activities table we would have to do a separate lookup for bugs, commits, users and projects.

With a document database, instead of having multiple lookups you have one or two lookups at most, particularly if you are querying by a field which is indexed. Therefore, a document store is a better fit for my use case than a relational database.

The node-express-boilerplate app did not include a persistence layer. I decided to use MongoDB to store each activity as a document because of the flexibility in schema. I knew I was going to be working with third party data, and I wanted to iterate quickly on the external data we incorporated. The above JSON activity can be stored in its entirety as an activities document and extended. You may notice that this is a very denormalized mechanism for storing data and could cause issues if we needed to update the objects. Luckily, since activities are actions in the past this is not as big of an issue.

Assuming we have an activities collection where the activity document has nested actors and objects you can write code like:

// https://github.com/ciberch/activity-streams-mongoose

 Activity.find().sort('published', 'descending').limit(10).run(
   function (err, docs) {
        var activities = [];
        if (!err && docs) {
            activities = docs;
            res.render('index', {activities: activities});
        }       
    });

});

One of the greatest aids in this project was MongooseJS, which is an Object Document Mapper for Node.js. Mongoose exposes wrapper functions to use MongoDB with async callbacks and easily model schema as well as validators. With Mongoose I was able to define the schema in a few lines of code.

Scaling the real-time syndication

One of the issues with the boilerplate code is that socket.io cannot syndicate messages to other recipients that are connected to a different web server since it stores all the messages in memory. The most logical thing to do was to put in place a proper queueing system that all web servers could connect to. Redis PubSub was my first choice as it is extremely easy to use. As soon as I successfully saved an activity to MongoDB, I streamed it into the proper channel for all subscribers to receive. This was extremely easy to use since we are working with JSON everywhere:


var redis = require("redis");
var publisher = redis.createClient(options.redis.port, options.redis.host);
if(options.redis.pass) {
  publisher.auth(options.redis.pass);
}
 
function publish(streamName, activity) {
  activity.save(function(err) {
  if (!_.isArray(activity.streams)) {
     activity.streams = []
   }
   if (!_.include(activity.streams, streamName)) {
     activity.streams.push(streamName);
   }
   if (!err && streamName && publisher) {
      // Send to Redis PubSub
      publisher.publish(streamName, JSON.stringify(activity));
   }
  });
}

This methodology is particularly useful when you have predefined aggregation methods, such as tags or streams.

Packaging as a Module

One of the great things about the Node.js community is the fact that its very easy to contribute to the Open Source Community thanks to NPM and its Registry. I could not find any lightweight activity stream libraries, so I went ahead and submitted the persistence logic as a new module: activity-streams-mongoose.

Once you have a proper package.json, you can just do this command to publish it.

npm publish

Once you have the module published you can follow the steps outlined in this pull request: 

https://github.com/ciberch/node-express-boilerplate/pull/1/ to get your app upgraded to persist activities. You can easily run this app on CloudFoundry.com by creating and binding Redis and MongoDB instances as you deploy your application. Furthermore, scaling the app can be simply done with the ‘vmc instances‘ command.

Conclusion

It is important to take time and select the proper database type for the application you are building. While RDBMS systems are the most popular, they are not always the best for the job. In this scenario, using a document store, namely MongoDB, helped us increase scalability and write simpler code.

Another step in taking your app to the cloud is making it stateless so that if instances are added or deleted, users don’t lose their sessions or messages. For this app, using Redis PubSub helped us solve the challenge of communicating across app instances. Finally, contributing to open source initiatives can not only save you time, but can also get more eyeballs on your code and help you be thorough in your testing. In this first module, I used nodeunit and was able to catch bugs during tests and from user reports. In the next blog post, I will do a final walk through of the app with a deep dive into client-side components.

Monica Wilkinson, Cloud Foundry Team

Sign up for Cloud Foundry today to build an app in Node.js with MongoDB and Redis

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email