Cloud Foundry Blog

About Monica Wilkinson

Monica Wilkinson has been writing software for over a decade and for a variety of large scale employers such as IBM, MySpace, Facebook and VMware. Monica is an Open Web Standards and Data Portability Advocate and has contributed to many open specifications around Activity Streams and real-time notifications. Today she works as a Developer Advocate at Cloud Foundry where she is directly responsible for the website, documentation, open source gallery and working with developers and partners to enhance the platform

Build a Real Time Activity Stream on Cloud Foundry with Node.js, Redis and MongoDB 2.0 – Part III

In Part II of this series, we covered the architecture needed for persisting the Activity Streams to MongoDB and fanning it out in real-time to all the clients using Redis PubSub.

Since then, some exciting new Node.js features for Cloud Foundry were launched. In addition, the MongoDB version on Cloud Foundry has been upgraded to 2.0.

In this blog post we will cover how to:

  • Use Mongoose-Auth to store basic user information, including information from Facebook, Twitter, and Github, and how we made this module with native dependencies work on Cloud Foundry
  • Use Mongo GridFS and ImageMagick to store user uploaded photos and profile pictures
  • Perform powerful stream filtering, thanks to new capabilities exposed in MongoDB 2.0
  • Update the UX of the app to become a real-time stream client using Bootstrap, Backbone.js and Jade.

Offering SSO and Persisting Users

The requirement for this boilerplate Activity Streams App was to allow users to log in with Facebook, Twitter, or Github, and to persist this user data in the database.

I am sure many of you are familiar with how to store user information in a database and perhaps even in MongoDB. What some of you may not have tried is storing third party user information like the one we obtain when users log in with Facebook, Twitter or Github. In a relational database, you would probably store this information across multiple records in multiple tables (e.g., users, services, accounts, or auth). However, with MongoDB you can use embedded documents and store all this information in a single document, thus reducing the complexity of the operation. I found that this was very easy to do using @bnoguchi’s mongoose-auth which decorates the Mongoose User Schema with the third party services fields.

My previous version of the app was using another popular module from Brian called everyauth which handled SSO very well, but it did not persist the user info. It was fairly straightforward to upgrade from everyauth to mongoose-auth.

First, I updated a helper module I created called activity-streams-mongoose to offer a User Schema and made it possible to extend all schemas. Then I loaded mongoose-auth and decorated that schema with the specifics needed. You can see the exact code changes in this diff. The key part in the upgrade was to normalize the user data as it was saved. I did this by leveraging the pre-save callback MongooseJS offers.

Here is a snippet of the code:

var streamLib = require('activity-streams-mongoose')({
  mongoUrl: app.siteConf.mongoUrl,
  redis: app.siteConf.redisOptions,
  defaultActor: defaultAvatar
});

var authentication = new require('./authentication.js')(streamLib, app.siteConf);

// Moved normalization to only be done on pre save
streamLib.types.UserSchema.pre('save', function (next) {
  var user = this;
  var svcUrl = null;
  if (user.fb && user.fb.id) {
    user.displayName = "FB: " + user.fb.name.full;
    asmsDB.ActivityObject.findOne().where('url', facebookHash.url).exec(function(err, doc){
      if (err) throw err;
      user.author = doc._id;
      // Need to fetch the users image...
      https.get({
        'host': 'graph.facebook.com'
        , 'path': '/me/picture?access_token='+ user.fb.accessToken
      }, function(response) {
        user.image = {url: response.headers.location};
        next();
      }).on('error', function(e) {
        next();
      });
    })
  } else  {
    if (user.github && user.github.id) {
      user.displayName = "GitHub: " + user.github.name;
      var avatar = 'http://1.gravatar.com/avatar/'+ user.github.gravatarId + '?s=48'
      user.image = {url: avatar};
      svcUrl = githubHash.url;
    } else if (user.twit && user.twit.id) {
      user.displayName = "Twitter: " + user.twit.name;
      user.image = {url: user.twit.profileImageUrl};
      svcUrl = twitterHash.url;
    }

    if(!user.actor) {
      asmsDB.ActivityObject.findOne().where('url', svcUrl).exec(function(err, doc){
        user.author = doc;
        next();
      });
    } else {
      next();
    }
  }
  });

var asmsDB = new streamLib.DB(streamLib.db, streamLib.types);
streamLib.asmsDB = asmsDB;

MongoDB 2.0 and Cloud Foundry

For those of you not familiar with MongoDB 2.0, one neat feature is that it supports Multi-location Documents. I was also able to add a location property to the core Activity Object in the activity-streams-mongoose module which can be used with the Activities, Activity Objects and User collections to allow performing Geo Queries.

var LocationHash = {
  displayName: {type: String},
  position: {
    latitude: Number,
    longitude: Number
  }
};

var ActivityObjectHash = {
  id: {type: String},
  image: {type: MediaLinkHash, default: null},
  icon: {type: MediaLinkHash, default: null},
  displayName: {type: String},
  summary: {type: String},
  content: {type: String},
  url: {type:String},
  published: {type: Date, default: null},
  objectType: {type: String},
  updated: {type: Date, default: null},
  location: LocationHash,
  fullImage : {type: MediaLinkHash, default: null},
  thumbnail : {type: MediaLinkHash, default: null},
  author : {type: ObjectId, ref: "activityObject"},
  attachments : [{type: ObjectId, ref: 'activityObject'}],
  upstreamDuplicates : [{type: String, default: null}],
  downstreamDuplicates : [{type: String, default: null}]
};

Another interesting geo feature in MongoDB 2.0 is polygonal search. This means that you can search whether a given object is in a specified area by providing the area points. For example, this can be helpful when you want to see if certain objects, like houses, are in a certain zip code.

Working with Node.js Modules with Native Dependencies

Mongoose-Auth requires a module bcrypt which has a native dependency (it gets compiled locally when you do a Node Package Manager (npm) install and the binary placed in the mode_modules directory). If you are working from a Mac or Windows and deploying to the cloud you can run into issues by including your node_modules folder. Luckily, now there is support in Cloud Foundry for excluding the node_modules folder and having Cloud Foundry fetch and build the npm modules server-side.

$ npm shrinkwrap
$ echo "{ignoreNodeModules: true}" > cloudfoundry.json
$ vmc update

For more info you can read this blog post from Maria and/or watch this helpful video from Raja.

My recommendation is that when you start a new Node.js App make sure to add the cloudfoundry.json file with skip node_modules folder set to true so all the native dependencies are built directly on Cloud Foundry. Also don’t forget to run npm shrinkwrap if you change package.json.

User Uploaded Photos with ImageMagick and Mongo GridFS

One of the most engaging objects to show in a web app like this
Activity Stream boilerplate app are photos. Apps like Instagram and Pinterest have taken photo sharing to a whole new level and have completely redesigned the UX of photo feeds. We wanted to help developers build Activity Stream apps with rich photo sharing and thus needed a library to help us manipulate images and a place to store all these images. Since we were already using MongoDB, I decided to leverage Mongo GridFS to store the images.

I had previously worked storing photos in GridFS but it was from Ruby. It was a little bit more challenging to find the right tools in Node.js. I found a lot of npm modules which seemed to handle it for me, but found that they were either unfinished or loaded several additional components which were incompatible. I really wanted to keep it simple so I ended up following the documentation on the official Node.js Mongo DB driver and creating a few routes to handle creating photos and viewing photos.

Here is how I ingested the photos into Mongo GridFS:

var im = require('imagemagick');
var Guid = require('guid');
var siteConf = require('./lib/getConfig');
var lib = new require('./lib/asms-client.js')(app, cf).streamLib;

function ingestPhoto(req, res, next){
  if (req.files.image) {
    im.identify(req.files.image.path, function(err, features){
      if (features && features.width) {
        var guid = Guid.create();
        // Concatenating name to guid guarantees that we always have
        // unique file names
        var fileId = guid + '/' + req.files.image.name;
        // The GridStore class is the equivalent of a File class but has the
        // added benefit of allowing you to store metadata
        var gs = lib.GridStore(lib.realMongoDB, fileId, "w", {
          content_type : req.files.image.type,
          // metadata is optional
          metadata : {
            author: req.session.user._id,
            public : false,
            filename: req.files.image.name,
            path: req.files.image.path,
            width: features.width,
            height: features.height,
            format: features.format,
            size_kb: req.files.image.size / 1024 | 0
          }
        });
         // This command copies the file from the file system(temp dir)
         // to GridFS.
         // GridFS supports any file size by breaking it into chunks
         // behind the scenes
        gs.writeFile(req.files.image.path, function(err, doc){
          if (err) {
            next(err);
          } else {
            if (! req.photosUploaded) {
              req.photosUploaded = {};
            }
            // I have another express route to serve the photos by fileId
            var url = siteConf.uri + "/photos/" + fileId;
            // Add the results of the upload to the chain
            req.photosUploaded['original'] = {url : url, metadata: gs.metadata};
            req.nextSizeIndex = 0;
            next();
          }
        });
      } else {
        if (err) throw err;
        throw(new Error("Cannot get width for photo"));
      }
    });
  } else {
    next(new Error("Could not find the file"));
  }
};

The above snippet shows how I used ImageMagick to get the photo dimensions. ImageMagick is an amazing open source software suite for manipulating images and there is an easy to use ImageMagick node module to expose its functionality.

An open source project backed by years of continual development, ImageMagick supports about 100 image formats and can perform impressive operations such as creating images from scratch; changing colors; stretching, rotating, and overlaying images; and overlaying text on images.

ImageMagick.org

And this snippet shows how to produce a new image of smaller size:

var im = require('imagemagick');
var Guid = require('guid');

function reducePhoto(req, res, next){
    var photoIngested = req.photosUploaded['original'];
    if (photoIngested) {
        var sizeName = sizes[req.nextSizeIndex].name;
        var destPath = photoIngested.metadata.path + '-' + sizeName ;
        var nameParts = photoIngested.metadata.filename.split('.');
        var newName = nameParts[0] + '-' + sizeName + '.' + nameParts[1];
        var width = sizes[req.nextSizeIndex].width;

        im.resize({
          srcPath: photoIngested.metadata.path,
          dstPath: destPath,
          width:   width
        }, function(err, stdout, stderr){
          if (err) {
              next(err);
          } else {
            console.log("The photo was resized to " + width + "px wide");
            var guid = Guid.create();
            var fileId = guid + '/' + newName;
            var ratio = photoIngested.metadata.width / width;
            var height = photoIngested.metadata.height / ratio;
            var gs = asmsClient.streamLib.GridStore(asmsClient.streamLib.realMongoDB, fileId, "w", {
                  content_type : req.files.image.type,
                  metadata : {
                      author: req.session.user._id,
                      public : false,
                      filename: newName,
                      width: width,
                      height: height,
                      path: destPath
                  }
              });
              gs.writeFile(destPath, function(err, doc){
                  if (err) {
                    next(err);
                  } else {
                      var url = siteConf.uri + "/photos/" + fileId;
                      req.photosUploaded[sizeName] = {url : url, metadata: gs.metadata};
                      req.nextSizeIndex = req.nextSizeIndex + 1;
                      next();
                  }
              });
          }
        });
    }
};

The only gotcha on Cloud Foundry was that it did not set for us the environment variable TMP which is used by the formidable module to offer a temp directory where the files are first uploaded. Once I set it using env-add the problem was solved.

bash-3.2$ vmc files asms app/tmp

   36272c476f10ecbf0e3a99481a8d365b         50.6K

All apps on Cloud Foundry have permissions to write files to its own directory or a subdirectory. Set the environment variable TMP to a subdir if you are working with express/formidable to have it handle form uploads.

More Powerful Queries with MongoDB
Part of the beauty of the ActivityStrea.ms format is that it provides a lot of metadata about each activity which can then be searched, aggregated and pivoted to draw interesting conclusions about certain topics and trends.

Examples of these fields are: Hashtags or Topics, Verb, Object Types, Actor Types and Location.

The first step to allowing users to analyze the stream data is providing them with a map of their universe. This means allowing them to see all the possible values for each field. For example, if we are talking about hashtags, then we would show our users that the population has used so far ten hashtags. We would then reveal the distribution in usage across everyone. Then we would provide our users with the ability to drill in by segmenting via actor type or location, for example. This could yield interesting results showing where certain topics are most popular.

var getDistinct = function (req, res, next, term, init){
  var key = 'used.' + term;
  req[key] = init ? init : [];
  var query = {streams: req.session.desiredStream};
  asmsDB.Activity.distinct(term, query, function(err, docs) {
    if (!err && docs) {
      _.each(docs, function(result){
        req[key].push(result);
      });
      next();
    } else {
      next(new Error('Failed to fetch distinct ' + term));
    }
  });
}

//..

function getDistinctVerbs(req, res, next){
  getDistinct(req, res, next, 'verb');
};

function getDistinctActors(req, res, next){
  getDistinct(req, res, next, 'actor');
};

function getDistinctObjects(req, res, next){
  getDistinct(req, res, next, 'object', ['none']);
};

function getDistinctObjectTypes(req, res, next){
  getDistinct(req, res, next, 'object.object.type', ['none']);
};

function getDistinctActorObjectTypes(req, res, next){
  getDistinct(req, res, next, 'actor.object.type', ['none']);
};

//...

app.get('/streams/:streamName', loadUser, getDistinctStreams, getDistinctVerbs, getDistinctObjects, getDistinctActors,
  getDistinctObjectTypes, getDistinctActorObjectTypes, getDistinctVerbs, getMetaData, function(req, res) {

    asmsClient.asmsDB.Activity.getStream(req.params.streamName, 20, function (err, docs) {
    var activities = [];
    if (!err && docs) {
      activities = docs;
    }
    req.streams[req.params.streamName].items = activities;
    var data = {
      currentUser: req.user,
      streams : req.streams,
      desiredStream : req.session.desiredStream,
      actorTypes: req.actorTypes,
      objectTypes : req.objectTypes,
      verbs: req.verbs,
      usedVerbs: req['used.verb'],
      usedObjects: req['used.object'],
      usedObjectTypes: req['used.object.type'],
      usedActorObjectTypes: req['used.actor.object.type'],
      usedActors: req['used.actor']
    };
    if (req.is('json')) {
      res.json(data);

    } else {
       res.render('index', data);
    }
  });

});

A Robust Activity Stream UX

The initial node-express-boilerplate app had some basic jQuery used to show plain text messages and users’ photos. In the new app, we have much richer messages and the ability to post and filter them. For this reason, we decided to use some of the great client-side open source tools available today.

After some consideration, we ended up using these three tools:

  1. Backbone.js: A lightweight client-side MVC framework
  2. Bootstrap: A set of CSS, HTML and Javascript components which help developers produce great looking apps, without needing to start from scratch
  3. Jade: A templating language with the help of ClientJade

Templating, Markup and CSS

If you are a web developer, you probably know that there are many choices in tools to render HTML dynamically. A good number of web developers prefer to use templating engines to render HTML because these help you produce more readable code. Using declarative programming, you can interpolate variables and directives in the HTML. The most popular templating engine for Node.js is Embedded JavaScript (EJS), which resembles ERB in Ruby. This is what the node-express-boilerplate project included. When I started working with Node.js, I found many more choices that were not present in the Ruby world such as: Mustache, Handlebars, Dust and Jade. In fact, LinkedIn wrote an excellent blog post discussing the many alternative choices for templating engines.

I ended up selecting Jade because I already liked HAML, which is very similar to Jade in its terseness. Both templating languages use indentation to understand the hierarchy of elements. Jade is even terser than HAML because it removes the need to put % in front of the HTML tags.

Another cool thing about Jade is that it already had support for server-side and client-side rendering via ClientJade. Here is how I broke out the views allowing easy extension of the object types.

I then compiled the Jade views into js for faster client-side rendering with ClientJade.

clientjade views/*.* > public/js/templates.js

Once this was done, I simply included templates.js in the list of files to be minified and used it like this from Backbone:

var ActivityCreateView = Backbone.View.extend({
    el: '#new_activity',
    initialize: function(){
        _.bindAll(this, 'newAct', 'render', 'changeType', 'includeLocation', 'sendMessage');

        this.trimForServer = App.helper.trimForServer;

        var streamName = this.$el.find('#streamName').val();
        var verb = this.trimForServer(this.$el.find('#verb-show'));
        var objectType = this.trimForServer(this.$el.find('#object-show'));

        this.newAct(streamName, verb, objectType);
        this.render();
    },
    events: {
        "click .type-select" : "changeType",
        "click #includeLocation" : "includeLocation",
        "click #send-message" : "sendMessage"
    },
    newAct : function(streamName, verb, objectType) {
        this.streamName = streamName;
        this.model = new Activity({
            object: {content: '', objectType: objectType, title: '', url: ''},
            verb: verb,
            streams: [streamName]
        });
    },
    render: function(){
      var actData = this.model.toJSON();
      this.$el.find("#specific-activity-input").html(jade.templates[actData.object.objectType]());

      return this; // for chainable calls, like .render().el
    },
    changeType : function(event) {
        console.log(event);
        var itemName = $(event.target).data("type-show");
        if (itemName) {
            $("#" + itemName)[0].innerHTML = event.target.text + "  ";
            var val = this.trimForServer(event.target.text);
            if (itemName == "verb-show") {
                this.model.set('verb', val);
            } else {
                var obj = this.model.get('object');
                obj.objectType = val;
                this.model.set('object', obj);
            }
        }
        this.render();
    },
    includeLocation : function(event) {
        if (navigator.geolocation) {
            navigator.geolocation.getCurrentPosition(App.helper.getLocation);
        } else {
            alert("Geo Location is not supported on your device");
        }
    },
    sendMessage : function() {
        console.log("In send message");

        var obj = this.model.get('object');
        obj.content = $("#msg").val();
        obj.url = $('#url').val();
        obj.title = $('#title').val();
        obj.objectType = this.trimForServer($('#object-show'));
        this.model.set('object', obj);

        var streamName = $('#streamName').val();
        this.model.set('streams', [streamName]);

        var verb = this.trimForServer($('#verb-show'));
        this.model.set('verb', verb);

        if (this.model.isValid()) {
            if (this.model.save()) {
                this.newAct(streamName, verb, obj.objectType);
                this.render();
            }
        }

    }

});

The original node-express-boilerplate app was using 960gs and jQuery. However, this activity streams boilerplate app is a bit more complex so I switched to using Twitter’s Bootstrap as the first step. This provided me a nice way to the nav bar, hero unit, modals, drop downs and so on. Also it was easy enough to go from using one grid system to another. For the moment the app is using the default grid system but it can easily be made to use Bootstrap’s Fluid Layout and Responsive Design enhancements.

Manipulating Data on the Client with Backbone.js

Instead of having a large number of individual jQuery handlers on HTML elements, Backbone.js helps you break your UX apart into components called Views which are more similar to Controllers when thinking of server-side MVC frameworks. These Backbone views can take Backbone models and templates and render them, as well as listen for events on the elements that comprise the view. You can see in the snippet above we have a Backbone view that works with a Backbone model for an Activity.
Backbone Models are pretty simple classes you create by detailing all the properties for the model and validation rules. Here is the code for the Activity Backbone Model which is also used to create the ActivityStreamView:

var Activity = Backbone.Model.extend({
    url : "/activities",
    // From activity-streams-mongoose/lib/activityMongoose.js
    defaults: {
        verb: 'post',
        object: null, //ActivityObject
        actor: null, //ActivityObject
        url: '',
        title: '',
        content: '',
        icon: null, // MediaLinkHash
        target: null, //ActivityObject
        published: Date.now,
        updated: Date.now,
        inReplyTo: null, //Activity
        provider: null, //ActivityObject
        generator: null, //ActivityObject
        streams: ['firehose'],
        likes: {},
        likes_count: 0,
        comments: [],
        comments_count: 0,
        userFriendlyDate: 'No idea when'
    },
    validate: function(attrs) {

    if (! attrs.object) {
        return "Object is missing"
    }
    if (!attrs.object.title) {
      return "Title is missing";
    }
  }
});

You can then easily instantiate passing a javascript bare object. In the example below, I took the output from socket.io when a message comes in, converted it to an object and added it to the Backbone collection associated with the Stream View:

var streamView = new ActivityStreamView();
App.socketIoClient.on('message', function(json) {
  var doc = JSON.parse(json);
    if (doc) {
      streamView.collection.add(new Activity(doc));
    }
});

Working with Backbone.js was very fun but it definitely takes some time to convert all your logic to using Backbone Views and Models. In this example, I only used Backbone.js for a subset of the app. Having Jade as my templating language and Node.js allowed me to share code between server and client. If you have a trivial application, you may not need to use Backbone.js and may be able to keep it simple with express and server side templates. In the case of this Activity Streams application, which syndicates in real time and offers the ability to react to any new item, it made sense to use Backbone.js. It also allowed me to provide the hooks for the next iteration of this app. After all, this is a boilerplate app.

Remember that it is very easy to push updates to your application on Cloud Foundry as you make progress doing:

bash-3.2$ vmc update

Updating application 'asms'...
Uploading Application:
  Checking for available resources: OK
  Processing resources: OK
  Packing application: OK
  Uploading (380K): OK   
Push Status: OK
Stopping Application 'asms': OK
Staging Application 'asms': OK                                                  
Starting Application 'asms': OK

Conclusion

As a contributor to the ActivityStrea.ms specification, I find it necessary (and fun) to get my hands dirty building the apps which use open standards to see where there are limitations and what technologies can make it easier. Working with MongoDB proved to be the right choice, giving me the ability to do complex queries, aggregation and full modeling of my objects which are needed for quickly painting the stream. I am really happy that MongoDB 2.0 is now running on Cloud Foundry because a lot of the Object Document Mappers like Mongoid in Ruby only support 2.x. This is a very exciting time to be a developer, as things are moving very fast and there is ample opportunity to make a difference via open source.

Here is the final architecture of the app at a very high-level:

Activity Streams Boilerplate App Architecture

To try this application, you can visit it here. To get the full code, you can clone or fork it here.

–Monica Wilkinson

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Open Web Foundation Agreement for Activity Streams Signed

At Cloud Foundry we care about making life easier for web developers and API developers. Open source helps developers reuse code. Open standards help developers reuse protocols and schemas. We are pleased to announce that VMware has furthered its commitment to open standards with a signed agreement on Activity Streams specifications. VMware signed the Open Web Foundation Agreement (OWFa) for the JSON and Atom Activity Streams specifications, which gives application developers confidence in implementing these open specifications when building web applications and services.

What is OWFa?

In short, the Open Web Foundation Agreement is a promise made by the signing company that they will not assert IP Claims over the work covered in the specification.

OWFa 1.0 grants perpetual (for the duration of the applicable copyright), worldwide, non-exclusive, no-charge, royalty-free, copyright license, without any obligation for accounting to me, to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, distribute, and implement any Contribution to the full extent of my copyright interest in the Contribution

What is Activity Streams?

Activity Streams is a generic schema to represent social activity around the web. There are currently two specifications: one for JSON and one for Atom. The main components of an activity as defined by the activitystrea.ms spec are: Actor, Verb and Object with an optional Target. An actor performs an action of type verb on an object in a target. For more information on how this standard can help you build real-time web applications see this blog post by Cloud Foundry Developer Advocate, Monica Wilkinson who is a contributor on the ActivityStrea.ms Specification.

Many social networks in the consumer and enterprise space now offer an Activity Streams API and have implemented the specification. In particular, Google Plus’ API implements the Activity Strea.ms JSON Specification thanks to the efforts of people like Will Norris who is also a contributor in the specification.

Over the course of development of the specification there have been contributions from developers at: MySpace, Facebook, Microsoft, Google, VMWare, Mozilla, StatusNet, IBM and others. It is important that contributors get their companies to sign the OWFa agreement so implementors can be confident when building their applications. To get a full list of the signees checkout github.

Next Steps

For developers To get started building an Activity Streams real-time application you can checkout this tutorial.

For contributors We are evaluating starting an official IETF charter for a complete Activity Streams Protocol which would not only cover the schema and syntax, it would also cover the REST endpoints and streaming. If you are interested in becoming a contributor or implementor please join the Activity Streams mailing list.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Building a Real-Time Activity Stream on Cloud Foundry with Node.js, Redis and MongoDB–Part II

In Part I of this series, we showed how to start from the node-express-boilerplate app for real-time messaging and integration with third parties and move towards building an Activity Streams Application via Cloud Foundry. The previous app only sent simple text messages between client and server, but an Activity Streams Application processes, aggregates and renders multiple types of activities. For this new version of the application, my requirements were to show the following:

  • User interactions with the app running on CloudFoundry.com
  • Custom activities created by users on the app’s landing page
  • User activities from GitHub such as creating repositories or posting commits

For this reason, I decided to use Activity Strea.ms which is a generic JSON format to describe social activity around the web. The main components of an activity as defined by the activitystrea.ms spec are: Actor, Verb and Object with an optional Target. An actor performs an action of type verb on an object in a target. For example:

John posted SCALA 2012 Recap on his blog John OSS 10 minutes ago

 
{ 
  "published": "2012-05-10T15:04:55Z", 
  "actor": { 
    "url": "http://example.org/john", 
    "objectType" : "person", 
    "id": "tag:example.org,2011:john", 
    "image": { 
    "url": "http://example.org/john/image", 
      "width": 250, 
      "height": 250 
    }, 
    "displayName": "John" 
  }, 
  "verb": "post", 
  "object" : { 
    "objectType" : "blog-entry", 
    "displayName" : "SCALA 2012 Recap", 
    "url": "http://example.org/blog/2011/02/entry", 
    "id": "tag:example.org,2011:abc123/xyz" 
  }, 
  "target" : { 
    "url": "http://example.org/blog/", 
    "objectType": "blog", 
    "id": "tag:example.org,2011:abc123", 
    "displayName": "John OSS" 
  } 
} 

This generic vocabulary not only helps us transmit activities across apps, but it can also help us model our data store in a flexible fashion and help us with rendering the final HTML or textual output for the client.

I will talk in more detail about how this format helped me with rendering the UX in the next post, but let’s first start discussing the design for the backend and how we arrived at a data flow which includes MongoDB and Redis PubSub:

Modified Architecture

While the initial architecture worked well for a small scale, a real world app must be able to scale to meet the demands.

Persisting the Data

One of the key decisions I had to make was how to store these activities. From my previous experience on a team building a large scale Activity Streams application, I know that you have to optimize for views as the streams are typically placed in the most visited parts of websites, like home pages.

For activity streams, it is best to store all of the information needed to render an activity using a single simple query. Otherwise, given the variety of the activities, you will be joining to many tables and not be able to scale, especially if you are aggregating a variety of actions. Imagine, for example, wanting to render a stream with activities about open source contributions and bugs. In a relational database system you would have tables for:

  • users
  • projects
  • commits
  • bugs

And then you could query this data using an object document mapper like mapper:

User.hasMany("bugs", Bug, "creatorId");
Bug.belongsTo("user", User, "creatorId");

Project.hasMany("bugs", Bug, "projectId");
Bug.belongsTo("project", Project, "projectId");

User.hasMany("commits", Commit, "creatorId");
Commit.belongsTo("user", User, "creatorId");

Project.hasMany("commits", Commit, "projectId");
Commit.belongsTo("project", Project, "projectId");

Bug
  .select('createdAt', 'creatorId', 'name', 'url', 'description')
  .page(0, 10)
  .order('createdAt DESC')
  .all(function(err, bugs) {
   Commit
    .select('createdAt', 'creatorId', 'name', 'url', 'description')
    .page(0, 10)
    .order('createdAt DESC')
    .all(function(err, commits) {
      // coalesce
      ordered = coalesce(bugs, commits, 10);
      uniqueUsers = findUniqueUsers(ordered);
      uniqueProjects = findUniqueProjects(ordered);
      User
        .select('id', 'name', 'url', 'avatar_url')
        .where({ 'id.in': uniqueUsers })
        .all(function(err, users) {
        Project
          .select('id', 'name', 'url', 'avatar_url')
          .where({ 'id.in': uniqueProjects })
          .all(function(err, projects) {
            // finally, now correlate all the data back together
            var activities = [];
            //...
    
        });
      });
    });
  });

As you can see, a classic RDBMS design with very normalized data requires multiple lookups and roundtrips to the server or joins. Even if we had an activities table we would have to do a separate lookup for bugs, commits, users and projects.

With a document database, instead of having multiple lookups you have one or two lookups at most, particularly if you are querying by a field which is indexed. Therefore, a document store is a better fit for my use case than a relational database.

The node-express-boilerplate app did not include a persistence layer. I decided to use MongoDB to store each activity as a document because of the flexibility in schema. I knew I was going to be working with third party data, and I wanted to iterate quickly on the external data we incorporated. The above JSON activity can be stored in its entirety as an activities document and extended. You may notice that this is a very denormalized mechanism for storing data and could cause issues if we needed to update the objects. Luckily, since activities are actions in the past this is not as big of an issue.

Assuming we have an activities collection where the activity document has nested actors and objects you can write code like:

// https://github.com/ciberch/activity-streams-mongoose

 Activity.find().sort('published', 'descending').limit(10).run(
   function (err, docs) {
        var activities = [];
        if (!err && docs) {
            activities = docs;
            res.render('index', {activities: activities});
        }       
    });

});

One of the greatest aids in this project was MongooseJS, which is an Object Document Mapper for Node.js. Mongoose exposes wrapper functions to use MongoDB with async callbacks and easily model schema as well as validators. With Mongoose I was able to define the schema in a few lines of code.

Scaling the real-time syndication

One of the issues with the boilerplate code is that socket.io cannot syndicate messages to other recipients that are connected to a different web server since it stores all the messages in memory. The most logical thing to do was to put in place a proper queueing system that all web servers could connect to. Redis PubSub was my first choice as it is extremely easy to use. As soon as I successfully saved an activity to MongoDB, I streamed it into the proper channel for all subscribers to receive. This was extremely easy to use since we are working with JSON everywhere:

var redis = require("redis");
var publisher = redis.createClient(options.redis.port, options.redis.host);
if(options.redis.pass) {
  publisher.auth(options.redis.pass);
}
 
function publish(streamName, activity) {
  activity.save(function(err) {
  if (!_.isArray(activity.streams)) {
     activity.streams = []
   }
   if (!_.include(activity.streams, streamName)) {
     activity.streams.push(streamName);
   }
   if (!err && streamName && publisher) {
      // Send to Redis PubSub
      publisher.publish(streamName, JSON.stringify(activity));
   }
  });
}

This methodology is particularly useful when you have predefined aggregation methods, such as tags or streams.

Packaging as a Module

One of the great things about the Node.js community is the fact that its very easy to contribute to the Open Source Community thanks to NPM and its Registry. I could not find any lightweight activity stream libraries, so I went ahead and submitted the persistence logic as a new module: activity-streams-mongoose.

Once you have a proper package.json, you can just do this command to publish it.

npm publish

Once you have the module published you can follow the steps outlined in this pull request: 

https://github.com/ciberch/node-express-boilerplate/pull/1/ to get your app upgraded to persist activities. You can easily run this app on CloudFoundry.com by creating and binding Redis and MongoDB instances as you deploy your application. Furthermore, scaling the app can be simply done with the ‘vmc instances‘ command.

Conclusion

It is important to take time and select the proper database type for the application you are building. While RDBMS systems are the most popular, they are not always the best for the job. In this scenario, using a document store, namely MongoDB, helped us increase scalability and write simpler code.

Another step in taking your app to the cloud is making it stateless so that if instances are added or deleted, users don’t lose their sessions or messages. For this app, using Redis PubSub helped us solve the challenge of communicating across app instances. Finally, contributing to open source initiatives can not only save you time, but can also get more eyeballs on your code and help you be thorough in your testing. In this first module, I used nodeunit and was able to catch bugs during tests and from user reports. In the next blog post, I will do a final walk through of the app with a deep dive into client-side components.

Monica Wilkinson, Cloud Foundry Team

Sign up for Cloud Foundry today to build an app in Node.js with MongoDB and Redis

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Building a Real Time Activity Stream on Cloud Foundry with Node.js, Redis and MongoDB – Part I

Cloud Foundry provides developers with several choices of frameworks, application infrastructure services and deployment clouds. This approach to providing a platform as a service enables the fast creation of useful cloud applications that take advantage of polyglot programming and multiple data services. As an example of how this enables us to be more productive, I was able to use Node.js, Redis, and MongoDB to create an Activity Streams application in a short time, despite the fact that I mainly develop Web Applications in Ruby. Based on the developer interest after a demo of this application at NodeSummit 2012 and MongoSF 2012,  I was inspired to do a 3 part blog series that fully details the creation, deployment, and scaling of this application on CloudFoundry.com. The application is based on an interesting project I came across recently called “node-express-boilerplate” developed by @mape which is a full-featured but simple boilerplate Node.js app. Given my previous experience working on Social Networking software and Open Web Standards, I decided to morph this app into an Activity Streams sample application.

[]3

Initial Boilerplate Architecture without MongoDB or Redis PubSub

“Node-express-boilerplate” is a starter application written in Node.js which showcases how users can log into a website using Facebook, Twitter or GitHub, display basic profile info from those sites, and have real-time communication between server and clients. In this first blog post, we are going to review the components of the boilerplate application so that developers can learn how to use them for other applications and  deploy locally as well as on Cloud Foundry.

A tour of the Boilerplate Components Before we move to the setup, here are some code highlights explaining the various parts of the application.

Express MVC Framework

Working with the Express framework is very easy to do because its creator, @tjholowaychuk, has made it simple yet flexible to use while providing good documentation and screencasts. You can use any web templating engine. For my project, I switched from using Embedded Javascript (EJS) to jade which is even terser than HTML Abstraction Markup Language (Haml).

 
// Settings 

app.configure(function() { 
  app.set('view engine', 'jade'); 
  app.set('views', __dirname+'/views'); 
});

Route Middleware

This middleware allows us to chain functions to pre-process the request. In this example I am showing how to invoke loadUser to read the current user, getDistinctStreams to get all the different topics and getMetaData to get the list of verbs and object types.

app.get('/streams/:streamName', loadUser, getDistinctStreams, getMetaData, function(req, res) {
  asmsDB.getActivityStream(req.params.streamName, 20, function (err, docs) { 
    var activities = []; if (!err && docs) { 
      activities = docs; 
    } 
    req.streams[req.params.streamName].items = activities; res.render('index', { 
      currentUser: req.user, providerFavicon: req.providerFavicon, streams : req.streams, desiredStream :   
      req.session.desiredStream, objectTypes : req.objectTypes, verbs: req.verbs 
    }); 
  }); 
});

Session Management

In the boilerplate application, sessions are stored in Redis, allowing us to scale to more than one instance. Not only does this application properly manage user sessions, it also supports single sign-on with  

GitHub, Facebook and Twitter via module everyauth.

Packaging Assets In real world applications today, users expect immediate feedback and this cannot be done if your client is executing multiple http requests to render the app. This boilerplate sample uses

@mape‘s module connect-assetmanager which minifies and bundles CSS and Javascript assets so the responses are very fast. This module is very flexible and allows pre and post manipulation of assets. Here is an example which packages the js and the css files

var assetManager = require('connect-assetmanager'); 

var assetHandler = require('connect-assetmanager-handlers'); 

... 

var assetsSettings = { 
  'js': { 
    'route': /\/static\/js\/[a-z0-9]+\/.*\.js/ , 
    'path': './public/js/' , 
    'dataType': 'javascript' , 
    'files': [ 
      'http://code.jquery.com/jquery-latest.js' , 
      siteConf.uri+'/socket.io/socket.io.js' // special case since the socket.io module serves its own js , 
      'jquery.client.js' 
    ] , 
    'debug': true , 
    'postManipulate': { 
      '^': [ 
        assetHandler.uglifyJsOptimize , 
        function insertSocketIoPort(file, path, index, isLast, callback) {          
          callback(file.replace(/.#socketIoPort#./, siteConf.port)); 
        } 
      ] 
    } 
  } , 
  'css': { 
    'route': /\/static\/css\/[a-z0-9]+\/.*\.css/ , 
    'path': './public/css/' , 
    'dataType': 'css' , 
    'files': [ 'reset.css' , 'client.css' ] , 
    'debug': true , 
    'postManipulate': { 
      '^': [ 
        assetHandler.fixVendorPrefixes , 
        assetHandler.fixGradients , 
        assetHandler.replaceImageRefToBase64(__dirname+'/public') , 
        assetHandler.yuiCssOptimize 
      ] 
    } 
  } 
}; 

var assetsMiddleware = assetManager(assetsSettings);

Client-Server Real Time Communication

Socket.io is used to send messages back and forth between the user-agent and the server. This saves us from having to do entire page refreshes to show new content. In the example below you can see how we subscribe to different events and change the page content accordingly. \

var socketIoClient = io.connect(null, { 
  'port': '#socketIoPort#' , 
  'rememberTransport': true , 
  'transports': ['xhr-polling'] 
}); 

socketIoClient.on('connect', function () { 
  $$('#connected').addClass('on').find('strong').text('Online'); 
}); 

var image = $.trim($('#image').val()); 

var service = $.trim($('#service').val()); 

var $ul = $('#main_stream'); 

socketIoClient.on('message', function(json) { 
  var doc = JSON.parse(json); 
  if (doc) { 
    var $li = $(jade.templates["activity"\]({activities: [doc]})); 
    $ul.prepend($li); 
  } 
  if ($ul.children.count > 20) { 
    $ul.children.last.remove(); 
  } 
});

Here are the steps we performed to run this application as-is on Cloud Foundry

Setup

  • Get a CloudFoundry.com account if you don’t have one yet.
  • Install vmc command line tool to deploy to Cloud Foundry.
  • Get Node.js running locally on your machine.
    • Install version 0.6.8 or later. NPM is the package manager which will be included.
  • Get Redis running locally.
  • Create Apps on Facebook, Twitter and GitHub for prod and local environments and note the keys.

Configure Clone

@mape‘s repo or fork it and clone your fork and install the dependencies

$ git clone git://github.com/mape/node-express-boilerplate.git 
$ cd node-express-boilerplate

Edit package.json to include module cloudfoundry

{ 
  "name" : "node-express-boilerplate", 
  "description" : "A boilerplate used to quickly get projects going.", 
  "version" : "0.0.2", 
  "author" : "Mathias Pettersson ", 
  "engines" : ["node"], 
  "repository" : { 
    "type":"git", 
    "url":"http://github.com/mape/node-express-boilerplate" 
  }, 
  "dependencies" : { 
    "cloudfoundry": ">=0.1.0", 
    "connect" : ">=1.6.0", 
    "connect-assetmanager" : ">=0.0.21", 
    "connect-assetmanager-handlers" : ">=0.0.17", 
    "ejs" : ">=0.4.3", 
    "express" : ">=2.4.3", 
    "socket.io" : ">=0.7.8", 
    "connect-redis" : ">=1.0.7", 
    "connect-notifo" : ">=0.0.1", 
    "airbrake" : ">=0.2.0", 
    "everyauth" : ">=0.2.18" 
  } 
}

Install the dependencies locally.

 
$ npm install

Copy to siteConfig.js

$ cp siteConfig.sample.js siteConfig.js

Edit siteConfig.js to use environment variables and the cloudfoundry module. Changes are detailed here. Update server.js to use the new siteConfig.js settings as seen here. Set environment variables for all services. Example in bash:

 
$ export twitter_consumer_key=2SXwj3HcMHsdsdsL4uuUBdjShw 
$ export twitter_consumer_secret=UFamzEOAEhLUwewewDwwEoCI72hN0fl8 
$ export facebook_app_id=5925695687264066 
$ export facebook_app_secret=cce6f5edefa89f4686e5e036e3ea 
$ export airbrake_api_key=63340934f6b376a001eacfc660d06205 
$ export github_client_id='92df9d93813ab234e1' 
$ export github_client_secret='fa64d10d3a02eee08d00cda3c2965caea2a4ce22' 

Run locally

$ node server.js

Run on CloudFoundry.com Install

vmc if you have not already done so

$ sudo gem install vmc --pre

Specify you want Redis “redis-asms” bound to your app when you deploy:

$ vmc push --runtime=node06 --nostart 
Would you like to deploy from the current directory? [Yn]: 
Application Name: node-express-start 
Detected a Node.js Application, is this correct? [Yn]: 
Application Deployed URL [node-express-start.cloudfoundry.com]: 
Memory reservation (128M, 256M, 512M, 1G, 2G) [64M]: 128M 
How many instances? [1]: 
Bind existing services to 'node-express-start'? [yN]: 
Create services to bind to 'node-express-start'? [yN]: Y 
  1: mongodb 
  2: mysql 
  3: postgresql 
  4: rabbitmq 
  5: redis 
What kind of service?: 5 
Specify the name of the service [redis-9bea7]: redis-asms 
Create another? [yN]: N 
Would you like to save this configuration? [yN]: Y 
Manifest written to manifest.yml. 
Creating Application: OK 
Creating Service [redis-asms]: OK 
Binding Service [redis-asms]: OK 
Uploading Application: 
Checking for available resources: OK 
Processing resources: OK 
Packing application: OK 
Uploading (305K): OK 
Push Status: OK

Note that here I responded Yes to saving the configuration which will create a file called manifest.yml. A manifest.yml helps you quickly push and update apps. You can read more about it here. Now you can run this command to add the keys from the services.

 
$ export APP_NAME=your_app_name 
$ vmc env-add $APP_NAME airbrake_api_key=your_key 
$ vmc env-add $APP_NAME github_client_id=github_id 
$ vmc env-add $APP_NAME github_client_secret=github_secret 
$ vmc env-add $APP_NAME facebook_app_id=fb_id 
$ vmc env-add $APP_NAME facebook_app_secret=fb_secret 
$ vmc env-add $APP_NAME NODE_ENV=production 
$ vmc env-add $APP_NAME twitter_consumer_key=twitter_key 
$ vmc env-add $APP_NAME twitter_consumer_secret=twitter_secret

To finish, run:

 
vmc start

And that’s it! You are up and running with the boilerplate app as seen here. Please note that the app may not work to spec on IE, but works on Firefox, Safari and Chrome.

Observations so far

node-express-boilerplate is a great starting point on which to build an activity stream engine given that it addresses:
* Robust real-time messaging between browser and server. Socket.io adapts to the protocols supported by the server and client
* Performance via Asset Bundling and Minification
* Provides SSO Support to major Social Networks
* Handles scalable session management via Redis
* Built on a great MVC framework Also, it was good to see that

@mape had abstracted the infrastructure details via the creation of a siteConfig.js. We enhanced this even further by using environment variables. As you saw on the walkthrough all this was possible thanks to the open source community and using a robust platform as a service like CloudFoundry which had everything I needed. I was able to use @igo‘s “cloudfoundry” module to assist in parsing environment details in my app and thus made siteConfig.js even more straightforward. In the next blog post, I will cover how to start the modification of this app into an Activity Stream engine. The tutorial will include how to create a Node.js module like activity-streams-mongoose and how to manage the persistance of the Activity Streams data on MongoDB as well as real time syndication across multiple app instances with Redis PubSub. Signup for Cloud Foundry today, and start building your own cool app! -Monica Wilkinson, Cloud Foundry Team

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email