IRC on Freenode #rackspace
  • Sign Up
  • Log In
    • MyRackspace Portal
    • Cloud Control Panel
    • Rackspace Webmail Login
    • Email Admin Login
  • Rackspace Logo
  • Developer Home
  • Documentation & SDKs
  • Blog
SDK Quickstarts
  • Gophercloud Go
  • JClouds Java
  • OpenStack.NET.NET
  • pkgcloud Node.js
  • php-opencloud PHP
  • pyrax Python
  • fog Ruby
Cloud User Guides
  • Core Infrastructure
  • Cloud Servers
  • Cloud Images
  • Orchestration
Developer Community
  • Developer Blog
  • Knowledge Center
  • Developer Forum
  • Outreach & Events
Rackspace Cloud
  • Auto Scale
  • CDN
  • Cloud Block Storage
  • Cloud Databases
  • Cloud DNS
  • Cloud Files
  • Identity
  • Cloud Images
  • Cloud Load Balancers
  • Cloud Monitoring
  • Cloud Networks
  • Cloud Queues
  • Cloud Servers
  • Orchestration
Other Products
  • Airbrake
  • Mailgun
  • ObjectRocket
  • RedisToGo
Developer Community
  • Developer Blog
  • Knowledge Center
  • Developer Forum
  • Outreach & Events

Developer Blog


News and Insights from Rackspace Developers

A Case for MongoDB with AEM

Placeholder Avatar
Jonathan Hurley | November 9, 2015
  • architecture
  • database

What is MongoDB?

MongoDB is, among other things, a document-oriented NoSQL database. This means that it deviates from the traditional, relational model to present a flexible, horizontally scaling model for data management and organization.

How does MongoDB work with AEM?

MongoDB integrates with Adobe Experience Manager (AEM) by means of the crx3mongo runmode and JVM options: -Doak.mongo.uri and -Doak.mongo.db

Why would I MongoDB?

Primarily MongoDB provides an alternate HA configuration to the older CRX cluster configuration. In reality, the architecture is more similar to a shared catalog on NFS or to NetApp than true clustering. The authors and publishers using MongoDB are not necessarily aware of each other.

Let's get it out of the way - CRX clustering doesn't have a spotless track record. One of the inherent challenges to a local-only object catalog is that, at scale, a single catalog presents several benefits:

  • reduced duplication of data
  • concentrated scope for performance analysis and tuning
  • HA availability without requiring multi-node interaction
  • horizontal scaling for redundancy and performance

Even with a perfectly working CRX cluster, none of these benefits are addressed. Enter the so-so old days of shared data tier.

With 5.6.1 and earlier, large and growing deployments could mount their object repository as shared NFS, where the protocol, or NFS server, provided locking support for multi-write behavior with low collision. This is still a viable option, but, unless employing NetApp, there is often a risk of single point of failure in the controller, or the data tier, that holds back the n*9 architecture.

The introduction of the MongoDB microkernel into AEM 6.x provides essentially the same architecture with the more buzzword-heavy database that we love.

The primary use-case for MongoMK is with AEM author instances, where the active user limit (~25-30) means that horizontal scaling must first address the concurrency issue. Here, the shared data tier shines brightest as performance is second fiddle to scale and consistency while simultaneously eliminating the need for node-to-node communication.

It is certainly possible to still join publishers in similar fashion, however, the benefits are reduced due to the lack of a concurrency issue (i.e. just add more publishers for more scale) and the performance penalty of a non-local catalog.

So that's it then! Scale to infinity and beyond!

Slow down, Buzz Lightyear, there's a problem. Single-site MongoDB is expected to scale very effectively given its option for fast-access data and in-line networking. Here, the sky could be the limit. Consider, though, a multi-site deployment where MongoDB replica sets must communicate across disparate data centers and unknown miles of fiber. Enter the operations log (oplog).

The oplog is the intermediary for replicas to remain synchronized; essentially providing a series of delta operations to be applied in order. This is fairly similar to certain other database cluster configurations.

During optimized operation, the oplog is executed as close to realtime with the primary as possible, although MongoDB does permit drift. During multi-site operation, however, wire latency and operation latency can stack to introduce greater drift and present the following challenges to this configuration:

  • The oplog is functionally a stack of commands.
  • Like many stacks, it is first-in - first-out.
  • Stack behavior dictates that, when the next push will overwrite an un-popped record, data loss will occur.

MongoDB programmers have elected to catch and address this situation by having the replica halt, rebuild, and sync forward from the (new) oplog after its rebuild.

On the surface, there's nothing exotic or dangerous about this operation. Consider first the rebuild operation, which wipes all data and thens pull a fresh copy from the known-good primary. Even if the export takes zero seconds to complete, the copy of the data volume across a latent wire will inherently introduce drift before the remote replica has even finished importing. Assume also that wire latency is the only challenge, so that, after rebuild, the oplog only contains the drift introduced by the rebuild. The original issue that triggered replication lag remains, so it can be expected that, over time, the newly built secondary will be forced to rebuild again.

In some cases, this behavior enters an unending loop of replication lag and rebuild, resulting in remote secondaries which are never in a functional state.

For AEM, this means that using MongoDB replicas as the primary data source for cross-site authoring places remote authors at significant risk.

Popular Posts
Last 30 Days All Time
  1. Neutron Networking: Neutron Routers and the L3 Agent
  2. PowerShell 101 From a Linux Guy
  3. Ansible and Docker
  4. OpenStack Orchestration In Depth, Part I: Introduction to Heat
  5. Neutron Networking: VLAN Provider Networks
  1. Developer love: Welcome to the Rackspace Cloud Developer Discount
  2. Introducing LoggerFS
  3. Welcome to Performance Cloud Servers; have some benchmarks!
  4. Neutron Networking: Neutron Routers and the L3 Agent
  5. Neutron Networking: VLAN Provider Networks
Categories
  • Openstack
  • DevOps
  • Community
  • Developer Experience
  • Security
  • Deployment
  • SDKs
  • Tools
Docs
  • Style Guide for Technical Content
  • Cloud Backup
  • Cloud Block Storage
  • Cloud Databases
  • Cloud DNS
  • Cloud Files
  • Identity
  • Cloud Images
  • Cloud Load Balancers
  • Cloud Monitoring
  • Cloud Orchestration
  • Cloud Networks
  • Cloud Queues
  • Cloud Servers
  • Rackspace Auto Scale
  • Rackspace CDN
Sdks
  • Go
  • Java
  • .Net
  • Node
  • PHP
  • Python
  • Ruby
Partner Tools
  • Airbrake
  • Mailgun
  • ObjectRocket
  • RedisToGo
Blog
  • Developer Blog
©2018 Rackspace US, Inc.
  • ©2018 Rackspace US, Inc.
  • About Rackspace
  • Investors
  • Careers
  • Privacy Statement
  • Website Terms
  • Trademarks