Posts categorized “cloud monitoring”
We have a large distributed system here at Rackspace, which we’re scaling and which currently processes 10 million metrics per minute. Part of the scaling effort required us to switch to using Apache Kafka for our message queues. We'll be looking at a few possible clients to explore our options and benchmarking their throughput.
Losing a large amount of food in a freezer can be a costly and grueling task to clean up. I know from personal experience. A few months ago we lost a considerable amount of food due to a GFCI outlet that disconnected the electrical circuit in our detached garage.
Drowning in data? You probably just started monitoring your application. Modern applications overwhelm users with an avalanche of metrics representing their functions and behaviors. Sysadmins are flooded with alerts about their CPU or Network I/O crossing preset thresholds. Developers might be watching a stream of application-level metrics like the gossip protocol activity on a Cassandra ring. Overwhelmed, they’ll either turn off their monitoring tool, spend weeks tuning it, or give up and accept the unstructured noise.
Rackspace Cloud Monitoring currently samples about 40,000 metrics per second across our cloud-hosted instance flavors. This data accumulates minute by minute and more checks are enabled every day. There's a lot of data here—an automated layer of intelligence over our cloud environments could really help.
Time series data can yield some of the most interesting and relevant information for developers, operators and businesses. But ever larger datasets coming from multiple sources are making it difficult for people to pull real, actionable intelligence from these time series streams.
We've been working on a tool called Blueflood that makes managing massive-scale time series metrics much easier and are pleased to be open sourcing it for comment, collaboration and improvement. Please check out http://blueflood.io for documention, https://github.com/rackerlabs/blueflood for the source code and on Freenode IRC #blueflood for discussion.
Our Autoscale project -- codenamed "otter" -- is now open source.
Autoscale takes the work out of capacity planning, allowing Rackspace Cloud Monitoring alerts or scheduled events to create and delete servers. Through the use of webhooks, Autoscale can be integrated into countless deployment scenarios.
Why did we do this, you ask? Well, when meeting with folks in the OpenStack community at PyCon earlier this year, we were deeply encouraged to share our code. Then, a month later, we were blown away by the OpenStack Summit in Portland. The Heat design sessions were an incredible example of the power and speed with which open source communities can operate. We decided to open source Autoscale so we can better communicate with the OpenStack Heat project, provide a real-world example to inform future plans and help align all of our visions for how OpenStack might implement autoscaling in the future.
Cloud Monitoring now supports PagerDuty integration! With this new notification type, alarm notifications can automatically create new incidents and resolve them once Cloud Monitoring detects things are okay.
MySQL replication offers an opportunity to distribute your database to multiple nodes for additional performance or to off-load specific functionality from your production stack. When uptime is a requirement, table-locking can create unavailability and a long-running nightly backup can cause an unexpected outage. One benefit of replication is that it allows the backup to happen on the slave without interrupting your production environment. So what happens when you rely on this configuration to safeguard your data without constantly ensuring that the replication process is working? I'll tell you what happens – you're the guy without a recent backup.
We live our lives on the web. And social media and networking has established itself a dominant force, becoming the most visited sites on the web – and among the most popular social media sites, Facebook has become the colossus, amassing more than an estimated 930 million users.
One aspect that sets Facebook apart is the applications – games, quizzes, you name it. While they may have a brief shelf-life, they can achieve massive popularity in a very short period of time. At Rackspace, we can help you plan for this unpredictable demand by hosting your Facebook app on our open cloud platform.
The team over at Mailgun just posted a case study about how the Rackspace Cloud Monitoring team successfully migrated their email alerts to the Mailgun email automation platform. It's a really interesting read that is as much about how to plan for and deploy 3rd party tools in a production application as it is about using Mailgun to automate and monitor email alerts.
With the launch of Rackspace Cloud Monitoring (RCM) earlier this week, Rackspace has added an additional tool to your belt that shows you how your servers and applications are behaving. Cloud Monitoring makes it easy to configure monitors and alerts from the Control Panel, but today I want to focus on raxmon, one of the most flexible CLI tools available today for RCM.