Monitoring as a Service#
Rackspace Private Cloud Powered By OpenStack
Last updated: Oct 10, 2018
The monitoring services for Rackspace Private Cloud OpenStack (RPCO) Solutions, which is included as part of the Core Support agreement, ensures that host level monitoring and OpenStack services operate within optimal parameters. The monitoring agent runs a continual and comprehensive set of custom plugins across all hosts. The monitoring pollers continually test OpenStack endpoint connectivity while returning various HTTP response metrics, ensuring that your cloud maintains optimal health.
The monitoring service#
Rackspace delivers Fanatical Support® for the world’s leading clouds. It has specialized expertise, available 24x7x365, and results-obsessed customer service that’s been around since 1999.
Work with the Rackspace support team to customize your monitoring configuration in the following ways:
- Set the frequency and timeout of your monitoring plug-ins (for example, every minute, every five minutes, and so on).
- Tune thresholds for definable alarm templates (for example, disk space, memory capacity, and so on).
- Determine which members of your organization should receive an auto-generated MyRack notification for each alert.
The following table shows the alert severity levels and expected response times:
|Severity level||First live response|
|Emergency — instances are failing or the OpenStack cloud is partially or wholly inoperable||15 minutes|
|Urgent — new instances cannot be launched or terminated||1 hour|
|Standard — new instance launches are delayed or errors occur when interacting with the OpenStack API||4 hours|
Rackspace data centers versus customer data centers#
The Rackspace monitoring service differs for Rackspace and customer data centers in the following ways:
For both Rackspace data centers and customer data centers, the following elements apply:
- The poller and agent connection endpoints are requested by using service
record (SRV) domain name service (DNS) records and provide a pool of
addresses to the following Rackspace Monitoring regions:
- Agents and pollers connect securely over port
443to endpoint addresses.
- 24/7 access to all three endpoint regions is required for functional monitoring.
- Agents are deployed to all physical hosts and kubernetes clusters.
- Deployment playbooks require access to the following resources:
- Rackspace Monitoring repositories for agent and poller packages
- Python Packaging Authority (pypa.io)
- System-level package repositories (apt or yum)
For customer data centers, only, the following elements apply:
- Private network monitoring (PNM) pollers are deployed to the physical control plane nodes if endpoints are RFC1918 addresses.
- Optionally, agent and poller connections can be forwarded through a web proxy.
- Hardware monitoring of server chassis is the customer's responsibility.
This includes processor, memory, and physical disk monitoring.
- Standard Rackspace chassis offerings are supported only in Rackspace data centers (for Dell or HP devices) or on OpenStack Anywhere (for roll-in rack deployments).
The agents and pollers collect metrics for individual hosts and overall cloud level. The following sections explore these in more detail.
Hardware monitoring includes the following elements:
- For Rackspace-managed infrastructure in any data center location: the monitoring service monitors status of processors, memory, physical disks, raid volumes, raid controller, and raid controller battery.
- All devices can have the following elements monitored: Ping/SSH and bonding interface status
- The control plane hosts have monitoring and alarming for the following elements: disk space, disk utilization, CPU idle time, memory capacity, and conntrack count. Additional metrics are gathered for network interface throughput, but do not result in a notification or ticket.
- The non-control plane hosts have monitoring and alarming for disk space and conntrack count. Additional metrics are gathered for disk utilization, CPU idle time, memory capacity, and network interface throughput, but these do not result in a notification or ticket.
RPCO monitoring includes the following elements:
|Openstack service||Elements monitored|
|ceph||ceph overall cluster health, mons health (quorum), osd status, radosgw status|
|cinder||cinder local api, cinder volume status, cinder scheduler status|
|designate||designate local api, designate mdns, designate process|
|glance||glance local api, glance registry|
|heat||heat local api, heat api cloudformation, heat api cloudwatch|
|ironic||ironic local api, ironic compute status, ironic conductor status|
|keystone||keystone local api|
|kubernetes||mk8s local api, mk8s auth, mk8s etg, mk8s etp, mk8s ui, process checks|
|neutron||neutron local api, neutron dhcp agent, neutron l3 agent, neutron linuxbridge agent, neutron ovs agent, neutron metering agent, neutron metadata agent, neutron agent conntrack count, neutron qrouter conntrack count|
|nova||nova local api, nova metadata api, nova cert, nova compute status, nova conductor status, nova console (spice/novnc), nova consoleauth, nova scheduler status|
|octavia||octavia local api, octavia lb error, octavia quota check, process checks|
|swift||swift account process, swift account server, swift async, swift container process, swift container replication, swift container server, swift md5, swift object process, swift object replication, swift object server, swift proxy server, swift quarantine, swift time sync|
|hummingbird||hummingbird account process, hummingbird account server, hummingbird container process, hummingbird container server, hummingbird object process, hummingbird object server, hummingbird proxy server|
|memcached||memcached local api, memcached connections|
|galera||cluster size, wsrep state, connections, file limits, innodb row lock time, innodb deadlocks, access errors, aborted connections, holland backup|
|rabbitmq||disk free, memory, max channels per connection, file limits, processes, sockets, unconsumed messages, queue growth rate, messages without consumers|
OpenStack HTTP monitoring includes the following elements:
|HTTP function||Elements monitored|
|HTTP API validation and uptime of all applicable supported service endpoints||cinder, designate, glance, heat api, heat cfn, heat cw, horizon, ironic, keystone, managed kubernetes, mk8s ui, neutron, nova, octavia|
|HTTP access/health check||hummingbird, swift|
|HTTPs certification expiry||if applied to endpoints|