Posts written by Kenny Gorman
Wes McKinney started working on Pandas in 2008. Since then, Pandas has become one of the most popular and useful software components for the data scientist. For good reason; using Python, Pandas and iPython/Jupyter notebooks makes it simple and quick to perform analysis on various datasets.
In this post, we perform some basic analysis on the City of Baltimore employee salary data from data.gov, but this technique can be used on a wide variety of data sets very easily.
Pandas and Jupyter notebooks make this work quick. It may be surprising to see where the money goes!
MongoDB Inc just released what is arguably the most important change to the MongoDB database in its short history.
MongoDB version 3.0
MongoDB 3.0 brings with it a wealth of new features, but most notably a new pluggable storage engine API. We wanted to help customers get familiar with the new storage engine and features quickly and easily.
Because of the new pluggable storage engine API, MongoDB 3.0 promises a massive leap forward in functionality, usability and features. Developers, DevOps Engineers and DBA's should start getting acquainted with MongoDB 3.0. In particular:
- wiredTiger storage engine
- concurrency testing
- Journaling, durability and crash recovery
- General compatibility
- SCRAM-SHA-1 authentication compatibility
Full Release Notes
From a community standpoint, the more people using 3.0 and filing any bug reports the better. We wanted a quick and easy way for folks to experiment. We needed tooling. A couple attributes of the tooling we thought where really important are:
- Easy to use
- Configurable by end user(s)
- Uses Rackspace cloud (or any other IP, including localhost)
- Easily repeatable provisioning; so users can break it, tweak it, and rebuild it easy
We created an Ansible playbook that installs and configures a simple MongoDB 3.0 configuration. It takes just a few minutes to setup and is completely customizable.
- CentOS/RHEL (for now)
Installation is 4 simple steps:
- Step 1. Setup Ansible and git.
- Step 2. Clone the repo.
- Step 3. Add roles and change some config files
- Step 4. Provision some MongoDB
Complete and up-to-date installation and configuration instructions.
In a nutshell:
Step 1: Installing Ansible and Git.
For this, you need to have git and Ansible installed. Installation is pretty easy. For most systems you simply need to:
# Centos/RHEL # Ansible sudo yum install ansible # git sudo yum install git
Step 2: Clone the repo
Simply clone the repo to the box where you installed Ansible:
git clone https://github.com/rackerlabs/ansible-mongodb.git
Step 3: Add roles, and change some config files
We need to tell Ansible to use the host(s) where we want MongoDB to be installed. We need to ensure we tell Ansible the correct configuration for our host(s), as well as set any startup parameters we want.
# edit hosts file, and change <MYIP> to the ip address of the host to provision vi hosts.txt # install the required roles ./mongodb_roles.sh # alter the default config (or at least inspect it for being correct) vi roles/ansible-roles_mongodb-install/defaults/main.yml
Step 4: Provision some MongoDB
Simply launch the helper shell scripts:
cd ansible-mongodb ./setup-mongodb.sh
For a fully managed solution with replica sets and sharding, hit up firstname.lastname@example.org and the support folks will install and configure a MongoDB 3.0 instance in the ObjectRocket fully managed environment.