We’ve been using MongoDB in production at AJJ, Inc. for almost two years now.
We initially chose Mongo because we needed to store large SOAP responses (500kb/ea on average) and we were in need of a way to cache them locally. Everything worked well, and I had no complaints or gripes unlike many others.
Everything was going great until last Friday when the application lost database connectivity.
After logging into our Mongo server the problem was immediately apparent:
[FileAllocator] allocating new datafile /var/lib/mongo/regapp_production_db.5, filling with zeroes... [FileAllocator] creating directory /var/lib/mongo/_tmp [FileAllocator] FileAllocator: posix_fallocate failed: errno:28 No space left on device falling back
Disk Space Damage Control
Doh. Our disk was out of space. The first thing to note is that when a MongoDB server runs out of disk space the process immediately goes into read only mode; all write ops (deletes included) will be blocked. To re-enable writes to the database you have to first restart the mongod service.
After restarting the database I was able to remove some of our oldest cached requests manually by hand, so that at the very least our application server could resume processing requests once again.
At the very least the application was functioning and bought us some time to figure out a resize strategy.
Ensuring data integrity
Anytime you mess around with a partition you are risking the integrity of the data on the partition. The fact that this was a production server and there were incomplete registrations made me very nervous. Based off what I read others have suggested that you do not need to be careful when creating EBS snapshots, but still I did not want to risk it. I safely powered down the mongod serve and then the ec2 instance..
sudo /etc/init.d/mongod stop sudo shutdown -h 0
Snapshotting the EBS volume
Knowing that our MongoDB server was in a safe state, I created a snapshot of the EBS volume I needed to resize.
Creating a new volume
The next step was to create a new volume. It is important to note that if the instance runs in us-east-1a, the new volume you are creating needs to be in the same datacenter. When you create the new volume you should choose the id of the snapshot you created in the previous step.
Attach your new volume
After creating the larger volume, go back to the EC2 Instances tab and detach the old volume. After the old volume is detatched you can attach the new volume to the root filesystem (/dev/sda1). After you do this you will need to go to the Elastic IP’s tab and reattach the EIP to the EC2 instance. Once everything is reattached you are good to restart the EC2 instance.
Resizing the volume
When you restart and reconnect to the EC2 instance, if you do a df -h you’ll see that nothing has changed, yet:
~ $ sudo df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 7.9G 7.8G .1G 99% /
The last and final step is to resize the filesystem:
Now when you do a df -h you should see:
~ $ sudo df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 74G 8.1G 65G 12% /
After this you’re good to restart mongodb and start your application server back up.
It is worth noting that while I had only planned for this to take 10 minutes, from now on I will take my estimate and times it by 4, that way when things go longer than expected, you already have extra time allotted. When things take the proper amount of time, you’ll look better because you got done 3x faster than you’d expected.