10gen MongoDB online courses for DBAs: Replica sets

I’ve already written about the excellent MongoDB online courses that launched a few weeks ago. I signed up for both developer and DBA tracks, and even though I knew that it would be super useful and great, let me tell you now that Iâ??m in week 4 – the value is immense, with all the detailed video lectures, quizzes and homework, especially considering that the courses are absolutely free!

Week 4 in the DBA track is about replica sets, and even though I did some projects with Mongo before, they weren’t of such scale that required replica sets. That’s why this week content was especially interesting to me. There’s one quick tip that I’ve learned and wanted to share.

When initiating and configuring a replica set, make sure you’re connected to the primary set member only. I thought that I had to connect to each instance and initiate the set on each, but that’s not the case, and I ended up with two primaries instead of one.

So the steps would be (example assumes all 3 instances on localhost):

  1. Create data directories for each instance:
    mkdir -m /db/data/1
    mkdir -m /db/data/2
    mkdir -m /db/data/3
  2. Start 3 instances of mongod – they will be members of your set:
    mongod --port 27001 --dbpath /db/data/1 --replSet rs0
    mongod --port 27002 --dbpath /db/data/2 --replSet rs0
    mongod --port 27003 --dbpath /db/data/3 --replSet rs0
  3. Connect to the first instance only:
    mongo localhost:27001
  4. Initiate your replica set:
    rs.initiate()
  5. Optional: check your configuration with rs.conf()
  6. Add two more members to your replica set:
    rs.add("localhost:27002")
    rs.add("localhost:27003")
  7. Done! You can check your set by running rs.status()

Man, it’s so cool that such a crucial concept like database replication is implemented so gracefully in Mongo, and you can learn and do it!

Meetup recap: scaling AppNexus

Just got back home from a meetup hosted at AppNexus, by AppNexus.

The topic of today’s talk was scaling the complex ad serving and bidding system that supports billions of ad impressions and stores terabytes of data. I have to admit, it was one of the most technical meetup presentations that I’ve attended, and it was excellent!

Mike, the CTO of AppNexus, is a fast-talker, who started with the story of how the company was born, and covered a lot of info from building the company up and adding functionality (ad bidding system was added when ad-selling business demanded it), to hard drive performance, to build vs buy vs outsource topic.

I especially liked the overview of data warehouse and crunching tools: Netezza, Vertica and Hadoop, and how changing requirements dictated choice of tools. Netezza worked great as a single instance, but did not scale with clustering, Hadoop was very hard to learn and configure from scratch, but meets most of the needs now and will suffice for the next 2 years.

There was a good question from the audience about ideas for startups: what are the technological painpoints and gaps that need to be filled. Interestingly enough, Mike named monitoring as one of the areas that’s lacking a great tool. Someone next to me mentioned New Relic, but I was wondering if that’s enough to monitor thousands of servers.

So for me, this talk was full of information on areas that I had little knowledge in, and it’s always great to see who develops breakthrough solutions in technology and how they solve problems (big data problems are very very interesting).

Another bit that got me wondering, since I’m into MongoDB lately, was when to use Mongo vs Hadoop. And sure enough, I found a really good deck from 10gen with not only the answer, but also great practical demos. Yay for the internet and sharp minds in NY tech community! I feel proud to live in this huge tech hub, and humbled because there are just so many things yet to learn.