Convert Twitter timestamps to local timezone in Python

So my Insights team had a challenge with timestamp data on tweets. Their reporting system was giving them data in users’ timezones and it was not consistent (since timezones are different for different users) and bad for reporting. Their question to me was whether we can have all timestamps converted to the same timezone (preferably EST since the team is here).

The data is actually already available through an API call, and the only task is to convert it to a readable local timezone format.

Here’s how data is coming back from the API call, in UTC zone:

"created_at": "Wed Jun 01 12:53:42 +0000 2011"

Python code:


 #import datetime
 from datetime import datetime
 from datetime import timedelta

 clean_timestamp = datetime.strptime(obj['created_at'],
                   '%a %b %d %H:%M:%S +0000 %Y')
 offset_hours = -5 #offset in hours for EST timezone

 #account for offset from UTC using timedelta                                
 local_timestamp = clean_timestamp + timedelta(hours=offset_hours)

 #convert to am/pm format for easy reading
 final_timestamp =  datetime.strftime(local_timestamp, 
                    '%Y-%m-%d %I:%M:%S %p')  

MongoDB > Authentication in replica sets

Had another great question in the M102 forums a few days ago, so wanted to share it, since I didn’t know the exact answer, and Dwight graciously clarified it for everyone.

Question:

  • If I have admins and users with read/write or read only access on the primary node, will that info transfer to secondary nodes?
  • Can we have different admins for different nodes of mongo?

My initial thinking went like this:

The user info is stored in the system.users database (where all user credentials are stored), so with replica sets it would be replicated across all nodes and allow access to data from any node. And since the info would be copied across all nodes, I don’t think it’s possible to have different admins for different nodes.

Dwight’s explanation with example:

The auth information replicates, so you either have authorization for the set as a whole, or not. The auth is per database (with authorization on the ‘admin’ database implying you can access all).

So you could do something like this:

$ mongo –host abc8
 use mydb
 db.system.users.find()
 ^C
 $ mongo –host abc9
use mydb
rs.slaveOk()
db.system.users.find()

And you should see the same user information (assuming the secondary is caught up to the replication time of the user additions).

The above example is for a standalone replica set; if you are sharded you would connect to the cluster through mongos. Once again your credentials are for the whole cluster, per db.

Additional resources and notes:

More information on authentication: http://docs.mongodb.org/manual/tutorial/control-access-to-mongodb-with-authentication/

Note that for replica sets you’ll need to use –keyFile option to specify configuration file that will be used to authenticate between members of the replica sets: http://docs.mongodb.org/manual/administration/replica-sets/#replica-set-security

A curious case of MongoDB updates with upsert:true

One of the benefits of being a TA for M102 course is that you get to learn even more by answering questions. Because some questions are fascinating.

Yesterday we had a very interesting case that puzzled even those folks who are usually able to answer anything in the forum. There was some virtual head scratching (or at least I’m imagining it), and we decided to ask the 10gen team if they can shed some light on this.

Here’s the scenario, where we are starting with an empty collection:

db.test.update({_id : "Jane"}, {"count" : 1}, true)
db.test.update({_id : "Jane2"}, {$set: {"count" : 1}}, true)

db.test.find()

{ "_id" : ObjectId("510aa62cbb530b24318c93d4"), "count" : 1 }
{ "_id" : "Jane2", "count" : 1 }

As we see, in both update statements, the “upsert” option is set to true.

The result of the first statement is a document with an _id field assigned by default by MongoDB, even though we specified _id:”Jane”.

However, the result of the second statement is the document with the _id that we specified – so it’s a different outcome when we use “$set” operation.

And here’s the explanation (from MongoDB docs):

If the argument includes only field and value pairs, the new document contains the fields and values specified in the argument. If the argument includes only update operators, the new document contains the fields and values from argument with the operations from the argument applied.

Link: http://docs.mongodb.org/manual/applications/update/

Then Andrew also explained in the discussion forum:

The behavior of the update operator depends on what is the second positional document. If you put in set operator or push operator, then the db will insert a document that has both the fields that are set in the selector and is the result of applying your pushes and sets.

So this behavior is expected, but still makes a very interesting case. Live and learn, my friends!