Helping as a TA for the MongoDB M102 Spring class

I have a quick exciting announcement! I wrote about taking 10gen’s M101 and M102 online courses last fall, and about excellent quality of their videos and content.

Well, after the course completion, I was happy to learn that I got to be one of the top graduates for M102 (MongoDB for DBAs track). I did pretty good in M101 but mis-submitted one of the answers by checking an option that was too similar to the correct answer, but was incorrect.

10gen asked top graduates if they would like to offer assistance in the next round of classes, and of course I said that I’d love to help, since I’ve been really impressed by the company, their product and the education portal they put together. A few conversations later, they invited me to participate in the M102 Spring course as a TA (teaching assistant).

I’m helping out by answering questions in the discussion board and clarifying details on home work and quizzes. It’s really cool that some other discussion participants are so knowledgeable and willing to help other fellow students out.

If you’d like to learn more – 10gen Education blog is a great resource where Andrew Erlichson, 10gen’s VP of Education, writes about how they created the courses, cost and equipment involved, and even some interesting stats on the Fall semester (about 19% of students enrolled graduated with scores 65% or higher, for both M101 and M102). Check it out.

And of course, definitely sign up for the online classes – such invaluable in-depth content, and anyone who successfully completes them will get cool certificates, just like I did :P

GOOD challenge finalists announced

A couple of months ago I wrote about a great iniative from Coding for GOOD, where applicants from any background, even without prior programming experience, could take online classes and submit their projects for a chance to get a cool job in LA!

The other day Robbyn from GOOD Worldwide, reached out to me on Twitter letting me know that the finalists have been chosen! So exciting! And what’s really cool is that one of the finalists is from NYC – Ada Ng from Brooklyn. Definitely read about the finalists and their projects on the GOOD website.

And check back tomorrow, because the last round (a weekend hackathon) has been completed, and the winner is being chosen today and will be announced tomorrow! Who do you think it will be? In any case – I think it’s a really big achievement for all finalists, big congrats to each of them and kudos to GOOD for organizing this. Can’t wait to find out who the winner is and follow their brand new career gig at GOOD!

Find and delete files older than X days in Unix

I wrote about setting up a cron job on my work machine to process some files in a shared Dropbox folder.

After a while, old files will start to accumulate there and I’d want to delete them, by adding another command to the crontab.

The criteria for deleting files in my case are:
– File is older than 5 days
– File has an extension .csv (keep other files like readme)
– Do not delete a few example .csv files

So HGH the HGH Unix command will look like a pipe that first finds the files, then deletes them.

find  X -exec rm {} ;

Tip from howtogeek.com

It’s pretty easy to find all .csv files older than 5 days:

find /path/to/files/*.csv -mtime +5

And then, to exclude certain CSV files that I want to keep, I used -not and -and operators to specify filenames (test.csv, test_out.csv and input_example.csv):

find /path/to/files/*.csv 
-not -iname test* -and -not -iname input*  -mtime +5

Tip from linux.ie

Cron job on Mac OSX that saves files to Dropbox

My work machine is on all the time, and I figured it might work well for simple tasks like text file processing:
– Take input file
– Do something with it (add information, transform text, add columns, etc)
– Save file (either overwrite old one, or create a new output file)

I used Dropboxâ??s shared folder functionality to let multiple people submit their files for processing. Your Mac will see those shared folders as local, and if you set up a cron job, you are pretty much done.

Here are some steps that can help:

  • Write your script
  • Create a folder on Dropbox. Share it with people who will need access
  • Edit your crontab. Hereâ??s a great detailed blog post on how to do this
  • Confirm itâ??s working
  • Sit back and relax, as your work is done :)

Note: remember to use absolute paths for your scripts and if youâ??re passing file/folder paths as parameters, make sure those paths are also absolute.

Hereâ??s my example:

0,30 8,19 * * 1-5 python /Users/[name]/Dropbox/script.py 
/Users/[name]/[folder]/*.csv

This runs the script every half hour, between the hours of 8am and 7pm, Monday-Friday and processes all .csv files in the designated Dropbox folder.

And here’s a simplified Python script, maybe some will find it helpful.


import csv
import sys
import os.path

for filename in sys.argv[1:]:
    
  # don't need to process my example file 
  # or anything with the _out in the file name, 
  # since my script creates them     
  if (filename.find('example.csv') == -1) and (filename.find('_out')== -1):  
  	
  	# also check if script already ran 
        # and you have _out.csv file saved in your folder
  	if os.path.isfile(filename.replace('.csv', '_out.csv')) == False:
          with open(filename) as f:
	    input = csv.reader(open(filename, 'rU'), delimiter=',', quotechar='"')
	    output = csv.writer(open(filename.replace('.csv','_out.csv'), 'ab'), 
                     delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
		
	       rowcount = 0
		
		for row in input:
		  if rowcount ==  0:  # skips the first header row
		       output.writerow(row)
		       rowcount += 1 
		   
		  else:
			# do your file processing here
			output.writerow(row)
			rowcount += 1

Sometimes the solution is easier than you think

I’m lazy. But in a good way. The way that makes you simplify, automate, and get rid of unncessary work.

Here’s a quick example. Our insights team had a task at hand that could not be done manually. Writing a script to do it took half an hour. They were happy. Once in a while they would send me a file to be processed, I run a simple command, and the script spits out the file with results, which I send back.

But that’s too much work. How do I remove myself out of the picture and let them handle it all? I figured let’s use Dropbox. Users drop their files into a shared folder, the script checks every so often if anything new was placed there, then processes the files and saves results.

Somehow in the beginning I got too entangled in details, thinking about a server where I’d put the script (probably Heroku), then having to add Dropbox API integration, then making sure all the dependencies are installed… Then it occured to me: my work machine already sees the Dropbox directories as local folders. Why not just run the cron job on my work machine and be done with it?

So with a little script tweaking, instruction writing and cron job testing, this is done, and I’ve just removed a task from my list (however simple it might be).