Python script to check HTTP status and redirect chains

Here’s a quick and dirty Python script that goes through a list of URLs, and for each URL checks HTTP status codes (200, 301, 404, etc.), along with number of redirects and redirect chains (each redirect destination).  Saves output into a text file (you can modify to save to CSV, but I used text import feature in Excel to import tab-delimited data).

Well, damn hosted WordPress doesn’t let me paste code correctly, so here’s the link to Gist.

import requests
def get_status_code(url):
try:
r = requests.get(url)
print "Processing " + url
if len(r.history) > 0:
chain = ""
code = r.history[0].status_code
final_url = r.url
for resp in r.history:
chain += resp.url + " | "
return str(code) + '\t' + str(len(r.history)) + '\t' + chain + '\t' + final_url + '\t'
else:
return str(r.status_code) + '\t\t\t\t'
except requests.ConnectionError:
print("Error: failed to connect.")
return '0\t\t\t\t'
input_file = 'urls.txt'
output_file = 'output.txt'
with open(output_file, 'w') as o_file:
o_file.write('URL\tStatus\tNumber of redirects\tRedirect Chain\tFinal URL\t\n')
f = open(input_file, "r")
lines = f.read().splitlines()
for line in lines:
code = get_status_code(line)
o_file.write(line + "\t" + str(code) + "\t\n")
f.close()
view raw gistfile1.txt hosted with ❤ by GitHub

Update: and here’s how to post Gist properly:

Screen Shot 2015-10-01 at 10.59.23 AM

Setting up WordPress behind a reverse proxy

Task: put a WordPress blog/site behind a reverse proxy. So blog.mysite.com moves to supersite.com/blog.

Why? In our case, we do it for branding and also for SEO benefits: larger site supersite.com receives more traffic, and we want to fold our blog content into supersite.com/blog.

Let’s agree on a couple of terms before we start:
– original blog.mysite.com is A
– destination supersite.com/blog is B

I’m using the latest version of WordPress (4.2.2) and Apache (2.2) on Ubuntu.

Let’s get started!

  1. Server setup

  2. Before we even touch WordPress, we need to make sure that the server hosting B is ready to accept requests and forward requests for any URL to A.

    Assuming we have Apache setup and working, let’s make sure mod_proxy is enabled. With root or sudo privileges, run:

    a2enmod proxy_http
    service apache2 restart
    

    Then, open the Apache virtual hosts config file, and add:

    ProxyPreserveHost On
    ProxyRequests Off   
    <Location /blog>
        ProxyPass http://blog.mysite.com 
        ProxyPassReverse http://blog.mysite.com  
        Order allow,deny   
         Allow from all
    </Location>
    

    Quick explanation for each line:

    ProxyPreserveHost On: Off by default. I’m turning it On because I host multiple sites on this server and use virtual hosts. If you don’t need it, simply exclude this line

    ProxyRequests Off: should only be on for forward proxy, not for reverse proxies

    <Location /blog>: applies proxy directives to requests matching /blog

    ProxyPass http://blog.mysite.com: creates a mapping from a path within the local web site to a given remote URL

    ProxyPassReverse http://blog.mysite.com: rewrites URLs in HTTP headers
    Order allow,deny: allows access to all proxied content

    After that, restart apache again:

    service apache2 restart

    Result: If you got this right, when you load supersite.com/blog you should see your WordPress blog home page.

    Note: if you see errors 500 on interior pages, check .htaccess file for any rewrite rules, and adjust if necessary.

    If you run into issues, here are 2 great sources on how to setup reverse proxy on Apache:

  3. Update WordPress settings

    Now we got to make sure that all URLs (category pages, single post pages) also display correctly on site B.

    For this, log into WordPress admin using the original login link: blog.mysite.com/wp-login.php.

    Go to Settings > General, and update the “Site address (URL)” field to B (supersite.com/blog).

    Screen Shot 2015-07-13 at 2.52.32 PM

    Result: All pages should be accessible and rendering correctly.

  4. Update redirects, canonical tags, robots.txt

  5. This is a pretty extensive area that I have not completed yet, so I’ll add to this section soon once I have this task completed. Stay tuned!

  6. Fonts

  7. If you use any font embedding services, make sure you have correct license to match your new domain (supersite.com), because it’s different than your original blog.mysite.com.

  8. Administration

  9. You should be able to login and administer all content via the original link blog.mysite.com/wp-login.php.

Being a detective (work these days)

These days work is fun. I have a handful of project of various complexity, and the deeper I dig, the more related projects I discover.

Example: streamlining and organizing existing apps and setting up continuous integration that works for Python and potentially node in the future.

Python is great and all that. But I’m struggling with all the dependencies and pieces to be managed (module dependency, db, aws storage, cron, source control and remote server management). Using js-only stack with heroku and deployments triggered via git in the past was way quicker.

Currently reading: http://www.fullstackpython.com/continuous-integration.html

jQuery CSV parser shows CSVDataError: Illegal state

Just a quick troubleshooting tip. If you are trying to use the jQuery CSV data parser on Mac, it might give you this error:

CSVDataError: Illegal state [Row:1]

The reason is likely your CSV file that was saved on a Mac. Apparently, line endings on Macs use special characters, and the parser “chokes” on them and spits out this error.

To get around this, clean the input by adding this line of code:

// Normalize new lines
result = result.replace(/[r|rn]/g, "n");

This worked for me, hope you find this helpful! (via this link)

Build vs Buy: Appcelerator Cloud API Case Study

A lot of times a project at hand has some components that can either be built from scratch, or a ready-to-use solution can be bought from a 3rd-party vendor. For example, there’s lots of ways to build a blog or a CMS, but most likely you will just use one of the many solutions already available on the market (such as WordPress).

One of the projects I have at work has a star rating component, and we had a vendor in mind – Poll Daddy (interestingly enough owned by Automattic – the creators of WordPress). It’s a super quick and easy JavaScript-based solution, that allows users to give a rating from 1 to 5 stars. It costs about $900/year for unlimited ratings, and requires virtually no development effort (aside from copying&pasting the script code).

Then, someone suggested we user a “cheaper” option – a rating component built on top of Appcelerator cloud services. Usage of the API is apparently free for up to a certain call volume (and who doesn’t like free?). I’m not opposed to using a better solution, so I decided to look deeper into this platform and what it offers.

Here’s essentially what it is: Appcelerator Cloud Services provides a back-end infrastructure mostly targeted towards mobile apps that use its Titanium development platform. The API provides a layer of methods and services that allows developers to build apps without worrying about server-side infrastructure. There are pre-built components that can allow for faster development, one of the components is Ratings and Reviews.

However, it’s not a plug-and-play deal. In order to achieve the star ratings functionality that we need, there are multiple implementation steps and gotchas:

  • We would need to create a list of products we want to be rated as Custom Object type in the API
  • To submit reviews we’ll have to know which product they apply to, so we’d have to map Custom Objects to products, probably performing a “GET” call to fetch Custom Object data and map it to products
  • We’d need to create a mechanism to prevent duplicate submissions (session or cookie or IP-based). Appcelerator asks for a user_id value to be specified whenever a rating is submitted, so it means we would have to create and work with a User object as well
  • Submitting a rating requires the user to be logged in – another API call
  • It also turns out that PUT and DELETE API methods trigger an XHR error from 3rd-party domains. This is resolvable by adding headers (Access-Control-Allow-Origin per CORS specifications), but will require additional settings adjustment on the server-side
  • And finally, any custom development will need to be thoroughly QA’d – which adds effort and time

I’m sure the Appcelerator cloud API is a great solution for certain cases, but for a super-simple component in my scenario it is much quicker and easier to go with a pre-built solution that satisfies all of my requirements.

Funny enough, we had another “build vs buy” discussion at lunch with Mike today, and thought that 80/20 rule can be applied to this problem: if spending 20% of the effort yields you 80% of result, that’s what you should go for.

Curious to hear about other build vs buy examples, so leave your notes in the comments!

PyCon 2015 tutorials at home

Last week, due to awesomeness of the internet, I learned that PyCon 2015 conference is happening in April in Montreal. This got me super excited, even though I don’t quite get to use Python as much as I’d love to. The conference seems to be organized so well, in the beautiful city of Montreal, with amazing workshop options, hotel share options, AND on-site childcare!

So there I am, excited and trying to plan how I can swing it, looking up flights (bonus post for you on saving over 50% on flights) and emailing this girl about sharing a hotel room… Then bummer! Not only the conference was sold out, but also most of the workshops! (I was only hoping to attend tutorial days) But to re-phrase that old saying: if you can’t go to a conference, let a conference come to you!

I made a list of the workshops I would take if I could go, and started looking online for authors and their past presentations. Luckily, all of them had prep materials and some even had videos!

Here is the list focused on machine learning and data analysis for all of you, fellow curious Python lovers. Thank you so much to speakers for sharing these amazing study materials.

  1. Machine learning with Python, basics

    Hands-on data analysis with Python by Sarah Guido
    Description
    Python is quickly becoming the go-to language for data analysis. However, it can be difficult to figure out which tools are good to use. In this workshop, we’ll work through in-depth examples of tools for data wrangling, machine learning, and data visualization. I’ll show you how to work through a data analysis workflow, and how to deal with different kinds of data.

  2. Hadoop with Python (video) by Donald Miner
    Description
    In this tutorial, students will learn how to use Python with Apache Hadoop to store, process, and analyze incredibly large data sets. Hadoop has become the standard in distributed data processing, but has mostly required Java in the past. Today, there are a numerous open source projects that support Hadoop in Python and this tutorial will show students how to use them.

  3. Learning Pandas by Brandon Rhodes
    Description
    The typical Pandas user learns one dataframe method at a time, slowly scraping features together through trial and error until they can solve the task in front of them. In this tutorial you will re-learn how to think about dataframes from the ground up, and discover how to select intelligently from their abilities to solve your data processing problems through direct and deliberately-chosen steps.

  4. Bayesian statistics made simple (video) by Allen Downey
    Description
    An introduction to Bayesian statistics using Python. Bayesian statistics are usually presented mathematically, but many of the ideas are easier to understand computationally. People who know Python can get started quickly and use Bayesian analysis to solve real problems. This tutorial is based on material and case studies from Think Bayes (O’Reilly Media).

  5. Building a recommendation engine with Python (video) by Diego Maniloff, Christian Fricke, Zach Howard
    Description
    In this tutorial we’ll set ourselves the goal of building a minimal recommendation engine, and in the process learn about Python’s excellent Pydata and related projects and tools: NumPy, pandas, and the IPython Notebook.

This post begs a follow-up on takeaways from each class. To be continued…

Facebook Product Engineering Open House recap

The other day me and a couple of coworkers went to the Facebook Engineering meetup hosted at the Facebook’s office.

The main draw for me were the topics of the scheduled talks – pretty technical, focussed subjects, and of course, seeing the office which I didn’t have a chance to visit yet.

My expectations for the talks were totally on point – all of the speakers were really good, very technical and knew what they were talking about. Some of the details were a bit above my head, personally, as I don’t have much in-depth knowledge of hadoop and related technologies (zookeeper, hive), but it was still interesting to hear about problems that arise and how the production engineering team solves them.

The intro talk by Dave Viner about the role of production engineering at Facebook in general was great too. Not many end users realize that it takes more than your regular front-end or back-end developers to run a huge, super heavily used service like Facebook behind the scenes. So their team touches everything from scalability to infrastricture to performance optimization and automation. Pretty cool stuff. I also really liked the demo on Open Graph, how the idea from 2011 came to actual realization in early 2013 and what it took and what sorts of problems the team ran into when building releasing this feature.

After the talks I asked my coworker Kedar what he thought, and he said “I wish I was smarter”. which was funny but also true for me. Those guys are solving really hard, interesting engineering problems – something I miss in the management part of my job. Plus I’m always jealous of people who get to work with super smart geeks and can learn a lot just by rubbing shoulders with them daily and working on something together.

All in all – really good event, and me and guys were all glad we attended. Oh, and of course I have to mention the food and the office: the office was nice, but nothing out of ordinary, actually. Drinks were plentiful and various – from a few different kinds of beer, wine to Starbucks bottled drinks to coconut water to Naked juice… anyone could have their pick. They served a lot of tiny appetizers, with some cupcakes and desserts and other small snacks available as well. I also snatched a tshirt for Sean, and wonder if he’ll be getting weird questons when he wears it, because the back of it says “Facebook Infrastructure”.

Thanks to Facebook for pulling a great, smart event together, and hopefully there will be more of them in the future.

PS: For those interested in good tech talks/meetups in NYC, some of my favorites are Code as Craft talks at Etsy (there’s one coming up in August), and Web peformance meetup . I like that they tend to focus on interesting engineering problems vs technology/framework du jour.