Python script to check HTTP status and redirect chains

Here’s a quick and dirty Python script that goes through a list of URLs, and for each URL checks HTTP status codes (200, 301, 404, etc.), along with number of redirects and redirect chains (each redirect destination).  Saves output into a text file (you can modify to save to CSV, but I used text import feature in Excel to import tab-delimited data).

Well, damn hosted WordPress doesn’t let me paste code correctly, so here’s the link to Gist.

Update: and here’s how to post Gist properly:

Screen Shot 2015-10-01 at 10.59.23 AM

Setting up WordPress behind a reverse proxy

Task: put a WordPress blog/site behind a reverse proxy. So blog.mysite.com moves to supersite.com/blog.

Why? In our case, we do it for branding and also for SEO benefits: larger site supersite.com receives more traffic, and we want to fold our blog content into supersite.com/blog.

Let’s agree on a couple of terms before we start:
– original blog.mysite.com is A
– destination supersite.com/blog is B

I’m using the latest version of WordPress (4.2.2) and Apache (2.2) on Ubuntu.

Let’s get started!

  1. Server setup

  2. Before we even touch WordPress, we need to make sure that the server hosting B is ready to accept requests and forward requests for any URL to A.

    Assuming we have Apache setup and working, let’s make sure mod_proxy is enabled. With root or sudo privileges, run:

    a2enmod proxy_http
    service apache2 restart
    

    Then, open the Apache virtual hosts config file, and add:

    ProxyPreserveHost On
    ProxyRequests Off   
    <Location /blog>
        ProxyPass http://blog.mysite.com 
        ProxyPassReverse http://blog.mysite.com  
        Order allow,deny   
         Allow from all
    </Location>
    

    Quick explanation for each line:

    ProxyPreserveHost On: Off by default. I’m turning it On because I host multiple sites on this server and use virtual hosts. If you don’t need it, simply exclude this line

    ProxyRequests Off: should only be on for forward proxy, not for reverse proxies

    <Location /blog>: applies proxy directives to requests matching /blog

    ProxyPass http://blog.mysite.com: creates a mapping from a path within the local web site to a given remote URL

    ProxyPassReverse http://blog.mysite.com: rewrites URLs in HTTP headers
    Order allow,deny: allows access to all proxied content

    After that, restart apache again:

    service apache2 restart

    Result: If you got this right, when you load supersite.com/blog you should see your WordPress blog home page.

    Note: if you see errors 500 on interior pages, check .htaccess file for any rewrite rules, and adjust if necessary.

    If you run into issues, here are 2 great sources on how to setup reverse proxy on Apache:

  3. Update WordPress settings

    Now we got to make sure that all URLs (category pages, single post pages) also display correctly on site B.

    For this, log into WordPress admin using the original login link: blog.mysite.com/wp-login.php.

    Go to Settings > General, and update the “Site address (URL)” field to B (supersite.com/blog).

    Screen Shot 2015-07-13 at 2.52.32 PM

    Result: All pages should be accessible and rendering correctly.

  4. Update redirects, canonical tags, robots.txt

  5. This is a pretty extensive area that I have not completed yet, so I’ll add to this section soon once I have this task completed. Stay tuned!

  6. Fonts

  7. If you use any font embedding services, make sure you have correct license to match your new domain (supersite.com), because it’s different than your original blog.mysite.com.

  8. Administration

  9. You should be able to login and administer all content via the original link blog.mysite.com/wp-login.php.

Being a detective (work these days)

These days work is fun. I have a handful of project of various complexity, and the deeper I dig, the more related projects I discover.

Example: streamlining and organizing existing apps and setting up continuous integration that works for Python and potentially node in the future.

Python is great and all that. But I’m struggling with all the dependencies and pieces to be managed (module dependency, db, aws storage, cron, source control and remote server management). Using js-only stack with heroku and deployments triggered via git in the past was way quicker.

Currently reading: http://www.fullstackpython.com/continuous-integration.html

Data science is the new cool thing to learn

I wrote about learning practical data science with Python, and nowadays everyone wants to learn and teach data science. From many online educational sites, to General Assembly and this organization called Metis, that provides extensive courses and assists with job search in that field.

I think Metis’ data science bootcamp sounds really cool, and a great career path for someone who is apt for statistics/analysis/programming and starting out (or switching to tech).

Too many tools spoil the web dev world

Tool

Well, not this kind of tools :)

Marco elaborates on why tools and frameworks for web development have become so complicated and convoluted (and usually unnecessary), in response to PPK’s post.

Web development has never been more complicated or convoluted than it is today due to the sheer quantity of tools (and their rapid rate of change) involved in most modern web-dev environments.

At the Generate conference, Brad Frost also half-jokingly noted that a web developer applying for a job these days must go through a ridiculous list of “frameworks” that he/she must know.

“Our job descriptions contain so many acronyms… How do we keep from drowning in a sea of devices, tools, technologies, Medium posts, tweets, and opinions? And how do we maintain our sanity in the process?”

My short take of this: know why tools exist and what problem they solve, at what cost. Like with anything in this life, you can’t have it all. Frameworks might help (or often appear to help) with speed and cost, but you might lose on quality of the end product, since it is not going to be perfectly custom to your needs.

PPK proposes a solution:

The solution is simple: ditch the tools. All of them. (No, I’m not being particularly subtle here.) Teach the newbies proper web development. That’s it, really.

The web’s answer to the native challenge should be radical simplification, not even more tools.

Would you agree?

jQuery CSV parser shows CSVDataError: Illegal state

Just a quick troubleshooting tip. If you are trying to use the jQuery CSV data parser on Mac, it might give you this error:

CSVDataError: Illegal state [Row:1]

The reason is likely your CSV file that was saved on a Mac. Apparently, line endings on Macs use special characters, and the parser “chokes” on them and spits out this error.

To get around this, clean the input by adding this line of code:

// Normalize new lines
result = result.replace(/[r|rn]/g, "n");

This worked for me, hope you find this helpful! (via this link)