• Using curl with a web site secured by Rails Authenticity Token

    There’s a third-party hosted service that we use at work. It has a great browser-based UI which allows us to generate reports in Excel.

    This is all great, but we wanted to integrate the data this web app captures into some internal dashboards. The app has an API, but the API doesn’t give me a way to pull the information out as succinctly as the Excel reports I can pull through the browser-based UI. My goal was to figure out a way to programmatically log in to the web site, pull an Excel document using curl, and feed it into Pentaho’s excellent Kettle tool for ETL into a data warehouse that I can use to back our internal dashboards.

    I figured I could just post my login information to the server using POST parameters with curl. Unfortunately, the web site is not very curl-friendly. It uses a Rails method for securing login forms called Authenticity Token. The way this works is that every time you load the login page, Rails generates an authenticity_token which gets stored in your session and written out to the login form. Unless you submit the exact same authenticity_token that was generated in the login form, your login will fail. In order to maintain session continuity, Rails uses good old-fashioned cookies, so the cookies need to be maintained between the request and login as well. This is a great security mechanism to prevent people from hijacking your session, but not great when you’re trying to programmatically log into a web site with curl.

    To get around this, I needed to do the following.

    1. Load the login page using curl, piping the resulting HTML through grep and cut to parse out the authenticity_token. I also need to save the cookies set by the server as the login page loads.
    2. Post to the login form’s action with the cookies from Step 1, including all the login credentials and hidden fields (all of which can be hard-coded) and appending the URI-encoded authenticity_token (which must be dynamic since it is generated fresh on every page load).
    3. curl the URI that the web site would generate to load the Excel report that I want to pull down, redirecting the output to a file.

    Below is the shell script I came up with to do this, which I thought I would share with the world in case anybody else runs into this issue. I’m not a great shell scripter and regexs confuse me (hence the ghetto cut method for pulling the auth token out), but this approach works.

    Note the character positions 287-330 in the first curl. These positions relate to the location of the value of the authenticity_token in the login page of the web app I’m authenticating against. Make sure to tune these values to your needs as necessary, or use another way to pull out the value. The tokens are always the same length and appear to be Base64 encoded strings.

    # Pulls the login page and strips out the auth token
    authToken=`curl -L -c cookies.txt --silent 'https://railsapp.com/path/to/login/page' | grep 'something unique to the line of HTML code with authenticity_token, such as the form field ID' | cut -c 287-330`
    # Posts all the pre-URI-encoded stuff and appends the URI-encoded auth token
    curl -L -c cookies.txt -b cookies.txt --silent --data 'pre-uriencoded key value pairs for everything in the form except for the auth token' --data-urlencode authenticity_token=$authToken 'https://railsapp.com/path/the/login/form/submits/to' > /dev/null
    curl (whatever URL you need to hit after logging in to get what you need - in my case, the Excel report spreadsheet)

    Category: Uncategorized | Tags: