I decided to start seeding the database for one of my projects with some US Census data. At first I thought I would scrape the data using scrapy, a robust web scraper for Python. Then I realized that the data I wanted to start with was actually in a PDF and also in an Excel file. I couldn’t open the Excel file using Open Office as it was flagged as corrupt. However, using the Microsoft Office free trial version of Excel I was able to open the file and convert it to a CSV text file. I played around with the file a bit in Python and at first I was just going to have it print out the data in a format that I could use to seed the Rails database through the seed.rb file. But then I found a nicer and more Rails way to do it at http://stackoverflow.com/questions/12028334/having-trouble-seeding-csv-file-into-rails-app through a rake task that parses the CSV data and seeds the database. This seemed to work really well.

Here is what one row of data looked like:


“Somewhere, CA”,90.1,90.1,90.1,90.1,90.1,89.8

And the rake file:


require 'csv'

namespace :csv do

  desc "Import CSV Data from COLI Census city data"
  task :taskname => :environment do
    csv_file_path = 'db/filename.csv'

    CSV.foreach(csv_file_path) do |row|
      if row.length > 4  #wanted to ignore some straggling data and blank spaces in the file
          :name => row[0],
          :composite => row[1],
          :grocery => row[2],  
          :housing => row[3]      
        puts "Row added!"

To run the rake task:

rake csv:taskname

Although generally the data looked good, I had a small issue with my cost of living index data, which I hoped to set as a decimal type, specifying the precision and scale to limit the number of digits. For example, I wanted a cost of living index of 98.1 to be saved in the database as 98.1 rather than some floaty approximation with fifteen significant digits.

class CreateCities < ActiveRecord::Migration
  def change
    create_table :cities do |t|
      t.string :name
      t.decimal :composite, :precision => 4, :scale => 1
      t.decimal :grocery, :precision => 4, :scale => 1
      t.decimal :housing, :precision => 4, :scale => 1

However, after I seeded the data I found that my precision and scale values were being ignored, and I was getting repeating decimals in some cases, like 94.40000000000001. I realized that I am currently using SQLite for this project and that it does not utilize the decimal type and won’t recognize the precision and scale values. http://www.sqlite.org/datatype3.html. SQLite interprets the decimal type as numeric affinity, and then converts the data into an integer or real number if possible, and otherwise into text. I am hoping this issue will be fixed if/when I move to a PostgreSQL database.

Note: After writing this post, I realized that the data on the US government website was taken from a private party and is not in the public domain. So I will be looking for new data sets and I will be rewriting this application to accommodate the new data.

Tags: , , , , , ,

In categories: databases, Rails, web development



Only the name field is required. Your email will not be published.