DataChomp

Chomping At The Bits

Always Get an Absentee Ballot

Since living in Oklahoma, I’ve found one of the absolute best ways to vote is by getting registering for an absentee ballot. Oklahoma has pretty lax laws on this sort of thing so it’s easier than you think. In addition, having an absentee ballot doesn’t mean I mail in my vote. I still go to the poll, hand in my absentee ballot so they can destroy it, and vote like a normal downtrodden citizen.

What do I even vote for:

Oklahoma is a closed primary state. What this means is as a registered Independent, I’m only allowed to vote on a subset of items. Having a ballot sent to me when I’m allowed to vote is a great time saver vs meandering the UI of a government website. It’s also really handy when you’re talking to a politician to let them know that you’re not allowed to vote and let them move on to the proper class of citizenry.

Voting Matters:

When it comes to Federal politics, people will often lie to you and tell you that your vote matters. What they should really be telling you is that your vote only matters when it comes to local laws that actually affect you. For example, in Oklahoma, a 1% tax increase can pass with as little as +/– of 8 votes. While talking about if flag burning is treasonous is quite fun, taking a few moments to control how your income is distributed in your community is likely more patriotic.

Lobbyotomy:

Being able to see my entire ballot before voting day lets me take 45 minutes or so to independently research information on the actual proposals on the ballot. Previous to this method, while I knew about some of the popular issues due to high dollar campaigns trying to blugeon my vote, there would always be a few items on the ballot I had no idea about. I found this incredibly disheartening. The beauty of the absentee ballot is that I’m not longer surprised, second guessing, and leaving blank ballots because I can’t quite parse the ridiculous lawyer speak.

Enums for When You Just Cant Handle Another Foreign Key

If we get over the notion that all data is created equal, it opens up options for us. For example, your DBA wants some sort of data consistency and integrity… both very good things but you are just making a quick little 1 off table that you don’t want to sprawl to 4 or 5 tables with foreign keys.

enum my heart

For these situations, I like to just use postgres enums. Check it out:

create type verdict as enum ('gross', 'can eat', 'amazing');

create table burritos (id serial not null primary key, title varchar(50), thoughts verdict not null);
insert into burritos (title, thoughts) values ('road kill burrito', 'yummy');

insert into burritos (title, thoughts) values ('road kill burrito', 'gross');
--Complete Failure

insert into burritos (title, thoughts) values ('grande queso burrito', 'amazing');
--Complete Awesome

Sweet! We’re getting the data we want in, it looks legit, and there is no querying of multiple tables to get the values back. Another benefit to this is that AppDev isn’t continually questioning if they set up the FK correctly.

blowing in the winds of change

But how maintainable is it? Say we had a less than ideal experience… one that requires a lawyer. need to add to our list:

alter type verdict add value 'lawsuit' before 'gross';
insert into burritos (title, thoughts) values ('putrid plate', 'lawsuit');

That was pretty easy, though I must say I’m a little confused on exactly which values I have in my enum. Good thing we’re in a database and can just query it!!!:

select t.typname as enum_name,  
       e.enumlabel as enum_value,
       n.nspname as enum_schema
from pg_type t 
   inner join pg_enum e on t.oid = e.enumtypid  
   inner join pg_catalog.pg_namespace n ON n.oid = t.typnamespace
where t.typname = 'verdict' and n.nspname = 'public'

re-fear-factor

If you are ok with the above code… then good on you. However, we might want to populate a dropdown or something else in our application based on these values. Neither your typical Rails Dev or ActiveRecord itself is going to try to implement the above without tears of sadness streaming down to their Mac. We can build a bridge by just making a simple view and assigning a self.table_name = to the following:

create view my_lovely_enums as 
select t.typname as enum_name,  
       e.enumlabel as enum_value,
       n.nspname as enum_schema 
from pg_type t 
    inner join pg_enum e on t.oid = e.enumtypid  
    inner join pg_catalog.pg_namespace n ON n.oid = t.typnamespace
where t.typname = 'verdict' and n.nspname = 'public';

Hardcoding ‘verdict’ like that creates a pretty brittle where clause. I tend to remove that part and go a little more flexible with a predicate/scope to select your enum of choice:

create or replace view my_lovely_enums as 
select t.typname as enum_name,  
       e.enumlabel as enum_value,
       n.nspname as enum_schema
from pg_type t 
    inner join pg_enum e on t.oid = e.enumtypid  
    inner join pg_catalog.pg_namespace n ON n.oid = t.typnamespace;
select * from my_lovely_enums where enum_name = 'verdict' and n.nspname = 'public';
"verdict";"gross";"public"
"verdict";"can eat";"public"
"verdict";"lawsuit";"public"
"verdict";"burrito dreams";"public"

For some more information on this topic, check out a lovely write up at postgresguide.com. The bottom of that article has this handy little buyer-beware:
“At the time of this writing, Postgres does not provide a way to remove values from enums.” While this approach can’t be used for all situations, using it where you can get creates a tremendous amount of simplicity in your app without compromising integrity – affectionally known as a win-win.

I Am a Bad Blogger

I’ve never really thought about how bad of a blogger I am. I mean, I always knew deep in my heart I wasn’t very good, but a few conversations over the past few months have really solidified just how bad I am. My confession begins.

Some Person in Somalia has Your Content

Over the years, I’ve seen a lot of emotional breakdowns and anger when someone sees their content reposted on bottom feeder sites. Some people have even pointed out to me when things from this blog appear on there. Each time this has happened, I’ve been unable to get angry or motivated enough to try to get it removed. I remember some people like Tom LaRock putting ridiculous “If you are not at thomaslarock.com, this material has been plagiarized by someone who steals content from others.” Tynt scripts into his publicly pasted query and just being bewildered by the drama surrounding these episodes. For this lack of sharing without conditions and unwillingness to feign vitriol at a bot, I’m a bad blogger.

Always Be Tweeting

I don’t tweet or retweet my same entry multiple times a day. I blame this partly on the fact that I’m not a consultant nor have the self esteem issues needed to shamelessly market and remarket myself over and over. Another thing I do very poorly is submit a post to Hacker News and then stand on the tweet corner begging for upvotes. Because of this lack of pandering and repetitious pollution, I am a bad blogger.

Thanks for Writing My Opinion

me: ‘Your post isn’t any different than the press release.’
blogger: “That is what they told us to say.”
To this day, the above conversation has never made sense to me. I’m always fascinated by people who do a repost of a marketing piece given to them and then celebrate what a prolific blogger they are. This apathy towards a raw number of blog posts as well as not being an echo chamber for a company on this blog is completely my fault and I apologize.

A Number of Reasons

“How many readers do you have?”, “How do you tweak your SEO?”, “How often do you reblog your most popular post?”, “What is your peak time of day to release content?” and on and on. Barf. Some people think I’m just joking when I say I have no idea on the above questions, but it’s the truth. More than pretty charts and clicks per minute, I tend to be more inspired by someone just saying ‘hey dude, I checked out that thing on pgbouncer and it worked for me.’ or being stuck on a smtp configuration in SSRS and searching this site for it. While this does cement that I’ll never be an elite blogger, it does let me take acceptance that I’m just a bad blogger.

Truncate Table Opinions

There are a lot of reasons to blog, and for the most part mine are incredibly selfish. I blog to try to remember things, I blog so I can reference certain posts later… I blog to help put my thoughts into words and refine an opinion on something. The archive on this site might deceive you, but I tend to ‘blog’ at least once a week. A great majority of these never get published, but there is something therapeutic and reinforcing about writing something out or laying out a process into steps. If you want to get technical, you could say that I keep a journal that I sometimes publish out of and I wouldn’t fight you on it. When you stop caring about others opinions or turning things into a popularity contest, you might be surprised at how easy “blogging” starts to become and how helpful it get in an industry that doesn’t slow down.

  • This was a 10 minute journal entry that I just sort of decided to hit publish on. Goal was to have something to link when I get stuck in these silly conversation.
  • Followed by 30 minutes or so of fixing typos because I’m illiterate.

The AppDevs Are Idiots

In my many years of Enterprise DBA servitude, there would be many instances of blanket ‘the devs are idiots’ statements. Sometimes by me, sometimes by others and from the perspective of the database it was usually justifiable. In these enviornments, this is acceptable. I have no idea what busy work task they have come up with and why should I even care about anything out side of my database? These AppsDevs are disgusting dirty creatures that just sling filth everywhere as they stomp their hooves on the keyboard. Worthless.

Inner Joined at the Hip

What I enjoy most about the above mindset is the expectation of the AppDev to know their job and be equally proficient at my own. They must be fluent in the language of Stored Procedures else everything collapses from the data hate speech that comes from their object-pooriented programs. I’ve often wondered where this one sidedness originated from. For a bit I thought it was just something that happened at places I worked. It’s not just there though. Check out your local communities and you’ll see the imbalance there as well. I attend both database user groups and developer based user groups. I often see a fair amount of AppDevs at sql events… I pretty much never see DBAs at developer events.

Cross Apply What You Know

A few years back, I got involved in with a few open source .NET ORMs. It goes without saying that ORMs are the Vietnam of the software world and I was pretty much blown away at how desperate these projects were for feedback. It was so easy and fun to help out and it totally changed my opinion of ORMs. It also changed my opinion of the curmudgeoned DBA. If you want some contrast/perspective, I’ve also helped out a few ‘open source’ database projects by their prefered method – email.

Refactor the Query

The more time you spend with AppDevs the more you realize just how many of them are stuck with really bad DBAs. Now when I hear DBAs complain I ask the following of them: What open source work do you do? What dev conferences do you attend with the team? What have you personally built? What source control do you use so that the team can see your scripts and ask about them?

You don’t have to be Nostradamus to know how the people I’m talking about answer the above questions. Like so many things with humans, the loud ones with the most generic argument are typically doing the least to fix the problem. If you are an Oracle/SQL Server DBA, don’t just hide behind our industry specific firewalls – get out there. Engage the AppDevs, review their code, help some project, make your scripts public for them to see. We’ve sat in our foxholes long enough and the AppDevs are not our enemy. It’s our turn to build the bridge.

Getting Started With PGBouncer

One of the issues we face at Raisemore is a continuous flux in our number of connections to our primary Postgres database. If our load increases and we have to spin up more front end servers or spin up more backend servers. This can quickly wreck our connection limit. One solution, is for me to babysit the connection size on the pg server itself… or perhaps I can give the Apps the middle finger and starve their connection limit – neither of these lead to happy outcomes for everyone involved.

apt-get install happiness

A better compromise is to just put a connection pooler like PgBouncer in front of our postgres database. In the same way we proxy our applications with Nginx, we can apply the same concept to our database(s) with Pgbouncer. Let’s get started:

--Assuming your pg database is already up and running
--databasename = burrito_hq  username = elguapo
sudo apt-get install -y pgbouncer
sudo nano /etc/pgbouncer/userlist.txt
> "postgres" "postgres"
> "elguapo" "hefe"
sudo nano /etc/pgbouncer/pgbouncer.ini
> [databases]
> burrito_hq = host=localhost port=5432 dbname=burrito_hq user=elguapo password=hefe
> [pgbouncer]
> listen_addr = *
> auth_type = md5
> admin_users = postgres
> stats_users = postgres

Above are some of the settings I changed to get started. As you get more familiar with Pgbouncer, change the various settings as you see fit and for your workload.
Let’s test our settings by firing up PgBouncer. PgBouncer is closely tied to the postgres user on the server, so we’ll ‘switch user’ into the postgres account to crank it up:

sudo su postgres -c"pgbouncer -d /etc/pgbouncer/pgbouncer.ini"

On our client machine, lets attempt a connection to our server which has an ip of 172.16.150.128:

psql -h 172.16.150.128 -p 6432 -U elguapo -d burrito_hq 

and hopefully we’re in! Once we know our settings are in good shape, lets turn PgBouncer auto start mode on:

sudo nano /etc/default/pgbouncer
> START=1
sudo reboot #reboot the box to test it out

When the server comes back online, hopefully you can connect again from the client. Now that the server is stable again, we can go get completely lost in the documentation and let the tool really shine.

Heroku Pg-extras Builds Bridges

It starts with the all too familiar “I think my database is running slow” and ends with the AppDev and myself speaking different languages, yelling at each and saying hurtful things… all the while the app and db sit in their servers neglected and needing love.

pg_therapy

While many of us that live in the database layer have tricked out .psqlrc files, I rarely find AppDevs with the same. Let me stress that this is perfectly ok!!! But having some really ugly diagnostic queries aliased in your .psqlrc is incredibly helpful. Majority of the AppDevs I talk to concerned about their db also happen to be running on Heroku. To accomodate its users, Heroku has put out a very helpful plugin to their toolbelt called pg-extras. I love this tool because it gets the AppDev and I on a common language. They don’t get scared by hairy SQL statement and it’s consistent from app to app. Check it out:

Pump you up

Installation is a breeze:

heroku plugins:install git://github.com/heroku/heroku-pg-extras.git

Using it a breeze:

heroku pg:bloat  #one of my favorites!

When I run that, it tells me that I didn’t specify a database:
! Unknown database. Valid options are: HEROKU_POSTGRESQL_BLACK_URL, HEROKU_POSTGRESQL_PINK_URL

I run it again with the proper db:

heroku pg:bloat HEROKU_POSTGRESQL_BLACK_URL

Now I have an easy to use snapshot of our database that has an incredibly low barrier of entry to execute for myself or anyone. Let’s say that the person running it doesn’t trust me or perhaps I’m in a burrito coma, they can always refer to the heroku database tuning site and get sound information.

Dude, I don’t even have time for that.

Maybe the above is a bit too much to stomach. That’s cool, we’ll just flip another easy switch:

heroku addons:add librato

Librato is a beautiful dashboard that is fun to look at, easy to use and will give us some historical context for troubleshooting the typically ill fated “I think my database is slow” situation. What is nice about tooling like Librato is not only does it show our db stats, we also get the context of the Application(Dyno’s, router latency, etc). For more information on this, checkout Craig Kerstiens post on Monitoring with Librato.

Hugs and Queries

I love this type of solution because it gets us looking at hard numbers and can easily as well as objectively say it is the DBs fault, the Apps fault, heroku’s fault, or no ones fault and everything is fine. Librato and Postgres don’t keep track of our egos, our mood or our fatigue – they keep us honest and on topics that really matter… something we all could use a little more of.

Sequel and Postgres Range Types

Like many rubyists, I have a hard time keeping up with which burrito truck is parked outside my house on any given day. In postgres, I’ve solved this by entering the trucks and the duration they will be staying parked outside in my database:

CREATE TABLE burritotrucks
(
  id serial primary key,
  truckname text not null,
  onsite daterange not null --or tsrange if busy schedule
);

insert into burritotrucks (truckname, onsite)
values ('thumpy tortillas', '[2013-11-15, 2013-11-17]'),
('chompy delight', '[2013-11-15, 2013-11-17]'),
('no dogs allowed', '[2013-11-16, 2013-11-21]'),
('government cheese wagon', '[2013-11-11, 2013-11-14]'),
('tortuga mochilla', '[2013-11-13, 2013-11-18]');

There are a variety of operators to query range types, but for this example, I’ll just be using the contains operator:

select * from burritotrucks where onsite @> now()

Uh oh, it’s not working. Unlike the other range types, date ranges require a little more finesse. Postgres doesn’t quite trust the implicit conversion so we’ll just do a little hand holding:

select * from burritotrucks where onsite @> now()::date

It works! But how do we do this in our application with Sequel? First, we let Sequel know we’re going to be using the range extensions:

Sequel.extension(:pg_range)
DB = Sequel.connect(:adapter=>'postgres', :host=> 'localhost'
, :database=>'frontyard', :user=>'dc')

We query the database:

options = DB[:burritotrucks].all
p options

and we get the data. We can see how beautifuly Sequel handles the range type for us in the printed output and at this point we can get away with using our raw sql to get the data we need:

options = DB.fetch("select * from burritotrucks where onsite @> now()::date").all
p options

This works, but I can imagine many AppDevs becoming ill at the sight of SQL in their app. In order to appease both sides of the aisle, Sequel also has some pretty ruby friendly operators we can use by just adding the core_extensions as well as the range operators extension:

Sequel.extension(:core_extensions, :pg_range, :pg_range_ops)
DB = Sequel.connect(:adapter=>'postgres', :host=> 'localhost'
, :database=>'frontyard', :user=>'dc')

Now we can get much more ruby friendly queries. Note, We still need to cast for our date value but that is pretty trivial:

options = DB[:burritotrucks].where(:onsite.pg_range.contains(Sequel.cast(Date.today, Date))).all
p options

People often ask me why I like Sequel so much and this is another great example why. It doesn’t punish me for knowing SQL. It doesn’t punish postgres for having so many amazing features and data types. It easily lets me know how many burrito trucks I have in my yard which is something you can not put a value on.

Writable CTEs in Postgres

Postgres is filled to the brim with awesome features, but they don’t make sense for every occasion. I posted last night about my Thankyou app and it happens to have a use case for writable CTEs (Common Table Expression).
First, lets new up some data:

drop table if exists thanks;
create table thanks (
    id serial primary key,
    who text not null,
    picked boolean not null default 'false',
    created_at date default now(),
    last_picked date default '-infinity'
);

insert into thanks (who)
values ('rob conery'), ('postgres'), ('sidekiq'), ('demis bellot'), ('sinatra')
, ('josh berkus'), ('elizabeth naramore'), ('amir rajan'), ('sequel');

select * from thanks;

Boom! This app selects a random person or project I’m thankful for that hasn’t already been picked or hasn’t been picked in the last 9 months. Here it is in code form:

select * from thanks where picked = false or last_picked < now() - interval '9 months';

Now we can put it in a normal CTE:

with guesswho as (
    select * from thanks 
    where picked = false or last_picked < now() - interval '9 months')
select guesswho.id, guesswho.who
from guesswho;

Yes!!! Let’s pick a random row. For our randomizer, I would like a lovely set of sequential id’s I can pick from. Since we can’t completely trust the primary key id’s returned in our CTE (especially as rows start to get trimmed off), we’re going to throw a row_number function on, as well as pass our first CTE into a second CTE to generate a random number based on the result set:

with guesswho as (
    select ROW_NUMBER() OVER (ORDER BY id) as champs, * 
    from thanks where picked = false or last_picked < now() - interval '9 months')
, onlyone as (select trunc(random() * count(0) + 1) as tops from guesswho)
select guesswho.id, guesswho.who
from guesswho, onlyone
where champs = onlyone.tops;

Yay!!! We’re getting data we love on projects we love. But where does the writable CTE come into play? How cool would it be if we could also mark the record we’re selecting as picked, so we don’t have to make an additional call to the DB to flag afterwards? Check it out:

with guesswho as (select row_number() over (order by id) as champs, * 
    from thanks where picked = false or last_picked < now() - interval '9 months')
, onlyone as (select trunc(random() * count(0) + 1) as tops from guesswho)

, adios as (update thanks set picked = 'true', last_picked = now() from guesswho, onlyone where thanks.id = guesswho.id and 
guesswho.champs = onlyone.tops RETURNING thanks.picked)

select guesswho.id, guesswho.who
from guesswho, onlyone, adios
where champs = onlyone.tops;

So.Much.Awesome! Check out that 3rd CTE we added (adios), is using our previous CTE’s, updating the row we chose, as well as providing a returning statement of what it updated. The returning statement provides context for what happened inside that writable CTE to alleviate confusion between table expresisons. Feel free to play with it on your own machine and especially change the data for the people or tech you’re thankful for.

My Friend Rob

It’s been a little over 6 months since I left .Net and one of the personal concerns I had with Postgres/Ruby was how long it would be before I started getting complacent and taking everything for granted.

Always Be Coding

To counteract this concern, I made a little database full of projects and people I’m thankful for. You’ll see projects like sidekiq, hstore, pg_stat_statements, sinatra and you’ll also see people like Josh Berkus, Ryan Bates, Demis Bellot… It’s become a pretty big table because I’m thankful to so many. I also made a little console app to pick a random row and email it to me once a week because what good is data if you don’t use it?

It Passes in a Flash

The past few months have been amazing. The team I work with at Raisemore is top notch and it is an incredible feeling to be working with and helping non-profits every day. Tekpub got acquired by Pluralsight. That experience has been a lot of work and an absolute blast building videos with them. I’ve also helped reboot our local Ruby user group with 2 other friends and spoke to that group today on two topics I absolutely love – Sequel and Postgres.

For Whom The Phone Tolls

We just got back from dinner and I’m checking my usual blog posts and nightly reading when I get the familiar email every Friday at 00:00:00 UTC:
Subject: Be Thankful
Body: 1 – Rob Conery

I don’t know if it gets anymore serendipitous than that. Rob is the first entry I put in the table. Since I’ve met Rob, he has challenged me to be a better DBA, challenged my comfort zones and to be an overall better person. Some of the above he’s done directly, some of it indirectly and I’m thankful for all if it.

perspective.empty?

I can’t wait to see who/what shows up in my inbox next week. I can’t wait to reflect on what is was that connected with me. When I get the next mail, I don’t know yet if I’ll take the time to personally thank them or the project. What I do know is that between now and when I get my next mail, I hope that I can have the same effect on others my list has had on me.

Helping Asset Pipeline Open Up a Little

A working Asset Pipeline can be a thing of beauty. A broken asset pipeline can be a horribly mean shut-in. One of my favorite ways for it to break in my Rails 3 app is the:

undefined method `directory?' for nil:NilClass
rake aborted

Thanks AP, for breaking and basically telling me nothing. Luckily, it doesn’t have to be this way. If we take a moment to stop and get to know AP, we can get it to open up and be more helpful to us. We start by setting our bundler editor in .bashrc by adding the following line:

export BUNDLER_EDITOR=subl   #I set mine to sublimetext

Next up, we hop into terminal and go for an in home visit right into the code itself:

bundle open sprockets  #sprockets is AP's birth name

Navigate to lib->sprockets->base.rb

find your way to the “def each_entry(root, &block)” section of code and you’ll likely see a lack of error handling there. To fix it, I wrap a rescue around the “directory?” check like so:

begin
    if stat(path).directory?
      each_entry(path) do |subpath|
        paths << subpath
      end
    end
    rescue
      puts "Hey friend, I have an issue at: #{path}"
    end
end

This doesn’t fix all of AP’s problems, none of us are perfect, but it does coax the gem into letting us know where the problem is. Once we know where the problem is, we can work on it together rather than just throwing our arms up in the air and quitting.