DataChomp

Chomping At The Bits

Getting Postgres Version With Golang

I’ve been looking at some monitoring solutions for my network at home. I’ve used nagios in the past but it has quite a few drawbacks. Most noticably, the whole fact that it doesn’t scale to millions of nodes or use a popular javascript framework for its front end pretty much makes it a no-go for my needs. Thanks to my background in technology, I know that the next best option is to just write something myself. As I stated earlier, nagios is a no-go and I need something that is… so I chose go-lang. The fact that go is in the name tells me it will work. Let’s get started!

Plumbing:

1
2
3
4
5
6
7
brew install go
mkdir ~/go
export GOPATH=~/go
export PATH=$PATH:$GOPATH/bin
go get github.com/lib/pq
mkdir gogios && cd gogios
touch gogios.go

Now we code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package main
import (
  _ "github.com/lib/pq"
  "database/sql"
  "fmt"
  "log"
)

func main() {
  db, err := sql.Open("postgres", "user=rob dbname=postgres sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }

  var version string
  err = db.QueryRow("select version()").Scan(&version)
  if err != nil {
      log.Fatal(err)
  }
  fmt.Println(version)
}

Run it:

1
go run gogios.go

Output:

1
PostgreSQL 9.4.4 on x86_64-apple-darwin14.3.0, compiled by Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn), 64-bit

Now, while we haven’t replaced 100% of nagios, we’re off to a great start and probably only have a few hundred hours of coding left. Good thing it is the weekend!

Query Archive Database, the Apathetic Way

When talking to people about data archival strategies, a question that comes up regularly is ‘how do I even get the archived data?’ This is in part because ~modern~ web frameworks like Rack-on-Rails do not always make multiple data sources a straight forward task. True to form, for every problem a web framework can create, postgres can usually fix it.

Say we have a main database – ‘burrito_store’ that holds the main transactions and an archive database ‘burrito_archive’ that holds stale records…ok, forget pretending, lets just get straight to code.

Main Plumbing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CREATE DATABASE burrito_store;
CREATE DATABASE burrito_archive;

\c burrito_store
CREATE TABLE burrito_sales (id serial primary key, customer_id int, sale_amount int, created_at timestamp, updated_at timestamp)

INSERT INTO burrito_sales(customer_id, sale_amount, created_at, updated_at)
values(1, 123, now(), now()), (1, 123, now(), now()),(1, 123, now(), now())
,(2, 123, now(), now()),(2, 123, now(), now()) ,(2, 123, now(), now())
,(3, 123, now(), now()),(4, 123, now(), now())

CREATE EXTENSION postgres_fdw;
CREATE SERVER archive
 FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host 'localhost', port '5432', dbname 'burrito_archive');

CREATE USER MAPPING FOR public SERVER archive OPTIONS (user 'rob');
CREATE FOREIGN TABLE sales_archive (id int, customer_id integer, sale_amount int, created_at timestamp, updated_at timestamp)
 SERVER archive OPTIONS (schema_name 'public', table_name 'burrito_sales_archive');

On the archive database:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
CREATE TABLE burrito_sales_archive (id int primary key, customer_id int, sale_amount int, created_at timestamp, updated_at timestamp)

CREATE EXTENSION postgres_fdw;
CREATE SERVER store
 FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host 'localhost', port '5432', dbname 'burrito_store');

CREATE USER MAPPING FOR public SERVER store OPTIONS (user 'rob');
CREATE FOREIGN TABLE sales (id int, customer_id integer, sale_amount int, created_at timestamp, updated_at timestamp)
 SERVER store OPTIONS (schema_name 'public', table_name 'burrito_sales');

INSERT INTO burrito_sales_archive(id, customer_id, sale_amount, created_at, updated_at)
SELECT id, customer_id, sale_amount, created_at, updated_at
FROM sales order by id;

At this point, we should be able to query and interact with the desired table from either database. Now, we can just create a basic view to make things easier for our poor application:

1
2
3
4
5
6
7
8
9
10
11
CREATE VIEW total_sales as
SELECT * from burrito_sales
UNION
SELECT * from sales_archive;

EXPLAIN SELECT * FROM total_sales ;
"HashAggregate  (cost=248.19..277.94 rows=2975 width=28)"
"  Group Key: burrito_sales.id, burrito_sales.customer_id, burrito_sales.sale_amount, burrito_sales.created_at, burrito_sales.updated_at"
"  ->  Append  (cost=0.00..211.00 rows=2975 width=28)"
"        ->  Seq Scan on burrito_sales  (cost=0.00..24.00 rows=1400 width=28)"
"        ->  Foreign Scan on sales_archive  (cost=100.00..157.25 rows=1575 width=28)"

Viola! It works! If you look at the cost of the foreign scan, you will see that this operation doesn’t come cheap or free. In this example, burrito_archive even lives on the same server, so you can imagine how performance amplifies as you get further from main datastore. That may or may not be important to you at this time and at a minimum, this strategy can at least get your Proof of Concept going while you check out gems like Octopus or additional decorators for your connection string/models.

Note: This is just a real quick and dirty way of doing this with very low technical overhead. Yes, other options exist depending on a variety of factors.

Carne Asada Hold the Self Join

Since leaving the tech industry, I’ve been disrupting the food service sector with my reimaginification of the food truck. Basically, I serve freshly heated burritos out of the back of my car. The company is called “Uber-ritos” and things could not be any better… well actually, there is this one thing.

Disrupting the Food Line

The point of sale system in the back of my car runs off a repurposed TI-82. While it has everything one might need to run a legit company (keyboard/ display/postgres), on old metal like that there is still quite the penalty for wasting CPU cycles. Like most Enterprise systems, it has a very noticeable hot spot. Let us see if we can fix it!

Plumbing

1
2
3
4
5
6
7
CREATE TABLE food_line(order_id int, step_number int, step_name text, completed_at timestamp);
INSERT INTO  food_line(order_id, step_number, step_name, completed_at)
VALUES (1, 1, 'heat tortilla', now()), (1, 2, 'load with food', null), (1,3, 'serve customer', null)
, (2, 1, 'heat tortilla', now()), (2, 2, 'load with food', now()), (2,3, 'serve customer', null)
, (3, 1, 'heat tortilla', now()), (3, 2, 'load with food', null), (3, 3, 'serve customer', null)
, (4, 1, 'heat tortilla', now()), (4, 2, 'load with food', null), (4, 3, 'serve customer', null)
, (5, 1, 'pour queso', now()), (5, 2, 'serve customer', null);

SubQuery V1

1
2
3
4
5
6
7
8
9
10
EXPLAIN SELECT fl.order_id, fl.step_name
FROM food_line fl
  INNER JOIN
  (SELECT order_id, min(step_number) as min_step_number
  FROM food_line
  WHERE completed_at is null
  group by order_id
  ) hu on fl.order_id = hu.order_id
          AND fl.step_number=hu.min_step_number
ORDER BY fl.order_id;

“Sort (cost=48.68..48.69 rows=1 width=36)”
Above is the first version of the “What’s left” query and while it worked well enough, I didn’t think it was that readable. My business deserves a CTE version.

CTE Version

1
2
3
4
5
6
7
8
9
10
EXPLAIN WITH last_step as (
  SELECT order_id, min(step_number) as min_step_number
  FROM food_line
  WHERE completed_at is null
  GROUP BY order_id)
SELECT fl.order_id, fl.step_name
FROM food_line fl
  INNER JOIN last_step ls on fl.order_id = ls.order_id
         AND fl.step_number=ls.min_step_number
ORDER BY order_id;

“Sort (cost=48.69..48.70 rows=1 width=36)”
I find this version much more readable. I can alias the CTE and tweak the last step process. It also doesn’t affect cost a whole heck of a lot…but cost is what I really need to lower.

Order Up!!! Window Function

1
2
3
4
5
6
7
8
EXPLAIN SELECT order_id, step_name
FROM (
  SELECT order_id, step_name, row_number()
      OVER (PARTITION BY order_id ORDER BY step_number) as stuck_step
  FROM food_line
  WHERE  completed_at is null) as data
WHERE stuck_step = 1
ORDER BY order_id

“Subquery Scan on data (cost=20.46..20.62 rows=1 width=36)”
Wow! Half the cost! This new found efficiency lets me serve an extra 15 burritos per lunch cycle which means more revenue! This additional income will let me decide if I want to pocket the extra money, or reinvest it back into the company. I might reinvest in something like upgrading the point of sale to a raspberry pi and fancy display. With more horsepower I can tackle some of the items the business unit wants but customers hate – like facebook/twitter integration!

Gaming on Linux

Like a majority of gamers in the U.S., I decided to make Linux (commonly called Ubuntu in Europe) my main gaming Operating System. In my research, I found the best Linux to use is one that is referred to as ‘Trusty’. I didn’t do a ton of research, but for the most part, all the Linuxes that “just work” get labeled ‘Trusty’. My list of games:

  • Team Fortress 2 (steam)
  • Torchlight II (steam)
  • Faster Than Light (steam)
  • System Shock 2 (steam)
  • Postgres 9.4 (not steam)
  • Minecraft (not steam)

For all the steam games, the solution is pretty simple:

1
sudo apt-get install -y steam

The magic sauce on the above command is the ‘-y’. If you are a software develop that uses git, you can think of ‘-y’ as same type of ‘Getting Things Done’ shortcut as ‘-f’. Once you get steam installed, it is just a matter of clicking all over the place to get the above games installed.

For playing Postgres, it is a little more involved:

1
2
3
4
5
6
sudo echo "deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main" > /etc/apt/sources.list.d/pgdg.list
wget --quiet -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install pgdg-keyring
sudo apt-get update
sudo apt-get -y install postgresql-9.4 postgresql-contrib-9.4

I understand that the above can be a bit intimidating or perhaps your RSI might be flaring. In that case, you can run the following:

1
2
3
sudo bash <(curl -s https://raw.github.com/pgexperts/add-pgdg-apt-repo/master/add-pgdg-apt-repo.sh)
sudo apt-get update
sudo apt-get install -y postgresql-9.4 postgresql-contrib-9.4

The above is running a random shell file from an elevated bash prompt. This could be a potential security problem, so always make sure you’re calling random shell files with https.

And lastly, we need to install minecraft.

1
2
sudo add-apt-repository ppa:minecraft-installer-peeps/minecraft-installer  
sudo apt-get update && sudo apt-get install minecraft-installer  

Quick and easy! Now, if you play the same games as me, all your games should be ready to go. The only thing left to do is the trivial task of making sure your video, network, display, sound, and peripheral drivers all work without problems.

Personal Twitter Tips for Maximum Happiness

Tip 1: Don’t follow verified people.

Don’t follow verified people/accounts if possible. Someone who has done enough ego stroking to become verified, likely has a fair number of followers. Let those followers filter out the good stuff for you. Rest assured, if one of those verified accounts says something good, it will surface in your timeline..

Tip 2: Mute hashtags often

Amongst the chunks of good stuff on twitter are snowballs of drama and conferences/events you are not interested in. Luckily, most of these occurrences try to develop ‘branding’ with a hashtag. Mute them. In the event that muting is too much work, just simply logoff and do a puzzle or read a book.

Tip 3: Unfollow as easily as you follow

Sometimes you will have a friend or someone you know change careers. They might start consulting or some other endeavor that requires them to ‘brand-build’, self promote or just generally create a lot of noise. Unfollow them. If it is someone you have a close relationship with, such as a spouse or something, just mute them. If you find yourself following someone that is generally pretty negative, unfollow them unless they do it in a particularly clever manner.

Tip 4: Don’t use the twitter website.

Find a twitter client that stops the promoted tweets, ‘we think you like this’ tweets, blue-line/repeated conversation tweets, and the inane trending topics stuff. The amount of noise on Twitter increases with every passing day and you should be using a tool that increases the signal. I’m personally very happy with Tweetbot, but I’m sure other quality options exist. If you get very desperate, use the mobile version of twitter: http://m.twitter.com More times than not, it is better and faster than the main twitter site.

Always Get an Absentee Ballot

Since living in Oklahoma, I’ve found one of the absolute best ways to vote is by getting registering for an absentee ballot. Oklahoma has pretty lax laws on this sort of thing so it’s easier than you think. In addition, having an absentee ballot doesn’t mean I mail in my vote. I still go to the poll, hand in my absentee ballot so they can destroy it, and vote like a normal downtrodden citizen.

What do I even vote for:

Oklahoma is a closed primary state. What this means is as a registered Independent, I’m only allowed to vote on a subset of items. Having a ballot sent to me when I’m allowed to vote is a great time saver vs meandering the UI of a government website. It’s also really handy when you’re talking to a politician to let them know that you’re not allowed to vote and let them move on to the proper class of citizenry.

Voting Matters:

When it comes to Federal politics, people will often lie to you and tell you that your vote matters. What they should really be telling you is that your vote only matters when it comes to local laws that actually affect you. For example, in Oklahoma, a 1% tax increase can pass with as little as +/– of 8 votes. While talking about if flag burning is treasonous is quite fun, taking a few moments to control how your income is distributed in your community is likely more patriotic.

Lobbyotomy:

Being able to see my entire ballot before voting day lets me take 45 minutes or so to independently research information on the actual proposals on the ballot. Previous to this method, while I knew about some of the popular issues due to high dollar campaigns trying to blugeon my vote, there would always be a few items on the ballot I had no idea about. I found this incredibly disheartening. The beauty of the absentee ballot is that I’m not longer surprised, second guessing, and leaving blank ballots because I can’t quite parse the ridiculous lawyer speak.

Enums for When You Just Cant Handle Another Foreign Key

If we get over the notion that all data is created equal, it opens up options for us. For example, your DBA wants some sort of data consistency and integrity… both very good things but you are just making a quick little 1 off table that you don’t want to sprawl to 4 or 5 tables with foreign keys.

enum my heart

For these situations, I like to just use postgres enums. Check it out:

create type verdict as enum ('gross', 'can eat', 'amazing');

create table burritos (id serial not null primary key, title varchar(50), thoughts verdict not null);
insert into burritos (title, thoughts) values ('road kill burrito', 'yummy');

insert into burritos (title, thoughts) values ('road kill burrito', 'gross');
--Complete Failure

insert into burritos (title, thoughts) values ('grande queso burrito', 'amazing');
--Complete Awesome

Sweet! We’re getting the data we want in, it looks legit, and there is no querying of multiple tables to get the values back. Another benefit to this is that AppDev isn’t continually questioning if they set up the FK correctly.

blowing in the winds of change

But how maintainable is it? Say we had a less than ideal experience… one that requires a lawyer. need to add to our list:

alter type verdict add value 'lawsuit' before 'gross';
insert into burritos (title, thoughts) values ('putrid plate', 'lawsuit');

That was pretty easy, though I must say I’m a little confused on exactly which values I have in my enum. Good thing we’re in a database and can just query it!!!:

select t.typname as enum_name,  
       e.enumlabel as enum_value,
       n.nspname as enum_schema
from pg_type t 
   inner join pg_enum e on t.oid = e.enumtypid  
   inner join pg_catalog.pg_namespace n ON n.oid = t.typnamespace
where t.typname = 'verdict' and n.nspname = 'public'

re-fear-factor

If you are ok with the above code… then good on you. However, we might want to populate a dropdown or something else in our application based on these values. Neither your typical Rails Dev or ActiveRecord itself is going to try to implement the above without tears of sadness streaming down to their Mac. We can build a bridge by just making a simple view and assigning a self.table_name = to the following:

create view my_lovely_enums as 
select t.typname as enum_name,  
       e.enumlabel as enum_value,
       n.nspname as enum_schema 
from pg_type t 
    inner join pg_enum e on t.oid = e.enumtypid  
    inner join pg_catalog.pg_namespace n ON n.oid = t.typnamespace
where t.typname = 'verdict' and n.nspname = 'public';

Hardcoding ‘verdict’ like that creates a pretty brittle where clause. I tend to remove that part and go a little more flexible with a predicate/scope to select your enum of choice:

create or replace view my_lovely_enums as 
select t.typname as enum_name,  
       e.enumlabel as enum_value,
       n.nspname as enum_schema
from pg_type t 
    inner join pg_enum e on t.oid = e.enumtypid  
    inner join pg_catalog.pg_namespace n ON n.oid = t.typnamespace;
select * from my_lovely_enums where enum_name = 'verdict' and n.nspname = 'public';
"verdict";"gross";"public"
"verdict";"can eat";"public"
"verdict";"lawsuit";"public"
"verdict";"burrito dreams";"public"

For some more information on this topic, check out a lovely write up at postgresguide.com. The bottom of that article has this handy little buyer-beware:
“At the time of this writing, Postgres does not provide a way to remove values from enums.” While this approach can’t be used for all situations, using it where you can get creates a tremendous amount of simplicity in your app without compromising integrity – affectionally known as a win-win.

I Am a Bad Blogger

I’ve never really thought about how bad of a blogger I am. I mean, I always knew deep in my heart I wasn’t very good, but a few conversations over the past few months have really solidified just how bad I am. My confession begins.

Some Person in Somalia has Your Content

Over the years, I’ve seen a lot of emotional breakdowns and anger when someone sees their content reposted on bottom feeder sites. Some people have even pointed out to me when things from this blog appear on there. Each time this has happened, I’ve been unable to get angry or motivated enough to try to get it removed. I remember some people like Tom LaRock putting ridiculous “If you are not at thomaslarock.com, this material has been plagiarized by someone who steals content from others.” Tynt scripts into his publicly pasted query and just being bewildered by the drama surrounding these episodes. For this lack of sharing without conditions and unwillingness to feign vitriol at a bot, I’m a bad blogger.

Always Be Tweeting

I don’t tweet or retweet my same entry multiple times a day. I blame this partly on the fact that I’m not a consultant nor have the self esteem issues needed to shamelessly market and remarket myself over and over. Another thing I do very poorly is submit a post to Hacker News and then stand on the tweet corner begging for upvotes. Because of this lack of pandering and repetitious pollution, I am a bad blogger.

Thanks for Writing My Opinion

me: ‘Your post isn’t any different than the press release.’
blogger: “That is what they told us to say.”
To this day, the above conversation has never made sense to me. I’m always fascinated by people who do a repost of a marketing piece given to them and then celebrate what a prolific blogger they are. This apathy towards a raw number of blog posts as well as not being an echo chamber for a company on this blog is completely my fault and I apologize.

A Number of Reasons

“How many readers do you have?”, “How do you tweak your SEO?”, “How often do you reblog your most popular post?”, “What is your peak time of day to release content?” and on and on. Barf. Some people think I’m just joking when I say I have no idea on the above questions, but it’s the truth. More than pretty charts and clicks per minute, I tend to be more inspired by someone just saying ‘hey dude, I checked out that thing on pgbouncer and it worked for me.’ or being stuck on a smtp configuration in SSRS and searching this site for it. While this does cement that I’ll never be an elite blogger, it does let me take acceptance that I’m just a bad blogger.

Truncate Table Opinions

There are a lot of reasons to blog, and for the most part mine are incredibly selfish. I blog to try to remember things, I blog so I can reference certain posts later… I blog to help put my thoughts into words and refine an opinion on something. The archive on this site might deceive you, but I tend to ‘blog’ at least once a week. A great majority of these never get published, but there is something therapeutic and reinforcing about writing something out or laying out a process into steps. If you want to get technical, you could say that I keep a journal that I sometimes publish out of and I wouldn’t fight you on it. When you stop caring about others opinions or turning things into a popularity contest, you might be surprised at how easy “blogging” starts to become and how helpful it get in an industry that doesn’t slow down.

  • This was a 10 minute journal entry that I just sort of decided to hit publish on. Goal was to have something to link when I get stuck in these silly conversation.
  • Followed by 30 minutes or so of fixing typos because I’m illiterate.

The AppDevs Are Idiots

In my many years of Enterprise DBA servitude, there would be many instances of blanket ‘the devs are idiots’ statements. Sometimes by me, sometimes by others and from the perspective of the database it was usually justifiable. In these enviornments, this is acceptable. I have no idea what busy work task they have come up with and why should I even care about anything out side of my database? These AppsDevs are disgusting dirty creatures that just sling filth everywhere as they stomp their hooves on the keyboard. Worthless.

Inner Joined at the Hip

What I enjoy most about the above mindset is the expectation of the AppDev to know their job and be equally proficient at my own. They must be fluent in the language of Stored Procedures else everything collapses from the data hate speech that comes from their object-pooriented programs. I’ve often wondered where this one sidedness originated from. For a bit I thought it was just something that happened at places I worked. It’s not just there though. Check out your local communities and you’ll see the imbalance there as well. I attend both database user groups and developer based user groups. I often see a fair amount of AppDevs at sql events… I pretty much never see DBAs at developer events.

Cross Apply What You Know

A few years back, I got involved in with a few open source .NET ORMs. It goes without saying that ORMs are the Vietnam of the software world and I was pretty much blown away at how desperate these projects were for feedback. It was so easy and fun to help out and it totally changed my opinion of ORMs. It also changed my opinion of the curmudgeoned DBA. If you want some contrast/perspective, I’ve also helped out a few ‘open source’ database projects by their prefered method – email.

Refactor the Query

The more time you spend with AppDevs the more you realize just how many of them are stuck with really bad DBAs. Now when I hear DBAs complain I ask the following of them: What open source work do you do? What dev conferences do you attend with the team? What have you personally built? What source control do you use so that the team can see your scripts and ask about them?

You don’t have to be Nostradamus to know how the people I’m talking about answer the above questions. Like so many things with humans, the loud ones with the most generic argument are typically doing the least to fix the problem. If you are an Oracle/SQL Server DBA, don’t just hide behind our industry specific firewalls – get out there. Engage the AppDevs, review their code, help some project, make your scripts public for them to see. We’ve sat in our foxholes long enough and the AppDevs are not our enemy. It’s our turn to build the bridge.

Getting Started With PGBouncer

One of the issues we face at Raisemore is a continuous flux in our number of connections to our primary Postgres database. If our load increases and we have to spin up more front end servers or spin up more backend servers. This can quickly wreck our connection limit. One solution, is for me to babysit the connection size on the pg server itself… or perhaps I can give the Apps the middle finger and starve their connection limit – neither of these lead to happy outcomes for everyone involved.

apt-get install happiness

A better compromise is to just put a connection pooler like PgBouncer in front of our postgres database. In the same way we proxy our applications with Nginx, we can apply the same concept to our database(s) with Pgbouncer. Let’s get started:

--Assuming your pg database is already up and running
--databasename = burrito_hq  username = elguapo
sudo apt-get install -y pgbouncer
sudo nano /etc/pgbouncer/userlist.txt
> "postgres" "postgres"
> "elguapo" "hefe"
sudo nano /etc/pgbouncer/pgbouncer.ini
> [databases]
> burrito_hq = host=localhost port=5432 dbname=burrito_hq user=elguapo password=hefe
> [pgbouncer]
> listen_addr = *
> auth_type = md5
> admin_users = postgres
> stats_users = postgres

Above are some of the settings I changed to get started. As you get more familiar with Pgbouncer, change the various settings as you see fit and for your workload.
Let’s test our settings by firing up PgBouncer. PgBouncer is closely tied to the postgres user on the server, so we’ll ‘switch user’ into the postgres account to crank it up:

sudo su postgres -c"pgbouncer -d /etc/pgbouncer/pgbouncer.ini"

On our client machine, lets attempt a connection to our server which has an ip of 172.16.150.128:

psql -h 172.16.150.128 -p 6432 -U elguapo -d burrito_hq 

and hopefully we’re in! Once we know our settings are in good shape, lets turn PgBouncer auto start mode on:

sudo nano /etc/default/pgbouncer
> START=1
sudo reboot #reboot the box to test it out

When the server comes back online, hopefully you can connect again from the client. Now that the server is stable again, we can go get completely lost in the documentation and let the tool really shine.