Hashes
We've just met a very powerful object for storing information in the program world - instances of the Array
class.
Arrays are great for storing objects, but they get harder to understand the more objects they contain. For example, what does each element in the following array mean?
important_program_information = [0, "Hello", ["Tommy", 1.17]]
Moreover, reading elements from arrays using []
gets harder to understand as the array grows in complexity:
teams_with_substitutes = [[["Jim", "Yasmin", "Audrey"], ["Alex", "Mustafa"]], [["Pyotr", "Canace"], ["Xi"]]]
team_1_substitutes = teams_with_substitutes[0][0][1]
team_2_players = teams_with_substitutes[1][0][0]
And it's a pain to read. Remember how important naming is for helping other programmers understand your program - what does [0][0][1]
mean? How is it different from [1][0][0]
?
Is there a solution? Sure there is, the Hash
class and its instances 'hashes'.
From arrays to hashes
Arrays and hashes are similar in that they both contain lists of elements. The main difference is that:
- The elements of an array are identified only by their location within the array (their index).
- Elements of a hash are identified by a key.
Remember how variables are names for objects? In programming, we sometimes refer to these names as keys and the objects they reference as values. Together, they make a 'key-value pair'. A hash is a collection of key-value pairs.
Arrays actually work similarly, but they use indices as their keys. That is, the first element of an array has an index of 0
. The second has an index of 1
. Given an array, you can ask it for the value with key 0
in the following way:
# Given an array
array = [1, 2, 3]
# Read value with key 0
array[0]
And you can tell the array to set the value at key 0
in the following way:
# Given an array
array = [1, 2, 3]
# Set the value with key 0
array[0] = 999
# Return the array
array
So, for arrays you can only use integers as keys. 0
, 1
, 2
, and 550
are all valid array keys.
For hashes, we commonly use strings as keys instead:
favourite_things = { "sport" => "tennis", "food" => "chunky bacon" }
Even more commonly than strings, we'll use symbols. Symbols are a special and very widespread object in Ruby. They are similar to strings, except they're immutable – they can't be changed once they've been set. To write a symbol, add a semicolon :
before its name :my_symbol
.
Since we rarely want to change the keys in a hash, symbols are a perfect choice:
favourite_things = { :sport => "tennis", :food => "chunky bacon" }
Though symbols and strings are most commonly used, Hashes can use any object as a key:
hash = { ['an', 'array', 'object'] => 1, 44.2 => 2, Object.new => 3 }
Values inside a hash are accessed in the same way as arrays, with the []
function. This is why strings and symbols are most common, they make our code easy to read:
favourite_things = { "sport" => "tennis", "food" => "chunky bacon" }
favourite_things["sport"]
We can also update them in a similar way to arrays:
favourite_things = { "sport" => "tennis", "food" => "chunky bacon" }
favourite_things["food"] = "pizza"
You can ask a string to fetch its equivalent symbol very easily, by sending the string the message
to_sym
(like howto_f
worked for integers and floats).
So, this is the first function of hashes - as a key-value store for named information.
This kind of key-value store in programming is sometimes called a dictionary, map, hash map, lookup table, or an associative array. In Ruby, they're known as Hashes.
Using hashes to control the flow of information
One major value of a hash is that it can be used to refactor a conditional, especially if that conditional is getting too long. Here's an example procedure. The object running this procedure berates you if you curse at it:
curse = "dang"
if curse == "damn"
return "That's a curse word! How dare you"
elsif curse == "dang"
return "That's a less bad curse word! Still, how dare you"
elsif curse == "darn"
return "Hmm, I'm mildly offended but I'll survive. Watch your language!"
elsif curse == "durn"
return "Ahh, that's good Southern swearing, that is!"
end
We can refactor the above example to use a hash:
curse = "dang"
# First, set up the options
beratings = {
"damn" => "That's a curse word! How dare you",
"dang" => "That's a less bad curse word! Still, how dare you",
"darn" => "Hmm, I'm mildly offended but I'll survive. Watch your language!",
"durn" => "Ahh, that's good Southern swearing, that is!"
}
# Second, do a lookup on the beratings hash
beratings[curse]
This is a pretty powerful technique!
- Using the technique above, implement a program to the following specification without using a conditional.
I want a simple dictionary. I put in the word, and I get out the definition of that word. Here are the definitions I want:
- bear: a creature that fishes in the river for salmon.
- river: a body of water that contains salmon, and sometimes bears.
- salmon: a fish, sometimes in a river, sometimes in a bear, and sometimes in both.
In general, programmers try to minimise the number of conditionals in a program. This is because such pathways can quickly multiply in number (especially if a lot of programmers are working on the codebase). Each pathway can lead to a new program state, which results in more program states than any one programmer can reason about. This article is a fantastic case study into what happens when programmers can no longer reason about the complexity of their codebases. Lookup tables are one way to reduce conditional complexity in a program, by centralising possible program states. In Chapter 9, we'll see that we can use procedures as values to lookup tables, which allows us to control program state even more tightly.
Grouping things in hashes
One other useful application of hashes is grouping. Grouping makes use of the fact that we can set the value of a hash key to an array:
favourite_things = { :films => ["Hackers", "Titanic", "The Matrix", "CATS"] }
Once we've used the key to get the value from the hash, we can use all our regular array methods to read elements, add elements, remove elements, and so on:
favourite_things = { :films => ["Hackers", "Titanic", "The Matrix", "CATS"] }
favourite_things[:films].delete_at(0)
favourite_things[:films].push("Now You See Me 2").push("Citizen Kane")
favourite_things
- Group the array of hashes below into a hash, where each key of the hash is a sport, and each value of the hash is a list of names of people who play that sport.
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
This is a very tricky program to pull off. It's actually one of the things I find hardest in all programming, because it feels like I have to keep track of so much at once: what keys mean what, which values are where, and so on.
So no worries if you find this tough! I do too, and we're in good company. We can make life easier for ourselves by moving in small steps.
Let's decompose the specification into requirements.
- Set up a new hash (
sorted_by_sport
will do as a name for now). - Go through the players.
- For each player (
player_under_consideration
will do as a name for now), take a note of the sport they play (sport
will do as a name for now). - Check the
sorted_by_sport
hash. Ifsport
does not exist as a key on that hash, setsport
equal to an array containing the name ofplayer_under_consideration
. Then move to the next player. - Else, if
sport
DOES exist as a key on the hash, push theplayer_under_consideration
's name into the array that already exists. Then go to the next player.
When making these requirements, I actually went back-and-forth a bunch. The first set I tried didn't work, so I had to come back. Then the second set worked up to 3, but broke at 4. I ended up with this set of 'perfect requirements' third time round. So I didn't just 'come up with this' on the spot. Programmers rarely crank out known code - 'back and forth', deleting code, zooming in and out is all normal.
OK, this should get us started. Step-by-step.
1. Set up a new hash
- Set up a new hash (
sorted_by_sport
will do as a name for now).
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
sorted_by_sport = {}
2. Set up a loop on the players
- Go through the players.
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
sorted_by_sport = {}
players.each do |player_under_consideration|
end
# Let's return the sorted_by_sport hash so we can see it in the REPL too
sorted_by_sport
3. Name the sport each player plays, using a variable
- For each player (
player_under_consideration
will do as a name for now), take a note of the sport they play (sport
will do as a name for now).
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
sorted_by_sport = {}
players.each do |player_under_consideration|
sport = player_under_consideration[:sport]
end
sorted_by_sport
4. Set the key on sorted_by_sport
if there isn't one already
- Check the
sorted_by_sport
hash. Ifsport
does not exist as a key on that hash, setsport
equal to an array containing the name ofplayer_under_consideration
. Then move to the next player.
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
sorted_by_sport = {}
players.each do |player_under_consideration|
sport = player_under_consideration[:sport]
# I think I'll also give a variable for the name, since we did with the sport
name = player_under_consideration[:name]
if sorted_by_sport[sport] == nil
sorted_by_sport[sport] = [name]
end
end
sorted_by_sport
5. Push the player name into the key on sorted_by_sport
if the key already exists
- Else, if
sport
DOES exist as a key on the hash, push theplayer_under_consideration
's' name into the array that already exists. Then go to the next player.
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
sorted_by_sport = {}
players.each do |player_under_consideration|
sport = player_under_consideration[:sport]
name = player_under_consideration[:name]
if sorted_by_sport[sport] == nil
sorted_by_sport[sport] = [name]
else
sorted_by_sport[sport].push(name)
end
end
sorted_by_sport
6. Refactor
I'm going to tidy up these names first, to make them a bit terser without losing clarity:
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
players_by_sport = {}
players.each do |player|
sport = player[:sport]
name = player[:name]
if players_by_sport[sport] == nil
players_by_sport[sport] = [name]
else
players_by_sport[sport].push(name)
end
end
players_by_sport
I can get rid of an else
here, reducing the conditional complexity:
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
players_by_sport = {}
players.each do |player|
sport = player[:sport]
name = player[:name]
if players_by_sport[sport] == nil
players_by_sport[sport] = []
end
players_by_sport[sport].push(name)
end
players_by_sport
Since I'm only using name
once, it doesn't feel that helpful to keep it around:
players = [
{ :name => "Sam", :sport => "tennis" },
{ :name => "Mary", :sport => "squash" },
{ :name => "Ed", :sport => "tennis" },
{ :name => "Mark", :sport => "football" }
]
players_by_sport = {}
players.each do |player|
sport = player[:sport]
if players_by_sport[sport] == nil
players_by_sport[sport] = []
end
players_by_sport[sport].push(player[:name])
end
players_by_sport
This feels much more readable.
Iterating over hashes
When we iterate over a hash, we get both the key and value as parameters to a procedure (since the elements of a hash are key-value pairs):
my_favourite_things = { :sport => "tennis", :music => "classical" }
my_favourite_things.each do |key, value|
# Do something with key and value
end
Remember, just like array
each
es, you can name the parameters whatever you like. Just like theeach
parameter for an array represented each element – no matter what you called it – the first parameter for a hasheach
will be the element key, and the second will be the element value.
Complete the mastery quiz for chapter 8
Use your mastery quizzes repository to complete the quiz for chapter 8.