The write-up below isn't exactly the same as the one found in the Data Sketches book. For the book we've tightened up our wording, added more explanations, extra images, separated out lessons and more.
Starting off our collaboration with the theme movies as long as there was some personal connection to our angle on the topic.
I started with a general search of movies, to get a feeling of what might be out there. I came across budget and gross information per movie pretty quickly, found my way to the OMDb API and IMDb ftp, where you can download huge files with all of the information on movies & series. Having access to such a large databases seemed like a very interesting angle to start working with, but it wasn't very personal.
So, instead I started looking for data on my favorite movie (trilogy); the Lord of the Rings. I still remember, as a 12-year old girl, waiting at the movie theater more than 1.5 hours in advance with my parents to get the best spots in the theater (it was still first come, first serve back then). Collecting magazine clippings and posters and later fondly watching all of the extras (even for a second time). It's more than 9 hours of film that I can watch year after year.
I found a fascinating dataset about the number of words spoken by each character in the extended editions of all 3 films. I did a few checks, comparing the word count to scripts available on Age of the Ring and they coincided pretty well.
In the data, there is information on the number of words spoken by each character by scene and what race that character is. However, I found scenes to be a bit arbitrary. They are more attached to the making of the movie, not the movie experience. So instead I went ahead and manually added an on-screen location to each of the ±800 rows of data.
Besides a map of Middle-Earth I relied heavily on the Age of the Ring scripts of the extended editions and the original scripts of the non-extended editions found on IMSDb. These scripts sometimes mention the location when they talk about the scene in general. And of course, I used my own memory of watching the movies time and time again.
It took a few hours, but afterwards it felt like a dataset that I had a personal connection to and wanted to visualize.
Having just bought an iPad Pro 9.7" with Apple pencil I of course wanted to try it out for my dataviz sketching. This month I used the app that most charmed me, Tayasui's Sketches II.
About a month ago I got an email from Christian Wisniewski with a sketch that looked a bit like a Chord diagram but with "nodes" in the center. It seemed very intriguing and since I have a fond history of hacking the chord diagram for other purposes I wanted to try to create my own version of Christian's idea at some point. While going over ideas for the LotR data in my mind, I couldn't help but think that this dataset would fit that purpose very well.
For my dataset, the Fellowship characters would be placed in the center. The more general locations are the arcs around it. Each character would be connected to the location where they spoke and the thickness of the chord at the location would represent the number of words spoken there. I'm not quite sure if the detailed locations should provide a more detailed level within the arcs themselves. That will probably create too many chords...
I tried coming up with some other ideas, but I guess I was already sold by the chord diagram-ish one, because all of my other sketches didn't amount to anything worthwhile. There was something with a timeline; when was the scene was taking place and the number of words spoken by the character.
Another one where each location would a spirograph, with the number of petals being the number of detailed locations and sized in total to the number of words spoken.
Sketching on the iPad was quite fun. It was very easy to combine techniques, move things around, undo things. One minor downside was that I don't have the same amount of control over my pen in terms of drawing the longer strokes exactly as I want to. Perhaps it's just a matter of getting used to the slippery surface.
But definitely going for the first sketch!
I started with Mike Bostock's most basic d3v4 chord diagram and a plan on how to turn it into something resembling my sketch. I dove straight into the source code of d3's chord and ribbon functions to understand exactly what was happening in each line of code. Luckily, the code is very compact and easy to understand. I then started making small changes, each step taking me closer to the visual on my sketch.
I still very much like the look of all strings flowing to the center, but I had 9 members of the Fellowship to place there. So I started introducing a vertical offset based on the character that belonged to each string. The first successful result reminded me to work on string sorting as well to reduce the overall overlap.
One specific website helped me a lot during this project, an online Cubic Bézier Curve adjuster, with which I could try out the locations of the handles with respect to the end points of a path to understand how to create more elegant S-shaped curves. I went through many, many tweaks of shapes, each one getting a bit closer to the most smooth transition between the outer arcs and the inner characters.
There was definitely a need to create empty space above and below the inner section. Not only to have a less squished feeling, but also to give the strings in that region the room to really flow in a nice S-shape. I had some experience in this when I pulled apart a normal chord diagram to visualize a flow. This time I made sure that I didn't have to make some sort of dummy string that would need to be hidden (as was the case in the flow chord diagram).
Now that the biggest challenges of the new layout were behind me, I also started looking at the design a bit. Using screenshots of the movies to set a color for each location and finding a Google font that best matched Middle Earth.
After I got the stretching of the two halves visually where I wanted it, I ranked the locations on order that they (first) appear in the movie or where their most important scenes are taking place. This thankfully still divided up the two halves almost symmetrically. Even more string shape adjustments and color tweaks later and I ended up with the final result that you see below.
One very important thing that I didn't truly start on until the very end was interactivity. With its many strings this layout lends itself very well to be inspected in more detail through hovers. For example, when you hover over a character, the number of words of all the locations adjust to show the count for that character. Shirley gave a good suggestion to fade out the locations where the character hasn't said anything. I also implemented it vice versa, when you hover over a location, the characters that haven't said anything are dimmed. And finally, to help you get some fun insights from the results, I added a short note per character that shows up when you hover over the Fellowship members
Compared to my original sketch there where two main things that I had to let go. I felt that with the more general location there were already too many strings. Therefore, I didn't even try to split things out to their more detailed location. I only used that info to guide some of the character specific notes that you see on a hover.
I downloaded a whole bunch of LotR inspired fonts to use in the visual. I tried my very best to find the correct translations of the locations and the inscription in the ring. Although visually nothing major has changed, I do feel that adding the right fonts makes it intuitively more LotR.
See the fully interactive version here. I hope you'll find some new insights. I for one never realized Sam spoke so much. Or that Legolas spoke even less than Boromir.
It was quite a lot of fun working with a dataset about a topic that I love. I had to hold myself back several times throughout this month from not just laying my JavaScript files aside watching the whole trilogy again (*≧▽≦)
PS: There's also a high-quality giclée print available of this visualization in my online shop!
For our very first Data Sketches project we chose the topic “Movies”, and we wanted to have some sort of personal connection to whatever dataset we’d end up visualizing. Starting with an open mind I went ahead and did a general search of movies to get a feeling of what might be out there.
I quickly came across budget information per movie, and found my way to the OMDb API and IMDb Datasets where you can download huge files with lots of information on movies and series. Having access to such large databases seemed like a very good starting point, but I wanted to make the process more personal and relatable.
So I decided to search for data on my favorite movie trilogy: the Lord of the Rings (LotR).
With the popularity of the movies, I was quite surprised that I couldn’t find any structured datasets about them. Thankfully, after digging through more search results, and using variations of the search query “Lord of the Rings dataset” on Google, I found a fascinating dataset in a GitHub repo with the number of words spoken by each character in each scene, in all three extended(!) editions. How amazing is that?! (ノ◕ヮ◕)ノ*:・゚✧ I did a few manual checks, comparing the word count in the dataset to scripts available online, and they coincided pretty well. In this case I didn’t need a perfect match, since I was more interested in the aggregated results.