A Data Exploration of Jeopardy! from 1984 to the Present

Brian Hamilton


This project is an analysis of a dataset I created by scraping the HTML from the crowdsourced Jeopardy! archive J! Archive. Read more about the impetus to this project on the About page and the methods I took for building the dataset, analyzing it, and visualizing it on the Methods page.

Topic Modeling

Using MALLET, I built a topic model for the clues on J! Archive. Below is a visual representation of the change in topic usage over the course of the show's run. My hope from this is to see trends in topic usage. For example, I noticed that "History/War" spikes to its highest usage during the Gulf War.

Hover over the line on the graph or the topic key in the legend to isolate each individual topic for viewing. You can also click on the line or legend text to hold it in place.

State-by-State Contestants

While looking at the data I collected from the archive, I noticed that most contestants were identified by their name, hometown, and profession. I stripped out their local town and isolated each one that was from the fifty U.S. states and the District of Columbia. From here, I took a count of how many came from each state and how many won at least one game, making sure to only count each contestant once.

After I had this list, I compared it to population estimates made by the U.S. Census Bureau for 2019, creating a proportion of number of contestants/winners to the total population. Below, I created a map of the United States and the District of Columbia, shading each state by the size of the proportion; the darker shade, the larger the proportion, and vice versa. Use the two buttons to switch between the two views and hover over each state to see more detailed information.

Gender in Jeopardy!

Past studies of gender in Jeopardy! has taken a few different forms over the years. They have either looked at the different kinds of topics men and women may have chosen to respond to or how they act during the game. While J! Archive does not include a reference to a contestant's gender in their description, I was able to use the third-party API Genderize.io to determine the most likely gender from their first name.

Gender By Series

Before looking at more in depth analysis of gender in Jeopardy!, it's important to know how many men and women have actually been on the show. I grouped together contestants by their gender (removing any contestants that Genderize.io was not able to determine) and took a count of each. I created a donut char for those numbers, showing that 42.5% of contestants are women and 57.5% are men. These numbers line up almost exactly with studies that used a small number of episodes to conduct their research.

Hover over the slices or the legend to get a more detailed view of the numbers.

Gender By Episode

During an actual episode of Jeopardy!, there are three contestants competing against each other. One of them is the winner from the previous episode (if there was one) and the other two are new contestants. I was able to determine the breakdown of those three contestants by gender, only including episodes where all three contestants had a known gender. A makeup of two men and one woman takes up the majority of episodes, with all three being women happening the least. Below is a donut chart showing the breakdown.

Hover over the slices or the legend to get a more detailed view of the numbers.

Are Topics Gendered?

I wanted to see if the topics that I generated using the topic modeling were gendered--as in, are certain genders more likely to both know the answer and get it right. My database includes information on which clues a contestant got right and which they got wrong. I grouped this all together by gender and did a chi-squared test to see if the difference between the genders is significant.

Use the select field below to update the stacked bar chart and show the results of the chi-squared test for each topic. Hover over the bars to get more detailed numbers.


Critical Value



Who Wins?

By the end of an episode, the contestant with the most money is declared the winner. They are allowed to come back to the next game and defend their win. Some are good enough to win multiple games and continue on and on until they lose. I grouped together the data in a way that shows the gender disparity in number of wins.

There is some disparity in the data between this and the total number of contestants by gender. Some games on J! Archive (and by extension, my own database) are incomplete. I did my best to account for this by having a boolean field in the database for indicating that the game had an unknown winner. Even with that, some contestants may be counted twice. However, it should be a very small number.

I created a stacked bar chart to show this breakdown. Hover over the bars to get more detailed numbers.

And Roughly How Much Do They Win?

For each game, I looked at the final total for the winning contestant. Grouping by year and gender, I took the median amount to determine a how much was roughly won each game. A time series chart below shows this relationship. Hover over the lines or the legend to isolate the lines.

The three most obvious points of reference are 2001-2002, 2004, and 2019. Between 2001 and 2002, the dollar amounts for each clue were doubled. During 2004 and 2019, two of the most successful contestants competed; Ken Jennings in 2004 and James Holzhauer in 2019. James Holzhauer was known for winning by a large margin, racking up a lot of money each game. 2011 and 2019 also saw an increase in median winnings for women. However, while I looked for a specific contestant that could've caused this, there were no long-term female winners in those years. The most successful female contestant, Julia Collins, competed in 2014--a year with no major change.