layout | title | cover-img |
---|---|---|
page |
Lights, Camera, Inequality: The Story of Women in Cinema |
/assets/img/feminism_home_page.jpg |
Cinema is so much more than just entertainment. It’s a mirror (albeit sometimes a distorted one) of our society, its norms, its values… and, unfortunately, its inequalities. Since its early days, women on screen haven’t always had the spotlight: often confined to secondary or stereotypical roles, they’ve long remained in the shadows of male characters, leaving us to wonder: How much has truly changed? Did we truly manage to break free from old patterns, or did we just rewrite an old script with a superficial modern flourish? Through data, we’ll explore the evolution of female representation in movies, unraveling trends, asking: Are there more significant roles for women today? Are these roles nuanced, or do they still fall into familiar tropes? And what even is a “feminist” movie?
To find out, we’re going to dig into our data over 80,000 films strong. We’ll count actresses, see if they outshine their male counterparts, analyze film themes, and even measure how many pass the famous Bechdel test. We’ll also compare trends across countries and genres and attempt to uncover if feminist blockbusters even exist ?
And finally, we’ll try to answer the big question: What exactly makes a “feminist film” ? By following the threads of data, we hope to find answers… or maybe even uncover new questions.
Buckle up for a journey through the history of cinema and what it reveals (or doesn't) about the role of women on screen.
Movie cover of "All about Eve", widely considered as one of the best feminist movies of all timeOur story begins with the most evident metric: proportion of actresses to actors per year and film. It’s not the most exciting place to start, but hey, we’ve got to begin somewhere... So, what does the evolution of female numerical representation look like?
{% include_relative assets/plots/nb_actors.html %}
Although the number of female actors per movie seems to have increased over the year, it remains lower than the number of male actors on average since 1900 (there are on average twice as many male actors on screen as female actors). But, one may ask, are things different depending on the country, accounting for culture variation?
{% include_relative assets/plots/female_to_male_actor_ratio_per_country.html %}
Looking at the female-to-male ratio among actors in the top five movie-producing countries since 1920, the picture isn't great for women with the ratio usually staying below 1. Some big swings in the early decades can reflect the social and economic changes of the time. Since the 1960s, things start to level off at quite a low ratio and unfortunately, there's been no real progress since. There is some variation across countries with for example France and the UK that occasionally have slightly better ratios than the US and India. Overall, this plot paints the picture of a slow, uneven progress when it comes to the representation of women in global cinema.
But maybe we are just lacking perspective, maybe this percentage swings in favor of actresses for certain film genres ? We’ve grouped the films into different categories. Let’s take a look...
{% include_relative assets/plots/repartition_genre.html %}
Well... actresses are still in the minority, no matter the genre or the era ! This under representation seems to plateau between 25% and 40% with an average of just 31.44% of female actresses per movie !
But the number of characters is not all, it is about who they are too. We do not have the age of all the characters, but we do have the ones of their actors. Have you ever wondered what is the most common age of actresses when cast in movies? Are they younger, older than their male co-actors? If trend there is, did it evolve with the decades? Let’s take a look:
{% include_relative assets/plots/actors_age.html %}
As you can see, regardless of the period, women on screen tend to be younger than their male counterpart. This can be seen with some explicit examples such as in "Pretty Woman" from 1990 where the male Lead (Richard Gere) was 41 years old and the female Lead (Julia Roberts) was 23 years old. This classic romantic comedy has an 18-year gap between the leads. While it's a fictional story, the age disparity became a hallmark of the genre during that era.
What could explain such a noticeable difference in averages ? Could it be related to societal expectations, casting preferences or even the types of roles offered to women compared to men. This pattern raises intriguing questions about representation and the evolution of gender dynamics in the film industry. Let us investigate women's representation in movies to deepen our understanding of such dynamics.
Now that we covered the most straightforward metrics, let’s dive into something equally fascinating: the Bechdel Test. You might have heard of it—it’s become a widely known benchmark, highlighting how often women are given more than just a fleeting role in films. Here’s how it works: for a film to pass the Bechdel Test, it needs to meet just three criteria:
- It must have at least two named female characters
- These women must talk to each other
- And their conversation must be about something other than a man
Sounds straightforward, right ? You’d think most movies would pass with flying colors, but surprisingly, many don’t ! This test isn’t about calling out individual films but rather highlighting larger trends in storytelling and representation. Now, passing the Bechdel Test doesn’t automatically mean a movie is feminist or inclusive, nor does failing it make a film inherently problematic. Instead, it’s a way to spark discussion about how women are portrayed on screen and whether they are given meaningful roles beyond supporting male characters.
The Disney movie "Mulan" passes the Bechdel TestWhile other tests, such as the Mako Mori Test—which evaluates whether a female character has her own narrative arc independent of supporting a male character—or the DuVernay Test, which assesses the inclusion of characters of color in fully realized roles, provide valuable insights, we chose to focus on the Bechdel Test for its simplicity and widespread use. This makes it an accessible starting point for exploring broader trends in gender representation.
Now, we want to use this Bechdel Test result to classify if a movie is considered as having a good representation of women or not. However, not every movie in the dataset that we are provided with has a Bechdel Test result. We then need to make a halt on our analysis journey and classify these movies ourselves.
Thus, we look through the plot summaries of the movies on which we have Bechdel Test information to find out the defining words and themes of movies that pass and fail the test and then search for these elements in the plot summaries of the other movies of the dataset to classify them as having a good representation of women or not.
Although the Bechdel Test result is a popular measure to classify the representation of women in movies, it is also inherently flawed. Its major flaw being that it does not take into account the subject matter (the subject of the conversation) and hence movies that are widely considered as sexist can still pass the test (e.g. Fifty Shade of Grey which is known to promote abuse passes the test) while other movies with feminist undertones can still fail the test (e.g. Eternal Sunshine of the Spotless Mind). Another major flaw is that the dataset we used to get Bechdel Test results data is user-submitted which leads to certain cognitive biases (e.g. some users might only add movies that pass the Bechdel Test and hence cause an imbalance in the dataset). For these reasons, we create another metric to define if a movie has a good representation of women : the movie has to pass the Bechdel AND have at least half of its cast to be female actresses. We then use this classification method to find the relevant words and themes to apply to search every other movie of the dataset.
To compare between these 2 metrics, we used 2 different methods : GPT2 to classify feminist movies based on the Bechdel test, and classification with women proportion taken into account with SVM.
- We turned to GPT-2, a pre-trained language model that specializes in generating and understanding natural language. GPT-2 was chosen for its ability to analyze the semantics of movie plot summaries, helping us identify nuanced themes and context that go beyond simple keyword matching. By fine-tuning GPT-2 on a custom dataset of feminist movies, we were able to improve its ability to recognize feminist elements in films, providing a richer and more accurate classification than the Bechdel Test alone.
This method is especially valuable in addressing the limitations of the Bechdel Test, which can miss films with feminist messages that might not meet its criteria. GPT-2 allows us to go beyond surface-level analysis and dive deeper into the language of movie summaries, identifying key themes, character arcs, and narrative structures that reflect feminist ideas.
- SVM was selected as it excels in handling binary classification tasks using numerical and categorical input features. For the second metric—where female cast proportion is included—SVM provides a straightforward and interpretable model to classify movies with high accuracy.
Here are our two models performances:
Measure | Model 1 (GPT-2) | Model 2 (SVM) |
---|---|---|
Accuracy | 0.82 | 0.73 |
F1 Score | 0.61 | 0.73 |
As we can see, the GPT-2 model achieved an accuracy of 0.82 and an F1 score of 0.61, indicating its strength in understanding natural language but leaving room for improvement in identifying less explicit feminist elements. The SVM model, trained on cast proportion and Bechdel results, provided an alternative approach, achieving an accuracy of 0.73 and an F1 score of 0.73. Together, these models offer complementary insights into the representation of women in films, with GPT-2 focusing on semantic analysis and SVM addressing numerical and categorical patterns.
Based on the confusion matrices, GPT-2 outperforms the SVM model in key areas, particularly in identifying the ‘Feminist’ class correctly. It has significantly fewer false negatives (609 compared to 1176), demonstrating better sensitivity. This is crucial because minimizing missed detections of the ‘Feminist’ class is likely more important in our application. While SVM performs slightly better in reducing false positives, the higher accuracy and stronger performance on the ‘Feminist’ class make GPT-2 the more suitable model for our needs. Given these results, it makes sense to concentrate on GPT-2 moving forward and focus on fine-tuning it further to optimize performance.
But enough about machine learning monologues, let us look at the results and what kind of information can we get from them !
{% include_relative assets/plots/two_models_two_plots.html %}
Do you see what I see ? That’s right –- progress ! Even though the number of films produced each year has skyrocketed, the percentage of movies with women well represented has also steadily increased over time, nearing almost 40% with the second classification method ! This classification difference creates quite the discrepancies, as they do not always classify the same movie in the same category as can be seen on the Venn diagram below.
Venn Diagram of both modelsSo overall, women seem to be more and more represented on screen over the years but does that change depending on the genre of the movie ?
{% include_relative assets/plots/two_models_genre.html %}
Wow, strangely only a small portion of action and adventure pass the test... probably because, in these movies, the main character is usually a man with a woman playing the role of love interest whereas romance and musical movies tend do much better. But regardless of the genre, these percentages are going up with time which is a good start... or is it ?
Now to be fair, the Bechdel Test provides a way to assess the representation of women in a film, but it remains limited in evaluating the depth or significance of their roles. For instance, Pacific Rim fails the test due to its lack of female characters, yet Mako Mori plays a pivotal role in the story, serving as more than just a supporting figure to the male protagonist.
Makomori in actionSimilarly, Éowyn from The Lord of the Rings stands out as a strong character, despite the film’s broader failure to pass the test. And as an example,Twilight passed the test, even though the character of Bella is rarely considered as a strong female character, as it is said that she relies on men to “save and protect” her.
Bella saved by EdwardWell then, what shall we do? The Bechdel test has proven not to be deterministic. Afterall, a movie can convey a message in many subtle ways that a simple numerical test might miss. So how about we take a look at feminism from a different angle… turn to ourselves… and ask the public?
We created our own dataset of feminist movies based on different websites on the internet (each movie of the dataset we created is widely considered as conveying a feminist message). Here below you can see a sample of the movies we have extracted, of the set of images we have put together ourselves. In order to complement this, we used the same number of samples of films that have obtained 0 at the bechdel test. In total, our dataset is made of 296 feminist movies and 296 movies that fail the Bechdel Test. We use this dataset to fine-tune a pre-trained GPT2 model to find the most important features of a feminist movie (and also the features that make a “macho” movie), working mostly on the semantics of the sentences, then apply this model to the entire movie dataset to classify a movie as feminist or not.
Feminism movies collectedEquipped with our freshly labelled dataset, let us apply the same analysis framework that we used for the representation of women in movies :
{% include_relative assets/plots/combined_feminism_analysis.html %}
Surprisingly, there seems to be quite a lot of feminist movies, reaching even 60% of the total movie production in the 1930s ! However, the trend that we observed while analyzing women representation does not appear here, in fact it is even reversed ! As time goes on, less movies seem to convey a feminist message. But maybe this diminishing trend does not appear in every genre.
{% include_relative assets/plots/percentage_feminism_all_periods.html %}
Romance films rank first in terms of the percentage of feminist films, reaching about 70%! Surprising? Musical films come in second on average. These types of films generally feature a man and a woman as leads, without one being there to complement the other. In contrast, only 30% of action and adventure films, which are often male-driven, are considered feminist. We find roughly the same ranking for films that pass the Bechdel test.
But what about one of the most prestigious recognition awards, given to those who have shaped the industry?
{% include_relative assets/plots/combined_oscar_feminist_pie_charts.html %}
Ladies and gentlemen, the Oscars! 🎬 Half of the nominees are feminist—pretty impressive! But when it comes to taking home the Best Picture award? Let’s just say the numbers aren’t quite ready for their acceptance speech.This begs the question: Is it enough to be nominated, or will we see a shift where feminist films become the new norm for winners?
As we ponder upon this question, another one catches up to us, we now dive into what truly defines a feminist movie !
We have seen a lot about feminism making its way to the big screen, but what sort of feelings carry this message? Do feminist movies portray a more positive and happy scenario ? One might expect to find a trend towards these kinds of movies or, on the contrary, that they focus more on themes of rebellion and tense plotlines. Analyzing the movie summaries from all of the movies predicted as feminist by the 3 ML models created as well as those predicted non-feminist, we find a deceiving result : there is no apparent difference in sentiment between feminist and non feminist movies.
Indeed, by using the sentiment intensity analyser of VADER, we were able to determine whether the movie summaries were presenting a positive, neutral or negative sentiment. The first plot shows that both feminist and non-feminist movies contain mostly neutral sentiment summaries and that feminist movies tend to have a slightly higher rate of positive scores whereas non-feminist movies will tend towards slightly more negative sentiments.
The second plot shows the compound scores which illustrate the mean sentiment score value, a positive value represents positive and similar for negative scores and sentiments. Once again, both categories have an average mean score that represents neutral movies and non-feminist movies have a non-symmetric distribution towards negative sentiment.
This surprising result leaves us wondering why. Therefore, in order to better understand it, we perform topic detection and find the defining results in both feminist and non-feminist movies.
To further investigate these patterns, we performed topic detection to uncover the themes prevalent in both feminist and non-feminist movie summaries.
{% include_relative assets/plots/lda_visualization.html %}
By visualizing the intertopic distance using multidimensional scaling, we can identify two dominant principal components (PC1 and PC2). These components provide a deeper understanding of how topics diverge:
- PC1 aligns with themes containing negative sentiment, such as loss, conflict, or hardship.
- PC2 highlights themes with positive sentiment, such as love, triumph, or resolution. This dichotomy reveals why the average compound scores for feminist and non-feminist movies appear neutral despite their underlying asymmetry. Both categories contain a mix of polarizing themes, which, when averaged, cancel out extreme sentiments.
These findings present a nuanced understanding of the sentiment and thematic differences between feminist and non-feminist movies. While both categories exhibit a neutral overall tone, the underlying topics reveal a blend of positive and negative connotations, with feminist movies leaning toward optimism and non-feminist movies reflecting more somber themes.
Now how about finding some relationships in these feminist movies : let's do some graph theory !
As our journey unfolds, delving deeper into understanding what defines a feminist movie, it has become clear that traditional metrics alone aren’t enough. To uncover the intricate connections between themes, characters, and interactions, we turn to a more sophisticated tool: graph theory.
Graph theory gives a quite powerful framework to analyze complex relationships and structures within movie datasets. By representing actors, characters, or scenes as nodes and their interactions as edges, we can uncover patterns of collaboration, centrality, and representation. This approach enriches our analysis by offering a deeper understanding of how gender dynamics are interwoven within the fabric of storytelling. This approach can help us understand how feminist and non-feminist movies are linked and what features determine if a movie is feminist or not.
First of all, let us compare the three different models that were trained. In this graph, each model is linked to all of the movies it has predicted as feminist. We can see that only a fraction of the movies are predicted as feminist by all three models. This shows how complex it can be to define a feminist movie.
{% include_relative assets/plots/graph_movie_model_predictions.html %}
But how does the model determine if a movie is feminist or not ? One way of understanding how the third model determines its predictions is by finding the main topics/themes in the summary as keyphrases. Then, each set of words in a single keyphrase can be linked and plotted into a network to show the interactions between the topics. We have implemented this and done a keyphrase search with the help of the module KeyBERT in the summaries of 20 movies predicted as feminist by the model. KeyBERT allows us to obtain keyphrases of 2 words that are representative of the main text such as : (‘katniss volunteers’, 0.5623) for the summary of the Hunger games, where the number represents the relevance score. Finally, by linking these words together, a network graph can be plotted.
{% include_relative assets/plots/movie_topic_interactions.html %}
By selecting movies and looking around in the plots, a trend can be observed. Usually, a feminine character is represented and action words are linked to it. Of course, sometimes the plots do not bring much information as they rely on the summary but this framework could provide some very valuable results if provided with a more extensive dataset (for example a datset containing the dialogues of the movies).
Another way of understanding the differences between feminist and non-feminist movies is by looking at the main features provided by the dataset such as box office revenue, runtime, year, country, languages and genre. Using a parallel coordinates plot is a great way of viewing how a set of feminist and non-feminist movies vary within these features and seeing the differences between both groups.
{% include_relative assets/plots/parallel_plot.html %}
The plot was created on a subset of 1’000 movies for each group (feminist or not). We observe the strong similarity in movie runtime and box office revenue whereas the countries, languages and genre vary greatly within and in between the groups. This shows that feminist movies are not that different compared to others with regards to the categorical features but rather depend on the movie plots and the message that they portray.
So, what have we uncovered on this deep dive through the world of women’s representation in cinema? This journey is a complex one, filled with progress, challenges, and plenty of room for growth. From applying the Bechdel Test to leveraging advanced tools like GPT-2 and SVM, we’ve explored how women are portrayed across decades, genres, and culture. We saw that while yes, there has been progress, more films are telling stories about women, placing them in the spotlight,slowly breaking free of the ageless stereotypes, the data also whispers that true gender equality is still far from reality, with women remaining underrepresented in leading roles and often confined to supporting or stereotypical characters.
Thus through a mosaic of numbers and tests,we gained insight into broader trends in representation. But straight-forwarded metrics alone don’t tell the whole story. By also examining sentiment, genre, and country-specific data, we can capture a more nuanced understanding of how feminism and female representation intersect with cultural and cinematic factors.
At the end of this journey we are left with those important questions, about how far we’ve come and how much further we have to go. While the feminist ideals keep on growing over our old habits, the industry still has a long way to go before reaching true gender equality on screen. This research could mark a step forward, paving the way for further exploration and, hopefully, continued progress in the portrayal of women in cinema.
And as we wrap up, let’s channel Wonder Woman—lassoing the truth about representation and smashing the stereotypes, one reel at a time. 💪🎥
Gal Gadot in Wonderwoman...Still with us? Here is a little post-credit secret for you:
Now it is your turn ! After going through our analysis, do you think you can guess if a movie is feminist or not based on its plot summary ? Select a movie, look through its plot summary for hints, take a guess and see if you agree with our model's prediction !
Select a movie from the dropdown to view its summary and feminism response:
{% include_relative assets/plots/feminism_game.html %}