Entry tags:
How I make those Gephi diagrams
Written for
shauna@social.coop on Mastodon
Please don't read any further if you want to maintain the belief I know what I'm doing
I was original entranced by this post - https://gephi.org/users/quick-start/
So I downloaded Gephi and then looked for some data to interrogate. Initially, I wanted to look at how the players at Euro 2012 were interconnected - https://fulltimesportsfan.wordpress.com/2012/06/08/finalised-diagram/
It's pretty much been the same process ever since, with some learnings that have been incorporated in the oh my goodness 13 years!
Information source - I use the Wikipedia squad pages. This is why you'll occasionally see notes about "delayed because of" and a warning that the data is taken from Wikipedia so may not be accurate. The club team a player plays for is the one most likely to be wrong, especially for competitions that are between seasons. (The strange places I've ended up at on Wikipedia because of edit wars about which club someone plays for.) It is, to an extent, worse for the rugby union ones because players move at the end of the season, not the start of the next.
Because I do the input (and the removals when teams are out) by hand, I go through the national teams starting with group A and move downwards. This is why, if there is a delay in the teams in group A naming their squads, it causes a huge knock on effect. (The time Italy delayed their squad announcement till ~ 7 pm on the night of the deadline for a game show caused significant cursing because they were the first team in group A and I couldn't start till they did.)
The larger the event and the bigger the squads obviously, the longer it takes.
If you can python (I can't, one day I will learn etc), James Ashford wrote a really nice post on how to do all of this with Python - https://james.ashford.phd/2023/08/25/analysing-the-2023-fifa-womens-world-cup-with-graph-theory/
Me, I manually add things into Gephi. Sometimes this means I notice things (like the number of Zambian players playing in the Kazakh Women's League, or that there is a Saudi Women's League - https://fulltimesportsfan.wordpress.com/2023/07/22/womens-world-cup-2023-group-stage-network-diagrams/)
I use the player as the source and the national or club team as the target. I've experimented with using both directed and undirected links, and it doesn't make much of a difference.
There are other layouts, but I like the way force atlas looks. (Force atlas works okay with this size data set. For significantly larger ones, Yifan Hu is easier)
I start with the pre-set values, increasing the repulsion strength if the teams crash into each other.


I like to add each player individually, because one of the things I enjoy is seeing the shape and positions change with each addition, but I'm sure making more links at the same time would make it go quicker.
For the colour and the size of the circles, I keep it really simple and stick to number of degrees. For size, I set minimum at 10 and maximum at 50. I find it's large enough to see the small changes with each player and clear enough when I've oopsed and not attached a player to the right national team. Or the Wiki page is wrong and hasn't taken off the players that didn't make it to the final squad. Or France have decided to only pick 25 players when they could have picked 26 for who knows what reason. I also like it for the rugby union one (because they have unlimited replacements for squad injuries) because it creates a subtle gradation for each injured player.
Colour is slightly more complicated. I like to try and use either the tournament colours or the colours of the host country flag but it's not possible to change the Gephi pre-sets (or at least not as far as I can find out) so sometimes I have to go with nearest to those. I know to not complain about free software but it's the reason I keep looking to see if there's a way to do something similar in R.
I keep link width at 1.
For closest to the centre, I use the zoom function, and the degree function to get the number of players.
When teams get knocked out, I remove the players manually, hence why I try to keep the teams in order when I add them. Again, you get some fun shape and pattern changes that way.
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Please don't read any further if you want to maintain the belief I know what I'm doing
I was original entranced by this post - https://gephi.org/users/quick-start/
So I downloaded Gephi and then looked for some data to interrogate. Initially, I wanted to look at how the players at Euro 2012 were interconnected - https://fulltimesportsfan.wordpress.com/2012/06/08/finalised-diagram/
It's pretty much been the same process ever since, with some learnings that have been incorporated in the oh my goodness 13 years!
Information source - I use the Wikipedia squad pages. This is why you'll occasionally see notes about "delayed because of" and a warning that the data is taken from Wikipedia so may not be accurate. The club team a player plays for is the one most likely to be wrong, especially for competitions that are between seasons. (The strange places I've ended up at on Wikipedia because of edit wars about which club someone plays for.) It is, to an extent, worse for the rugby union ones because players move at the end of the season, not the start of the next.
Because I do the input (and the removals when teams are out) by hand, I go through the national teams starting with group A and move downwards. This is why, if there is a delay in the teams in group A naming their squads, it causes a huge knock on effect. (The time Italy delayed their squad announcement till ~ 7 pm on the night of the deadline for a game show caused significant cursing because they were the first team in group A and I couldn't start till they did.)
The larger the event and the bigger the squads obviously, the longer it takes.
If you can python (I can't, one day I will learn etc), James Ashford wrote a really nice post on how to do all of this with Python - https://james.ashford.phd/2023/08/25/analysing-the-2023-fifa-womens-world-cup-with-graph-theory/
Me, I manually add things into Gephi. Sometimes this means I notice things (like the number of Zambian players playing in the Kazakh Women's League, or that there is a Saudi Women's League - https://fulltimesportsfan.wordpress.com/2023/07/22/womens-world-cup-2023-group-stage-network-diagrams/)
I use the player as the source and the national or club team as the target. I've experimented with using both directed and undirected links, and it doesn't make much of a difference.
There are other layouts, but I like the way force atlas looks. (Force atlas works okay with this size data set. For significantly larger ones, Yifan Hu is easier)
I start with the pre-set values, increasing the repulsion strength if the teams crash into each other.


I like to add each player individually, because one of the things I enjoy is seeing the shape and positions change with each addition, but I'm sure making more links at the same time would make it go quicker.
For the colour and the size of the circles, I keep it really simple and stick to number of degrees. For size, I set minimum at 10 and maximum at 50. I find it's large enough to see the small changes with each player and clear enough when I've oopsed and not attached a player to the right national team. Or the Wiki page is wrong and hasn't taken off the players that didn't make it to the final squad. Or France have decided to only pick 25 players when they could have picked 26 for who knows what reason. I also like it for the rugby union one (because they have unlimited replacements for squad injuries) because it creates a subtle gradation for each injured player.
Colour is slightly more complicated. I like to try and use either the tournament colours or the colours of the host country flag but it's not possible to change the Gephi pre-sets (or at least not as far as I can find out) so sometimes I have to go with nearest to those. I know to not complain about free software but it's the reason I keep looking to see if there's a way to do something similar in R.
I keep link width at 1.
For closest to the centre, I use the zoom function, and the degree function to get the number of players.
When teams get knocked out, I remove the players manually, hence why I try to keep the teams in order when I add them. Again, you get some fun shape and pattern changes that way.