Data source: https://www.transfermarkt.de/
Tools used: Python, BeautifulSoup, Pandas, Geopandas, Matplotlib, Gimp
Here you can see the single images for each season.
Expand More Comments
If you're interested, you can read additional details here:
A quick explanation about what you are seeing:I've used pushshift to download 5M submissions and comments from AITA subreddit.
I then used NLP to extract topics from the data. I received topics related to relationship, money, work-enviroment and more. I used the extracted topics to classify what each post was talking about, ranking the common topics throughout all the submissions
The link also contains the dataset created using pushshift.
Visualization via excel
Edit: more detailed explanation
No wedding category? I feel like half the stories here have to do with weddings. I’m not complaining, by the way. I love the wedding stories.
Last year I, and many others liked this submission, https://www.reddit.com/r/dataisbeautiful/comments/7wts84/tampa_mapped_with_one_day_of_coast_bike_rides_oc/ by u/ReimannOne
For the longest time I wanted to reproduce it, and this week I finally managed to do it, giving it a night theme and a little Miami Vice flair. I used the 128 "longest" rides from Q1 2018 to draw the streets of Tampa.
Rendering took about 90 minutes, plus the same amount doing shorter version to check what it looks like.
(Tools) R with sf, tidyverse and gganimate r/rstats
I wrote out 25 return addresses for our upcoming Christmas cards over about 20 minutes and noticed they were inconsistent in size. I measured the width and height of the complete return address and recorded in a csv file. I used ggplot2 to visualize.
Nice. So tell me, how comfortable is it sleeping on your couch?
Source - Pregnancy tracker app
Weight is in Kilograms (sorry America)
I accidentally lost the data from the first 12 weeks while changing phones. The 'projected' range is set automatically by the app.
This is really fascinating! It would be even more interesting to demonstrate the mom’s weight variance pre/immediately post birth to baby weight compared to the decile weight gain/baby weight to determine any correlations.
What’ll really have a chance to get you in hot water is projecting the weight loss after the delivery.
Source: Location of my phone throughout my trip on an ATVTool: Google Maps timeline feature
Source: The modlog of /r/dataisbeautiful, pulled with python's praw-module. The posts were filtered (everything hosted on youtube and v.redd.it) and manually classified (I probably lost at least 2 IQ-points in the process).
Tools: Matplotlib in python, labels added in inkscape.
We have recently announced a moratorium on racing bar charts due to them being over-used on the subreddit and because of very very persistent spam-issues by youtubers. Since most of the posts were removed due to shameless self-promotion/spam even before the announcement, some of you rightfully asked to see some numbers. So here you go: At the peak, there were almost half as many racing bars coming into the subreddit per day as there were valid posts.
Wait, what? There is a moratorium on racing bar graphs?!?
That is great news. Those visualizations are misleading and not useful.
It would have been better probably to wait for some time after the moratorium's start to present the data.
Ban seems to make sense - but I finally joined reddit and I have to admit - I was looking forward to posting a moving bar graph I made!
And the big picture showing my recent travels in Europe.
Source: Google location history
I have more questions about your big picture map:
And the most important question:
While this graph shows that, if anything, Millennials are doing better than Boomers, it does not seem to count for inflation nor the growth in the cost of, for example, education beyond inflation.
(It may count for inflation but the Fed data doesn't mention it so I doubt it.)
I used intercensal age data to determine the population for each generation by year using the birth years below. Then I offset the generations to reflect the middle of the generation. That is, someone born in the middle of the Baby Boomer generation (1955.5) would be lined up with someone born in the middle of the Millennial generation (1989).
Distributions by generation are defined by birth year as follows: Silent and Earlier=born before 1946, Baby Boomer=born 1946-1964, Gen X=born 1965-1980, and Millennial=born 1981-1996.
Issues with analysis:
Population data is not smooth--there are annual adjustments to the data as seen most readily by the stepped graph in the Silent Generation.
Inflation should be factored in.
Chart created in LibreOffice Calc (because I am to lazy to install R and remind myself how to use it).
Household Wealth by Generation from
Intercensal Data from US Census Bureau
PEPSYASEXN - Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States: April 1, 2010 to July 1, 2018
EDIT - Added inflation data and chart (in comment above)
Correct me if my interpretation is wrong, but it looks to me as if each generation is about on par with the ones before, with the boomers doing a significantly better than the silent generation past 50. I have to think that is in part due to better retirement planning and the rise of the 401k as a tool to facilitate it instead of pensions, which I don't think would show up as household wealth.
Gen X seems to be on track to do at least as well.
how old is this? Boomers are only 65? Did people (silent gen) really get so much wealthier from age 73 to age 80? Today, elderly people spend a huge amount on health care, a typical assisted living place is about 5k per month.
(is the x axis age? I'm just guessing, but it'd be nice to have some labels).
Where to begin:
What are the units of the x-axis?
Why not align data for the same life year for different cohorts?
Are the age ranges for each cohort the same size? Are they arbitrary?
This is horrible choice of visualization for this data
What is theoretical results supposed to show? Theoretically you should have a bell curve (if you ran with a larger sample size). None of those lines are bell curves, some are close but some are very far off.
Did you pick max combination of dice that didn't add above 18 or sum of all the dice or sum of top n dice or just arbitrarily cut it off at 18? Didn't pay attention to the subtitle in the vis.
Tool: anydice, LibreOffice, google sheets
I used anydice to calculate the probability of the rolls and to simulate the random roll datasets. I used the third dataset generated for the graph data. I used a combination of LibreOffice and google sheets to generate the graphs.
for context, XDY connotation in D&D; expresses "roll X number of Y sided dice". in this case various numbers of 6 sided dice (standard cube) are rolled and in pools over 3 dice (4d6, etc) the highest 3 dice are used.
I'm trying to create an alternative to standard ability score generation that fits the flavor of the diceless generation method of Pathfinder RPG 2E, so I wanted to see the results of different size pools of dice.
data collection page
Made with Love in New York City, New Jersey & Monterrey, Mexico.