Dataisbeautiful

Page 1 of 135

[OC] Map of the Bundesliga teams from 1992 to 2019

Data Is Beautiful

COMMENTS

  • kaphi
    1 points Dec 08,2019, 5:11pm

    Data source: https://www.transfermarkt.de/

    Tools used: Python, BeautifulSoup, Pandas, Geopandas, Matplotlib, Gimp

    Here you can see the single images for each season.

COMMENTS

  • kaphi
    1 points Dec 08,2019, 5:11pm

    Data source: https://www.transfermarkt.de/

    Tools used: Python, BeautifulSoup, Pandas, Geopandas, Matplotlib, Gimp

    Here you can see the single images for each season.

● ● ●

[OC] What does Reddit argue about? Analyzing 5M AmITheAsshole posts and comments

Data Is Beautiful

COMMENTS

  • networksciencegg
    9 points Dec 08,2019, 4:40pm

    If you're interested, you can read additional details here:

    https://medium.com/@tom.gonda/what-does-reddit-argue-about-28432b11ea26

    A quick explanation about what you are seeing:I've used pushshift to download 5M submissions and comments from AITA subreddit.

    I then used NLP to extract topics from the data. I received topics related to relationship, money, work-enviroment and more. I used the extracted topics to classify what each post was talking about, ranking the common topics throughout all the submissions

    The link also contains the dataset created using pushshift.

    Visualization via excel

    Edit: more detailed explanation

  • IoSonCalaf
    1 points Dec 08,2019, 5:53pm

    No wedding category? I feel like half the stories here have to do with weddings. I’m not complaining, by the way. I love the wedding stories.

COMMENTS

  • networksciencegg
    9 points Dec 08,2019, 4:40pm

    If you're interested, you can read additional details here:

    https://medium.com/@tom.gonda/what-does-reddit-argue-about-28432b11ea26

    A quick explanation about what you are seeing:I've used pushshift to download 5M submissions and comments from AITA subreddit.

    I then used NLP to extract topics from the data. I received topics related to relationship, money, work-enviroment and more. I used the extracted topics to classify what each post was talking about, ranking the common topics throughout all the submissions

    The link also contains the dataset created using pushshift.

    Visualization via excel

    Edit: more detailed explanation

  • IoSonCalaf
    1 points Dec 08,2019, 5:53pm

    No wedding category? I feel like half the stories here have to do with weddings. I’m not complaining, by the way. I love the wedding stories.

● ● ●

The streets of Tampa mapped with 128 Coast Bike rides in 2018. [remix] [OC]

Data Is Beautiful

COMMENTS

COMMENTS

● ● ●

Height and Width of Handwritten Return Addresses [OC]

Data Is Beautiful

COMMENTS

  • veggiemedley
    2 points Dec 07,2019, 12:54pm

    I wrote out 25 return addresses for our upcoming Christmas cards over about 20 minutes and noticed they were inconsistent in size. I measured the width and height of the complete return address and recorded in a csv file. I used ggplot2 to visualize.

COMMENTS

  • veggiemedley
    2 points Dec 07,2019, 12:54pm

    I wrote out 25 return addresses for our upcoming Christmas cards over about 20 minutes and noticed they were inconsistent in size. I measured the width and height of the complete return address and recorded in a csv file. I used ggplot2 to visualize.

● ● ●

[OC] My wife's weight tracked over the course of her first pregnancy

Data Is Beautiful

COMMENTS

  • Upstart-Emeritus
    58 points Dec 07,2019, 12:29pm

    Nice. So tell me, how comfortable is it sleeping on your couch?

  • ThomasSchiff
    14 points Dec 07,2019, 12:13pm

    Source - Pregnancy tracker app

    Weight is in Kilograms (sorry America)

    I accidentally lost the data from the first 12 weeks while changing phones. The 'projected' range is set automatically by the app.

  • Jonny_Boy_HS
    1 points Dec 07,2019, 2:04pm

    This is really fascinating! It would be even more interesting to demonstrate the mom’s weight variance pre/immediately post birth to baby weight compared to the decile weight gain/baby weight to determine any correlations.

COMMENTS

  • Upstart-Emeritus
    58 points Dec 07,2019, 12:29pm

    Nice. So tell me, how comfortable is it sleeping on your couch?

  • ThomasSchiff
    14 points Dec 07,2019, 12:13pm

    Source - Pregnancy tracker app

    Weight is in Kilograms (sorry America)

    I accidentally lost the data from the first 12 weeks while changing phones. The 'projected' range is set automatically by the app.

  • Jonny_Boy_HS
    1 points Dec 07,2019, 2:04pm

    This is really fascinating! It would be even more interesting to demonstrate the mom’s weight variance pre/immediately post birth to baby weight compared to the decile weight gain/baby weight to determine any correlations.

  • attack_bronson
    1 points Dec 07,2019, 7:42pm

    What’ll really have a chance to get you in hot water is projecting the weight loss after the delivery.

● ● ●

[OC] Google Maps Timeline of when I went around nearly the entirety of Aruba in an ATV

Data Is Beautiful

COMMENTS

  • SkylarWeston
    1 points Dec 01,2019, 12:08pm

    Source: Location of my phone throughout my trip on an ATVTool: Google Maps timeline feature

COMMENTS

  • SkylarWeston
    1 points Dec 01,2019, 12:08pm

    Source: Location of my phone throughout my trip on an ATVTool: Google Maps timeline feature

● ● ●

[OC] Racing bars and total number of posts on /r/dataisbeautiful

Data Is Beautiful

COMMENTS

  • askLubich
    1 points Nov 29,2019, 11:05am

    Source: The modlog of /r/dataisbeautiful, pulled with python's praw-module. The posts were filtered (everything hosted on youtube and v.redd.it) and manually classified (I probably lost at least 2 IQ-points in the process).

    Tools: Matplotlib in python, labels added in inkscape.


    We have recently announced a moratorium on racing bar charts due to them being over-used on the subreddit and because of very very persistent spam-issues by youtubers. Since most of the posts were removed due to shameless self-promotion/spam even before the announcement, some of you rightfully asked to see some numbers. So here you go: At the peak, there were almost half as many racing bars coming into the subreddit per day as there were valid posts.

  • TrailRunnerYYC
    16 points Nov 29,2019, 3:21pm

    Wait, what? There is a moratorium on racing bar graphs?!?

    That is great news. Those visualizations are misleading and not useful.

  • Vaglame
    5 points Nov 29,2019, 12:30pm

    It would have been better probably to wait for some time after the moratorium's start to present the data.

COMMENTS

  • askLubich
    1 points Nov 29,2019, 11:05am

    Source: The modlog of /r/dataisbeautiful, pulled with python's praw-module. The posts were filtered (everything hosted on youtube and v.redd.it) and manually classified (I probably lost at least 2 IQ-points in the process).

    Tools: Matplotlib in python, labels added in inkscape.


    We have recently announced a moratorium on racing bar charts due to them being over-used on the subreddit and because of very very persistent spam-issues by youtubers. Since most of the posts were removed due to shameless self-promotion/spam even before the announcement, some of you rightfully asked to see some numbers. So here you go: At the peak, there were almost half as many racing bars coming into the subreddit per day as there were valid posts.

  • TrailRunnerYYC
    16 points Nov 29,2019, 3:21pm

    Wait, what? There is a moratorium on racing bar graphs?!?

    That is great news. Those visualizations are misleading and not useful.

  • Vaglame
    5 points Nov 29,2019, 12:30pm

    It would have been better probably to wait for some time after the moratorium's start to present the data.

  • datagraph
    1 points Nov 29,2019, 6:15pm

    Ban seems to make sense - but I finally joined reddit and I have to admit - I was looking forward to posting a moving bar graph I made!

● ● ●

[OC] Hiking the Harz mountains

Data Is Beautiful

COMMENTS

  • Mugros
    16 points Nov 29,2019, 10:32am

    And the big picture showing my recent travels in Europe.

    Source: Google location history

    Tool: https://github.com/luka1199/geo-heatmap

  • EmboldenedEagle
    1 points Nov 29,2019, 7:45pm

    I have more questions about your big picture map:

    • Did you hike all of this?
    • What was your gear?
    • What did you do in Jersey?
    • How long did your recent travels take?

    And the most important question:

    • Did you get all the stamps for the Harzer Wandernadel?

COMMENTS

  • Mugros
    16 points Nov 29,2019, 10:32am

    And the big picture showing my recent travels in Europe.

    Source: Google location history

    Tool: https://github.com/luka1199/geo-heatmap

  • EmboldenedEagle
    1 points Nov 29,2019, 7:45pm

    I have more questions about your big picture map:

    • Did you hike all of this?
    • What was your gear?
    • What did you do in Jersey?
    • How long did your recent travels take?

    And the most important question:

    • Did you get all the stamps for the Harzer Wandernadel?

● ● ●

[OC] Per Capita Distribution of Wealth in the U.S. by Generation

Data Is Beautiful

COMMENTS

  • caiuscorvus
    4 points Nov 26,2019, 2:03pm

    While this graph shows that, if anything, Millennials are doing better than Boomers, it does not seem to count for inflation nor the growth in the cost of, for example, education beyond inflation.

    (It may count for inflation but the Fed data doesn't mention it so I doubt it.)

    Here it is adjusted to 2018 $US

    I used intercensal age data to determine the population for each generation by year using the birth years below. Then I offset the generations to reflect the middle of the generation. That is, someone born in the middle of the Baby Boomer generation (1955.5) would be lined up with someone born in the middle of the Millennial generation (1989).

    Distributions by generation are defined by birth year as follows: Silent and Earlier=born before 1946, Baby Boomer=born 1946-1964, Gen X=born 1965-1980, and Millennial=born 1981-1996.

    Issues with analysis:

    Population data is not smooth--there are annual adjustments to the data as seen most readily by the stepped graph in the Silent Generation.

    Inflation should be factored in.

    Other information:

    Chart created in LibreOffice Calc (because I am to lazy to install R and remind myself how to use it).

    Household Wealth by Generation from

    https://www.federalreserve.gov/releases/z1/dataviz/dfa/distribute/table/#quarter:119;series:Net%20worth;demographic:generation;population:all;units:levels;range:1989.3,2019.2

    Intercensal Data from US Census Bureau

    https://www.census.gov/data/tables/time-series/demo/popest/intercensal-national.html

    https://www.census.gov/data/tables/time-series/demo/popest/intercensal-2000-2010-national.html

    PEPSYASEXN - Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States: April 1, 2010 to July 1, 2018

    EDIT - Added inflation data and chart (in comment above)

    https://fred.stlouisfed.org/graph/?g=pBDH

  • Burner_Acount
    3 points Nov 26,2019, 3:04pm

    Correct me if my interpretation is wrong, but it looks to me as if each generation is about on par with the ones before, with the boomers doing a significantly better than the silent generation past 50. I have to think that is in part due to better retirement planning and the rise of the 401k as a tool to facilitate it instead of pensions, which I don't think would show up as household wealth.

    Gen X seems to be on track to do at least as well.

  • arachnidtree
    2 points Nov 26,2019, 3:22pm

    how old is this? Boomers are only 65? Did people (silent gen) really get so much wealthier from age 73 to age 80? Today, elderly people spend a huge amount on health care, a typical assisted living place is about 5k per month.

    (is the x axis age? I'm just guessing, but it'd be nice to have some labels).

COMMENTS

  • caiuscorvus
    4 points Nov 26,2019, 2:03pm

    While this graph shows that, if anything, Millennials are doing better than Boomers, it does not seem to count for inflation nor the growth in the cost of, for example, education beyond inflation.

    (It may count for inflation but the Fed data doesn't mention it so I doubt it.)

    Here it is adjusted to 2018 $US

    I used intercensal age data to determine the population for each generation by year using the birth years below. Then I offset the generations to reflect the middle of the generation. That is, someone born in the middle of the Baby Boomer generation (1955.5) would be lined up with someone born in the middle of the Millennial generation (1989).

    Distributions by generation are defined by birth year as follows: Silent and Earlier=born before 1946, Baby Boomer=born 1946-1964, Gen X=born 1965-1980, and Millennial=born 1981-1996.

    Issues with analysis:

    Population data is not smooth--there are annual adjustments to the data as seen most readily by the stepped graph in the Silent Generation.

    Inflation should be factored in.

    Other information:

    Chart created in LibreOffice Calc (because I am to lazy to install R and remind myself how to use it).

    Household Wealth by Generation from

    https://www.federalreserve.gov/releases/z1/dataviz/dfa/distribute/table/#quarter:119;series:Net%20worth;demographic:generation;population:all;units:levels;range:1989.3,2019.2

    Intercensal Data from US Census Bureau

    https://www.census.gov/data/tables/time-series/demo/popest/intercensal-national.html

    https://www.census.gov/data/tables/time-series/demo/popest/intercensal-2000-2010-national.html

    PEPSYASEXN - Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States: April 1, 2010 to July 1, 2018

    EDIT - Added inflation data and chart (in comment above)

    https://fred.stlouisfed.org/graph/?g=pBDH

  • Burner_Acount
    3 points Nov 26,2019, 3:04pm

    Correct me if my interpretation is wrong, but it looks to me as if each generation is about on par with the ones before, with the boomers doing a significantly better than the silent generation past 50. I have to think that is in part due to better retirement planning and the rise of the 401k as a tool to facilitate it instead of pensions, which I don't think would show up as household wealth.

    Gen X seems to be on track to do at least as well.

  • arachnidtree
    2 points Nov 26,2019, 3:22pm

    how old is this? Boomers are only 65? Did people (silent gen) really get so much wealthier from age 73 to age 80? Today, elderly people spend a huge amount on health care, a typical assisted living place is about 5k per month.

    (is the x axis age? I'm just guessing, but it'd be nice to have some labels).

  • TrailRunnerYYC
    1 points Nov 26,2019, 3:42pm

    Where to begin:

    • What are the units of the x-axis?

    • Why not align data for the same life year for different cohorts?

    • Are the age ranges for each cohort the same size? Are they arbitrary?

    This is horrible choice of visualization for this data

● ● ●

[OC] D&D dice rolls for attributes (strength, etc) with various dice counts

Data Is Beautiful

COMMENTS

  • Essence1337
    2 points Nov 25,2019, 10:53am

    What is theoretical results supposed to show? Theoretically you should have a bell curve (if you ran with a larger sample size). None of those lines are bell curves, some are close but some are very far off.

    Did you pick max combination of dice that didn't add above 18 or sum of all the dice or sum of top n dice or just arbitrarily cut it off at 18? Didn't pay attention to the subtitle in the vis.

  • t3hd0n
    1 points Nov 25,2019, 10:29am

    Tool: anydice, LibreOffice, google sheets

    I used anydice to calculate the probability of the rolls and to simulate the random roll datasets. I used the third dataset generated for the graph data. I used a combination of LibreOffice and google sheets to generate the graphs.

    for context, XDY connotation in D&D; expresses "roll X number of Y sided dice". in this case various numbers of 6 sided dice (standard cube) are rolled and in pools over 3 dice (4d6, etc) the highest 3 dice are used.

    I'm trying to create an alternative to standard ability score generation that fits the flavor of the diceless generation method of Pathfinder RPG 2E, so I wanted to see the results of different size pools of dice.

    data collection page

COMMENTS

  • Essence1337
    2 points Nov 25,2019, 10:53am

    What is theoretical results supposed to show? Theoretically you should have a bell curve (if you ran with a larger sample size). None of those lines are bell curves, some are close but some are very far off.

    Did you pick max combination of dice that didn't add above 18 or sum of all the dice or sum of top n dice or just arbitrarily cut it off at 18? Didn't pay attention to the subtitle in the vis.

  • t3hd0n
    1 points Nov 25,2019, 10:29am

    Tool: anydice, LibreOffice, google sheets

    I used anydice to calculate the probability of the rolls and to simulate the random roll datasets. I used the third dataset generated for the graph data. I used a combination of LibreOffice and google sheets to generate the graphs.

    for context, XDY connotation in D&D; expresses "roll X number of Y sided dice". in this case various numbers of 6 sided dice (standard cube) are rolled and in pools over 3 dice (4d6, etc) the highest 3 dice are used.

    I'm trying to create an alternative to standard ability score generation that fits the flavor of the diceless generation method of Pathfinder RPG 2E, so I wanted to see the results of different size pools of dice.

    data collection page