Dataisbeautiful

Page 1 of 107

[OC] Very basic groupings of letters of the alphabet

Data Is Beautiful

COMMENTS

  • OC-Bot
    1 points Jan 20,2019, 3:59pm

    Thank you for your Original Content, /u/bdean42!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • bdean42
    1 points Jan 20,2019, 2:39pm

    Created with SnakeyMATIC using this data:

    Alphabet [5] VowelsAlphabet [21] ConsonantsVowels [4] Starts with a vowel soundVowels [1] Starts with a consonant soundConsonants [7] Starts with a vowel soundConsonants [14] Starts with a consonant sound
  • GooseNZ
    1 points Jan 20,2019, 4:18pm

    I don’t understand these type of chart. In this one for example, why does one of the vowels peel off and end up down in “starts with consonant”? And the same with consonants and starts with vowel. How am I meant to be reading these?

COMMENTS

  • OC-Bot
    1 points Jan 20,2019, 3:59pm

    Thank you for your Original Content, /u/bdean42!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • bdean42
    1 points Jan 20,2019, 2:39pm

    Created with SnakeyMATIC using this data:

    Alphabet [5] VowelsAlphabet [21] ConsonantsVowels [4] Starts with a vowel soundVowels [1] Starts with a consonant soundConsonants [7] Starts with a vowel soundConsonants [14] Starts with a consonant sound
  • GooseNZ
    1 points Jan 20,2019, 4:18pm

    I don’t understand these type of chart. In this one for example, why does one of the vowels peel off and end up down in “starts with consonant”? And the same with consonants and starts with vowel. How am I meant to be reading these?

● ● ●

Every single trip I've taken since birth [OC]

Data Is Beautiful

COMMENTS

  • OC-Bot
    1 points Jan 20,2019, 3:13pm

    Thank you for your Original Content, /u/jamcowl!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • jamcowl
    5 points Jan 20,2019, 10:16am

    High-resolution static image of the finished plot.

    Tool: I wrote a Python script that used Matplotlib Basemap for the plotting. Anyone can use my code to create their own visualisations, all you need is the code in my GitHub repo and a list of location data and you can run this yourself.

    Source: I got my location history from Google by following these instructions and downloading a KML file, but any file with a list of (lat,lon) pairs will work. There are instructions in the GitHub repo for what inputs are required. Google's location history only goes back until I got my first smartphone, so I manually added some coordinates from a few childhood trips, which brought the dataset's coverage all the way back to my birth.

  • drdrero
    2 points Jan 20,2019, 10:34am

    It's cool that you have been around the world and all.
    I just don't get people that travel to sick locations, but not the close ones. There are so many beautiful places in Europe alone to discover.

COMMENTS

  • OC-Bot
    1 points Jan 20,2019, 3:13pm

    Thank you for your Original Content, /u/jamcowl!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • jamcowl
    5 points Jan 20,2019, 10:16am

    High-resolution static image of the finished plot.

    Tool: I wrote a Python script that used Matplotlib Basemap for the plotting. Anyone can use my code to create their own visualisations, all you need is the code in my GitHub repo and a list of location data and you can run this yourself.

    Source: I got my location history from Google by following these instructions and downloading a KML file, but any file with a list of (lat,lon) pairs will work. There are instructions in the GitHub repo for what inputs are required. Google's location history only goes back until I got my first smartphone, so I manually added some coordinates from a few childhood trips, which brought the dataset's coverage all the way back to my birth.

  • drdrero
    2 points Jan 20,2019, 10:34am

    It's cool that you have been around the world and all.
    I just don't get people that travel to sick locations, but not the close ones. There are so many beautiful places in Europe alone to discover.

● ● ●

[OC] Word cloud of all populated place names in Great Britain

Data Is Beautiful

COMMENTS

  • OC-Bot
    1 points Jan 18,2019, 7:53am

    Thank you for your Original Content, /u/Dr_Heron!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • lampjambiscuit
    3 points Jan 18,2019, 9:15am

    I've love to see a similar thing done with town/village prefixes and suffixes such as ton, chester porth, port, mouth, aber, stock, ford, etc. Even more interesting would be the geographic distribution. Obviously you'd see the huge differences between parts of Scotland, Wales and Cornwall but I'd be interested in the distribution within England. I'd imagine there being a clear difference between north and south.

  • Dr_Heron
    1 points Jan 18,2019, 5:20am

    I find it interesting how East and West are far more common than North or South, despite the country stretching far more on the North/South axis than on the East/West.

    Also interesting how Little, Lesser, Lower ect are more common than Greater, Upper ect. I wonder if that says anything about the British psyche!

    Data is from the ONS, and the cloud was generated using WordArt.com

COMMENTS

  • OC-Bot
    1 points Jan 18,2019, 7:53am

    Thank you for your Original Content, /u/Dr_Heron!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • lampjambiscuit
    3 points Jan 18,2019, 9:15am

    I've love to see a similar thing done with town/village prefixes and suffixes such as ton, chester porth, port, mouth, aber, stock, ford, etc. Even more interesting would be the geographic distribution. Obviously you'd see the huge differences between parts of Scotland, Wales and Cornwall but I'd be interested in the distribution within England. I'd imagine there being a clear difference between north and south.

  • Dr_Heron
    1 points Jan 18,2019, 5:20am

    I find it interesting how East and West are far more common than North or South, despite the country stretching far more on the North/South axis than on the East/West.

    Also interesting how Little, Lesser, Lower ect are more common than Greater, Upper ect. I wonder if that says anything about the British psyche!

    Data is from the ONS, and the cloud was generated using WordArt.com

  • spacecraftily
    1 points Jan 18,2019, 8:37am

    Is the "St" generally "street" or "saint" in these cases?

● ● ●

I modeled my heart rate recovery from running over various temperature ranges [OC]

Data Is Beautiful

COMMENTS

  • OC-Bot
    1 points Jan 18,2019, 8:33am

    Thank you for your Original Content, /u/antirabbit!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • antirabbit
    1 points Jan 18,2019, 8:28am

    Background

    Since fall of 2017, I've been recording data from my runs using a Garmin Forerunner 230, a chest heart rate strap, and a small temperature monitor that I place on my foot.

    With all of the data, I figured it would be interesting to try extracting some insights from it. One that I thought would be interesting is modeling heart rate recovery.

    Data

    Each data point consists of me stopping my stopwatch, and then starting it again during the same run. Most of the time I am either standing or walking slowly. I removed data points less than half a mile into runs because the heart rate monitor tends to give erratic values, especially in colder weather.

    Model

    I used a nonlinear regression model, as the model cannot be described as a linear sum of the estimated parameters times other values.

    A few assumptions I made in the nonlinear model:

    1. My heart rate will decrease over time unless it is already very low to begin with.

    2. If I were walking for a very long time, then my heart rate would be at my "walking heart rate".

    3. My heart rate will naturally be higher at higher temperatures, as thermal regulation is one of the functions of blood circulation.

    The formula I modeled the data with is

    HR(t) = HR_stop+ (HR_walk(T) - HR_stop) * (1-2^(-RATE(T)*t)) + error

    where t is time in seconds and T is temperature in degrees Celsius

    The values I got for this formula were

    • HR_walk(T) = 75.9 + 0.882*T, where T is temperature in degrees Celsius

    • RATE(T) = 0.0263 + 0.000277T, where T is temperature in degrees Celsius

    • error = normal distribution with variance 145 (standard error about 12 bpm)

    The interpretation of the model is that 1/RATE is the "half life" of my heart rate recovery (roughly 38 seconds), and my heart rate while walking is 75.9 bpm, plus an extra 0.882 bpm per degree Celsius outside.

    Reading the graph

    Since I am trying to describe 4 variables (initial heart rate, final heart rate, temperature, and time), I split up the temperature by the different rectangular facets, and used color to describe the initial heart rate.

    The lines on each graph indicate what the model looks like for that temperature and initial heart rate (you can tell the exact value by where it lies at t=0). The temperature for those lines is the middle point of each facet (so -7.5 C for the first, -2.5 C for the second, 2.5 C for the third, etc.). The dots are actual data points. I've cut off ones over 6 minutes, as they make it harder to read the graph.

    Software

    I converted my FIT files to CSV using the fitparse package in Python, specifically with a command-line script I wrote to streamline the process: https://github.com/mcandocia/fit_processing

    I used R to filter and reshape the data, primarily using dplyr. Visualization was done with ggplot2, as well as the cetcolor package for getting a visually distinguishable color gradient.

    The modeling itself was done using rstan/STAN, which was written in C++ and using Monte Carlo Markov Chains. Attempts using other R functions/packages failed/underperformed.

    The code for this can be found here: https://github.com/mcandocia/heart_rate_modeling

    More Information

    The article I wrote for this data: https://maxcandocia.com/article/2019/Jan/09/modeling-heart-rate-nonlinear/

  • just_some_guy65
    1 points Jan 18,2019, 12:46pm

    Ever since Garmin watches starting displaying 2 minute heart rate recovery I have been recording mine, annoyingly you have to remember it as it doesn't appear in Connect or anywhere useful. I had noticed that it is lower at higher temperatures but that is about it. I thought that I was the only person in the world interested in this subject so great to see what you have done and I have followed your interesting links.

COMMENTS

  • OC-Bot
    1 points Jan 18,2019, 8:33am

    Thank you for your Original Content, /u/antirabbit!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • antirabbit
    1 points Jan 18,2019, 8:28am

    Background

    Since fall of 2017, I've been recording data from my runs using a Garmin Forerunner 230, a chest heart rate strap, and a small temperature monitor that I place on my foot.

    With all of the data, I figured it would be interesting to try extracting some insights from it. One that I thought would be interesting is modeling heart rate recovery.

    Data

    Each data point consists of me stopping my stopwatch, and then starting it again during the same run. Most of the time I am either standing or walking slowly. I removed data points less than half a mile into runs because the heart rate monitor tends to give erratic values, especially in colder weather.

    Model

    I used a nonlinear regression model, as the model cannot be described as a linear sum of the estimated parameters times other values.

    A few assumptions I made in the nonlinear model:

    1. My heart rate will decrease over time unless it is already very low to begin with.

    2. If I were walking for a very long time, then my heart rate would be at my "walking heart rate".

    3. My heart rate will naturally be higher at higher temperatures, as thermal regulation is one of the functions of blood circulation.

    The formula I modeled the data with is

    HR(t) = HR_stop+ (HR_walk(T) - HR_stop) * (1-2^(-RATE(T)*t)) + error

    where t is time in seconds and T is temperature in degrees Celsius

    The values I got for this formula were

    • HR_walk(T) = 75.9 + 0.882*T, where T is temperature in degrees Celsius

    • RATE(T) = 0.0263 + 0.000277T, where T is temperature in degrees Celsius

    • error = normal distribution with variance 145 (standard error about 12 bpm)

    The interpretation of the model is that 1/RATE is the "half life" of my heart rate recovery (roughly 38 seconds), and my heart rate while walking is 75.9 bpm, plus an extra 0.882 bpm per degree Celsius outside.

    Reading the graph

    Since I am trying to describe 4 variables (initial heart rate, final heart rate, temperature, and time), I split up the temperature by the different rectangular facets, and used color to describe the initial heart rate.

    The lines on each graph indicate what the model looks like for that temperature and initial heart rate (you can tell the exact value by where it lies at t=0). The temperature for those lines is the middle point of each facet (so -7.5 C for the first, -2.5 C for the second, 2.5 C for the third, etc.). The dots are actual data points. I've cut off ones over 6 minutes, as they make it harder to read the graph.

    Software

    I converted my FIT files to CSV using the fitparse package in Python, specifically with a command-line script I wrote to streamline the process: https://github.com/mcandocia/fit_processing

    I used R to filter and reshape the data, primarily using dplyr. Visualization was done with ggplot2, as well as the cetcolor package for getting a visually distinguishable color gradient.

    The modeling itself was done using rstan/STAN, which was written in C++ and using Monte Carlo Markov Chains. Attempts using other R functions/packages failed/underperformed.

    The code for this can be found here: https://github.com/mcandocia/heart_rate_modeling

    More Information

    The article I wrote for this data: https://maxcandocia.com/article/2019/Jan/09/modeling-heart-rate-nonlinear/

  • just_some_guy65
    1 points Jan 18,2019, 12:46pm

    Ever since Garmin watches starting displaying 2 minute heart rate recovery I have been recording mine, annoyingly you have to remember it as it doesn't appear in Connect or anywhere useful. I had noticed that it is lower at higher temperatures but that is about it. I thought that I was the only person in the world interested in this subject so great to see what you have done and I have followed your interesting links.

● ● ●

Eras of dominance in top flight English football (soccer) (Description in comments) [OC]

Data Is Beautiful

COMMENTS

  • OC-Bot
    1 points Jan 17,2019, 8:11am

    Thank you for your Original Content, /u/shlam16!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • shlam16
    27 points Jan 17,2019, 7:12am

    Source: Wiki

    Tools: Excel and Paint


    There's an awful lot of information to be gleaned from this plot, so bear with me while I describe some of the important stuff:

    • An "Era of Dominance" is classified by a team scoring 7 "points" or more in a 7 year period. Winning the league earns 2 points. Runners-up earn 1 point. 3rd-4th each earn 0.5 points. In theory it's possible for a team to earn this status by getting nothing but runner-up (never happened), but in reality it always required at least 2 wins in a rolling period.

    • The x-axis is in years. For the purpose of this plot, this represents the calendar year the season ended. The y-axis is in "score", with a maximum of 14 points which would mean winning every available title in the rolling period.

    • When two or more teams share an era of dominance then the team with the higher "score" is shown on top. For example: 2008 with Man U > Arsenal > Chelsea.

    • The bars under the team logos are provided to show the length of the era, as it can get a little confusing in the top plot.


    Some fun stats:

    • Liverpool had the longest uncontested era of dominance. 14 years between 1976-1990. Nearest was Arsenal with 10 years in the 30's.

    • Manchester United had the longest continuous era of dominance. 26 years between 1990-2016. Nearest was Liverpool with 23 years just prior to Man U.

    • A difference between the Liverpool and Manchester reigns was that Liverpool spent 22/23 of their seasons as the premier team. Manchester only spent 20/26 of their seasons, being usurped by Liverpool, Arsenal, and Chelsea throughout.

    • Only one 7 year period in the history of English football saw three teams establishing dominance. The period ending in 2008.

    • No football was played during either of the great wars.


    Lastly, for fun, here is the all-time table for top flight English Football.


    Edit: Made a couple of errors. Here is a slightly updated version of the main post. I forgot to extend Chelsea and MC's reigns to their beginnings initially. Don't have time to add all the bells and whistles like the main post so this can suffice.

    Also somehow forgot Chelsea from the table above, so that link has been changed to remedy the situation.

COMMENTS

  • OC-Bot
    1 points Jan 17,2019, 8:11am

    Thank you for your Original Content, /u/shlam16!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • shlam16
    27 points Jan 17,2019, 7:12am

    Source: Wiki

    Tools: Excel and Paint


    There's an awful lot of information to be gleaned from this plot, so bear with me while I describe some of the important stuff:

    • An "Era of Dominance" is classified by a team scoring 7 "points" or more in a 7 year period. Winning the league earns 2 points. Runners-up earn 1 point. 3rd-4th each earn 0.5 points. In theory it's possible for a team to earn this status by getting nothing but runner-up (never happened), but in reality it always required at least 2 wins in a rolling period.

    • The x-axis is in years. For the purpose of this plot, this represents the calendar year the season ended. The y-axis is in "score", with a maximum of 14 points which would mean winning every available title in the rolling period.

    • When two or more teams share an era of dominance then the team with the higher "score" is shown on top. For example: 2008 with Man U > Arsenal > Chelsea.

    • The bars under the team logos are provided to show the length of the era, as it can get a little confusing in the top plot.


    Some fun stats:

    • Liverpool had the longest uncontested era of dominance. 14 years between 1976-1990. Nearest was Arsenal with 10 years in the 30's.

    • Manchester United had the longest continuous era of dominance. 26 years between 1990-2016. Nearest was Liverpool with 23 years just prior to Man U.

    • A difference between the Liverpool and Manchester reigns was that Liverpool spent 22/23 of their seasons as the premier team. Manchester only spent 20/26 of their seasons, being usurped by Liverpool, Arsenal, and Chelsea throughout.

    • Only one 7 year period in the history of English football saw three teams establishing dominance. The period ending in 2008.

    • No football was played during either of the great wars.


    Lastly, for fun, here is the all-time table for top flight English Football.


    Edit: Made a couple of errors. Here is a slightly updated version of the main post. I forgot to extend Chelsea and MC's reigns to their beginnings initially. Don't have time to add all the bells and whistles like the main post so this can suffice.

    Also somehow forgot Chelsea from the table above, so that link has been changed to remedy the situation.

● ● ●

Staleness of the top 100 Reddit posts 16.01.2019 [OC]

Data Is Beautiful

COMMENTS

COMMENTS

● ● ●

Timeline of colour popularity of the cars being sold in the Netherlands starting from 1900 and predicted until 2020 (source in comments).

Data Is Beautiful

COMMENTS

  • Mrdontknowy
    2 points Jan 14,2019, 4:08pm

COMMENTS

  • Mrdontknowy
    2 points Jan 14,2019, 4:08pm

● ● ●

Trump Addresses The Nation Word Cloud [OC]

Data Is Beautiful

COMMENTS

COMMENTS

● ● ●

[OC] Stations with direct rail connections to London (left) Vs Manchester (right).

Data Is Beautiful

COMMENTS

  • OC-Bot
    1 points Jan 09,2019, 8:07pm

    Thank you for your Original Content, /u/kwn2!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • Pete_J
    5 points Jan 09,2019, 9:45pm

    Fascinating visual. Looks a bit like fireworks. Personally, I would love to have the # or connections appearing on the visuals for comparison. It's hard to just compare lines. Can't wait to see the complete set. Maybe you could run this for other national hubs (France, Madrid, Copenhagen, NYC, etc.) and compare them alongside London to gather understanding of the usage of rail as public transport in those cities.

  • kwn2
    3 points Jan 09,2019, 6:05pm

    In answer to this tweet, with early attempts and more data about what I was doing in the replies. Data scraped from timetabling websites using Python and Selenium, plotted in kepler.gl . Scripts and data used are here.

COMMENTS

  • OC-Bot
    1 points Jan 09,2019, 8:07pm

    Thank you for your Original Content, /u/kwn2!
    Here is some important information about this post:

    Not satisfied with this visual? Think you can do better? Remix this visual with the data in the citation, or read the !Sidebar summon below.


    OC-Bot v2.1.0 | Fork with my code | How I Work

  • Pete_J
    5 points Jan 09,2019, 9:45pm

    Fascinating visual. Looks a bit like fireworks. Personally, I would love to have the # or connections appearing on the visuals for comparison. It's hard to just compare lines. Can't wait to see the complete set. Maybe you could run this for other national hubs (France, Madrid, Copenhagen, NYC, etc.) and compare them alongside London to gather understanding of the usage of rail as public transport in those cities.

  • kwn2
    3 points Jan 09,2019, 6:05pm

    In answer to this tweet, with early attempts and more data about what I was doing in the replies. Data scraped from timetabling websites using Python and Selenium, plotted in kepler.gl . Scripts and data used are here.

  • kwn2
    3 points Jan 09,2019, 6:45pm

    Less polished images of Birmingham and Liverpool, as well. Screenshots rather than proper images, kepler.gl's export function is broken on Chrome and I've only got my phone with me.

    My scripts are currently running getting the data for Leeds, Sheffield, Edinburgh, Glasgow and Southampton as well.

    Edit: replaced images as something went screwy.

● ● ●

Ages of members of congress at the beginning of each new congress (1789-2019) [OC]

Data Is Beautiful

COMMENTS

COMMENTS