Retirement Simulator #4: Make a US market rates distribution

Published Dec. 31, 2021, 9:26 p.m.

Welcome to the 4th tutorial!  Bones are built!  Now time to add muscle and organs (we already had an appendix?)  To do that we want to use real world market data from which to get a "distribution" of rates.

When I googled it I found these links:

https://www.macrotrends.net/2526/sp-500-historical-annual-returns
https://www.macrotrends.net/1319/dow-jones-100-year-historical-chart

You can use any data you like.  From these websites, I copied the data and then pasted it into a text file.  (I formatted it to get rid of everything except the year and the rate). The data for the first file looks like the following:

2021 ,20.15
2020 ,16.26
2019 ,28.88
2018 ,-6.24
2017 ,19.42
2016 ,9.54
2015 ,-0.73
2014 ,11.39
2013 ,29.60
2012 ,13.41
2011 ,0.00
2010 ,12.78
2009 ,23.45
2008 ,-38.49
2007 ,3.53
2006 ,13.62
2005 ,3.00
2004 ,8.99
2003 ,26.38
2002 ,-23.37
2001 ,-13.04
2000 ,-10.14
1999 ,19.53
1998 ,26.67
1997 ,31.01
1996 ,20.26
1995 ,34.11
1994 ,-1.54
1993 ,7.06
1992 ,4.46
1991 ,26.31
1990 ,-6.56
1989 ,27.25
1988 ,12.40
1987 ,2.03
1986 ,14.62
1985 ,26.33
1984 ,1.40
1983 ,17.27
1982 ,14.76
1981 ,-9.73
1980 ,25.77
1979 ,12.31
1978 ,1.06
1977 ,-11.50
1976 ,19.15
1975 ,31.55
1974 ,-29.72
1973 ,-17.37
1972 ,15.63
1971 ,10.79
1970 ,0.10
1969 ,-11.36
1968 ,7.66
1967 ,20.09
1966 ,-13.09
1965 ,9.06
1964 ,12.97
1963 ,18.89
1962 ,-11.81
1961 ,23.13
1960 ,-2.97
1959 ,8.48
1958 ,38.06
1957 ,-14.31
1956 ,2.62
1955 ,26.40
1954 ,45.02
1953 ,-6.62
1952 ,11.78
1951 ,16.46
1950 ,21.78
1949 ,10.26
1948 ,-0.65
1947 ,0.00
1946 ,-11.87
1945 ,30.72
1944 ,13.80
1943 ,19.45
1942 ,12.43
1941 ,-17.86
1940 ,-15.29
1939 ,-5.45
1938 ,25.21
1937 ,-38.59
1936 ,27.92
1935 ,41.37
1934 ,-5.94
1933 ,46.59
1932 ,-15.15
1931 ,-47.07
1930 ,-28.48
1929 ,-11.91
1928 ,37.88

The second file is thus:

2021 ,  14.13 
2020 ,  7.25 
2019 ,  22.34 
2018 ,  -5.63 
2017 ,  25.08 
2016 ,  13.42 
2015 ,  -2.23 
2014 ,  7.52 
2013 ,  26.50 
2012 ,  7.26 
2011 ,  5.53 
2010 ,  11.02 
2009 ,  18.82 
2008 , -33.84 
2007 ,  6.43 
2006 ,  16.29 
2005 ,  -0.61 
2004 ,  3.15 
2003 ,  25.32 
2002 , -16.76 
2001 ,  -7.10 
2000 ,  -6.17 
1999 ,  25.22 
1998 , 16.10 
1997 , 22.64 
1996 , 26.01 
1995 , 33.45 
1994 , 2.14 
1993 , 13.72 
1992 , 4.17 
1991 , 20.32 
1990 , -4.34 
1989 , 26.96 
1988 , 11.85 
1987 , 2.26 
1986 , 22.58 
1985 , 27.66 
1984 , -3.74 
1983 , 20.27 
1982 ,  19.60 
1981 ,  -9.23 
1980 ,  14.93 
1979 ,  4.19 
1978 ,  -3.15 
1977 ,  -17.27 
1976 ,  17.86 
1975 ,  38.32 
1974 ,  -27.57 
1973 ,  -16.58 
1972 ,  14.58 
1971 ,  6.11 
1970 ,  4.82 
1969 ,  -15.19 
1968 ,  4.27 
1967 ,  15.20 
1966 ,  -18.94 
1965 ,  10.88 
1964 ,  14.57 
1963 ,  17.00 
1962 ,  -10.81 
1961 ,  18.71 
1960 ,  -9.34 
1959 ,  16.40 
1958 ,  33.96 
1957 ,  -12.77 
1956 ,  2.27 
1955 ,  20.77 
1954 ,  43.96 
1953 ,  -3.77 
1952 ,  8.42 
1951 ,  14.37 
1950 ,  17.63 
1949 ,  12.88 
1948 ,  -2.13 
1947 ,  2.23 
1946 ,  -8.14 
1945 ,  26.65 
1944 ,  12.09 
1943 ,  13.81 
1942 ,  7.61 
1941 ,  -15.38 
1940 ,  -12.72 
1939 ,  -2.92 
1938 ,  28.06 
1937 ,  -32.82 
1936 ,  24.82 
1935 ,  38.53 
1934 ,  4.14 
1933 , 66.69 
1932 , -23.07 
1931 , -52.67 
1930 ,  -33.77 
1929 ,  -17.17 
1928 ,  49.48 
1927 ,  27.67 
1926 ,  4.05 
1925 ,  25.37 
1924 , 26.16 
1923 , -2.70 
1922 , 21.50 
1921 , 12.30 
1920 , -32.90 
1919 ,  30.45 
1918 , 10.51 
1917 , -21.71 
1916 , -4.19 
1915 , 81.49 

Whoa.  1915! What a year to invest in the market.  Anyway,  You can copy one or both of these into respective files.  We will read this data into our code.  Truth be told, the year is not a relavant piece of data for our purposes.  If you want to challenge yourself, you can erase the year data and figure out how to read in the rate data yourself.  However, if you follow the video I will show you how to read them both in.

Save the files as text files in the work folder area.  I chose the second one (dow market returns) and save it as "dow_market_returns.txt".  Then I create a varible in my code with that file name:

filename = "dow_market_returns.txt"

Then I do my three-line trick for reading in a text file:

f = open(filename, 'r')
lines = f.readlines()
f.close()

Now we have the list "lines".  To get the data we need to loop through the lines and "cast some types".  That would look as follows:

years=[]
rates = []
for line in lines:
    year, rate = line.split(',')
    years.append(int(year))
    rates.append(float(rate)/100)

We established two empty lists (years and rates).  Then we loop through the lines, spliting each line into the year and the rate.  We then append those to our lists while casting each one into the 'type' that we want (int or float).  For the rate we need to divide the number by 100 to get the number that we actually want.

Now for a first step toward using real data we will use another function from the random class: choice.  the random.choice function selects a single value (at random) from a list.  We happen to have a list of realistic market rates.  So we can simply replace the random.gauss function with this new one like so:

year_check = todays_date.year
while todays_date < retirement_date:
    retirement += add_to_retirement
    retirement *= (1+paycheck_interest_rate)
    todays_date += pay_frequency
    this_year = todays_date.year
    if this_year > year_check:
        # annual_interst_rate  = random.gauss(mean, sigma)
        annual_interst_rate  = random.choice(rates)
        paycheck_interest_rate = get_paycheck_interest_rate(annual_interst_rate, paychecks_per_year)
        print(this_year, annual_interst_rate)
    year_check = todays_date.year

Now we are using real data and so perhaps our simulation is a bit more realistic.  However, if this rubs you the wrong way, then you are not alone.  We don't REALLY want to use a list of OLD rates.  We want the list of old rates to inform a model from which we can determine NEW rates.   And that is what we will begin to do in the next video. But before we finish, we can take our test from the 3+ video and change it slightly so that we now plot our list of rates.  The list of rates is much smaller but should be enough to show something like a gaussian shape. The lack of statistics make it a bit underwhelming for a traditional gaussian shape.  Also, it may be that a gaussian only approximates the actual shape.  Whatever the case may be, a guassian is as good a guess as any and so next time we are going to fit this data with a gaussian function and use the 'optimized parameters' to our fit to make our own sample distribution.  See you there!

skip_nextRetirement Simulator #5: Fit the Market Data with scipy curve_fit
  • Retirement Simulator Introduction and Overview

  • Retirement Simulator #1: March through time with datetime and put money under the mattress!

  • Retirement Simulator #2: Apply an interest rate and verify it!

  • Retirement Simulator #3: Use a Gaussian distribution of rates!

  • Retirement Simulator #3+: Verify our Gaussian distribution!

  • Retirement Simulator #4: Make a US market rates distribution
    (currently viewing)
  • Retirement Simulator #5: Fit the Market Data with scipy curve_fit

  • Retirement Simulator #6: Sampling from our custom market distribution

  • Retirement Simulator #7: Organize multiple plots with subplots! and learn about plt.pause()

  • Retirement Simulator #8: Multiverse Investing: Simulate 1000 random investors

  • Retirement Simulator #9: Track averages and final outcomes

  • Retirement Simulator #10: Test and Compare Different Retirement dates (And make it pretty!)