Retirement Simulator #6: Sampling from our custom market distribution

Published Dec. 31, 2021, 9:30 p.m.

Welcome back!  Last time we created an optimized fit to our data.  This time we will take our x and y values and use them to create what we will call a "Sample Distribution".  Think of it like this:  We had about 100 data points.  We started by taking those exact data and putting them into a hat, and drawing numbers randomly out of the hat.  Then we made a model out of the distribution of the 100 numbers.  Now we want to scale up our model so we can put 10,000 model-based numbers in the hat from which to draw.  As a shark hunting captain who was using a hat instead of a boat might say: "Your're gonna need a bigger hat."

You're going to need a bigger hat.We are going to create a definition that takes a list of x and y values and creates a distribution of values from which to draw.  Recall that we made for ourselves a list of x values and then proceeded to use our function and optimized parameters to make a companion list of y values.  For our "numbers in a hat" model, the y value (say 100) represents the number of pieces of paper that go into the hat with a certain x value written on it (say 0.07).  The peak of our gaussian distribution corresponds to the most probable x value.  Way off the peak (say x = -0.45)  the y value might only be 1-- meaning that only 1 piece of paper with the number -0.45 goes into our big hat.  One problem:  We have a smooth continuous distribution that contains non-interger numbers.  (i.e. you can't put 50.532 scraps of paper into the hat.)  What do we do about that? 

I propose doing 2 things:  First we will normalize our y value data so that the maximum value of the distribution is equal to 100.  This scales our y values up (in this case).   Second we will use python to re-cast our y-values as integers.  The first part of that (within our definition would look like the following:

def makesampledistribution(xvals, yvals):
    ymax = max(yvals)
    norm = 100/ymax
    yvals = [norm*y for y in yvals]

We define a makesampledistribution that takes a list of x and y values.  Then we determine the maximum y value in our y list.  Then we make a normalization based on that y-max.  Then we re-cast our y values my multiplying each value by this normalization.  So far so good?

Ok.  Next we need to define an empty list to fill.  This is our new bigger "hat".  We'll just call it x_dist:

def makesampledistribution(xvals, yvals):
    ymax = max(yvals)
    norm = 100/ymax
    yvals = [norm*y for y in yvals]
    x_dist = []

Now, within our defininition, we need to

  • loop through our y-values
  • cast y values as integers
  • fill our list with each x-value a number of times equal to y
def makesampledistribution(xvals, yvals):
    ymax = max(yvals)
    norm = 100/ymax
    yvals = [norm*y for y in yvals]
    x_dist = []
    for i,y in enumerate(yvals):
        inty = int(y)
        if inty > 0:
            for j in range(inty):
                x_dist.append(xvals[i])
    return x_dist

By using the enumerate feature to loop through our y-values, we already have the index (i) of the x-value that we want.  By using some logic we avoid confusing python by asking it to do something zero times.  At the end, we must return our new x_dist.

Then outside of our definintion we can use it to make the x_distribution:

x_dist = makesampledistribution(x_vals, y_vals)

Recall that x_vals and y_vals were the actual names of the lists we made in a previous tutorial.

Now we can confirm our distribution is as we wanted it to be by plotting it (We should make our bins like our x_vals or else, python will make bins for us):

plt.hist(x_dist, bins=x_vals)
plt.show()
sys.exit()

If you plotted a nice big gaussian histogram centered at about 0.09 and having an amplitude of 100 then you did everything correctly!  Next time we will show how we can start sampling from this new fancy distribution that we have labored over.  See you there.

skip_nextRetirement Simulator #7: Organize multiple plots with subplots! and learn about plt.pause()
  • Retirement Simulator Introduction and Overview

  • Retirement Simulator #1: March through time with datetime and put money under the mattress!

  • Retirement Simulator #2: Apply an interest rate and verify it!

  • Retirement Simulator #3: Use a Gaussian distribution of rates!

  • Retirement Simulator #3+: Verify our Gaussian distribution!

  • Retirement Simulator #4: Make a US market rates distribution

  • Retirement Simulator #5: Fit the Market Data with scipy curve_fit

  • Retirement Simulator #6: Sampling from our custom market distribution
    (currently viewing)
  • Retirement Simulator #7: Organize multiple plots with subplots! and learn about plt.pause()

  • Retirement Simulator #8: Multiverse Investing: Simulate 1000 random investors

  • Retirement Simulator #9: Track averages and final outcomes

  • Retirement Simulator #10: Test and Compare Different Retirement dates (And make it pretty!)