PythonResource.net

Python Pandas and Spreadsheets #5

Published Nov. 3, 2020, 9:44 p.m.

In the last tutorial we managed to read 66 spreadsheets, find the and extract the row with the NY Yankees data and then compile that data into a new dataframe. We then plotted the batting average over time. This time I want to write our new dataframe to a new xlsx spreadsheet and excercise a few new options that you may want to use when you create spreadsheets.

The first thing that we want to do is to copy / paste our former definition called "write_spreadsheet" and then modify it for our new purposes. We'll call the new definition "write_team_spreadsheet" and we will pass into it a dataframe, a sheetname and a new excel file name. We should adjust the variable name in the line defining 'writer' to use the 'newexcelfilename'. Next we will remove the 'for' loop and we will add a keyword value to our 'to_excel' function. By adding 'index = False', we are telling this function that we don't want pandas to print an 'index' column in our excel file. I believe that most people do not naturally have an 'index' column when they create their own spreadsheets, so I wanted to show you how to exercise that option. The writer.save() an writer.close() lines stay the same so that our new definition should look like the following:

def write_team_spreadsheet(df, sheetname, newexcelfilename):
    writer = pd.ExcelWriter(newexcelfilename+".xlsx", engine = "xlsxwriter")
    
    df.to_excel(writer, sheet_name = str(sheetname), index=False)
    
    writer.save()
    writer.close()

Now, all that is left is to call our new function in the correct place and with the right variables. After the last 'for' loop (the one that builds the 'new_df') we can add:

write_team_spreadsheet(new_df, col_val, f'{col_val}_1954_2019')

Here we are passing in the new dataframe (new_df), the team name (= col_val) as the sheetname and an 'f-string' for the newexcelfilename.

Now when we execute our script it creates 'NYY_1954_2019.xlsx' which we can open and see all of the data including the column of years at the end.

In the next tutorial, we will generalize this process to work with all the teams that have existed from then to now, and we will make sure that we can handle special cases like teams that don't have data for some numbers of years. Thanks for watching and see you in the next tutorial!

Python Pandas and Spreadsheets #6

Python Pandas and Spreadsheets #1
Python Pandas and Spreadsheets #2
Python Pandas and Spreadsheets #3
Python Pandas and Spreadsheets #4
Python Pandas and Spreadsheets #5
(currently viewing)
Python Pandas and Spreadsheets #6