Once again, we we be swig the Lehman Baseball Database ill toes weeks’ assignment.

data mining


Once again, we have swigged the Lehman Baseball Database ill toes weeks’ assignment. We want to create a graph to visualize tile relationship between how many RBIs a player has in a given year and the salary Thai same year, Unfortunately, Lehman provides the relevant data in two separate files. We will need to open the data To create the desired graph. We have provided you with two CSV tiles named Datting.csv and salanes.csv. 

The first, of course, contains the annual batting performance data from last week’s assignment. The Second contains salary curate ‘or all Major Legalize Baseball players dating back to the year 1985. You should download both files ND place them in the same directory as yogi Python Cackle for tries assnrne’i. Carbon! Your program should work wit9 any similarly formatted cave files. Reading in the data First, you will niece to read in tire coati files. In battWg.csv, we were interested ¡rig three columns this week: ‘p4ayerlD’, ‘yearly’ and R81”. Your program’ should ski the header Kithe File and completely ignore any lines where the RBI column does “at certain a digit. You should create an accumulator ddlonary called ‘playeryear2rtiis” that maps tops of the layered string and yearly string to an integer representing the number of RBIs for that Player and year. Just like last week, as you iterate through rout tire fIle. You should update the playeryear2rbes dictionary. 

In salanies.csv, we are also Interested Ki three columns: played’. ‘Yearly, and serially’. You should create a dictionary called playeryear2salary which maps a tape of the played String and yearly siring to an integer representing tile player’s salary for that year. Preparing the data After creating the two dictionaries, we need to join the data together and prepare it for plotting we need to create two lists to hold the x and y values in opt. We will call them Saline’s’ and retire. You should iterate over as of the keys or playeryear2salary aria use tries keys to find the salary and rib data for each player arid each year you should skip any Player, year combinations that are not represented in tire playeryear2rbes dictionary (In other words, the data for that player and year must be in both dictionaries.) When you rave tire salary and rib data for a given player and year. You should append them to the appropriate best. Once ¡lies loop IS completed, you should end up with two lists of Integers that have tile exact same size. Each element in ‘salaries will be en integer containing a player’s salary fora certain year. The corresponding element in moa (the one with the Sane ideal will have the RBIs of that same player and year. Plotting the data Plot tire data using this pilot plot’ function. Make sure to import matplotlebpyplot as pit. On the X axis, plot the ‘salaries. On the y axis. Loot the ‘roes.” Use the format string ‘k.’, to Plot the points as black dots. Title your plot “Salary vs. RBIs n MLB. Label the X axis Salary’ and the Y axis “RB’s’.

Related Questions in data mining category