By the end of this assignment, you should have:
As we’ve discussed in class, trees are a fundamental data structure used to model all sorts of hierarchical data. Often, a tree represents a hierarchical categorization of a base set of data, where the leaves represent the data values themselves, and internal nodes represent groupings of this data. Files on a computer can be viewed this way: regular files (e.g., PDF documents, video files, Python source code) are grouped into folders, which are then grouped into larger folders, etc. Moreover, usually the data values are not all of equal “weight” or “importance.” For example, in our file system one of the most important properties of a regular file is its size; certainly not all files have the same size!
Ais a visualisation technique to show a tree’s structure according to the weights (or sizes) of its data values, using nested rectangles to show subtrees, scaled to different dimensions to reflect the proportional sizes of each piece of data. Treemaps are used in many contexts. Among the more popular uses are news headlines, various kinds of financial information, and computer disk usage. Here’s an example of a treemap visualisation of files on a computer, from Wikipedia:
Some free programs that use treemaps to visualise the size of files on your computer:
For this assignment, you will write an interactive treemap visualisation tool that you can use to visualise multiple types of hierarchical data. We will make the following simplifications for this assignment:
This assignment requires thelibrary, which you installed as part of . We recommend working through the Additional Exercise of that lab to familiarize yourself with pygame.
Please download the following starter code files and save these into yourfolder.
Sample testing files:
Your work for this assignment consists of writing two main components:
Please read through the following sections carefully to understand what is required for this program.
In order to use a treemap to visualise data, we first need the actual data to be stored in a tree. For this program, we have provided anclass to define the common interface that our treemap visualiser will expect for any sort of data that we want to visualise.
This class is similar to the basicclass that you have already worked with, with a few differences. The two most conceptually important ones are discussed here, but there are others you’ll explore when you dig into the code.
First, the constructor is considered private, and should be overridden by each subclass. This is because these trees will be initialized through custom procedures that depend on the data we want to represent. (So a tree representing files on your computer will be created in a different way than a tree representing country population data from the World Bank. You’ll do both on this assignment!)
Second, the class has ainstance attribute that is used by the treemap algorithm to visualise the tree. is defined recursively as follows:
This is a little abstract. Let’s now discuss one concrete type of hierarchical data that you will be responsible for modelling on this assignment.
Consider a folder on your computer (e.g., thefolder you have in your folder). It contains a mixture of other folders (“subfolders”) and some files; these subfolders themselves contain other folders, and even more files. This is a natural hierarchical structure, and can be represented by a tree as follows:
The familiarattribute will always store the name of the folder or file (not its path).
Theattribute of a leaf is simply how much space (in bytes) the file takes up on the computer. The of an internal node – remember, representing a folder – corresponds to the size (in bytes) of all files contained in the folder, including its subfolders. (Computer folders are actually a little bit bigger than the sum of their contents’ sizes, but we’ll use this simplified version for this assignment to make it easier to work with.)
In code, you will represent this particular data using theclass, which is a subclass of .
The next component of our program is the treemap algorithm itself. This is an operation that takes a tree and a 2-D area to fill, and returns a list of rectangles to render, based on the tree structure andattributes.
For both the 2-D area and the output rectangles, we’ll use the pygame representation of a rectangle, which is a tuple of four integers, where (x, y) represents the upper-left corner of the rectangle. Note that in pygame, the y-axis points down, and so the lower-right corner of the rectangle has coordinates (x + width, y + height).
There are many variations of the treemap algorithm, and we’ve chosen one for this assignment. For simplicity we’ll use “size” to refer to theattribute in the algorithm below.
Input: an instance of some kind of, and a pygame rectangle (the display area to fill).
Output: a list of pygame rectangles and each one’s colour, where each rectangle corresponds to a leaf, and has area proportional to the leaf’s.
If the tree has size 0, return an empty list.
If the tree is just a single leaf with positive size, return a list containing just the single rectangle that covers the whole display area, as well as theof that leaf.
Otherwise, if the display area’s width is greater than its height: divide the display area into vertical rectangles, one smaller rectangle per non-zero-sized subtree, in proportion to the sizes of the subtrees.
Example: suppose the input rectangle is (0, 0, 200, 100), and the input tree has three subtrees, with sizes 10, 25, and 15.
Note that the three rectangles’ edges overlap, which is expected.
You can assume that every non-zero-sized leaf will have a large enough size that its computed rectangle has a width and height >= 1.
If the height is greater than or equal to the width, then make horizontal rectangles instead of vertical ones, and do the analogous operations as in step 3.
Recurse on each of the subtrees, passing in the corresponding smaller rectangle from step 3 or 4.
Note about rounding: because you’re calculating proportions here, the exact values will often be floats instead of integers. For every intermediate boundary, always round down to the nearest integer. In other words, if you calculate that a smaller rectangle should be (0, 0, 10.9, 100), round the width down to (0, 0, 10, 100), and make sure the following rectangle starts at (10, 0, …).
However, the final (rightmost or bottommost) edge of the last smaller rectangle should always be equal to the outer edge of the input rectangle. This means that the last subtree might be a bit bigger than its true proportion of the total size, but it won’t be a noticeable difference in our application.
You will implement this algorithm in themethod in .
The code in treemap_visualiser.py runs the treemap algorithm, and then uses pygame directly to render a graphical display to the user.
The pygame window is divided into two parts:
Every rectangle corresponds to one leaf (that has a nonzero) in the tree. If the whole tree has zero (so no rectangles are returned by treemap), the entire screen should appear black.
In addition to displaying the treemap, the pygame graphical display responds to a few different events:
The following two events allow the user to actually mutate the tree, which shows how to turn a static visualisation into a tool for simulating changes on a dataset, and seeing what would happen. Once the user performs one of these events, the tree is no longer in sync with the original data set, and instead represents hypothetical sizes based on the user’s actions.
All changes described below should only change the tree object; so if a rectangle is deleted or its size changes, DO NOT try to change the actual file on your computer!
If the user right-clicks on a rectangle, the corresponding leaf is deleted from the tree. You must update theattributes of this leaf’s ancestors, and the visualisation must update to show the new sizes.
If the user presses the Up Arrow or Down Arrow key when a rectangle is selected, the selected leaf’sincreases or decreases by 1% of its current value, respectively. This affects the of its ancestors as well, and the visualisation must update to show the new sizes.
In the starter code, we have provided both an incomplete version of, and a skeleton of the class that could be used to subclass it and model how much space your files take up on your computer.
You should do two things here:
To complete the second part, you will need to read the Pythonto learn how to get data about your computer’s files in a Python program. We recommend using the following functions:
The fileis a small sample program that works with the module. Feel free to derive your own implementation from this.
You may not use thefunction - it does a lot of the recursive work for you, which defeats the purpose of this part!
Warning: don’t use string concatenation () to try to join paths together. Because different operating systems use different separators for file paths, you run the risk of using a separator that is invalid on the Teaching Lab machines on which we’ll test your code. So instead, make sure to use to build larger paths from smaller ones.
You should add the subtrees in yourin the order they are produced from .
After this step is done, you should be able to instantiate aby calling the constructor on a path, and then inspect the tree’s contents in the debugger. Write doctests and pytests to test the attributes of your objects; having these tests now will save you a lot of time and energy in later tasks.
Your next task should be to implement themethod in , which is the actual method responsible for computing the treemap representation of a tree. Your implementation must match the treemap algorithm described above.
Once again, your approach should be recursive; make sure you understand the role of theparameter before writing any code.
We strongly recommend implementing the base case before starting the recursive step, to give yourself a check that you understand what the method is supposed to do.
For the recursive step, a good stepping stone to think about is when the tree has height 2, and all leaves have the samevalues. In this case, the original rectangle should be partitioned into equal-sized rectangles.
Warning: make sure to follow the exact rounding procedure in the algorithm description above. If you deviate from this, you might get numerical errors that will lead to incorrect rectangle bounds.
The basic treemap algorithm does not require pygame at all: you can check your work simply by instantiating aobject, calling on it, and checking the list of rectangles returned. This is one way we’ll be testing your code. Write your own doctests and pytests for . This is the best way to check edge cases (e.g., making sure you’re rounding properly).
Now that you are able to both create concrete instances ofand compute a treemap, you are ready to enjoy the fruit of your labour by showing a pretty visualisation of the treemap.
Open, and complete the function. You may need to review Lab 7 to practice rendering rectangles using pygame.
Make sure you use the provided constants to show both the treemap and the text display at the bottom of the screen.
Note: you might notice thatis organized into a set of functions, rather than a class. Please do not change this design in any way. We have deliberately not used a class here to illustrate the idea that not all functions must be grouped together in a class. Part of this course is getting used to thinking about the different ways of designing programs, after all!
In the main block of beautiful rendering of your chosen folder’s files in the pygame window!, uncomment the call to , and put in a path. Try running the program: if all goes well, you should see a