Golf Charts — Another Take

Inspired by Dick’s interest in charts, I took a look at how I would have presented the data.

Some overall thoughts. I used Office 2010 beta for the charts shown below. There was not much, if any, difference between what I would have got with Excel 2007. I also stayed with the default Office theme. And, for what it is worth, I almost never use a non-theme color any more. By staying within a theme, I can change the look of the entire workbook by simply changing the theme.

In creating charts, I tend to use colors within the text on the chart to document the visual elements. So, in the chart for week 9, the high and low scores and the corresponding text are the same color (red and green).

And, of course, it always helps if you understand, at least to some extent, what is being shown. While I don’t have Dick’s depth of knowledge of golf, I do know that lower scores are better than higher scores. So, I tend to use colors associated with good results (green, for example) for lower scores and colors associated with poor results (red, for example) for high scores.

Week 8: As already noted, a smooth curve is misleading in this case. There are many instances where a smooth line is appropriate but this is not one of them. Further, even a straight line connecting the ranks between two successive weeks is not appropriate. After all, the change in ranking is an abrupt event that happens once a week. Just because someone was ranked #4 one week and #3 the next does not mean s/he was #3.5 halfway through the week. So, if one were to add lines connecting consecutive points they should create a “step chart.” Also, the key metric here is the status once a week. So, showing the markers is important. temp-ddoe-1

Week 9: To show the best and worst rounds for a golfer, I would use use a vertical separation to show the the scores and highlight the range. Of course, instead of a vertical separation one could also use a horizontal separation, but I picked the former. A good chart native to Excel for this kind of visualization is the Stock ‘High-Low-Close’ chart and I made the line thicker. Of course, it is not all that difficult to create one from scratch.
temp-ddoe-6

Week 10: A stacked column chart worked for this particular set of data since all the actual performances were worse than the handicaps. But, what if someone did better than their handicap? How does one show a negative column starting from the top of the lower column? Suppose Jack Hynes shot not a 21 but a 9. Then, we would want the blue column to go to 15 and the red column to start at 15 and drop to 9. But, there is no good way to show that. Instead, I prefer showing the handicap with a single point — which is after all what it is — and then draw a line up or down to show the actual result. And, I have used error bars for this kind of work for a long time. I modified Dick’s data to pretend that Jack did shoot 6 below his handicap. The chart below is a XY Scatter chart with two series, both of which represent the handicap. The first series has positive error bars formatted red. The 2nd series has negative error bars formatted green.
temp-ddoe-3

Week 11: As with Week 8, the important metric changes only at specific points along the x axis (Hole in this case). Using connecting lines without markers is somewhat misleading. After all, Miller did not have a half-bogey while walking to the first tee. In this case I decided to forgo even step connector lines and use just markers. I also thought it did not really help to show the performance relative to the par for the entire round. After all, a golfer doesn’t start the day with a hole zero score of par for the round (36 in this case). So, I chose to show the cumulative performance relative to par, represented by zero. The y-axis title documents the significance of above-zero and below-zero scores. If we wanted, we could add the cumulative player score as a data label. Finally, the default square marker looks much larger than the default diamond marker. So, I reduced the size of the former by two units.
temp-ddoe-51

Week 12: I have no idea what the 2 charts represent and lacking a golf context for them I left this week’s chart alone.

Week 13: Dick expressed some frustration at the ‘Upset Saturday’ chart and I can sympathize. As Jon noted, using dark colors and losing the gridlines would have helped. But, here’s a more important point. In most cases showing a lot of data results in nothing but confusion. But, there are exceptions. One such instance is when I show one of the metrics for a management simulation exercise I conduct on a regular basis. One of the results of this simulation is the resulting market price of a product in multiple markets (6 - 10) over several periods (8 - 12). The resulting chart looks very confusing and when the audience first sees it it invariably invokes an audible response. However, when I explain what the chart represents it makes sense to the participants. What it does represent is this: The prices in the markets start off all over the place. They also fluctuate each period. However, the cumulative effect of 10 periods of decision making and learning is that they are slowly converging to the theoretical optimum of about 50, even though this is not known to the participants! Some markets learn very smoothly (see the bright blue line with the star marker) others take a stumble and then pick up smoothly (the pink squares) while others fluctuate a bit but are eventually closing in on the optimum (the orange circles).
temp-ddoe-7

Week 14: Here again, I would use a chart with vertical lines to show the separation between the high and the low. To include a measure of the average, I chose the median rather than the mean.
temp-ddoe-81

Week 15: I would use steps to indicate the change in ranking rather than straight line connectors for the reason already mentioned above.
temp-ddoe-9

In the final chart, I have no idea what Dick wanted to show. But, here’s how I would create a bar chart on different scales such that it is easy to align. Start with a stacked bar chart. The first series is the actual value of the average scores. Then, we add a dummy series so that the total of the average plus the dummy series is a constant. I tried different numbers for aesthetic appeal before finally settling on 10. Then, I scaled the total scores down so that they don’t overwhelm the average scores. After some trial and error, I picked 9 as the largest possible score. So, all the scores are scaled by actual-score / max-score * 9. I plotted this as the 3rd stacked series. Finally, it is not possible to put data labels for a bar chart on the ‘outside end.’ So, I added a dummy series all with a value of 1 as the fourth stacked series.

The two dummy series were formatted to be ‘invisible.’ I also added the average values as the data labels for the 2nd series (formatted to the ‘inside base’ and the total scores as the data labels for the 4th series. I also formatted the horizontal (value) axis so that the vertical (category) axis crosses at a value of 9.temp-ddoe-a

Download the Excel 2010 XLSX workbook

5 Comments

  1. into says:

    The link appears to be broken…

    /T

  2. Jon Peltier says:

    Tushar -

    Good enhancements to the charts. For week 10, instead of hassling with error bars, which particularly sucks in 2007, plot the two series (handicap and actual) as line chart series, hide the connecting lines and markers, and use up-down bars to show the difference.

  3. I love the vertical range stuff, like in Week 9. I have some other comments but I want to preface them by saying that I think everyone makes better charts than me, so they are genuine inquiries although they may come off as smart ass.

    Week8: I get the logic of steps, but I don’t get the feeling that I want to elicit when I look at your chart. I don’t want people to use the chart to determine what M’s ranking was on week 6 - a table would be better for that. I want the reader to feel something - specifically to feel that K’s downward momentum and M’s upward momentum make it inevitable that M will be above K shortly. So while a team may not be 3.5 at mid-week, it’s the trend from 4 to 3 that I’m interested in. Hmmm, trend. Maybe that’s a hint that I should have hidden the series and just shown the trendline.

    For week 12: Every week two teams play each other, for a total of four golfers. Generally, someone shoots well, someone shoots badly, and the other two are average. Occasionally, everyone shoots poorly or well. I was trying to show the most extreme weeks when everyone in the foursome shot well.

    In the last chart: Since every team has two golfers, I wanted to show how each individual contributed to the team effort. The stacked bar with the dummy series is great. I particularly like the labels in the middle - much more readable.

    I think all charts are sales jobs. They are abstractions from the raw data in order to make a point. If you give me a chart and ask me how to improve it, my first question has to be “what do you want it to say”, doesn’t it? That doesn’t seem to be the message I read on the data visualization blogs. They seem to be more interested in having the graphic match the data exactly. Why not just show the data then? I seem to be missing a key piece of theory.

  4. Tushar Mehta says:

    Dick:

    A graphical representation of data is indeed a way of abstracting from the data and presenting information in a different format. How much of a sales job it is depends on the intent of the presenter and it is no more or less a sales job than using numbers.

    Any time one summarizes (or otherwise condenses or abstracts) raw data there is the *potential* for biasing the analysis. But, it is almost always necessary. Raw data can be, and usually is, overwhelming and to make sense of it it becomes necessary to “distill” that data. Obviously, that means some information is lost but it is up to the analyst to retain all information pertinent to the current analysis. If not, the results will be biased.

    Look at something as simple as the mean and the median. Both are measures of average but tell very different stories. The mean (calculated in Excel through the AVERAGE function) is much more sensitive to outliers. So, a few star athletes on a team who are paid large salaries will make the mean much larger than the median, which is the value that divides the population in half. The same applies to real estate prices and I suspect the job offers in a graduating class. Or, just look at any political debate in the U.S. (and I suspect elsewhere). The left and the right crunch the same set of numbers in a way that supports their respective ideological viewpoint.

    But keeping politics out of this discussion…

    The same benefits and limitations of statistical aggregation of data also apply to visual representation of data. It can be a very powerful tool but it can also be misleading. Here are three examples where one can visually see stuff that is impossible to see in the raw data and almost impossible to see even in aggregated and summarized data.

    On ‘Dashboard example – conditional colors of shapes’ (http://www.tushar-mehta.com/excel/charts/0301-dashboard-conditional%20shape%20colors.htm) I show how to conditionally format map elements. The example shows population trends in the U.S. across time. I find it hard to imagine how a table would reveal the overall trend — without knowing the trend a priori.

    In your post on Ego Charts (http://www.dailydoseofexcel.com/archives/2009/02/12/ego-charts/) you pointed to the post at http://charts.jorgecamoes.com/focus-context-bar-chart-skyscraper/. The author of that post critiqued certain instances where a charts showed ‘all’ the data. But, that kind of chart can reveal important trends that would not be obvious from a chart with reduced data or a table. With the exception of the 1st and the last states, there is no abrupt change in poverty rates when the states are ranked by the poverty rate. But, imagine that Oregon and Florida had poverty rates comparable to Ohio (or, for that matter South Carolina). Then, we would see an abrupt shift from the SC rate to the OH rate. That might (should?) be the motivation for further research as to why.

    Something that brought together several of the issues mentioned above happened on a consulting project I did for a power utility. Its customer service center had been getting hammered with criticism of long wait times when customers called in with problems. It did not even understand the cause for the criticism since the average wait time was under 30 seconds! After struggling with the megabytes of data it gave me — and discovering nothing substantial — I plotted the raw data based on the utility’s definition of response categories. And, lo and behold. Starting about 6 months earlier, the wait time for all but two of their categories improved dramatically. But the remaining categories jumped to several minutes, with spikes that were even worse.

    Initially, no one in the utility would accept the results of the graph. After some back-and-forth the VP sponsoring my project called a break. On resumption of the meeting, he announced that he had found a possible cause. It turned out that the center tried to improve average performance by reconfiguring the number of employees who answered different types of calls. Some of the employees working on two categories that had relatively few calls were reassigned to the other categories. This change was monitored only by the average wait time and that improved a lot.

    As it turned out, the categories that saw the waits get dramatically worse were those that dealt with complex customer issues. Since each call itself took a long time, the compounding effect on the wait times was dramatic.

    Could this performance shift be detected with something other than a chart? Possibly. But, without knowing what we were looking for what numbers in what tables would we use? And, in any case, I doubt numbers in a table could possibly make the performance impact as stark or dramatic as a series of squiggles that suddenly shifted up (or down) on a chart.

  5. Ross says:

    Hi,
    Pingo must be down,
    Here is my take on the charts!
    http://www.blog.methodsinexcel.co.uk/2009/08/18/ddoe-golf-charts-agian-agian/

    I don’t think I’m a charting God… - yet!!! ;-))))

Leave a Reply