• Yet More Chart Geekery

    Dr. Drang doesn’t like my charts. Hmmph. But let’s keep an open mind and find out just how wrong he is:¹

    • First, the choice of tick spacing and positioning makes no sense. If you’re making a political point, you should use four-year spacing aligned with presidential administrations. If you’re not going to use administrations, just label the decades and use tick marks between them. Five-year spacing on the twos and sevens is something clearly decided by software; it’s not how humans think.
    • Second, there should be vertical gridlines. If precision in reading the data is important enough to add a horizontal grid, it’s important enough to add a vertical grid as well. I’ve been seeing a lot of graphs with half-assed grids lately, so I assume this is the default in some graphing software.
    • Third—and I admit this is a personal peeve—there’s no point in tilting the x axis tick labels. It’s just an affectation. The years are tilted 30–35° from the horizontal, which means they take up over 80% of the space that untilted labels would. The savings isn’t worth it, and it draws attention away from the data.

    Here’s the chart replotted using Dr. D’s preferences:

    And here are my responses to his three points:

    • I wasn’t making an explicitly political point in this chart, but yes, sometimes I use odd intervals. However, this is very much not a software default. I am constrained by the software to include the starting year, and what I normally try to do is pick intervals that allow me to show the final year as an endpoint. In this case, using 5-year intervals takes me from 1972 to 2017, which makes it absolutely clear that the chart ends at 2017.² Now, sometimes this means choosing odd intervals. Sometimes it means messing a bit with the start of the chart. And sometimes I just can’t do it because the starting point is important and the span from start to finish is a prime number. I hate when that happens.
    • Vertical gridlines, eh? Maybe. Visually, I’m not a fan of vertical gridlines, but they do help help readers who actually want to look at the data carefully.
    • I use tilted x-axis titles when straight titles won’t fit. The replotted chart above, for example, is hard to read with the numbers so close together. I could use a small font, of course, but I prefer not to since lots of people read this stuff on smartphones. If you watch carefully, you’ll see that I regularly use straight titles when I can. Usually, though, I prefer to show lots of year tickmarks (for the same reason Dr. D likes vertical gridlines), and that means tilting them to make them readable.

    The rest of his post is about dual y-axes, and I’ll just say that I find it unconvincing. He basically suggests taking the chart above and making it into two charts, and then putting one right above the other. But that’s two charts, which is precisely what I’m trying to avoid.

    So what is Dr. D’s real problem with dual y-axes? It’s the same as Kieran Healy’s: “The problem with this kind of plot is the freedom it gives the chartmaker to fiddle with the scales of the axes to make the items being plotted look more or less correlated.” It’s true that an explicitly dishonest chartmaker can take liberties with this stuff, but a dishonest chartmaker can do that with anything. Let’s worry instead about honest chartmakers. My response is simple: yes, you can stretch and shift the axes to make a correlation look better, but there’s nothing wrong with that. It just means you’re plotting a regression of the type y = ax + b, and that’s fine. Every regression involves constants. If you can make it work visually, you can make the algebra work too.

    In any case, I think that charts like this are never meant to be anything more than suggestive. If you want to show a real correlation then you have to do the work to show a real correlation and then justify your belief that it means something. If you do, then a chart like this can be a nice visual illustration, but that’s all.

    ¹This is a joke.

    ²Is this worth it? I think so, because in a chart for a lay audience you really need to make everything explicit or else people will have questions. Did you include the most recent data? Is it adjusted for inflation? Etc. All of this needs to be as clear as possible in the chart itself, not just in the text.

  • Now Is the Time to Be a Deficit Hawk

    Ross Douthat decided to devote his entire year-end “mistakes” column to a single thing this year:

    In the spirit of the longer view, I want to use this confessional column to reach back to the early Obama years, and the arguments I made then that assumed the urgency of deficit reduction, the pressing need for honest liberals to champion major tax increases and for honest conservatives to go all-in for major entitlement reform.

    ….But now I think this reasonable view was wrong. Not completely, in the sense that many of the deficit-reducing policies I supported — means-testing entitlement programs, eliminating tax breaks for the wealthy and upper middle class — I still support, because I think the money involved is presently misspent. But I was wrong in the priority that I gave the deficit relative to other issues, wrong to discern a looming “fiscal precipice,” wrong in some of the criticism I leveled at both George W. Bush and Barack Obama for failing to care enough about balancing the nation’s books.

    Paul Krugman approves, but I’m going to be more grinchy about this. The lesson to learn from the past decade isn’t that deficits are OK. The lesson is that broad fiscal policy is a tool, like anything else, and ideally it should be used to keep the economy on an even keel by running deficits during recessions and surpluses during good times. I have created my own “Kevin Rule” for the ideal deficit, which I am not going to share with you because it’s not to be taken seriously. This is for illustrative purposes only. But I think it provides at least the right sense of how we’ve been doing:

    Roughly speaking, the gray bars show how far wrong we’ve been. During the Great Recession, we needed more deficits. We undershot. Starting around 2011 we did pretty well for a few years. But then we started overshooting on deficits. Right now we should probably be running surpluses of 2-3 percent of GDP.

    Obviously you can argue about how high federal deficits and surpluses should be. In fact, that’s the whole point of this post: to get people talking about just what kind of targets we should have and what they should be based on. Inflation rates? Potential GDP? Bond markets? Or do you think the whole countercyclical theory of deficits is as antiquated as the gold standard, and monetary policy is all that matters? Douthat nods toward this in his column, but it deserves much more explicit discussion.

    For conservatives, the hardest part of all this is understanding what this means in real life: the only practical way of controlling deficits is via taxes. Entitlement reform won’t do it because that’s permanent and long-term. Nor can permanent discretionary spending cuts do the job. You’ll have to argue for those things on their own terms. After all, the whole point of this exercise is that sometimes we need more spending and sometimes we need less. It would probably be a good idea to increase the size of spending stabilizers (i.e., things like unemployment compensation, which automatically go up during a recession), but there’s a limit to how much you can do on that score.

    So that leaves taxes. We can still argue over the best long-term tax structure, but theological arguments about tax rates would have to end. Everyone would have to agree on temporary tax surcharges during good times as well as temporay tax rebates during recessions. This is really the only practical way to keep deficits in the ballpark of where we want them to be.

    I know perfectly well that this change of heart will not happen anytime in the near future. But if you can’t dream on New Year’s Eve, when can you?

  • Binge Drinking and Chart Geekery

    I was slightly under the weather today and had nothing better to do than noodle around on the computer, so let’s do some more chart geekery. This morning a reader sent me a link to this chart in the New York Times about binge drinking:

    That’s one confusing chart! I stared at it for a while, trying to figure out how binge drinking could go down 2 percent among men and up 13 percent among women, resulting in a net change for all people of -4 percent. After about a minute I finally realized that the top bar was only for people age 18-29. Basically, this is such a mishmash of different ages and genders that it’s hard to compare anything. Why not just do this?

    You could pretty easily add other age categories if you want, or a bar for all ages. Or a chart showing both 2002 and 2015 rather than just the change. It’s not like it would take up any more room than the other chart. The data is easily available online, so why not just present it all?

    I dunno. But my reader had a different complaint. Take the bar for women over 50. The number who report binge drinking in the previous 30 days has gone up from 17 percent to 31 percent. What’s the best way to represent that? It’s a difference of 14 percentage points, but 82 percent. I think the latter is more informative, but it’s always confusing when you compare percentages. It’s easy to see that an increase from, say, $17 to $31 is 82 percent, but less intuitive that an increase from 17% to 31% is also 82 percent. But it is.

    POSTSCRIPT: Alternatively, you could lard up your chart with the kitchen sink:

    On the upside, this one shows the absolute percentages of binge drinking reported by every age/gender combination, as well as the growth rate in binge drinking since 2002. On the downside, it’s pretty hard to parse for a casual reader.

    It also has way different numbers than the Times chart. I don’t know why that is. Maybe I bollixed up the data. But it seems pretty straightforward. Here’s the crosstab I used for the 2015 data:

    If 46.84 percent of males reported zero days of binge drinking, then 53.16 percent reported binge drinking at least once. But the Times chart says 33 percent. I don’t know what’s going on.

  • A Drunken Trump Aide Sparked the FBI’s Trump-Russia Investigation

    Here is Donald Trump a few days ago:

    This is Exhibit A in the conservative agit-prop campaign to discredit the Trump-Russia investigation: It was all kicked off by the Steele dossier, which was just a Hillary-funded hit job that the Trump-haters in the FBI used as an excuse to go after him.

    But here’s the thing: Steele shared his dossier with a Rome-based FBI agent in August 2016. In October he briefed a larger group of FBI agents in Washington. But the FBI had quietly begun its investigation three months earlier, in July. Obviously the dossier is not what kicked off the FBI investigation.

    So what did? Today the New York Times tells us:

    During a night of heavy drinking at an upscale London bar in May 2016, George Papadopoulos, a young foreign policy adviser to the Trump campaign, made a startling revelation to Australia’s top diplomat in Britain: Russia had political dirt on Hillary Clinton. About three weeks earlier, Mr. Papadopoulos had been told that Moscow had thousands of emails that would embarrass Mrs. Clinton, apparently stolen in an effort to try to damage her campaign….Two months later, when leaked Democratic emails began appearing online, Australian officials passed the information about Mr. Papadopoulos to their American counterparts.

    ….It is unclear whether Mr. Downer was fishing for that information that night in May 2016….It is also not clear why, after getting the information in May, the Australian government waited two months to pass it to the F.B.I. In a statement, the Australian Embassy in Washington declined to provide details about the meeting or confirm that it occurred.

    Ah yes, George Papadopoulos, one of the charter members of Trump’s foreign policy team formed in March 2016. Back then he was an “excellent guy,” but after he pleaded guilty to a charge of lying to the FBI earlier this year he was immediately derided by Trumpies as a mere “coffee boy” that nobody took seriously.

    Well, who knows about that? But Trump did pick him, and apparently he did get crocked in London and tell the Australian ambassador that the Russians had thousands of pages of compromising emails about Hillary Clinton. In July, sure enough, WikiLeaks released thousands of emails hacked from the DNC server. Apparently this lit a fire under the Australians, who passed along Papadopoulos’s drunken intel to the FBI, and that’s when the FBI began investigating Trump. They were shocked—as anyone would be—that apparently the Trump campaign had advance knowledge of Russian dirty tricks aimed at the Clinton campaign.

    Ten months later Trump fired FBI director James Comey for keeping the investigation alive, and the rest is history.

  • Wait! I Have a Better Dual Y-Axis Chart.

    I blew it! In my post earlier this morning about charts with dual y-axes I picked some miscellaneous data as an illustration. Then I saw a tweet about an article written in October that lays out the relationship between trust in government and murder rates:

    The murder rate since World War II has tracked almost perfectly, as criminologist Gary LaFree has observed, with the proportion of Americans who say they “trust the government in Washington to do what is right” most of the time and who believe that most public officials are honest.

    How about that. But is it true? Here’s a lovely chart with dual y-axes to provide a quick visual check. Note that I’m plotting murder vs. mistrust in government, since that’s supposed to be the actual driving factor:

    It actually fits pretty well until the late-90s, when mistrust started rising again but the murder rate kept going down. (The 9/11 attacks caused a spike downward and improved the fit, but that obviously shouldn’t be taken seriously. The divergence really began around 1998 or so.)

    I have no comment on this, since I haven’t read LaFree’s book and really couldn’t judge it anyway. If anything, I might guess that an increase in murder rates is a factor in driving up mistrust in government, but who knows? As with the lead-crime hypothesis, you need a lot more than a single national correlation to make a case for causality.

    However, it does show the usefulness of dual y-axes in a real world test. If you put these charts side-by-side, it would be hard to see if they match up. If you charted them on a single axis, the murder rate would look like a flat line down around zero. But with a dual axis you can pretty easily get a quick-and-dirty sense of whether there’s anything there. And if you then do the serious analysis to convince yourself that the correlation is real, the chart is a great visual illustration of your thesis.

  • Pick Your Favorite Shutter Speed!

    A couple of days ago I threatened to post a gallery of pictures of the fountain in Trafalgar Square taken at different shutter speeds. That way you can see how they differ and decide which one you like best. Doesn’t that sound like fun? Here you go.

    Just as a note, it’s a little tricky to compare the shortest and longest shutter speeds with the middle two because my camera doesn’t have the range to properly expose them. I adjusted them in Photoshop to look similar to the others, but a more expensive camera would have made them look modestly better.

    Shutter speed: 1/320th of a second

    Shutter speed: 1/4 of a second

    Shutter speed: 1 second

    Shutter speed: 10 seconds

  • Today’s Morning Waker-Upper: The Great Dual Y-Axis Dispute

    Today let’s discuss one of the great blogging controversies of our time. Having dispensed earlier with the Oxford comma (yes) and how to treat the word data (it’s singular), it’s time to take on the great Dual Y-Axis Dispute.

    I’ll illustrate this with a chart I made up. Suppose I want to show that economic growth leads to high employment. Does this do the job?

    This chart does indeed show both GDP growth and employment, but it’s almost impossible to tell if they’re related in any way. To show them both, the chart has to scale all the way to 100 percent, but when the scale is that large you can barely even see the peaks and valleys, let alone whether they’re related. So instead I can do this:

    By using one y-axis for GDP growth (on the left) and another for the employment rate (on the right), you get a good view of how and when each of them has gone up and down. It’s now clear there’s a relationship, as you’d expect, but it’s also not perfect. Why did the huge Reagan expansion produce only modest employment growth? Conversely, why has the modest Obama/Trump expansion produced huge employment growth? Seeing the data presented this way helps to make things clearer and can spur further questions.

    Now, there’s no question that a dual y-axis can be confusing. It’s just not something we’re used to seeing. I always try to make my dual-y charts easier to read by labeling them in different colors so it’s immediately obvious which line goes with which data series. I also do my best to adjust the axes so that the numbers on both sides line up properly with the gridlines.¹

    So the question is: Does the clearer presentation of the relationship make up for the added complexity of the chart? And is there a better way to show it? I’d answer definitely yes to the first question, and usually no to the second. Sometimes there is a better way, but not always. Sometimes it’s either a dual y-axis or nothing.

    And, really, what’s the objection? I’ve been a big fan of chart guru Edward Tufte for decades, and his mantra was to simplify as much as possible and to ruthlessly eliminate “chart junk.” This is good advice, but ever since Tufte became popular it’s become advice that many people take too far (as Tufte himself did later in life, I think). Eventually you get to the point where you’re making it harder to read a chart because it’s become so spare that it lacks the visual cues readers expect. You can eliminate gridlines entirely, for example, but that makes it harder on the reader who wants to look at a chart carefully and get a real sense of the data behind it. When you sacrifice that, you can easily end up with a wiggly curve that’s more just a directional symbol (something is going up, or down, or U-shaped) than a true chart.

    So that’s my take on dual y-axis charts. Yes, they add some clutter and complexity. Yes, they can be confusing to a casual reader. You should do your best to address that. But sometimes it really is the simplest, least cluttered way of making a point. When that’s the case, don’t let either personal dislike or the misplaced authority of Edward Tufte stand in the way of using them.

    ¹FWIW, this is harder than it sounds.

  • Donald Trump Is the First President to Lose a Third of His Staff in Year 1

    Unless Donald Trump suddenly decides to fire the entire Oval Office—and you never know, do you?—he’ll end his first year with a turnover rate among senior staff of 34 percent. That’s really high!

    The red bars are from Kathryn Dunn-Tenpas, a political science professor at the University of Pennsylvania who studies presidential transitions. The shaded bars are also from Kathryn Dunn-Tenpas—sort of. I took the numbers from this paper, and then reduced them by a third so they matched up for both Reagan and Clinton, the only two presidents in both datasets. Is this kosher? Of course not. Nobody tell her I did this. But it probably provides a rough historical lens to view this through.

    Anyway, there are no surprises here. Trump has been firing and otherwise losing his senior staff at an astounding clip: about 3-4 times the rate of his predecessors, and double the rate of the previous record holder, Ronald Reagan. I guess he didn’t know how to hire the best people after all.

    It’s also worth noting how Trump has been refilling the swamp. His initial staff was top-heavy with outsiders who were raring to turn Washington DC inside-out. But nearly all of them are gone, replaced by standard-issue Republican swamp dwellers. Gaze into the swamp too long, and it turns out the swamp stares back.

  • Friday Cat Blogging – 29 December 2017

    For our final catblogging post of the year, our local furballs have agreed to give the stage to Tillamook, one of my mother’s cats. I was visiting yesterday because the latest Windows update from Microsoft corrupted her PC so badly it wouldn’t boot. It was a lengthy visit, since I had decided it was time to buy a new computer rather than risk surgery on the old one, which would probably just crash again soon. They’re so cheap, why not? Anyway, the basics are all working now, though I’m sure there will be plenty of fiddly details to attend to over the next few weeks.

    Every time I do something like this I wonder how anyone survives having a PC. I’m pretty PC savvy, but even I had to screw around a fair amount to get all the backups and the email archives and the browser profiles etc. etc. working properly. An ordinary person wouldn’t have had a chance.

    On the bright side, the printer driver apparently installed itself without my even touching it. If only everything else worked so well.