There is More to Life than Manufacturing

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

One of my continuing questions about research in economic growth is why it insists on remaining so focused on manufacturing to the exclusion of the other 70-95% of economic activity in most economies.

I’ll pick on two particular papers here, mainly because they are widely known. The first is Chad Syverson’s “What Determines Productivity?“, a survey piece that reviews the literature on firm-level productivity measurement. The main theme of the survey is that productivity varies widely across firms. Which firms? Syverson cites his own work showing that within disaggregated manufacturing industries, productivity varies by a factor or roughly 2-to-1 between the 90th and 10th percentiles. The rest of the survey contains citation after citation of papers studying manufacturing sector productivity differences.

Hsieh and Klenow, in their paper looking at the aggregate impacts of these kinds of productivity gaps, look at manufacturing plants in India, China, and the U.S. They find that the productivity differences, if eliminated, would raise manufacturing productivity by 40-50% in China and India. What goes unsaid in Hsieh and Klenow is that a 40-50% increase in productivity in manufacturing would be something like a 10% increase in aggregate GDP in India, and a 15% increase in China. Both still impressive numbers, but much smaller than the headline result because the manufacturing sector is *not* the dominant source of value-added for any country.

Why do we persist in focusing on this particular subset of industries, sectors, and firms? I think one of the main reasons is that our data collection is skewed towards manufacturing, and we end up with a “lamppost” problem. We look for our lost keys underneath the lamppost because that’s where the light is, even though the keys are out in the dark somewhere.

Our system of classifying economic activity is part of the problem. It was designed to track manufacturing originally, and then other sectors were sort of stapled on as an afterthought. To see what I mean, consider the main means of classifying value-added by sector (ISIC codes) and the main means of classifying occupations (ISCO codes).

ISIC stands for International Standard Industrial Classification. It was designed to distinguish one goods-producing industry from another, not to provide any nuance with respect to services. The original ISIC system had 10 industries, and 2 of them were manufacturing. Those 2 manufacturing industries were divided into 20 total sub-industries. *All* of the other economic activity in the economy was assigned a total of 25 sub-categories. So we’ve got “manufacture of wood and cork, except for furniture” and “manufacture of rubber products” under manufacturing in general. But we’ve got “wholesale and retail trade” as a sub-category under commerce.

From ISIC’s perspective, separately tracking the manufacture of wood of cork products (but not furniture, that’s different) was important, but it was sufficient to just lump all wholesale and retail activity in the economy together. Even in 1960, all manufacturing value-added in the U.S. was only slightly larger than all wholesale and retail trade value-added. But the former is subdivided into 20 sub-categories, while the latter is simply a sub-category of its own. Our methods of categorizing value-added are a relic of an economy now 60-70 years old, and even back then this was un-related to the relative importance of different sectors.

And no, ISIC has not kept up with the times. Yes, the current ISIC revision 4 now breaks out wholesale and retail trade into its own sub-categories (2-digit) and sub-sub-categories (3-digit). Wholesale and retail trade now has 20 3-digit categories. Retail sale of automotive fuel, for example. Manufacturing has 71 3-digit categories. Manufacture of irradiation, electromedical, and electrotherapeutic equipment, for example.

In the current ISIC version, “Education” is a top-level sector, similar to “Manufacturing”. But while manufacturing still has 24 sub-sectors at the 2-digit level, and 71 at the 3-digit, education has 1 sub-sector at the 2-digit level, and 5 at the 3-digit level. “Human health and social work” is a top-level sector, and it has 3 2-digit sub-sectors, and 9 3-digit sub-sectors. We have “hospital activities” and “medical and dental practice activities” as 2 of the 9, so you can at least separate out your optometrist appointment from your emergency appendectomy.

Think of how ridiculous this is. We are careful to distinguish that your dining room table was produced by a different sub-sector than the one the produced the wooden salad bowl you use on that table. But we do not bother to distinguish my last tooth cleaning from my grandma’s last orthopedic appointment.

The calcification of our view of the sources of economic activity continues if we look at occupation codes. These are from ISCO, and the last revision to the codes was in 2008. ISCO uses a similar multi-digit system as ISIC. The one-digit code of 2 means “Professionals”, and below that is the two-digit code of 25, for “Information and communications technology professionals”. That two-digit code has the following lower-level breakdown:

  • 251 Software and applications developers and analysts
    • 2511 Systems analysts
    • 2512 Software developers
    • 2513 Web and multimedia developers
    • 2514 Applications programmers
    • 2519 Software and applications developers and analysts not elsewhere classified
  • 252 Database and network professionals
    • 2521 Database designers and administrators
    • 2522 Systems administrators
    • 2523 Computer network professionals
    • 2529 Database and network professionals not elsewhere classified

These are incredibly high level designations in the tech world. Imagine that you are building a new web site for your retail business, and you need someone to do user interface. Do you ask for someone who does “web and multimedia development”, or someone who does “software development”? No. Those are far too general. You’d post an ad for someone who does UI/UX design, with a knowledge of html, css, and perhaps javascript. You might also require them to know Photoshop. And this person is completely different than the person you’d hire to build your iPhone app, who needs to know Xcode at a minimum, and is different from the guy who builds the Android app.

On the other hand, we have the one-digit code of 7 that means “Craft and related trade workers”. Below that is code 71, for “Building and related trades workers, excluding electricians”. That category is broken down further as follows:

  • 711 Building frame and related trades workers
    • 7111 House builders
    • 7112 Bricklayers and related workers
    • 7113 Stonemasons, stone cutters, splitters and carvers
    • 7114 Concrete placers, concrete finishers and related workers
    • 7115 Carpenters and joiners
    • 7119 Building frame and related trades workers not elsewhere classified
  • 712 Building finishers and related trades workers
    • 7121 Roofers
    • 7122 Floor layers and tile setters
    • 7123 Plasterers
    • 7124 Insulation workers
    • 7125 Glaziers
    • 7126 Plumbers and pipe fitters
    • 7127 Air conditioning and refrigeration mechanics
  • 713 Painters, building structure cleaners and related trades workers
    • 7131 Painters and related workers
    • 7132 Spray painters and varnishers
    • 7133 Building structure cleaners

The separate occupations involved in building a house are pretty clearly delineated here: framers, plumbers, painters, etc.. Heck, ISCO makes sure to distinguish “spray painters” from regular old “painters”, and those are all different from people who clean building structures (I’m guessing these people have power washers?).

While all the individual occupations of building are house are broken down, all the individual occupations of building a successful web-site are lumped into one, maybe two occupations? “Software developers” is not the same level of disaggregation as “plumbers”, despite ISCO having them both coded to a 4-digit level.

If you go back to the ISIC codes, you can get an idea of how our conception of economic activity atrophied somewhere around 1960. What follows are some current descriptions of 3-digit sectors from ISIC.

This is for the “Manufacture of Furniture”:

This division includes the manufacture of furniture and related products of any material except stone, concrete and ceramic. The processes used in the manufacture of furniture are standard methods of forming materials and assembling components, including cutting, moulding and laminating. The design of the article, for both aesthetic and functional qualities, is an important aspect of the production process.

Some of the processes used in furniture manufacturing are similar to processes that are used in other segments of manufacturing. For example, cutting and assembly occurs in the production of wood trusses that are classified in division 16 (Manufacture of wood and wood products). However, the multiple processes distinguish wood furniture manufacturing from wood product manufacturing. Similarly, metal furniture manufacturing uses techniques that are also employed in the manufacturing of roll-formed products classified in division 25 (Manufacture of fabricated metal products). The molding process for plastics furniture is similar to the molding of other plastics products. However, the manufacture of plastics furniture tends to be a specialized activity.

Note the detailed differences accounted for in the definition of furniture manufacture. ISIC is careful to distinguish that wood furniture is distinct from just processing wood, because of some aesthetic element. And yes, the techniques for metal and plastic furniture are similar to other 3-digit industries, but there is something particular about furniture that sets it apart from these.

Now here’s the description of the “Computer Programming, Consultancy, and Related Activities” code:

This division includes the following activities of providing expertise in the field of information technologies: writing, modifying, testing and supporting software; planning and designing computer systems that integrate computer hardware, software and communication technologies; on-site management and operation of clients’ computer systems and/or data processing facilities; and other professional and technical computer-related activities.

On the other hand, anyone who does anything even remotely connected with IT gets lumped into one gigantic category. Write code in Ruby on Rails for web sites? Convert legacy systems at a major corporation from COBOL over to C? Do tech support for a bank? Manage a server farm? Create mobile apps in Xcode? All that shit’s basically the same, right? Computer stuff.

This concentrated focus on manufacturing is problematic because it means we cannot undertake detailed studies similar to Syverson’s or Hsieh and Klenow’s about the sectors that are actually growing rapidly. Is there a lot of productivity dispersion in software? How about in retail, or home health care? These industries actually account for large and growing shares of economic activity, so productivity losses in them are relatively important compared to manufacturing.

The classification system also helps sustain the myth that this sector is somehow inherently more valuable than other types of economic activity. It plays into this idea that a country is failing if its manufacturing sector is declining as a share of GDP. But that decline in manufacturing is simply evidence that we have gotten very, very adept at it, and that there is an upper limit on the marginal utility of having more manufactured goods. All that effort that goes into tracking individual types of manufacturing activity would be far better spent tracking more service-sector sub-categories and occupations, because those are actually going to expand in size in the future.

And yes, I just wrote 2000 words about ISIC and ISCO codes. What has happened to me?

All Institutions, All the Time?

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

Wolfgang Keller and Carol Shiue just released a working paper on “Market Integration as a Mechanism for Growth“. They are looking at growth in Germany during the 19th century, and proxy for growth by using city population growth, on the presumption that people only flood into cities that are booming economically. They examine the explanatory power of both market integration and institutions for city population growth, and hence for economic growth.

To measure market integration KS use the spread in wheat prices between pairs of cities. The smaller the spread, the more integrated the cities are. Larger price spreads indicate either high transportation costs and/or some kind of other barrier to transactions that keeps trade from reducing this spread. Why wheat? Because it is widely traded, homogenous, and they have good data on it.

For institutions, KS use three different measures, all binary indicators: abolition of guilds, equality before the law, and the ability to redeem feudal lands. The very good part about their measures are that they are binary, and this conforms to the historical situation. As Napoleon conquered German territories, he imposed some very specific institutional change in these places. So one can reasonably code a 0/1 variable for whether a specific city had abolished guilds, or had imposed equality before the law (that is, adopted the Napoleonic code), or allowed redemption of feudal lands. There is natural variation across German cities in when (or if) these institutional changes took place, based on Napoleon’s activity. (This empirical set-up is drawn from Acemoglu, Cantoni, and Robinson).

The binary indicators are fine as they are. But KS then do a bad thing, and average these measures. Regular readers of this blog know how I feel about arbitrary indexes of institutions, and averaging creates an arbitrary index. Their main specification averages the first two (guilds and legal equality). This effectively presumes that abolishing guilds and legal equality have precisely the same effect. A city that abolished guilds but did not adopt legal equality has an institutional level exactly equal to one that did not abolish guilds but did adopt legal equality. Why should this be identical in effect? These are clearly not institutional substitutes. They potentially have wildly different effects on economic activity. If you want to use different measures of institutions in this kind of study, then you should incorporate these measures separately in your regressions.

That gripe aside, what do KS do? First, they realize that if they just regress city population growth on their institutional measure and their measure of price gaps, then this is subject to all sorts of objections regarding endogeneity and omitted variables. So KS come up with instruments. They use a dummy for French rule to instrument for institutions, as only those places conquered by Napoleon necessarily adopted the institutional reforms (this is also the Acemoglu et al strategy). They then use a geographic measure of the slope of terrain surrounding a city as an instrument for market integration. This is because the cost of shipping by rail increases with the slope of the terrain (gravity is a bitch). They make an argument that both French rule and the slope characteristics are exogenous to city population growth, and serve as valid instruments.

They’re using IV, so you could also chuck rocks at the instruments and claim they don’t work. If you’re going to do that, you need to have some plausible story for why the IV’s aren’t exogenous. I don’t have a good story like that, so I’m going to take their IV strategy as solid.

What do they find? They find that city population is significantly and negatively related to market integration (price gaps) and insignificantly (but positively) related to institutions. Cities that had smaller price gaps with other cities, and so were more integrated into the wider economy, experienced more rapid city population growth over the 19th century. Cities with better institutions may have had higher city population growth, but the evidence is too noisy to know for sure. For future reference, their 2nd-stage regression has an R-squared of 68%, which includes the impact of city and year fixed effects. The regression also predicts 73% of the actual city growth in the mean city. So they have what I would consider a lot of explanatory power (although a bunch could just be due to fixed effects).

Here is where I start to get confused by the paper. I look at this and think, “Looks like institutions – at least the abolition of guilds and the Napoleonic code – didn’t have a big impact on city growth. Holding those institutions constant, more integrated cities grew faster.” But KS seem determined to find an interpretation of these results that preserves the primacy of institutions as an explanation for growth. They take this result and say it does not tell us about the relative importance of institutions, meaning those two or three very specific institutions of guild abolition, legal equality, and feudal redemption.

They argue that what you should really be doing is not looking at the lack of significance on institutions in this regression, but do some different counter-factuals. So they do two different regressions. They regress city population growth on market integration only, with market integration instrumented by only the geography instruments. This is their “mechanisms” model, and it is intended to capture just the pure effect of market integration. That specification yields an R-squared of 49%, and predicts 44% of actual city growth in the mean city. Again, these numbers include any influence of the city and time fixed effects, so this isn’t all due to market integration.

They then do the mirror image of this. They regress city population growth on institutions, instrumented with only the French rule instrument. This is their “institutions” model, and is intended to capture the pure effect of institutions. That gives them an R-squared of 15%, and predicts 13% of actual city growth in the mean city. Again, these numbers reflect the explanatory power of institutions and the city and time fixed effects.

Unsurprisingly, both of these separate regressions have less explanatory power than the combined specification. But it sure seems as if market integration is far more important that institutions, doesn’t it? The R-squared is 49% versus 15%, and remember that those both include the explanatory power of the city and time fixed effects. So it could well be that the explanatory power of institutions was zero, and the explanatory power of market integration is like 34%. (This is knowable, by the way, and I’d suggest they report the partial R-squared’s in the paper.)

KS press on, though, to keep institutions a central part of the story. They argue that we should view institutions as fundamental, and that institutions led to market integration, which led to further growth. In support of this, they use their first-stage results from the main specification. This shows that market integration is significantly related to both the French rule dummy and the geographic variables affecting rail costs. On the other hand, the institutions measure is only significantly related to the French rule dummy. From this, they conclude that “Institutional change led to gains in the integration of markets, but market integration did not, at least in the short run, affect institutions.” Institutions are more fundamental, so to speak.

I don’t think this follows from those first stages. Market integration is related to the French rule dummy, which is not a measure of institutions. It is a measure of whether the French ever ruled that particular city. It captures everything about French rule, not just those three particular institutional reforms. It captures, in part, whether Napoleon thought the city was worth taking over, and I would venture to guess this depended a lot on whether the city was well-connected with the rest of Germany. He needed to move troops around, so cities that were already well-integrated to other areas via roads would be particularly attractive. The French rule dummy does not tell me that institutions matter for market integration. They tell me that places conquered by Napoleon were better connected to other cities.

I’m not sure why it is so crucial to establish that these particular institutions in this time frame were important for growth. KS have a really cool paper here, with an impressive collection of data, an interesting time period to analyze, and a lot of results that stand up by themselves as interesting facts. Why shove it through the pin-hole of institutions?

I think KS could have easily written this paper as evidence that market integration matters more than the three institutions they study. And that would be okay. It doesn’t mean INSTITUTIONS don’t matter for growth, it means that guild abolition, legal equality, and feudal redemption were not important for growth. That leaves approximately an infinity of other institutions that could be important for growth. Given the ambiguous definition of institution, market integration is an institution itself, even if it depends on (gasp!) geography. Eliminating some institutions as relevant would be helpful at this stage, as the literature has to this point (miraculously?) found that every single institutional structure studied really matters for growth. Have we reached the point where publication requires finding each and every single institution relevant for growth?

Has the Long-run Growth Rate Changed?

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

My actual job bothered to intrude on my life over the last week, so I’ve got a bit of material stored up for the blog. Today, I’m going to hit on a definitional issue that creates lots of problems in talking about growth. I see it all the time in my undergraduate course, and it is my fault for not being clearer.

If I ask you “Has the long-run growth rate of the U.S. declined?”, the answer depends crucially on what I mean by “long-run growth rate”. I think of there as being two distinct definitions.

  • The measured growth rate of GDP over a long period of time: The measured long-run growth rate of GDP from 1985 to 2015 is {(\ln{Y}_{2015} - \ln{Y}_{1985})/30}. Note that here the measurement does not have to take place using only past data. We could calculate the expected measured growth rate of GDP from 2015 to 2035 as {(\ln{Y}_{2035} - \ln{Y}_{2015})/20}. Measured growth rate depends on the actual path (or expected actual path) of GDP.
  • The underlying trend growth of potential GDP: This is the sum of the trend growth rate of potential output per worker (we typically call this {g}) and the trend growth rate of the number of workers (which we’ll call {n}).

The two ways of thinking about long-run growth inform each other. If I want to calculate the measured growth rate of GDP from 2015 to 2035, then I need some way to guess what GDP in 2035 will be, and this probably depends on my estimate of the underlying trend growth rate.

On the other hand, while there are theoretical avenues to deciding on the underlying trend growth rate (through {g}, {n}, or both), we often look back at the measured growth rate over long periods of time to help us figure trend growth (particularly for {g}).

Despite that, telling me that one of the definitions of the long-run growth rate has fallen does not necessarily inform me about the other. Let’s take the work of Robert Gordon as an example. It is about the underlying trend growth rate. Gordon argues that {n} is going to fall in the next few decades as the US economy ages and hence the growth in number of workers will slow. He also argues that {g} will fall due to us running out of useful things to innovate on. (I find the argument regarding {n} strong and the argument regarding {g} completely unpersuasive. But read the paper, your mileage may vary.)

Now, is Gordon right? Data on the measured long-run growth rate of GDP does not tell me. It is entirely possible that relatively slow measured growth from around 2000 to 2015 reflects some kind of extended cyclical downturn but that {g} and {n} remain just where they were in the 1990s. I’ve talked about this before, but statistically speaking it will be decades before we can even hope to fail to reject Gordon’s hypothesis using measured long-run growth rates.

This brings me back to some current research that I posted about recently. Juan Antolin-Diaz, Thomas Drechsel, and Ivan Petrella have a recent paper that finds “a significant decline in long-run output growth in the United States”. [My interpretation of their results was not quite right in that post. The authors e-mailed with me and cleared things up. Let’s see if I can get things straight here.] Their paper is about the measured growth rate of long-run GDP. They don’t do anything as crude as I suggested above, but after controlling for the common factors in other economic data series with GDP (etc.. etc..) they find that the long-run measured growth rate of GDP has declined over time from 2000 to 2014. Around 2011 they find that the long-run measured growth rate is so low that they can reject that this is just a statistical anomaly driven by business cycle effects.

What does this mean? It means that growth has been particularly low so far in the 21st century. So, yes, the “long-run measured growth rate of GDP has declined” in the U.S., according to the available evidence.

The fact that Antolin-Diaz, Drechsel, and Petrella find a lower measured growth rate similar to the CBO’s projected growth rate of GDP over the next decade does not tell us that {g} or {n} (or both) are lower. It tells us that it is possible to reverse engineer the CBO’s assumptions about {g} and {n} using existing data.

But this does not necessarily mean that the underlying trend growth rate of GDP has actually changed. If you want to establish that {g} or {n} changed, then there is no retrospective GDP data that can prove your point. Fundamentally, predictions about {g} and {n} are guesses. Perhaps educated guesses, but guesses.

Significant Changes in GDP Growth

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

A relatively quick post to highlight two other posts that recently came out regarding GDP growth. First, David Papell and Ruxandra Prodan have a guest post up at Econbrowser regarding the long-run effects of the Great Recession. They use the CBO projections of GDP into the future (similar to what I did here) and look at whether there was a statistically significant break in the level of GDP at the Great Recession. Short answer, yes. Their testing finds that the break was 2008:Q2, not a surprising date to end up with.

It is important to remember that David and Ruxandra are testing for a break in the level of GDP, and not GDP per capita. It is entirely possible to have a structural break in GDP while not having a structural break in GDP per capita. The next thing to remember is that they cannot reject that the growth rate of GDP is the same after 2008:Q2 as it was before. What I mean is easier to see in their figure than it is to explain:
Papell Prodan
Before and after the break, the growth rate is identical. It is just the level that has changed.

The second post is from Juan Antolin-Diaz, Thomas Drechsel, and Ivan Petrella. They use only existing data (not CBO projections) and find that there is statistical evidence of a change in the growth rate of U.S. GDP. They see a slowdown in growth starting in the mid-2000’s, consistent with John Fernald’s suggestions regarding productivity growth. It takes until 2015 to see this break statistically because you need several years of data to confirm that the growth slowdown was not a temporary phenomenon.

Note the subtle but very, very, very important difference between the two posts. Papell/Prodan find a significant shift in the level of GDP, while Antolin-Diaz, Drechsel, and Petrella (ADP) find a significant shift in the growth rate of GDP. The former sucks, but the latter is far more troubling. If the growth rate is truly lower, then we will get farther and farther away from the pre-GR trend, and the ratio of actual GDP to pre-GR trend GDP will go to zero. If it is just a level shift, then the ratio of actual GDP to pre-GR trend GDP will go to one as both become arbitrarily large.

I find the Papell/Prodan result more convincing. Keep in mind that David is my department chair and if I knocked on my office wall right now I could interrupt the phone call he is on. Ruxandra’s office is all of 20 feet from mine. I see these people every day. But regardless of the fact that I know them personally, I think they are right.

ADP are getting a false result showing slow growth because of the level shift that David and Ruxandra identify. If ADP do not allow for the level shift, then over any window of time that includes 2008:Q2 the growth rate will be calculated to be low. But that is just a statistical artifact of this one-time drop in GDP. It doesn’t mean that the long-run growth rate is in fact different. Put it this way: if they re-run their tests 25 years from now, they’ll find no statistical evidence of a growth change.

Of course, if the CBO is wrong about the path of GDP from 2015-2025, then Papell/Prodan could be wrong and ADP could be right. But given the current CBO projections, there is strong evidence of a negative level shift to GDP, but no change in the long-run growth rate.

Research on Persistent Roots of Development

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

A few papers of interest regarding the persistent effect of historical conditions (geographic or not) on subsequent development:

  1. Marcella Aslan’s paper on the TseTse fly and African development is now out in the American Economic Review. I believe I’ve mentioned this paper before, so go read it finally. Develops an index of suitability for TseTse flies by geography, then shows that within Africa higher TseTse suitability is historically associated with less intensive agriculture, fewer domesticated animals, lower population density, less plow usage, and more slavery (If you are queasy about using Murdock’s ethnographic atlas, then avoid this paper). Marcella shows that TseTse suitability is currently related to lower light intensity (everyone’s favorite small-scale measure of development), *but* this effect disappears if you control for historical state centralization. The idea is that the TseTse prevented the required density from forming to create proto-states, and that these places remain underdeveloped. Great placebo test in this paper – she can map the TseTse suitability index of the whole world, and show that it has no relationship to outcomes. The TseTse is a uniquely African effect, and she is not picking up general geographic features.
  2. James Ang has a working paper out on the agricultural transition and adoption of technology. Simple idea is to test whether the length of time from when a country hit the agricultural transition is related to their level of technology adoption in 1000 BCE, 1 CE, or 1500 CE (think “did they use iron?” or “did they use plows?”). Short answer is that yes, it is related. Places that experienced ag. transition sooner had more technology at each year. Empirically, he uses instruments for agricultural transition that include distance to the “core” areas of transition (China, Mesopotamia, etc..) and indexes of biological endowments of domesticable species (a la Jared Diamond, and operationalized by Olsson and Hibbs). The real question for this kind of research is the measure of technology adoption. We (meaning Comin, Easterly, and Gong) retrospectively code places as having access to technologies in different years. A worry is that because some places are currently poor (for non-agricultural reasons) the world never bothered to adopt their particular technologies, but that doesn’t necessarily mean they were technologically unsophisticated for their time.
  3. Dincecco, Fenske, and Onorato have a paper out on historical conflict and state development. The really interesting aspect here is how Africa differs from other areas of the world. Across the world and over history (meaning from 1400 to 1799) wars are associated with greater state capacity today. That is, places that were involved in conflicts in the past are now stronger states (measured as their ability to tax) than those without conflict. The basic theory is that wars allow states to concentrate their power. However, historical conflict is unrelated to current civil conflicts…except in Africa. In Africa, historical wars are correlated with current civil conflicts, and this is associated with poor economic outcomes today, so things are bad on multiple fronts. Here’s my immediate, ill-informed, off-the-cuff analysis: In non-African places, wars generated strong states who were able to use their power to completely and utterly eliminate ethnic groups or cultural groups that were alternative power centers. They don’t have armed civil conflicts today because the cultural groups that might have agitated conflict were wiped out or so completely assimilated that they don’t exist any more. In Africa, central states were just not as successful in eliminating competing cultural groups, so they remain viable sources of conflict. Africa’s problem, perhaps, was a lack of conclusive wars in the past.

Mean-Reversion in Growth Rates and Convergence

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

Brad DeLong posted about the recent paper by Pritchett and Summers (PS) on “Asiaphoria” and mean-reversion in growth rates. PS found several things:

  • Growth rates are not persistent. The growth rate over the last 10 years has very little information about the growth rate over the next 10 years. Growth rates “regress to the mean” as PS say.
  • Growth in developing countries tends to take place in bursts of growth and bursts of stagnation. This is different from rich countries where growth variation tends to consist of mild variation around a trend rate.
  • There is no reason to believe that rapidly growing economies today (China and India) will necessarily continue to grow rapidly.

Brad’s response is to take their evidence as a fundamental challenge to the standard Solow model explanation for why growth rates differ.

Lant Pritchett and Larry Summers are now trying to blow this up: to say that just as the neoclassical aggregate production function is a very bad guide to understanding the business cycle, as the generation-old failure of RBC models tells us, so the neoclassical aggregate production function and the Solow growth model built on top of it is a bad guide to issues of growth and development as well.

This is an overreaction. The mean-reversion and “bursts” that PS find are perfectly consistent with a Solow model including shocks.

Let’s start with the finding that regressing decadal growth rates on prior-decadal growth rates gives you a coefficient of something like 0.2-0.3. PS call this mean-reversion. I think it’s an artifact of convergence. Let’s imagine an economy that is following the Solow model precisely. It is very poor in 1960, and growth from 1960-1970 is about 10% per year. By 1970 it is much better off, and so growth from 1970-1980 slows to 5% per year. By 1980 this has gotten the country to steady state, so from 1980-1990 it grows at 2% per year. From 1990-2000 it is still at steady state, so grows at 2% a year again.

Now regress decadal growth rates (5,2,2) on prior-decade growth rates (10,5,2). What do you get? A line with a slope of about 0.397. Why? Because growth rates slow down as you approach steady state. Play with the numbers a little and you can make the slope 0.3 if you want to. The point is that convergence will generate just such a pattern in growth rates.

What about the unpredictability of growth rates? PS find that the correlation of growth rates across periods is very low. This is more problematic for convergence, on the face of it. If convergence is true, then growth rates across decades should be tightly correlated. In other words, even if the slope of the toy regression I ran above is less than one, the R-squared should be large.

In my toy example, the country systematically converges to 2% growth, and the R-squared of my little regression is 0.86. PS find much smaller R-squares in their work. The conclusion is that growth rates in the next decade are very unpredictable. So does this mean that convergence and the Solow model are wrong? No. The reason is that once you allow for any kind of meaningful shocks to GDP per capita, the short-run growth rates get very noisy, and you lose track of the convergence. It doesn’t mean it isn’t there, it just is hard to see.

Let me give you a clearer demonstration of what I mean. I’m going to build an economy that strictly obeys convergence, with the growth rate related to the difference between actual GDP per capita and trend GDP per capita.

More formally, let

\displaystyle  y_{t+1} = (1+g)\left[\lambda y^{\ast}_t + (1-\lambda)y_t \right] + \epsilon_{t+1} \ \ \ \ \ (1)

where {g} is the long-run growth rate of potential GDP, {y^{\ast}_t} is potential GDP in year {t}, {y_t} is actual GDP in year {t}, and {\epsilon_{t+1}} are random shocks to GDP in year {t+1}. This formula mechanically captures convergence to trend GDP per capita, but with the additional wrinkle of shocks occurring in any given period that push you either further away or closer to trend. {\lambda} is the convergence parameter, which I said in some recent posts was about 0.02, meaning that 2% of the gap between actual and trend GDP per capita closes every period.

I simulated this over 100 periods, with {g=0.02}, {\lambda=0.02}, {y^{\ast}_0 = 20} and {y_0 = 5}. The country starts well below potential. I then let there be a shock to {y} every period, drawn from a normal with mean 0, variance 0.25. Here are the results of one run of that simulation.

First, look at the 10-year growth rates over time. There is a downward trend if you look at it, but this is masked by a lot of noise in the growth rate. You have what look distinctly like two growth booms, about period 25 and period 50.

10-year Growth Rates

Second, look at the correlation of the average growth rate in one “decade” and the average growth rate in the prior “decade”. This is essentially what Pritchett and Summers do. I’ve also included the fitted regression line, so you can see the relationship. There is none. The coefficient on the prior-decade growth rate is 0.05, so pretty severe mean-reversion. The R-squared is something like 0.16. A high growth rate one decade does not indicate high growth the following decade, and the current decadal growth rate provides very little information on growth over the next decade.

Correlation of Growth Rates over time

But this model has mechanical convergence built into it, just with some extra noise dropped on top to make things interesting. And with sufficient noise, things are really interesting. If you looked at this plot, you’d start talking about growth accelerations and growth slowdowns. What happened in period 25 to boost growth? Did this economy democratize? Was there an opening to trade? And what about the bust around period 40? For a poor country, that is low growth. Was there a coup? We see plenty of “bursts” of growth and “bursts” of stagnation (or low growth) here. It’s a function of the noise I built in, not a failure of convergence.

By the way, take a look at the log of output per worker over time. This shows a bumpy but steady upward trend. The volatility of the growth rate doesn’t look as dramatic here.

Log output per worker

If I turned up the variance of the noise term, I’d be able to get even wilder swings in output, and wilder swings in growth rates. In a couple simulations I played with, you get a negative relationship of current growth rates to past growth rates – but in every case there was convergence going on.

Why are growth rates so un-forecastable, as PS find? Because of convergence, the noise doesn’t just cancel out over time. If a country gets a big negative shock today, then the growth rate is going to be low this year. But now the country is particularly far below trend GDP per capita, and so convergence kicks in and makes the growth rate larger than it normally would be. And because convergence works slowly, it will be larger than normal for several periods afterwards. There is a natural tendency for growth rates to be uncorrelated in the presence of shocks, but that is again partly because of convergence, not evidence of its absence. There are lots of reasons that the Solow model could be the wrong way to look at growth. But this isn’t one of them.

I think the issue here is that convergence gets “lost” behind all the noise in the data. Over long periods of time, convergence wins out. [“The arc of history is long, but it bends towards Robert Solow”? Too much?] Growth rates start relatively high and end up asymptoting towards the trend growth rate. But for any small window of time – say 10 years – noise in GDP per capita can swamp the convergence effects. In the growth literature we tend to look at differences of 5 or 10 years to “smooth out” fluctuations. That’s not sufficient if one wants to think about convergence, which operates over much longer time periods.

PS are absolutely right that we cannot simply extrapolate out China and India’s recent growth rates and assume they’ll continue indefinitely. We should, as growth economists, account for the gravitational pull that convergence puts on growth rates as time goes forward. But just like gravity, convergence is a relatively weak force on growth rates. It can be overcome in the short-run by any reasonably-sized shock to GDP per capita.

You don’t think “Oh my God, gravity is broken!” every time you see an airplane overhead. So don’t take abnormal growth rates or uncorrelated growth rates as evidence that convergence isn’t occurring.

The Skeptics Guide to Institutions – Part 4

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

The final installment of my series on the empirical institutions literature. Quick summary of the prior posts:

  1. Part 1: cross-country studies of institutions are inherently flawed by lack of identification and ordinal institutional indexes treated as cardinal
  2. Part 2: instrumental variable approaches – settler mortality included – are flawed due to bad data and questions and more identification problems.
  3. Part 3: historical studies show that there is path dependence or a poverty trap, but not that institutions themselves are central to underdevelopment

You have to be very careful with what you conclude from the institutions literature or from my three posts. We are dealing with empirics here, so we are not able to make any definitive statements. There is a null hypothesis, and we either reject or fail to reject that null.

So what is that null hypothesis? For the institutions theory, as with any theory, the correct null hypothesis is that it is wrong. Specifically, the null hypothesis is “institutions do not matter”. What does the empirical institutions literature tell me? I cannot reject that null. We do not have sufficient evidence to reject the idea that institutions do not matter.

But failure to reject the null is not the same as accepting the null. Having failed to reject the null, I cannot conclude that institutions do *not* matter. They may matter. All the other reading and thinking I’ve done on this subject suggests to me that they *do* matter. But the existing empirical evidence is not sufficient to strongly reject the null that they do *not*. As I said in the last post, there may be a working paper out there right now that offers a real definitive rejection of the null.

Given the empirical evidence, then, I’m uncomfortable making broad pronouncements that we have to get institutions “right” or “improve institutions” to generate economic development. We do not have evidence that this would work.

Further, I’m not sure that even if that mythical working paper did appear to solidly reject the null that the right advice would be to “improve institutions”. I say this because even the institutions literature tells you that it is impossible to make an exogenous change to institutions. Acemoglu and Robinson did not lay out a theory of what constitutes good institutions, they laid out a theory of why institutions are persistent. Their work shows that being stuck in the bad equilibrium is the result of a skewed distribution of economic power that grants some elite a skewed amount of political power. The elite can’t credibly commit to maintaining reforms, and the masses can’t credibly commit to preserving the elite’s position, so they can’t come to an agreement on creating better institutions (whatever those might be).

The implication of the institutions literature is that redistributing wealth towards the masses will lead to economic development (and vice versa, that redistributing it towards the elites will slow economic development). Only then will the elite and masses endogenously negotiate a better arrangement. You don’t even have to know precisely what “good institutions” means, as they will figure it out for themselves. The redistribution need not be explicit, but may arise through changes in technology, trade, or population.

Douglass North has the same underlying logic in his work. It was only with changes in the land/labor ratio favoring workers in Europe that old institutions disintegrated (serfdom) and new institutions arose (secure property rights).

A good example is South Korea. In 1950, Korea was one of the poorest places on earth, falling well below many African nations in terms of development. It had also been subject to colonization by Japan from 1910 to 1945. Korea had the same history of exploitive institutions as most African nations.

So why didn’t South Korea get stuck in the same trap of bad institutions and under-development as Africa? One answer is that is had a massive redistribution of wealth. In 1945, the richest 3 percent of rural households owned 2/3 of all land, and about 60 percent of rural households had no land. This should have led to bad institutions and persistent underdevelopment. (See Ban, Moon, and Perkins, 1980, if you can find a copy).

But starting in 1948 South Korea enacted wholesale land reform. By 1956, only 7 percent of farming households were tenants, and the rest owned their land. According to the FAO Agricultural Census of 1962, South Korea had *zero* farms larger than 5 hectares. Not a small number, not just a few, but *zero*. Agricultural land in South Korea, probably the primary source of wealth at that point, was distributed with incredible equity across households.

According to North or Acemoglu and Robinson, this redistribution changed the relative power of elites and masses. It would have allowed them to reach a deal on “good institutions”, or at least would have made the elite powerless to stop the masses from enacting reforms. South Korea got good institutions in part because it changed the distribution of wealth. [Good institutions for economic growth don’t appear to overlap with good institutions for personal freedom, though – South Korea was a dictatorship until 1988.]

The point is that even if we acknowledge that “institutions matter”, that does not imply that we can or should propose institutional reforms to generate economic development. It’s a mistake to think of ceteris paribus changes to institutions. They are not a thing that we can easily or independently alter. If they were, then they wouldn’t be *institutions* in the way that Douglass North uses the term.

If you want to generate economic development, the implication of the institutions literature is that you have to reform the underlying distribution of economic power first. Once you do that institutions will endogenously evolve towards the “good” equilibrium, whatever that may be.

[But the distribution of economic power *is* an institution, you might say. Okay, sure. Define institutions broadly enough and it will become trivially true that institutions matter. Defined broadly enough, institutions are the reason my Diet Coke spilled this morning, because gravity is an “institution governing the interaction of two masses in space”.]

The Skeptics Guide to Institutions – Part 3

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

This is the third in a series of posts regarding the institutions literature. The first two posts dealt with original cross-country work on institutions and the attempt to identify the effects using settler mortality.

The third generation of institutions work is, in large part, a response to the empirical problems of the first 2 generations. These new papers avoid vague measurement of “institutions” by drilling down to one very specific institution, and do their best to avoid identification problems by looking for natural experiments that give them good reason to believe they are looking at exogenous variation in the institution.

The following are some good examples of this third generation. There are others that I haven’t listed, but these are ones I talk specifically about in class:

  • Dell (2010). Household consumption and child health are lower in areas in Peru and Bolivia subject to the Spanish mita – forced labor in mines – than in areas just outside the mita.
  • Nunn (2008). The number of slaves taken from an African country is negatively related to income per capita today.
  • Banerjee and Iyer (2005). Agricultural output and investments in education and health are currently lower in areas of India where the British invested property rights in landlords as opposed to cultivators.
  • Iyer (2010). Areas of India subject to direct British colonial rule have lower investments in schooling and health today than areas ruled indirectly through Indian governors.
  • Michalopoulos and Papaioannou (2013). Pre-colonial ethnic political centralization in Africa is related to current levels of development within Africa.

So, problem solved, right? We’ve got solid empirical evidence that institutions matter. Not necessarily.

What these papers demonstrate is that economic development is persistent. If you like, they are evidence that there are poverty traps. If something happens to knock you below some threshold level of development – slaving activity, the mita, arbitrary borders, bad landlords – then you can’t get yourself out of that trap. You are too poor to invest in public goods like human capital or infrastructure because you are spending all your money just trying to survive. So you stagnate. Pushing you into the trap was the result of an “institution”, if we call these historical experiences institutions, but it isn’t institutions that keep you poor, it’s the poverty itself that prevents development.

Take Dell’s paper. She does not have evidence that the mita reduced living standards while it existed, she has evidence that contemporary development in the area covered by the mita is lower, roughly two hundred years after the mita was abolished. Dell shows that education is lower and road networks are less dense in mita areas than in their close neighbors. So what explains the historical persistence? One possibility is that there was some other institutional structure left behind by the mita that limited development. But we have no evidence of any institutional difference between the mita areas and others. We simply know that the mita areas are poorer, and that could be evidence of a poverty trap rather than any specific institution.

The papers on India have a similar flavor. The British no longer are in charge in India, but there are some differences today related to how they did govern. With regards to the effects of direct British, we don’t actually know what the channel is leading to the poor outcomes. We just know that there is an effect. With regards to the effect of landlords or cultivator property rights, this isn’t about institutions, it’s about the distribution of wealth.

Think of the question this way. What specific policy change do any of these papers suggest would lead to economic development? “Don’t get colonized, exploited, or enslaved by Europeans” seems like it would be hard to implement retroactively.

Of the papers I listed, probably the strongest evidence that institutions actually matter is the Michalopoulos and Papaioannou work using African ethnicities. Geographic homelands of ethnicities cross national boundaries, and one can measure the economic development in one of these homelands by using satellite data on lights at night. What MP (I’m not spelling those again) find is that ethnicities that had stronger political centralization prior to being colonized – they had political systems beyond simple chief-led villages – are rich today relative to other ethnic groups within the same nation. But this still leaves unanswered what specifically about pre-colonial ethnic political centralization has been transmitted to current populations. The policy implication for development here is just “be descended from a more coherent political unit”.

Those same authors have another paper, by the way, that looks at the question from the other direction. They look within an ethnicity that spans a national border. Does the economic development level of the two parts depend on the national-level institutions? No. Measures of national-level institutions like those discussed in Part 1 have no explanatory power for development differences between the two parts of a partitioned ethnicity.

Understanding how a country/region/ethnicity got poor is not the same thing as understanding what will make them rich. “Institutions mattered” is different from “institutions matter”. I think the better conclusion from the 3rd generation of institutions research is that economies can fall into poverty traps from which escape is difficult if not impossible. Would better institutions allow these places to escape these traps? I don’t think we can say that with any confidence, partly because we have no idea what “better institutions” means.

I think the right null hypothesis regarding existing institutions is that they likely solving a particular issue for a particular group. Let’s call this the Elinor Ostrom hypothesis. I don’t think that the existing empirical institutions literature has provided sufficient evidence to reject the null at this point. Certainly not to the point that we can pinpoint the “right” institutions with any confidence.

Could I be wrong to be this skeptical? Absolutely. We may come up with concrete definitions of institutions that we can measure and use empirically. There may be research in the works right now that gives some definitive evidence that “institutions matter” for development, in the present, and that appropriately tweaking them will generate growth. If so, hallelujah. But until then, I remain skeptical.

The Skeptics Guide to Institutions – Part 2

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

This is the second of a series of posts on the empirical institutions literature that I am covering in my graduate growth and development course. In Part 1, I looked at how the 1st generation of this literature misused cross-country measures of institutions in their poorly identified regressions.

The second generation of empirical institutions work attempted to deal with the endogeneity problem in the standard “regress income per capita on institutions” regression of the 1st generation.

The dividing line between 1st-generation and 2nd-generation studies isn’t that bright. I used Mauro (1995) as an example of 1st-generation institutions work, but that paper uses ethnolinguistic fractionalizaton as an instrument for corruption. Hall and Jones (1999) look at measures of institutional quality instrumented with latitude and the percent of the population that speaks Western European languages. These instrumental variable (IV) strategies are generally dismissed, for the reason that few people believe ethnolinguistic fractionalization, latitude, or European language speaking have affects on income per capita *only* through institutions. In other words, these papers seem to fail on the second requirement of an IV, which is that the instrument has no separate correlation with the dependent variable.

The big event in the 2nd generation of literature was the arrival of Acemoglu, Johnson, and Robinsons (2001) using “settler mortality” as an instrument of institutional quality. They propose that the quality of institutions in a colony was a function of how deadly that colony was for European settlers. The idea is that in places where Europeans died quickly (Sub-Saharan Africa, Central America), they did not want to stay, and therefore installed extractive institutions to suck as many resources out of the colony before they caught some deadly disease. In places like the US or New Zealand, where they did not die, Europeans stayed. They therefore installed good, inclusive institutions.

The heart of the argument here is that institutions in colonies were exogenously determined by Europeans, and thus we have a clean empirical “natural experiment” that will yield a good estimate of the effect of institutions on economic development. AJR is widely cited, and the settler mortality instrument has been used in any number of other papers (I’ve refereed at least 5 or 6 myself in the last 10 years) since their paper came out.

But there are significant issues with the whole empirical strategy. There are four problems with their estimates that I usually think about:

1. They are still using an arbitrary measure of institutions as a continuous variable. The measure of institutions in AJR (2001) is “expropriation risk”, and every country is coded from 0 (high risk) to 10 (no risk). See the prior post for why index of institutions like this are useless. In short, the numbers have no meaning, but AJR treat them as if they do. A 10 does not mean that a US citizen is half as likely to be expropriated than a Bangladeshi (a 5.14). Going from Honduras (5.32) to Tunisia (6.45) is not necessarily the same thing as going from Mexico (7.50) to India (8.27). Their measure of institutions doesn’t measure “institutions”.

2. It is nearly impossible to believe that their instrument (settler mortality) has no separate correlation with the dependent variable (income per capita). Settler mortality arises from putting Europeans unadapted to different climates into those climates. Since the Europeans all come from a pretty similar climate zone, that means that settler mortality is essentially picking up the intensity of the tropical disease environment. While the Africans, Asians, or Americans they colonized may have been adapted to those diseases in the sense that they were no longer deadly, it doesn’t mean those diseases had no effect on those populations. Places that Europeans died are also places that tend to have incredibly poor agricultural conditions – lack of frost, overly heavy rains, and poor soils. Europeans dying at alarming rates is simply a proxy for bad geographic conditions. And no, the fact that AJR control for latitude, temperature, and humidity is not the same thing as controlling for agricultural conditions. You can hold those three things constant and have wildly different outcomes depending on soil, altitude, wind patterns, rainfall patterns, etc.. etc..

3. The estimated effect of institutions doesn’t make sense. Their IV results show a coefficient for institutions that is twice as large as the OLS coefficient. This is problematic. The whole reason we want IV estimates is because we think there is some kind of endogeneity between income per capita and institutions – specifically, that higher income leads to better institutions. This implies that the basic correlation of institutions and income per capita is biased *upwards*, or the OLS results are too big. But when they run IV, they get even bigger effects for institutions. This implies that income per capita has a *negative* effect on institutions, and that is hard to believe.

What about measurement error? We know that if institutions are measured with noise, then the OLS coefficient will be attenuated, or biased towards zero. But classic measurement error, as this would be, implies that there is some true “expropriation risk” out there in the world, and what we have is the true value plus some random error. But you can’t have this kind of measurement error when the numbers for expropriation risk are absolutely arbitrary. There is no *real* number to measure. The “expropriation risk” is precisely measured in the sense that it precisely measures the arbitrary index established by the Political Risk Services. So I don’t buy the measurement error argument.

In the end, the simplest explanation for why their IV results are larger than the OLS is that there is a correlation of their instrument with the error term. We know settler mortality is negatively related to expropriation risk. If settler mortality is independently and negatively related to income per capita, then the IV results are going to be larger than the OLS [for the math-inclined, beta(IV) = beta(OLS) + Cov(error,mort)/Cov(inst,mort) and that ratio of covariances is positive because the two terms are negative].

4. The data are probably wrong. David Albouy’s paper is the central reference here. Let me review the main issues. First, of the 64 observations, they do not have settler mortality data for 36 of them. For those 36, they infer a value from some other country. This inference could be plausible, but in many cases is not. For example, they use mortality data from Mali to infer values of mortality for Cameroon, Uganda, Gabon, and Angola. Gabon is mostly rainforest, and about 2300 miles away from Mali, a desert or steppe.

Second, the sources vary in the type of individuals used to make mortality estimates. Most relevantly, in some countries the mortality rates of soldiers on campaign are used, and in others the mortality rates of laborers on work projects. In both cases, mortality rates are outliers relative to what settlers would have experienced. Most importantly, the use of the higher mortality rates from campaigning soldiers or laborers is correlated with poor institutions. That is, AJR use artificially high mortality rates for places with currently bad institutions. Hence their results are already baked in before they go to run regressions.

Albouy’s paper shows that making any of a number of equally plausible assumptions about how to code the data will eliminate the overall results. Both the first stage – the relationship of mortality to institutions – and the second stage – the relationship of institutions to income per capita – become insignificant under any number of reasonable alterations of the AJR data.

So in the end the settler mortality evidence that institutions matter just does not stack up. It certainly does not have the kind of robust, replicable features we would like in order to establish the importance of something like institutions for development. If you want to argue that institutions matter, then by all means do so, but the AJR evidence is not something you should cite to support your case.

Next up I’ll talk about why 3rd generation empirical studies of specific institutions aren’t actually about institutions, but about poverty traps.

The Skeptics Guide to Institutions – Part 1

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

I’m starting a run of several lectures on institutions in my growth and development course. By revealed preference, so to speak, I take the institutions literature seriously. But there are some issues with it, and so I’m going to teach this literature from a particularly skeptical viewpoint and see what survives. These posts are going to sound very antagonistic as I do this, which isn’t completely fair, but makes it more fun to write.

This first post has to do with the cross-country literature on institutions. The 1st-generation of this research (Mauro, 1995; Knack and Keefer, 1995; Hall and Jones, 1999; Easterly and Levine, 2003; Rodrik et al, 2004; Acemoglu and Johnson, 2005) regressed either growth rates or the level of income per capita on an index of institutional quality along with other controls. In general, this literature found that institutions “matter”. That is, the indices were statistically significant in the regressions, and the size of the coefficients indicated big effects of institutions on growth or income per capita.

These results are the prima facie evidence that institutions are a fundamental driver of differences in development levels. The significance combined with the large absolute values of the estimate effects indicated that even small changes in institutions had a big impact on GDP per capita. We’ll get to talking about questions of whether in fact these are well-identified regressions in a future post. For now, let’s just take these regressions as they are.

The first big issue with this literature is that all the indices of institutions used are inherently arbitrary, and yet are used as if they have a strict numerical interpretation. (see Hoyland, et al, 2012; Donchev and Ujhelyi, 2014) This is easiest to talk about by using an example.

Let’s take the 7 point index for “constraint on the executive” used by Acemoglu and Johnson in their 2005 paper. 1 is “not so many constraints” and 7 is “lots and lots of constraints”. There are more official definitions of these categories. They comes from the Polity IV database, and I will concede that it is coded up by smart, reasonable people. I have no argument with how each individual country is coded. Minor quibbles about how we rank constraints on executives are not going to overturn the results of the regressions using this to measure institutions.

But does Australia (7) have seven times as many constraints at Cuba (1)? Does the one-point gap between Luxembourg (7) and South Korea (6) have a similar meaning to the one-point gap between Liberia (2) and Cuba (1)? Using this as a continuous variable presumes that the index values have some actual meaning, when all they are is a means of categorizing countries.

So what happens if you use the constraint on executive scores simply as categorical (i.e. dummy) variables rather than as a continuous measure? You’ll find that all of the action comes from the category for the 7’s (Western developed countries) relative to the 1’s (Cuba, North Korea, Sudan, and others). That is, the dummy variable on the 7’s indicates that their income per capita is statistically significantly higher than income per capita for the 1’s. Country’s with 2’s, 3’s, 4’s, and 5’s are not significantly richer than 1’s (2’s, 3’s, and 4’s are actually estimated to be *poorer* than 1’s). Country’s with 6’s have marginally significant higher income than 1’s. The finding is that having Western-style social-democracy constraints on executives is what is good for income per capita, but gradations in constraints below that are essentially meaningless.

But there is a more fundamental empirical problem once we use constraints on executive to categorize countries. Regressions are dumb, and don’t care that we have a particular interpretation for our categories. They just load *any* differences in income per capita onto those categorical variables. The dummy variable for category 7 countries captures the average income per capita difference between those countries and the category 1 countries. There might be – and certainly are – a number of things that distinguish North Korea from the U.S. beyond constraints on the executive, and the dummy is picking all those up as well. Even if I control for additional factors (geographic variables, education levels, etc.. ) we cannot possibly control for everything, in part because the sample is so small that I can’t include a lot of variables without losing all degrees of freedom. Empirically, the best I can conclude is that Western-style social democracies are different from poor countries. Well, duh. One aspect of that may be constraints on executives, but we cannot know that for sure.

Other indices of institutions are just as bad. The World Bank Governance indicators, commonly used, include sub-indices of “Governance”, “Accountability”, and “Voice”. Okay, and….what do I do with that? You want to tell me Governance is good in Switzerland and bad in Uganda, I guess I’d have to agree with you, not having any specific experience to draw on. But if I ask you what exactly you mean by that, what kind of answer would I get? These governance indicators are based on surveys of perceptions of the quality of institutions. The institutions that get coded as “good” are the institutions people find in rich countries, because those must be good institutions, right? These measures are inherently endogenous.

This problem holds to some extent even for modern measures of institutional quality like the Doing Business indicators. These have the virtue of measuring something tangible – the number of days necessary to start a business, for example – but it isn’t clear that this should enter linearly to a specification. Does going from 146 to 145 days to start a business have the same effect as going from 10 to 9? Why should it? Is there a threshold we should worry about, like getting the number of days under 30? And just because we can measure the number of days to register a business, does that mean it is important, or that it constitutes an “institution”?

Reading the cross-country empirical institutions literature is the equivalent of watching studio analysis of NFL games. You have a bunch of people “in the game” of economics sitting around making un-refutable statements that sound plausible, but have essentially zero content. “He’s got a real nose for the ball”. Okay, meaning what? How does one improve ones nose for the ball? Is there a machine in the weight room for that? Is this players nose better than that players nose? How could you compare? “Good institutions” is the equivalent of “having a nose for the ball”. It’s plausibly true, but impossible to quantify, measure, or define.

Another big problem with the empirical cross-country institutions work is courtesy of Glaeser et al (2004). Their point is that our institutional measures are generally measuring outcomes, not actual institutional differences. One example is Singapore, which scores (and scored) very high on institutional measures like risk of expropriation and constraints on executives. Except under Lee Kwan Yew, there were no constraints. He was essentially a total dictator, but happened to choose policies that were favorable to business, and did not arbitrarily confiscate property. But he *could* have, so there is no actual institutional limit there. The empirical measures of institutions we have are not measuring deep institutional, but transitory policy choices.

That leaves us with the whole issue of incredibly small sample sizes, often times in the 50-70 country range, eliminating the possibility of controlling for a number of other covariates without losing all degrees of freedom. And don’t forget publication bias, which means the only things we see in the literature are the statistically significant results that got thrown up in the course of running thousands of regressions with different specifications and measures of institutions.

In short, it may be that institutions do matter fundamentally for development. But the cross-country empirical literature is not evidence of that. There is a fundamental “measurement-before-theory” issue in this field, I think. We don’t know what we should be measuring, because we don’t have any good definition of an “institution”, much less a good theory of how they work, arise, collapse, or mutate. So we end up flinging things that sound “institution-ish” into regressions, without knowing what we are actually measuring.

Next up will be 2nd-generation cross-country empirical work that uses instrumental variables. Spoiler alert: those don’t work either.