Why Information Industrial Classification Diversity Grows

NOTE: The Growth Economics Blog has moved sites. Click here to find this post at the new site.

I read Cesar Hidalgo’s Why Information Grows. Going into it, I really wanted to like it. I really wanted it to give me some insight into one of those fundamental growth questions: what drives the speed of knowledge acquisition?

This is not that book. The beginning is fun for describing basic information theory, and its relationship to entropy. It has some neat examples of how we end up “saving” information from entropy by encoding it in solids like cars, houses, or even the organized binary digits on my computer. But when it comes to translating this into an explanation for why economies grow, there is a breath-taking amount of hand-waving. I could feel the breeze whistling out of my Kindle as I read it.

In the end, Hidalgo says some places are rich because they have complex production structures, meaning they produce goods or services that require a large number of people or firms to interact in some kind of network. These networks embody the “knowledge and knowhow” of the economy. I haven’t quite decided whether this is tautological, but it’s close.

He attempts to offer evidence in favor of his claims by appealing to the data he built with Ricardo Hausmann. This uses detailed export data to build up a measure of how complex (read: diverse) is the number of exports a country produces.

There are a few issues with trying to use this data on complexity with any explanation of economic growth, much less information theory.

1. The measures of complexity are built on export data. That’s because you can get data on exports that is very fine-grained in terms of products, “6-digit” for those in the business. 6-digit classification means you’ve got things like 312120 – Breweries, or 424810 – Beer and Ale Wholesalers. Export data is also great because you can get it bilaterally for a lot of countries. You have data on how much beer Belgium exports to the US, and how much beer the US exports to Belgium.

Export data is available at this level of detail because the transactions get funneled through customs procedures, usually in a limited number of geographic points (i.e. ports), that let you track them closely. You cannot get similar data for an entire economy because there is no equivalent to customs houses tracking the minutiae of all your day to day purchases. Yes, conceptually that data is out there in Target’s or Whole Food’s computers, but we don’t track domestic transactions at that level centrally. Which leads to the first issue. Just because you don’t export a diverse set of products doesn’t mean you don’t have a complex economy. The vast, vast, vast majority of economic transactions are domestic-to-domestic, even in countries with large export sectors. So while I buy that an index of complexity built on export data is highly correlated with actual complexity, it doesn’t necessarily measure total complexity.

2. What is more of a problem is that the measure of complexity is built on the given NAICs system of coding products. As I’ve mentioned before, these kind of industrial classifications are skewed towards tracking manufactured goods, and have not caught up to the complexity of services and the like. The 6-digit code 541511 is “Custom Computer Programming Services”. That is essentially all types of software work: web design, sys admins, app designers, legacy COBOL programmers, etc..

In comparison, code 541511 is “Dog and Cat Food Manufacturing”. 311119 is “Other Animal Food Manufacturing”, like rabbit, bird, and fish food. So we are careful to track the difference in economic activity based on whether processed lumps of food goo are served to dogs as opposed to bunnies. But we do not distinguish between someone designing Flappy Birds from someone doing back-end server maintenance.

This means that your level of complexity depends simply on now detailed NAICs gets. Take two towns. In one, they have a single factory that produces both dog and rabbit food, and they export both. This town looks complex because it exports in two separate NAICs categories. In a second town, they have several firms that do outsourcing for major companies, with different firms doing web design, server maintenance, custom C++ programming, and say three of four other activities. Because all those programming activities fall under a single NAICs category, this second town appears to have a less complex economy. The “knowledge and knowhow” in the second town is likely larger, but NAICs cannot capture this.

This is like saying that bacteria are less genetically diverse than eukaryotes because bacteria are all in one kingdom, while we happen to classify eukaryotes into 5: protozoa, algae, plants, fungus, and animals. But bacteria are known to be more genetically diverse across species than eukaryotes. If you focus on the arbitrary divisions, things can look more or less diverse based solely on your choice of those divisions.

3. Leave all the complaints about the measure of complexity aside. Hidalgo tries to show how important this is for explaining economic growth by…..running a growth regression. He doesn’t call it that. He plots GDP per capita against economic complexity in 1985, and there is a positive relationship. He then says that countries with GDP per capita below the level expected given their complexity in 1985 grew faster from 1985 to 2000, and that this justifies his theory. But that is just a growth regression, except without any explicit coefficient estimate or standard error.

Several issues here. First, he doesn’t bother to mention whether this is statistically significant or not. Second, we’ve spent twenty years in growth complaining about exactly these kinds of regressions because they are completely unidentified. He doesn’t even bother to try and control for any of the obvious omitted variables like savings rates or population growth rates. Most likely, complexity is just another of the long list of things that are correlated with high incomes – institutions, savings, a lack of corruption, etc.. – without having any idea whether they are causal or not.

Somewhere in there, perhaps invisible behind the blur of waving hands, is some kind of insight into how information expands and builds upon itself. That would have been an interesting contribution to our thinking on growth. But the book, as it is, fails to provide it.

28 thoughts on “Why Information Industrial Classification Diversity Grows

  1. When looking at economic complexity of exports and RGDP (PPP) per capita, I found that, with the sole exception of North Korea, economic complexity of exports was, indeed, a lower bound on RGDP per capita (just like distance from equator). So I’m guessing that in all capitalist countries, it’s causal.

  2. “Because all those programming activities fall under a single NAICs category, this second town appears to have a less complex economy.”
    -Ah, but all those programming activities are also less ubiquitous than animal food manufacturing, so they probably count as more complex. And I don’t think Hidalgo & friends count services at all (are services shipped through ports&customs?).

  3. FYI: on intra-national complexity there are a couple of recent papers:
    Balland and Rigby on the US: http://econpapers.repec.org/paper/eguwpaper/1502.htm
    Thor Berger and Carl Frey also on the UShttp://www.oxfordmartin.ox.ac.uk/downloads/academic/Technology%20Shocks%20and%20Urban%20Evolutions.pdf
    And Koegler et. al. on the EU: https://ideas.repec.org/p/egu/wpaper/1515.html

    I’ve been looking for research like this for a while: what I would like to see is more work on “global” cities as many of the metrics I’ve seen are arbitrary and fail to identify “globality” and if that makes any difference. I don’t think an economic analysis is possible even within-country for the reasons you state, but even an in-country analysis would be progress. In the UK we don’t have sub-national trade data which does not help.

  4. FYI: on intra-national complexity there are a couple of recent papers:
    Balland and Rigby on the US http://econpapers.repec.org/paper/eguwpaper/1502.htm
    Thor Berger and Carl Frey also on the US http://www.oxfordmartin.ox.ac.uk/downloads/academic/Technology%20Shocks%20and%20Urban%20Evolutions.pdf
    And Koegler et. al. on the EU: https://ideas.repec.org/p/egu/wpaper/1515.html

    I’ve been looking for research like this for a while: what I would like to see is more work on “global” cities as many of the metrics I’ve seen are arbitrary and fail to identify “globality” and if that makes any difference. I don’t think an economic analysis is possible even within-country for the reasons you state, but even an in-country analysis would be progress. In the UK, for example, we don’t have sub-national trade data which does not help.

    • Nice. As is usual, there is always some literature on a subject I’m unaware of. It’s got to be something that is worth measuring, and technology probably makes this a lot easier now.

  5. Prof. Vollrath,

    I was quite interested in this book; I thought it had scooped me on an idea I’ve been working on! It turns out it didn’t. However if you are interested in a less hand-wavy application of information theory to growth models, I put together the following post awhile ago. I wrote it after Paul Romer read Hidalgo’s book so it functioned as an open letter to him (no response, though).

    http://informationtransfereconomics.blogspot.com/2015/07/maybe-paul-romer-would-be-interested-in.html

    It turns out, for example, the Solow model can be seen as information equilibrium between output, labor and capital (the information in changes in each of those aggregates is communicated faithfully to the others). Actually, the framework reproduces many standard economic models … some turn out to be empirically valid, others less so.

    Cheers,

    Jason Smith

    • I recommend Jason’s link (and its associates on his blog) to anyone who has ever wondered what would happen of someone from another profession decided to extract and play with all the mathematical toys (“toys” was Solow’s term, at one point) to be found in a macro article. Just trying to track all the validity questions raised by such an exercise with either drive you crazy or make you a better person. Certainly the “mathiness” debate takes on a new light.

      And is any of this “informational economics” valid? I can’t claim to know, but my view of what was going on did change. At first I wondered if the “I” that Jason introduces really adds anything to the “A” in the Solovian production function. But trolling through his posts I came to see them as contributions in the tradition of attempts to endogenize “A”, with “A” being renamed (and perhaps liberated) in the process.

      What caught my eye about this work is that – with no apparent knowledge of either the endogenous technology human capital twists to the Solow model debate or of cross-country empirics – Jason comes to the same conclusion reached by Mankiw, Romer, Weil: that the “capital share” needs to be a lot bigger than traditionally assumed in order to explain the facts. And he has reasons (“reasons” should be in italics”) this would be true (i.e., why capital investments should drive more output per dollar than labor investments) – ones that stand our usual conclusions about rivalrous and non-rivalrous factors on their heads.

      Give these posts a half hour and I guarantee you’ll come away with something to think about.

      • Thank you for your critical review.

        I will say that I am aware of some of the human capital models and cross-national studies (I haven’t analyzed more than the UK and Mexico so far, but came to the same conclusions as for the US). I think a better way to characterize my approach with regard to TFP and/or human capital models was to try:

        Y = f(K)
        Y = f(K, L)
        Y = f(K, L, A)
        Y = f(K, L, A, B)

        in turn until I reached a good empirical description using the model. The process stopped at step 2, so there was no need to include A. But I agree: seen from the history of economic thought, that does seem like an attempt to endogenize A. It is dropping the requirement of constant returns to scale that does most of the work.

        In physics, the argument for atoms being (in-principle) indistinguishable atoms comes from the same place as the argument for constant returns to scale (making observables ‘extensive’), but capital and labor are not in-principle indistinguishable in the same way as atoms (people have names and pieces of equipment have brands and serial numbers). This distinguishability allows the whole to be more than the sum of its parts; it’s called Gibb’s paradox in physics (and is wrong) but I am under the impression it may be a genuine effect in growth economics (and in other areas of macroeconomics).

        Of course, I could be wrong. I’ve just finished up a draft paper that I’d love some feedback on before I submit to the economics e-journal:

        http://informationtransfereconomics.blogspot.com/2015/08/information-equilibrium-as-economic.html

      • I’d have to think harder on that. There is clearly some issue we have in stepping from tangible, observable inputs (that machine with a serial number) to an aggregate production function. It may be that we just have to get comfortable with it being a nebulous notion, and not try so hard to link it to something precisely real.

      • I forgot to add that the reason for output above and beyond constant returns to scale is deeply connected to the measurement of output in terms of money. If you had widget machines, widget machine operators and widgets of output, there would likely be constant returns to scale.

      • Actually, you touch a question here that I have dabbled with: How is growth theory affected by the fact that the “Y” of a production function can’t be reconciled with national income data without a fudge factor (that turns out to be different for each country) to turn the single good into money. Is this important in any way?

        Thanks for the link – I’ll look at your paper and comment separately.

      • You have to get away with thinking of Y as money. It’s not, its “real output”. We (unfortunately) made a choice a long time ago to count real output in “real dollars”. But in reality it would be much more clear if we simply picked a single producct (a can of Diet Coke, for example), and counted Y in those terms. Output in the U.S. would be reported in Diet Coke equivalents, making it clear that we are (trying) to count real output.

      • Maybe if we *only* produced hard goods. But services are real output, even though they produce nothing tangible, and they could easily have IRS. So I don’t think it’s a money thing that is the problem.

  6. off topic but “we’ve spent twenty years in growth complaining about exactly these kinds of regressions because they are completely unidentified.”

    Exactly! I have been grinding my teeth whilst reading Africa: Why Economists Get It Wrong which thus far gives impression this hadn’t occurred to economists.

    I’d like to see you review that book.

  7. Very interesting post! But I would think that the “complexity” measure that Hidalgo uses would tend to underestimate the differences between poor and rich countries, given that the latter are probably disproportionately specialized in those advanced sectors that the NAICS classification fails to split up appropriately. Am I missing something?

    • It could very well underestimate the difference. The point isn’t that complexity is *not* related to GDP per capita, but that the relationship Hidalgo uses tells us nothing about causality. Even if measured perfectly, that would still be a problem.

      • Sure no doubt about that. My point is simply about measurement, i.e. the classification problems that you mention are particularly severe for rich countries, so that they might appear less “diverse” than what they actually are.

        Of course this is far from telling us something about causality! The fact that rich countries export a wider set of products is quite interesting in itself, even though not really new, for example I believe that Hummels and Klenow (2005) dig deeper into that.

        Click to access Hummels&Klenow.pdf

  8. Even more off topic:

    This is like saying that bacteria are less genetically diverse than eukaryotes because bacteria are all in one kingdom, while we happen to classify eukaryotes into 5: protozoa, algae, plants, fungus, and animals.This is like saying that bacteria are less genetically diverse than eukaryotes because bacteria are all in one kingdom, while we happen to classify eukaryotes into 5: protozoa, algae, plants, fungus, and animals.

    Boy, are you behind the times. It’s all about cladistics now.
    Kingdoms have been superceded by domains, in which bacteris and eukaryotes (and archaea) are given equal billings. It’s almost as though Hidalgo had performed his analysis using SIC codes (instead of NAICS codes)!.

    See
    https://en.wikipedia.org/wiki/Domain_(biology) or
    https://en.wikipedia.org/wiki/Kingdom_(biology)#Modern_view

    Get with the program, man!

  9. Dietz: Thanks for the helpful comment above on “real output”, which I initially overlooked. So to summarize, the validity of a aggregate production function is not in its ability to reproduce national accounts (an ability it doesn’t have) but to correlate to national accounts. That’s easy – and easy to lose track of for the reason you mention. (Of course, there is a whole literature on how this correlation is a “statistical artifact”, but that’s would need to be a whole new thread.) TB

  10. Your criticism about how to measure complexity seem overdone. The shortcomings of the industrial classification are apparent, but my takeaway is surprise that the Complexity approach works so well despite the way it treats services.

    On whether or not complexity really does “work well”- perhaps this piece by the IMF provides some better empirics: http://www.imf.org/external/pubs/ft/reo/2015/whd/eng/pdf/chap5.pdf

    Also, since you are into Hip Hop, I’ve created this introduction to Hausmann/Hidalgo’s work just for folks like you: http://www.youtube.com/watch?v=ne6OUJPWUpk

    (BTW – I’m a big fan of your blog, thanks for all the great posts)

    • Nice! Love the video. But in reply, the IMF report suffers the same issue. You can arbitrarily measure complexity as high or low depending on how you set the categories. The various categories in the ISIC are not equally specific. I don’t even know how one would ensure they were.

      And what do we mean by works well? Complexity is associated with the level of GDP? That could easily be because the high GDP countries originally wrote the ISIC codes in the 1950s.

      But I can be convinced! I just want to see some better reasoning that the codes are not driving this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s