## DATA AND METHODOLOGY

We recognize that regional level entrepreneurship is a multifaceted phenomenon where individual capabilities and actions are contextualized by institutional incentives. This approach proposes that the building blocks (pillars) of entrepreneurial activity cannot be viewed in isolation. On the contrary, they constitute a system where the final outcome is moderated by the weakest performing pillar. Different economic ecosystems will have different outcomes in different parts of the world as the different agents and institutions interact.^{1}

**1. The structure of the Regional Entrepreneurship and Development Index**

In the following, we define and describe the structure of the REDI. We propose six level index-building: (1) sub-indicators (2) indicators (3) variables, (4) pillars, (5) sub-indices, and finally (6) the REDI super-index.^{2}

Table 1. The structure of the Regional Entrepreneurship and Development Index

Structure of the GEDI 3 Sub-indexes 14 Pillars |
National and regional institution variables | Regional level individual variables | |
---|---|---|---|

Entrepreneurial Aspiration Sub-index |
Financing | FINANCIAL INSTITUTIONS | INFORMAL INVESTMENT |

Globalization | CONNECTIVITY | EXPORT | |

High growth | CLUSTERING | GAZELLE | |

Process innovation | TECHNOLOGY DEVELOPMENT | NEW TECHNOLOGY | |

Product innovation | TECHNOLOGY TRANSFER | NEW PRODUCT | |

Entrepreneurial Ability Sub-index |
Competition | BUSINESS STRATEGY | COMPETITORS |

Human Capital | EDUCATION & TRAINING | EDUCATION LEVEL | |

Technology sector | ABSORPTIVE CAPACITY | TECHNOLOGY LEVEL | |

Opportunity start-up | BUSINESS ENVIRONMENT | OPPORTUNITY MOTIVATION | |

Entrepreneurial Attitudes Sub-index |
Cultural support | OPEN SOCIETY | CARRIER STATUS |

Networking | SOCIAL CAPITAL | KNOW ENTREPRENEURS | |

Risk acceptance | BUSINESS RISK | BUSINESS ACCEPTANCE | |

Start-up skills | QUALITY OF EDUCATION | SKILL PERCEPTION | |

Opportunity perception | MARKET AGGLOMERATION | OPPORTUNITY RECOGNITION |

*Source: *edited by the authors.

The three sub-indices of attitudes (ATT), abilities (ABT), and aspiration (ASP) constitute the entrepreneurship super-index, which we call REDI. All three sub-indices contain four or five pillars, which can be interpreted as quasi-independent building blocks of this entrepreneurship index. Each of our 14 pillars is the result of the multiplication of an individual variable and an associated institutional variable. In this case, institutional variables can be viewed as particular (country- or regional-level) weights of the individual variables.

**2. REDI individual and institutional data description**

Our index incorporates both individual-level and institutional variables. Individual-level variables are based on indicators from the GEM Adult Population Survey dataset except two innovation indicators that are from the European Union data collection^{3}. For this report we used the 2007-2011 pooled GEM data^{4}. In most cases – eleven out fourteen – the individual indicators were used directly as variables. In the remaining three cases we multiplied two indicators to calculate the variables. The New Product* *and the New Technology* *variables combine together a GEM based and another regional level innovation variable derived from the Poli-KIT database (Capello and Lenzi, 2013). The Prod Innovation* *and the Tech Innovation* *indicators serve to correct for the potential bias in the GEM’s self-assessed questionnaire. The Informal investment* *variable is a result of the multiplication of the mean amount of informal investment (Informal Investment Mean) and the prevalence of informal investment (Business Angel), both of them are coming from the GEM survey. Therefore, Informal investment combines together two aspect of informal finance providing a more accurate measure about the availability of startup capital of a region. The main concern for the individual variables used is the availability of a representative sample size for each of the regions.

However, the adaption of institutional variables for regional analyses is more complicated. Since the GEM dataset lacks the necessary institutional variables, we complete it for the index with other widely used relevant data derived from different sources^{5}. There are two types of institutional variables, country level and regional ones. Our original idea was to construct the institutional variables from fourteen country wide and fourteen regional indicators. The later would have reflected to spillovers effective mainly in smaller than county level geographic areas. However, in many cases we faced the lack of available data. Several options exist to overcome this limitation. One possible solution is to use closely correlated regional proxies to substitute for a missing variable. Another possible solution is to simply use the same country level institutional variables for all regions. In these cases where this method is used, the pillar level value would correspond entirely to the variations in the individual level variable used. Though the institutional variance would be missing, it is likely that the variance of the institutional variables within a country is much lower than the variance between countries. In light of the lack of regional institutional level data for REDI pillars, we applied a mixed method, incorporating all three alternative approaches. The idea behind the regional entrepreneurship index construction is to find regional level institutional data that are available also in the country level. If the regional institutional data are lacking then regional proxy and/or country level institutional data were applied. The selection criteria for a particular institutional variable were:

- The potential to link logically to the particular entrepreneurship variable.
- The clear interpretation and explanatory power of the selected variable; for example, we have had interpretation problems with the taxation variables.
- Avoiding the appearance of the same factor more than once in the different institutional variables
^{6}. - The pillar created with the particular variable should positively correlate to the REDI.

Finally, we ended up having nine country-wide and thirteen regional indicators. Institutional variables are more complex, some of them contain many sub-indicators. 76 sub-indicators are the basic building units of the institutional indicators and variables, and of the 40 variables we used to calculate the REDI scores for the mix of 125 NUTS1 and NUTS2 regions of 24 European Union countries.

A potential criticism of our method – as with any other index – might be the apparently arbitrary selection of institutional variables and the neglect of other important factors. In all cases, we aimed to collect and test alternative institutional factors before making our selection. Our choice was constrained by the limited availability of data in many regions. To eliminate potential duplication, instead of using existing complex institutional variables offered by different research agendas, we created our own complex indexes using relevant simple indicators or sub-indicators. In this version, we apply the most recent institutional variable indicators available on June 30. 2013.

As a general rule of regional level institutional variable calculation, if data were not available at NUTS1 level, we calculated the population weighted mean of the available NUTS2 regions. In cases, when both NUTS1 and NUTS2 regions were not available, NUTS0 (country level) were used as substitutes. NUTS0 data were used in Germany, France and Finland, because the lack of Technological Absorption data at NUTS1/NUTS2. We also endeavored to substitute other missing NUTS1 or NUTS2 level data. For handling the extreme distribution of the institutional indicators we apply the Box-Cox transformation method to improve the distribution of those indicators that are out of the [-1, 1] range of skewness* *(Annioni and Kozovska, 2010, 52-53).

**3. The creation of the Regional Entrepreneurship and Development Index**

We have defined entrepreneurship as the dynamic interaction of entrepreneurial attitudes, abilities, and aspirations and developed the Penalty for Bottleneck (PFB) methodology for measuring and quantifying these interactions (Acs et al., 2013; Rappai and Szerb, 2011). Bottleneck is defined as the worst performing weakest link, or binding constraint in the system. With respect to entrepreneurship, by bottleneck we mean a shortage or the lowest level of a particular entrepreneurial indicator as compared to other indicators of the index. This notion of bottleneck is important for policy purposes. Our model suggests that attitudes, ability and aspiration pillars interact, and if they are out of balance, entrepreneurship is inhibited.

The sub-indices are composed of four or five components, defined as pillars that should be adjusted in a way that takes this notion of balance into account. After normalizing the scores of all the pillars, the value of each pillar in a country is penalized by linking it to the score of the pillar with the weakest performance in that country. This simulates the notion of a bottleneck, and if the weakest pillar were improved, the particular sub-index and ultimately the whole REDI would show a significant improvement. To the contrary, improving a relatively high pillar value will presumably enhance only the value of the pillar itself, and in this case a much smaller increase of the whole REDI index can be anticipated. Moreover, the penalty should be higher if differences are higher. Looking from either the configuration or the weakest link perspective it implies that stable and efficient configurations are those that are balanced (have about the same level) in all pillars. Mathematically, we model the penalty for bottlenecks by modifying Tarabusi and Palazzi (2004) original function for our purposes. The penalty function is defined as:

*h*(*i*)*, j = min y*(*i*)*,j + *(1 − *e*^{−(y(i),j = miny(i),j)})

(1)

where *h _{i,j}* is the modified, post-penalty value of pillar j in region i

*y*is the normalized value of index component j in region i

_{i,j }*y*is the lowest value of for region i.

_{min }i = 1, 2,……n = the number of regions

j = 1, 2,.… ..m = the number of pillars.

Definitely, the advantage of this method is that it is an analytical method, therefore it is not sensitive to the size of the sample. There are two potential drawbacks of the PFB method. One is the arbitrary selection of the magnitude of the penalty. The other problem is that we cannot exclude fully the potential that a particularly good feature can have a positive effect on the weaker performing features. While this could also happen, most of the entrepreneurship policy experts hold that policy should focus on improving the weakest link in the system. Altogether, we claim that the PFB methodology is theoretically better than the arithmetic average calculation. However, the PFB adjusted GEDI is not necessary an optimal solution since the magnitude of the penalty is unknown. The most important message for economic development policy is that improvement can only be achieved by abolishing the weakest link of the system which has a constraining effect on other pillars.

All index building is based on a benchmarking principle. The selection of the proper benchmarking considerably influences the index points and also the rank of the countries. However, the existence of outliers could lead to set up inappropriate benchmarks. Hence, we need to handle extreme value outliers. Capping is a frequently used tool to handle outliers. The question relates to the value of the cap. In our case we selected the 95 percentile score adjustment meaning that any observed values higher than the 95 percentile is lowered to the 95 percentile. It also means that at least five percent of different regions reach the maximum value in all of the 14 pillars.

Like other composite index components, our pillars are in different magnitudes. In order to be in exactly the same range, the normalization of the pillars is necessary. After handling the outliers we normalize the pillar values, where distance normalization technique was used that preserves the distance (relative differences) amongst the regions:

(2)

for all j = 1,..m, m=14 is the number of pillars

where *x _{i,j}* is the normalized score value for region i and pillar j

*z*is the original pillar value for region i and pillar j

_{i,j}MAX

_{i}

*z*is the maximum value for pillar j.

_{i,j}Applying the distance methodology the pillar values are all in the range [0, 1], however the lowest pillar value is not necessary equal to 0. In this case all regions’ efforts are evaluated in relation to the benchmarking region but the worst region is not set to zero per se.

The different averages of the normalized values of the 14 pillars imply that reaching the same performance requires different effort and consequently resources. Higher average values – e.g. Opportunity startup – could mean that it is easier to reach better scores as compared to lower average value – e.g. Financing. Since we want to apply REDI for public policy purposes, the additional resources for the same marginal improvement of the pillar values should be the same for all of the 14 pillars, on the average. So improving by 0.1 unit Opportunity startup should require the same additional resource as compared to all the other 13 pillars. As a consequence, we need a transformation to equate the average values of the 14 pillars. Practically we have calculated the average values of the 14 pillars after the capping adjustment and the normalization and made the following average adjustment: Let’s x_{i} to be the normalized score for region i for a particular pillar j. The arithmetic average of pillar j for number *n *regions is:

for all j

(4)

We want to transform the *x** _{i,j}* values such that the potential values to be in the [0,1] range.

_{
}

(5)

where * ^{k}* is the “strength of adjustment”, the

^{k }^{th}moment of

^{X }^{j}is exactly the needed average,. We have to find the root of the following equation for

*:*

^{k}(6)

It is easy to see based on previous conditions and derivatives that the function is decreasing and convex which means it can be quickly solved using the well-known Newton – Raphson method with an initial guess of 0. After obtaining k, the computations are straightforward. Note that if

that is *x ^{k}_{i} *be thought of as the strength (and direction) of adjustment.

After these transformations, the PFB methodology was used to create pillar-adjusted PFB values according to the above mentioned exponential function (see. equation 1). Due to the average pillar adjustment the marginal rate of substitution becomes the same for all indicators. However, the real substitution rate of the pillar values of a particular region depends on the weakest pillar’s relative ratio compared to other pillars. Most importantly, the penalty function should reflect to the magnitude of the penalty, lower difference implies lower penalty while higher unbalance implies higher penalty. The penalty function also reflects to the compensation of the loss of one pillar for a gain in another pillar. The value of a sub-index for any country was then calculated as the arithmetic average of its PFB-adjusted indicators for that sub-index multiplied by 100 to get a 100 point scale.

(7a)

(7b)

(7c)

where is the modified, post-penalty value of pillar j in region i

i = 1, 2,……n = the number of regions

j= 1, 2,.…..14= the number of pillars.

The REDI super-index is simply the arithmetic average of the three sub-indices:

*REDI _{i} = ⅓(ATT_{i} + ABT_{i} + ASP_{i}*)

(8)

where i = 1, 2,……n = the number of regions.

Since 100 represents the theoretically available limit the REDI points can also be interpreted as a measure of efficiency of the entrepreneurship resources (Lafuente, Szerb and Acs, 2015).