Introduction

Socio-economic changes in society have become a driving factor of rising demand for healthcare services. The objective deficit of public healthcare entities has led to dynamic development in the private healthcare sector, especially in the area of new private medical businesses. The environment for private medical businesses consists of factors having influence on new business formation, in both positive and negative ways. Hence, there are questions not only about overcoming negative factors perceived as threats, but also about the factors perceived as opportunities for formation of new private medical businesses.

The main purpose of this research study is to identify strategic factors which have direct influence on entrepreneurship in the private healthcare sector. The complexity of the phenomenon imposes implementation of an unconventional approach in this field of exploration.

Our approach is based on Intelligent Data Analysis (IDA) - a methodology that includes a set of techniques that can be applied for extracting useful knowledge from large amounts of data. In order to indicate the most important factors of new business formation in the private healthcare sector, were applied explanation techniques – decision rules –to express mentioned relationships.

There search allowed us to identify and describe the variable that plays the crucial role in explaining the reasons new firms are established in the healthcare sector. The study links the explanatory variables with the type of municipality, and provides the answer to the question of which factors are responsible for entrepreneurship in the private healthcare sector due to the municipality type. Moreover, the results proves that slightly different factors are responsible for successful entrepreneurship support in municipalities with different numbers of already existing private healthcare entities.

Summarizing, this study showed what variables are important for the given category of muncipality and for the level of entrepreneurship in that municipality.

Literature review

It is widely recognized that the success and vitality of entrepreneurship are essential factors in measuring an economy’s progress, its quality and its future expectations. Entrepreneurship is closely related to SMEs (Small & Medium Enterprises) and large companies in local, regional, national or international markets, in private and public organizations and helps lead to competitiveness in the face of the effects of globalization. Entrepreneurial activities are important in creating new economic activities which in turn increases innovation, employment, economic wealth and growth, consolidates competitiveness in advanced economies and assures social welfare in less economically developed countries (Audretsch et al. 2005, p. 5).

Entrepreneurship is one of the most important forces shaping the changes in an economic area, regardless of whether it occurs within the framework of the formal structure of the economy or takes place informally outside state regulatory systems (Carree and Thurik 2010; Thurik et al. 2002; Williams and Nadin 2010 etc.).

Numerous definitions of entrepreneurship are used in the literature. P. Drucker’s definition of entrepreneurship is: an act of innovation that involves endowing existing resources with new wealth-producing capacity (Drucker 1985). S. Shane and S. Venkataraman define entrepreneurship as the issue which involves the nexus of two phenomena: the presence of lucrative opportunities and the presence of enterprising individuals (Shane and Venkataraman 2000, p. 218). In Williams and Thomson’s (1998) definition, entrepreneurship is related to productivity and it is assumed that the entrepreneurs are responsible for determining optimal production, investment, and financing decisions. For J.A. Schumpeter, the entrepreneur was identified with the function of carrying out new transformations and combinations which were usually embodied in new firms, which arise not out of the existing firms but grow up beside them (Schumpeter 1961, pp. 66–78).

Starting a business is not an event, but a process which may take many years to evolve and come to fruition. Entrepreneurial research has developed along two main lines:

  1. (1)

    the personal characteristics or traits of the entrepreneur; and

  2. (2)

    the influence of social, cultural, political and economic contextual factors (Mazzarol et al. 1999, p. 49) initial approach, including the personal characteristics, was conducted by A.T. Robinson and L.D. Marino. Firstly, research study gives empirical evidence that overconfidence is significantly related to venture creation decisions. According to the results, when overconfidence increases, venture creation decisions will increase as well. Secondly, the relationship between overconfidence and venture creation decisions is partially mediated by risk perceptions (Robinson and Marino 2015).

F. Miralles et al. conducted research study related to the topic topic: how individuals engaged in the actual behavior could provide differences in the perceptions and other intention’s antecedents? The results are as follows: actual behavior could be a source of differences across individuals, specifically if we also take into consideration different age brackets. The findings suggest that being exposed to the actual behavior of entrepreneurship would strengthen the influence of personal attitude and perceived behavioral control on entrepreneurial intention for younger individuals, meanwhile, it would weaken the relationship between perceived behavioral control and entrepreneurial intention for older individuals (Miralles et al. 2017, p. 899). Similar research studies dedicated to individual characteristics of entrepreneurs (including background such as gender, age, civil status, educational level, entrepreneurial culture and other important success factors), was conducted by Stringa et al. (2009) or Orlandi (2017).

Theory development and research into the relationship between the environment and organisation formation is a more recent event. Advocates of this approach believe that the entrepreneurial trait perspective has reached a dead end (Aldrich 1990) and has partially contributed to the understanding of new firm formation. The study of the role of the environment, the so-called rates or demand perspective (Peterson 1980; Richardson 2001; Richardson and Peacock 2006), is seen as a more viable approach. While not denying the role played by the founders’ characteristics, the demand perspective proposes that the environment is more important in understanding organisation formation.

The environment plays a crucial role in the formation of entrepreneurship. Timmons’ (1989) paper suggests that external factors have impact on the success of entrepreneurship. Furthermore, he assumes that the key to successful entrepreneurship is determining and applying the opportunities and being able to match the situation and organization to the important players. In turn, Kuratko and Hodgetts (1998) assume that entrepreneurship is made up of multidimensional processes including the impact on environment (internal and external as well), organizations, and individuals. According to their concept, the external environment consists of two parts: the societal environment (including economic, political, legal and technological forces), and task environment (which is related to the specific industry environment).

According to societal environment E. Hormiga and A. Bolívar-Cruz have examined the question of whether the ‘migrant condition’ (that is, the experience of being an immigrant) has an impact on the perception of the risks involved in engaging new business activity. The results are as follows: immigrants are less likely to perceive risk in making a new business than natives. What is more, the results have given the picture of negative relationships between the perception of risk and formation a new business, thus confirming that tolerance to risk is a crucial characteristic of entrepreneurs (Hormiga and Bolívar-Cruz 2014, p. 313).

P. A. Nylund and B. Cohen in their paper indicates that collision density is indeed a crucial factor for the development of entrepreneurial ecosystems. In this work, collision density was defined as the potential frequency of interdisciplinary interactions to explain dynamic growth of entrepreneurial ecosystems (Nylund and Cohen 2017).

The multidimensionality of the definition of entrepreneurship can be found in both: the way it is determined and in the way it is measured. A frequently implemented approach assumes using economic definitions of entrepreneurship based on two functions: the entrepreneur and the perception of economic opportunities and innovations. In turn, the second approach assumes the use of those definitions from the managerial world, where entrepreneurship is related to a way of managing. Referring to the second area of multidimensionality (measurement), two approaches are suggested: static and dynamic perspective. Business ownership and self-employment are frequently considered equivalent of entrepreneurship and those types of measures can be the basis for static indicators (Carree et al. 2002; Uhlaner and Thurik 2007). From the point of view of the second perspective (dynamic), the proposed measures of entrepreneurship are based on latent (preference), nascent and start-up activity (Grilo and Irigoyen 2006).

The healthcare sector is similar to others in the area of environment conditions, structure and strategies. Detailed commonalities of the healthcare sector and others in the area of environment include turbulence, inflexibility, and high competitiveness. In turn, structural similarities include new entrants, mergers and consolidation. The latter strategies have moved in the direction of cost accounting and strategic alliances. Accordingly, the healthcare sector is determined by unstable and ruthless environment circumstances. In light of these environment variables, healthcare has undergone structural and strategic changes and innovations to achieve organizational economies of scale, improve utilization of resources, enhance access to capital, increase political power and extend the scope of the market (Zuckerman et al. 2000).

Entrepreneurship research studies are highly recommended for the healthcare sector, as nowadays owners of medical entities perform entrepreneurial activities in order to generate innovative strategies and achieve a competitive advantage in the conditions of the turbulent environment. Chicken (2000) clarifies that businesses conduct entrepreneurial accomplishments for the exploitation of revenues or benefits. In private entities, entrepreneurial actions affect profit measured by monetary terms. As further healthcare entities convert to for-profit status, entrepreneurial activities would arise when they compete for market share or profit. In not-for-profit healthcare organizations, the benefit of medical treatment can be seen through the prism of organizational existence, reputation, development and chances. These circumstances also involve multidimensional strategies and necessitate the implementation of entrepreneurship in healthcare entities. Moreover, the complex healthcare environment needs more inventive solutions. Therefore, healthcare entities are beginning to exploit entrepreneurship in their management techniques.

An example of the applicability of entrepreneurship to the healthcare sector was described by Chicken (2000). He offers a number of entrepreneurial activities for a range of sectors. For instance, he finds these activities in financial services (banking and insurance sectors), manufacturing, agriculture, transportation, mining, fishing, hotels, media, civil services and government. He further summarizes that entrepreneurial activities occur under three circumstances. First, operations must be carried out in the open market. Second, some operations must be funded or subsidized by government. Third, operations could be completely funded by government. Using this formula, it is clear that entrepreneurial activities can occur in the healthcare sector, since healthcare organizational activities satisfy the first two criteria (Guo 2003, p. 50).

To sum up, multidimensional phases are required to assess the environment and organization prior to making changes implementing innovative strategies. Indeed, entrepreneurship is applicable to the healthcare sector as it has been successfully utilized in other sectors. It can be identified as a gap between the theory of entrepreneurship in healthcare organizations and research studies in the area of formation. This research study, therefore, aims to fill the research gap (between theory and practice), by exploring the set of factors affecting new business formation in the private healthcare sector giving the answer to the question of what are the most important factors of new business formation in healthcare, especially in the private sector. To do this factors were applied with the explanation technique - rule induction.

Rule induction

Rule induction – one of the fundamental tools of Data Mining – allows for easy interpretation of dependences hidden in data. Usually rules are expressed in the following form:

$$ \mathrm{IF}\ \left({\mathrm{attribute}}_1,{\mathrm{value}}_1\right)\ \mathrm{AND}\dots \mathrm{AND}\ \left({\mathrm{attribute}}_{\mathrm{n}},{\mathrm{value}}_{\mathrm{n}}\right)\ \mathrm{THEN}\ \left(\mathrm{decision},\mathrm{values}\right). $$
(1)

Data from which rules are induced are usually presented in a form of decision table (Pawlak 1982). Rows of the table represent nonempty and finite a set of cases (also called as objects or examples), while columns represent nonempty and finite set of variables. Each variable has a finite set of values. Independent variables are called attributes and a dependent variable is called a decision. The set of objects with the same value of decision attribute is called a decision class (or concept).

Construction of elementary conditions (attributei, valuei) in (1) may be various and depends on a rule induction algorithm. The most popular technique is rule induction using a sequential covering algorithm (Clark and Niblett 1989; Han and Kamber 1986) which creates such a number of rules to assure that every object in the training data “is covered” by at least one rule. Other techniques are connected with the induction based on rough set theory (Pawlak 2002; Grzymała-Busse and Yao 2011) or induction based on other formalism of knowledge representations (Quinlan 1986; Cohen 1995; Carvalho and Freitas 2004; Mroczek and Hippe 2015). In general, the aim of majority of rule induction algorithms is to find the minimal set of classification rules which cover and correctly predict decision classes of a given set of examples.

Rule quality measures

Many measures of rule quality assessment concern the relation between a decision rule and a class. Examples satisfying all elementary conditions are assigned to the concept indicated in the rule conclusion. The positive objects are those belonging to the decision class pointed out in the rule conclusion. The negative objects are the remaining ones. The relations can be presented in the form of a contingency table.

Let p denote the number of positive examples covered by the rule and P denote all positive examples in the training set. Let n denote the number of negative examples covering the rule and N denote all negative examples. The contingency matrix for rule has the following form:

figure a

where p + n – is the number of objects which recognize the rule; P + N -p - n is the number of objects which do not recognize the rule; P is the number of objects which belong to the decision class described by the rule and N is the number of objects which do not belong to the decision class described by the rule.

There are two basic rule quality measures - accuracy and coverage:

$$ Acc=\frac{p}{p+n} $$
(2)
$$ Cov=\frac{p}{P} $$
(3)

The accuracy reflects the correctness of the rule, the coverage reflects the applicability of the rule. Both measures are not independent of each other and when considered simultaneously give the complete view of rule quality. It is desirable for a rule to be accurate as well as to have a high degree of coverage. But with the increase of accuracy the rule coverage decreases. Therefore, to define the rule quality measures a large number of tests is required taking into account the accuracy and coverage at the same time. Taking into consideration the origin of data in the experiments additionally rule quality measure was used.

An entropy of a variable v (attribute or decision) with values v1, v2,...,vn is defined by the following formula:

$$ Info(U)=-{\sum}_{i=1}^np\left({v}_i\right)\bullet \log p\left({v}_i\right) $$
(4)

where U is the set of all cases in a data set and p(vi) is a probability (relative frequency) of value vi in the set U, i = 0; 1,…, n. Entropy of a set is understood as the number of information points necessary to communicate whether a certain training object belongs or not to the decision class described by the rule. Whereas the number of information points necessary to communicate whether certain object is or is not recognized by the rule is the conditional entropy of the decision d given an attribute a is:

$$ Info\left(d|a\right)=-{\sum}_{j=1}^mp\left({a}_j\right)\bullet {\sum}_{i=1}^np\left({d}_i|{a}_j\right)\bullet \log p\left({d}_i|{a}_j\right) $$
(5)

where a1, a2,.., am are all values of a and d1, d2,…,dn are all values of d.

Rule induction algorithm

A modified version of Quinlan’s classification model (Quinlan 1986), called C5.0, was used for rule induction (generation). The algorithm splits the objects maximizing the information gain. Information gain is based on the idea of entropy, a measure of uncertainty from information theory. Each subsample defined by the first split is then split again and the process repeats until the subsamples cannot be split any further. Finally, the lowest-level splits are reexamined, and those that do not contribute significantly to the value of the model are removed or pruned (Pang and Gong 2009; Pandya and Pandya 2015).

Data and methodology

The research study covered a period of six years. The base year was 2011. The data source was publicly available statistics of the Local Data Bank (LDB) of the Central Statistical Office (CSO). The base year was selected because of the fact the Act of 15 April 2011 on medical activity (the Act of 15 April 2011 r., on medical activity, Dz.U.2016 poz. 1638) modifying the organization of the Polish healthcare sector began to apply. The last year of the analysis was 2016, because it was the last year for which statistics were available.

The research was designed for the entire population of municipalities in Poland. Municipal level government offers the broadest instruments supporting entrepreneurship. According to this, the scale of the impact on the bottom-up opportunities to create entrepreneurship is the largest (Bania and Dahlke 2014; Kogut-Jaworska 2008; Wyszkowska 2012). Their number according to the state of 2011 was 2479–306 urban municipalities, 1571 rural municipalities and 602 urban-rural municipalities (available at: http://eteryt.stat.gov.pl/eteryt/raporty/WebRaportZestawienie.aspx, date of access 1st October 2017). Finally, due to the lack of data for the analysis, 2408 municipalities were selected. The research sample was formed by all the municipalities in Poland, for which data were available in 2011–2016 to describe entrepreneurship in the private healthcare sector. Taking into account the above criteria, the estimations were made on a sample of 302 urban municipalities, 602 urban-rural municipalities and 1504 rural municipalities.

The explained variable (decision) was entrepreneurship in the private healthcare sector described by the indicator defined as the number of newly registered private economic entities in Section QFootnote 1 in relation to the working-age population in the municipality. Division on a sample of 2408 cases was carried out due to the value of the entrepreneurship indicator. Taking into account the accepted criteria in the classification of municipalities allowed us to prepare the division of the examined units in a way that provided the individualizing of single municipalities (Table 1).

Table 1 Number of municipalities according to the divisions of entrepreneurship indicator in 2011–2016

The division was conducted using two criteria: a) the value of the indicator of the entrepreneurship, b) type of municipalities (urban, urban-rural, rural). The argumentation for such a division is as follows: it is impossible to evaluate equally low levels of entrepreneurship in urban and rural municipalities. A similar situation can be observed when we have two municipalities: with low and high levels of entrepreneurship, where respectively the first one is a rural municipality and the second is a big city. Among municipalities a separate division was created for large cities (Warsaw, Krakow, etc.) (above 1,00). Their exclusion was dictated by the very high value of the analysed indicator implying correct interpretation.

The group of explanatory variables consisted of 16 factors divided into three groups (Table 2). The first one consisted of variables referring to the municipal budget policy (variables numbered from 1 to 6 inclusive). The second group included social variables. They covered variables numbered from 7 to 13 (inclusive). The third group represented economic variables expressed by factors numbered from 14 to 16 (inclusive). The last explanatory variable was the municipalities category.

Table 2 Explanatory variables used in estimation

The goal of our study was to identify strategic variables which have direct influence on entrepreneurship in the private healthcare sector. To this end we applied a methodology based on rule induction. The C5.0 algorithm – capable of generating rules - was used in this study. In the first step we used the MultipleScanning method in order to discretize numerical valuables, where during every scan the entire attribute set is analyzed. For all attributes the best cut-point is selected. This process continues until the same stopping criterion is satisfied. Although the C5.0 algorithm has an internal discretization mechanism the Multiple Scanning discretization technique is significantly better than the one used in C5.0 (Grzymała-Busse and Mroczek 2016). Then we induced rules and examined their effectiveness using a ten-cross validation procedure. To this end all cases were randomly re-ordered, and then a set of all cases was divided into ten mutually disjoint subsets of approximately equal size. All but one subsets were used for rule induction, while the remaining one was used for testing. Finally, we conducted a qualitative analysis of the generated rules and identified the strategic variables.

Results

The discretization method, Multiple Scanning, was applied to a data set with the level of consistency equal to 100%. The minimal and stable error rate (19.93%) was obtained for the 3rd scan. Discretized data were used to induce rules using C5.0 algorithm. Table 3 shows the number of rules in each of the categories of municipalities.

Table 3 Number of rules for each of the divisions of entrepreneurship indicator

The efficiency of the rule sets examined using a ten-cross validation procedure was 73%. Qualitative analysis of the rules allowed us to define a set of the most important variables (from the classification point of view) for each division of the explained variable (see Figs. 13). The number of attributes has been normalized. Whereas the distribution of results into three figures results from the value distribution of the entrepreneurship indicator in each of the year. Analysis of the value distribution of the indicator (see Table 1), allows us to combine in the Fig. 1 municipalities with indicator values from 0.00 to 0.02 and from 0.02 to 0.20. Figure 2 shows the municipalities with the value indicator from 0.20 to 0.50. Finally, in Fig. 3 there are municipalities with the value of the indicator tested from 0.50 to 1.00 and above 1.00.

Fig. 1
figure 1

Attributes appearing in the rules explaining entrepreneurship for divisions from 0.00 to 0.02 and from 0.02 to 0.20. Source: Own work

Fig. 2
figure 2

Attributes appearing in the rules explaining entrepreneurship for division from 0.20 to 0.50. Source: Own work

Fig. 3
figure 3

Attributes appearing in the rules explaining entrepreneurship for divisions from 0.50 to 1.00 and above 1.0. Source: Own work

The results of show that for the values of the entrepreneurship indicator from 0.00 to 0.02 and from 0.02 to 0.20 can indicate common variables that have the greatest impact on entrepreneurship. These are: market saturation with business entities in Q section in total and share of assets expenditures of the municipalities in total expenditure.

Moreover, in the lowest level of the indicator, its value is determined by: the number of employees, the production-age population, the number of newly registered business entities, the population, the migration balance and the total income of municipalities per capita. On the other hand, where the value of the indicator is maintained in division from 0.02 to 0.20 the significance of explanation of entrepreneurship was: the share of assets expenditures of municipalities in total expenditures, the number of newly registered business entities, the total income of municipalities per inhabitant and the population.

When the value of entrepreneurship indicator was in the division from 0.20 to 0.50 the most significant variables where: population density, total expenditure of municipalities per capita, number of medical advisories given within a year and value of EU funds per capita and population in pre-production age.

For higher (from 0.50 to 1.00) and the highest (above 1) entrepreneurship, the most important variable was the population. In addition, to the indicator from 0.50 to 1.00, the remaining significant explanatory variables were: total expenditure of municipalities per capita, population in the post-production and pre-production age, and total level of income of municipalities per capita. On the other hand, the highest rate of entrepreneurship (in addition to the above-mentioned population), was explained by: total income and expenditure of municipalities per capita, pre-production population, number of employed persons and share of municipalities’ expenditures on healthcare sector in total expenditures.

Conclusion

Let us recall that our main objective was to identify strategic variables which have direct influence on entrepreneurship in the private healthcare sector. The results of our experiments show that it is possible to identify a set of the most important variables influencing entrepreneurship in the private healthcare sector. The level of the indicator of a lower level of entrepreneurship (divisions I and II), to the greatest extent was explained by the number of currently active business entities in Q section as well as the assets expenditure of municipalities in total expenditure. Entrepreneurship in municipalities with a higher level of entrepreneurship (divisions III and IV), to the greatest extent was explained by the population and total expenditures of the municipality per capita. In the municipalities with the highest entrepreneurship, this was the population and the total income of municipalities per capita. It is worth emphasizing that the variable expenditure of municipalities per inhabitant was found in these municipalities in the third place among all determinants.

At the same time, the results of the research have identified a set of variables that are not relevant to the explanation of entrepreneurship. For the entrepreneurship indicator from 0.00 to 0.02 this was the variable own income of municipalities per capita. On the other hand, the value of the indicator from 0.20 to 0.50 was the number of newly-opened business entities, the type of municipality and the share of own income in the total income of municipalities. In municipalities with a value of indicator from 0.50 to 1.00 they were: production age population, number of newly-opened business entities, type of municipality and share of own income in total incomes of municipalities. Finally, for the highest value of the indicator, entrepreneurship was not explained by: population in production age, number of medical advisories, number of newly-opened business entities, type of municipalities and share of own income in total incomes of municipalities. Exceptions were cases of entrepreneurship level from 0.02 to 0.20 in which all variables determined the level of the indicator. That means that the process of explanation in this division of entrepreneurship in the private healthcare sector is complicated and complex.