# Measuring our Universe from Galaxy Redshift Surveys

- First Online:

- Accepted:

DOI: 10.12942/lrr-2004-8

- Cite this article as:
- Lahav, O. & Suto, Y. Living Rev. Relativ. (2004) 7: 8. doi:10.12942/lrr-2004-8

- 16 Citations
- 426 Downloads

## Abstract

Galaxy redshift surveys have achieved significant progress over the last couple of decades. Those surveys tell us in the most straightforward way what our local Universe looks like. While the galaxy distribution traces the *bright* side of the Universe, detailed quantitative analyses of the data have even revealed the *dark* side of the Universe dominated by non-baryonic dark matter as well as more mysterious dark energy (or Einstein’s cosmological constant). We describe several methodologies of using galaxy redshift surveys as cosmological probes, and then summarize the recent results from the existing surveys. Finally we present our views on the future of redshift surveys in the era of precision cosmology.

## 1 Introduction

Nowadays the exploration of the Universe can be performed by a variety of observational probes and methods over a wide range of the wavelengths: the temperature anisotropy map of the cosmic microwave background (CMB), the Hubble diagrams of nearby galaxies and distant Type Ia supernovae, wide-field photometric and spectroscopic surveys of galaxies, the power spectrum and abundances of galaxy clusters in optical and X-ray bands combined with the radio observation through the Sunyaev-Zel’dovich effect, deep surveys of galaxies in sub-mm, infrared, and optical bands, quasar surveys in radio and optical, strong and weak lensing of distant galaxies and quasars, high-energy cosmic rays, and so on. Undoubtedly gamma-rays, neutrinos, and gravitational radiation will join the above already crowded list.

**Redshift surveys have unprecedented quantity and quality**:The numbers of galaxies and quasars in the spectroscopic sample of Two Degree Field (2dF) are ∼ 250, 000 and ∼ 30, 000, and will reach ∼ 800,000 and 100, 000 upon completion of the on-going Sloan Digital Sky Survey (SDSS). These unprecedented numbers of the objects as well as the homogeneous selection criteria enable the precise statistical analysis of their distribution.

**The Universe at****z****≈ 1000 is well specified**:The first-year WMAP (Wilkinson Microwave Anisotropy Probe) data [6] among others have established a set of cosmological parameters. This may be taken as the

*initial condition*of the Universe from the point-of-view of the structure evolution toward*z*= 0. In a sense, the origin of the Universe at*z*≈ 1000 and the evolution of the Universe after the epoch are now equally important, but they constitute well separable questions that particle and observational cosmologists focus on, respectively.**Gravitational growth of dark matter component is well understood**:In addition, extensive numerical simulations of structure formation in the Universe has significantly advanced our understanding of the gravitational evolution of the dark matter component in the standard gravitational instability picture. In fact, we even have very accurate and useful analytic formulae to describe the evolution deep in its nonlinear regime. Thus we can now directly address the evolution of

*visible objects*from the analysis of their redshift surveys separately from the nonlinear growth of the underlying dark matter gravitational potentials.**Formation and evolution of galaxies**:In the era of precision cosmology among others, the scientific goals of research using galaxy redshift surveys are gradually shifting from inferring a set of values of cosmological parameters using galaxy as their probes to understanding the origin and evolution of galaxy distribution given a set of parameters accurately determined by the other probes like CMB and supernovae.

## 2 Clustering in the Expanding Universe

### 2.1 The cosmological principle

Our current Universe exhibits a wealth of nonlinear structures, but the zero-th order description of our Universe is based on the assumption that the Universe is homogeneous and isotropic smoothed over sufficiently large scales. This statement is usually referred to as the *cosmological principle*. In fact, the cosmological principle was first adopted when observational cosmology was in its infancy; it was then little more than a conjecture, embodying’ Occam’s razor’ for the simplest possible model.

**The ancient Indian cosmological principle**:The Universe is infinite in space and time and is infinitely heterogeneous.

**The ancient Greek cosmological principle**:Our Earth is the natural center of the Universe.

**The Copernican cosmological principle**:The Universe as observed from any planet looks much the same.

**The (generalized) cosmological principle**:The Universe is (roughly) homogeneous and isotropic.

**The perfect cosmological principle**:The Universe is (roughly) homogeneous in space and time, and is isotropic in space.

**The anthropic principle**:A human being, as he/she is, can exist only in the Universe as it is.

Like with any other idea about the physical world, we cannot prove a model, but only falsify it. Proving the homogeneity of the Universe is particularly difficult as we observe the Universe from one point in space, and we can only deduce isotropy indirectly. The practical methodology we adopt is to assume homogeneity and to assess the level of fluctuations relative to the mean, and hence to test for consistency with the underlying hypothesis. If the assumption of homogeneity turns out to be wrong, then there are numerous possibilities for inhomogeneous models, and each of them must be tested against the observations.

**CMB fluctuations**Ehlers, Garen, and Sachs [18] showed that by combining the CMB isotropy with the Copernican principle one can deduce homogeneity. More formally their theorem (based on the Liouville theorem) states that “If the fundamental observers in a dust space-time see an isotropic radiation field, then the space-time is locally given by the Friedman-Robertson-Walker (FRW) metric”. The COBE (COsmic Background Explorer) measurements of temperature fluctuations (Δ

*T/T*= 10^{−5}on scales of 10°) give via the Sachs-Wolfe effect \((\Delta T/T = {1 \over 3}\Delta \phi /{c^2})\) and the Poisson equation r.m.s. density fluctuations of*δρ*/*ρ*∼ 10^{−4}on 1000*h*^{−1}Mpc (see, e.g., [99]), which implies that the deviations from a smooth Universe are tiny.**Galaxy redshift surveys**The distribution of galaxies in local redshift surveys is highly clumpy, with the Supergalactic Plane seen in full glory. However, deeper surveys like 2dF and SDSS (see Section 6) show that the fluctuations decline as the length-scales increase. Peebles [69] has shown that the angular correlation functions for the Lick and APM (Automatic Plate Measuring) surveys scale with magnitude as expected in a Universe which approaches homogeneity on large scales. While redshift surveys can provide interesting estimates of the fluctuations on intermediate scales (see, e.g., [72]), the problems of biasing, evolution, and

*K*-correction would limit the ability of those redshift surveys to ‘prove’ the cosmological principle. Despite these worries the measurement of the power spectrum of galaxies derived on the assumption of an underlying FRW metric shows good agreement with the A-CDM (cold dark matter) model.**Radio sources**Radio sources in surveys have a typical median redshift of \(\bar z \sim 1\), and hence are useful probes of clustering at high redshift. Unfortunately, it is difficult to obtain distance information from these surveys: The radio luminosity function is very broad, and it is difficult to measure optical redshifts of distant radio sources. Earlier studies claimed that the distribution of radio sources supports the cosmological principle. However, the wide range in intrinsic luminosities of radio sources would dilute any clustering when projected on the sky. Recent analyses of new deep radio surveys suggest that radio sources are actually clustered at least as strongly as local optical galaxies. Nevertheless, on very large scales the distribution of radio sources seems nearly isotropic.

**X-ray background**The X-ray background (XRB) is likely to be due to sources at high redshift. The XRB sources are probably located at redshift

*z*< 5, making them convenient tracers of the mass distribution on scales intermediate between those in the CMB as probed by COBE, and those probed by optical and IRAS redshift surveys. The interpretation of the results depends somewhat on the nature of the X-ray sources and their evolution. By comparing the predicted multipoles to those observed by HEAO1, Scharf et al. [75] estimate the amplitude of fluctuations for an assumed shape of the density fluctuations. The observed fluctuations in the XRB are roughly as expected from interpolating between the local galaxy surveys and the COBE and other CMB experiments. The r.m.s. fluctuations*δρ/ρ*on a scale of ∼ 600*h*^{−1}Mpc are less than 0.2%.

*l*> 100h

^{−1}Mpc.

The rest of the current section is devoted to a brief review of the homogeneous and isotropic cosmological model. Further details may be easily found in standard cosmology textbooks [96, 62, 69, 64, 10, 63].

*x*is the comoving coordinate, and where we use units in which the light velocity

*c*= 1. The above Robertson-Walker metric is specified by a constant

*K*, the spatial curvature, and a function of time

*a*(

*t*), the scale factor.

*T*

_{μν}, the energy-momentum tensor of the matter field, should take the form of the ideal fluid:

*u*

_{μ}is the 4-velocity of the matter,

*ρ*is the mean energy density, and

*p*is the mean pressure.

### 2.2 From the Einstein equation to the Friedmann equation

*a*(

*t*),

*ρ*(

*t*), and

*p*(

*t*).

*t*yields

*ä*with Equation (5), one obtains

*dQ*=

*dU*−

_{p}

*dV*=

*d*(

*ρa*

^{3}) −

*pd*(

*a*

^{3}) = 0, in the present context. Equations (4) and (7) are often used as the two independent basic equations for

*a*(

*t*), instead of Equations (4) and (5).

*a*(

*t*). This is usually given by an equation of state of the form

*p*=

*p*(

*ρ*). In cosmology, the following simple relation is assumed:

*w*may in principle change with redshift, it is often assumed that

*w*is independent of time just for simplicity. Then substituting this equation of state into Equation (7) immediately yields

*w*= 0, 1/3, and −1, respectively.

*w*

_{i}(

*i*= 1,…,

*N*), Equation (9) still holds independently as long as the species do not interact with each other. If one denotes the present energy density of the

*i*-th component by

*ρi*,0, then the total energy density of the Universe at the epoch corresponding to the scale factor of

*a*(

*t*) is given by

*a*

_{0}, is set to be unity without loss of generality. Thus, Equation (4) becomes

*w*

_{i}= −1 may be equivalent to the conventional cosmological constant Λ at this level, although they may exhibit spatial variation unlike Λ.

*H*

_{0}is the Hubble constant at the present epoch. The above equation is usually rewritten as follows:

*i*-th component is defined as

*K*= 0) implies that the sum of the density parameters is unity:

### 2.3 Expansion law and age of the Universe

- Einstein-de Sitter model \(({\Omega _{\rm{m}}} = 1,{\Omega _\Lambda} = 0)\):$$a(t) = {\left({{t \over {{t_0}}}} \right)^{2/3}},\quad\quad{t_0} = {2 \over {3{H_0}}}.$$(19)
- Open model with vanishing cosmological constant (Ω
_{m}< 1, Ω_{Λ}= 0):$$a = {{{\Omega _{\rm{m}}}} \over {2(1 - {\Omega _{\rm{m}}})}}(\cosh \theta - 1),$$(20)$${H_0}t = {{{\Omega _{\rm{m}}}} \over {2{{(1 - {\Omega _{\rm{m}}})}^{3/2}}}}(\sinh \theta - \theta)$$(21)$${H_0}{t_0} = {1 \over {1 - {\Omega _{\rm{m}}}}} - {{{\Omega _{\rm{m}}}} \over {2{{(1 - {\Omega _{\rm{m}}})}^{3/2}}}}\ln {{2 - {\Omega _{\rm{m}}} + 2\sqrt {1 - {\Omega _{\rm{m}}}}} \over {{\Omega _{\rm{m}}}}}.$$(22) - Spatially-flat model with cosmological constant (Ω
_{m}< 1, Ω_{Λ}= 1 − Ω_{m}:$$a(t) = {\left({{{{\Omega _{\rm{m}}}} \over {1 - {\Omega _{\rm{m}}}}}} \right)^{1/3}}{\left[ {\sinh {{3\sqrt {1 - {\Omega _{\rm{m}}}}} \over 2}{H_0}t} \right]^{2/3}},$$(23)$${H_0}{t_0} = {1 \over {3\sqrt {1 - {\Omega _{\rm{m}}}}}}\ln {{2 - {\Omega _{\rm{m}}} + 2\sqrt {1 - {\Omega _{\rm{m}}}}} \over {{\Omega _{\rm{m}}}}}.$$(24)

*t*

_{0}denotes the present age of the Universe,

*a*= 0 at

*t*= 0. The expression clearly indicates that

*t*

_{0}increases as Ω

_{m}decreases and/or Ω

_{Λ}increases. Figure 1 plots the scale factor as a function of

*H*

_{0}(

*t*−

*t*

_{0}), and Table 1 summarizes the age of the Universe.

*The present age of the Universe in units of* (*h*/0.7)^{−1} Gyr.

Ω | Open model (Ω | Spatially-flat model (Ω |
---|---|---|

1.0 | 9.3 | 9.3 |

0.5 | 10.5 | 11.6 |

0.3 | 11.3 | 13.5 |

0.1 | 12.5 | 17.8 |

0.01 | 13.9 | 28.0 |

### 2.4 Einstein’s static model and Lemaître’s model

So far we have shown that solutions of the Einstein equation are *dynamical* in general, i.e., the scale factor *a* is time-dependent. As a digression, let us examine why Einstein once introduced the Λ-term to obtain a static cosmological solution. This is mainly important for historical reasons, but is also interesting to observe how the operationally identical parameter (the Λ-term, the cosmological constant, the vacuum energy, the dark energy) shows up in completely different contexts in the course of the development of cosmological physics.

*a*= const. is given by

*normal*matter in the standard model of particle physics. If Λ ≠ 0 on the other hand, the condition for the static solution is

*ρ*and

*p*can be positive if

*p*= 0,

*K*> 0. For simplicity, let us assume that the Universe is dominated by non-relativistic matter with negligible pressure, and consider the behavior of Lemaître’s model. First we define the values of the density and the scale factor corresponding to Einstein’s static model:

*a*=

*a*

_{E}is a factor of

*α*(> 1) larger than

*ρ*

_{E}. Then

*a*≪

*a*

_{E}, Equation (33) indicates that

*a*∝

*t*

^{2/3}and the Universe is decelerating (

*ä*< 0). When a reaches

*α*

^{1/3}

*a*

_{E},

*ħ*

^{2}takes the minimum value \(\Lambda a_{\rm{E}}^2({\alpha ^{2/3}} - 1)\) and the Universe becomes accelerating (

*ä*> 0). Finally the Universe approaches the exponential expansion or de Sitter model: \(a \propto \exp (t\sqrt {\Lambda/3})\). If a becomes closer to unity, the minimum value reaches zero and the expansion of the Universe is effectively frozen. This phase is called the coasting period, and the case with

*α*=1 corresponds to Einstein’s static model in which the coasting period continues forever. A similar consideration for

*α*< 1 indicates that the Universe starts collapsing (

*ħ*

^{2}=) before

*ä*= 0. Thus the behavior of Lemaître’s model is crucially different if

*α*is larger or smaller than unity. This suggests that Einstein’s static model (

*α*= 1) is unstable.

### 2.5 Vacuum energy as an effective cosmological constant

*effective*matter field, however, should satisfy an equation of state of

*p*= −

*Ρ*. Actually the following example presents a specific example for an

*effective*cosmological constant. Consider a real scalar field whose Lagrangian density is given by

*p*

_{ϕ}≈ −

*ρ*

_{ϕ}and the field acts as a cosmological constant. Of course this model is one of the simplest examples, and one may play with much more complicated models if needed.

If the Λ-term is introduced in the l.h.s., it should be constant to satisfy the energy-momentum conservation *T*_{μν};^{ν}. Once it is regarded as a sort of matter field in the r.h.s., however, it does not have to be constant. In fact, the above example shows that the equation of state for the field has *w* = −1 only in special cases. This is why recent literature refers to the field as *dark energy* instead of the cosmological constant.

### 2.6 Gravitational instability

*in the comoving coordinate, the peculiar velocity*

**x***, density fluctuations δ(*

**v***t*,

*), and the gravitational potential \(\phi (t,x)\) which are defined as*

**x****∇**in the above equations are the time derivative for a given

*and the spatial derivative with respect to*

**x***, i.e., defined in the comoving coordinate (while those in Equations (39, 40, 41) are defined in the proper coordinate).*

**x***δ*and

*:*

**v***space using*

**k***δ*

_{k}reduces to

*δ*

_{k}has an unstable, or, monotonically increasing solution. This condition is equivalent to the

*Jeans criterion*:

_{J}which characterizes the scale that the sound wave can propagate within the dynamical time of the fluctuation \(\sqrt {\pi/G\bar \rho}\) Below the scale, the pressure wave can suppress the gravitational instability, and the fluctuation amplitude oscillates.

### 2.7 Linear growth rate of the density fluctuation

_{J}is negligibly small. Thus, at most scales of cosmological interest, Equation (54) is well approximated as

*a*(

*t*) as described in Section 2.3. Since Equation (55) is the second-order differential equation with respect to

*t*, there are two independent solutions; a decaying mode and a growing mode which monotonically decreases and increases as

*t*, respectively. The former mode becomes negligibly small as the Universe expands, and thus one is usually interested in the growing mode alone.

*t*,

*H*(

*t*) =

*ȧ*/

*a*:

*t*yields

*H*reduces to

*δ*

_{k}, Equation (55). Since

*H*(

*t*) is a decreasing function of

*t*, this implies that

*H*(

*t*) is the decaying solution for Equation (55). Then the corresponding growing solution

*D*(

*t*) can be obtained according to the standard procedure: Subtracting Equation (55) from Equation (60) yields

*D*(

*t*) in terms of the redshift

*z*as follows:

*D*(

*z*) → 1/(1 +

*z*) for

*z*→ ∞. Linear growth rates for the models described in Section 2.3 are summarized below:

- Einstein-de Sitter model (Ω
_{m}= 1, Ω_{Λ}= 0):$$D(z) = {1 \over {1 + z}}.$$(64) - Open model with vanishing cosmological constant (Ω
_{m}< 1, Ω_{Λ}= 0):$$D(z) \propto 1 + {3 \over x} + 3\sqrt {{{1 + x} \over {{x^3}}}} \ln (\sqrt {1 + x} - \sqrt x),\quad \quad x \equiv {{1 - {\Omega _{\rm{m}}}} \over {{\Omega _{\rm{m}}}(1 + z)}}.$$(65) - Spatially-flat model with cosmological constant (Ω
_{m}< 1, Ω_{Λ}= 1 − Ω_{m}):$$D(z) \propto \sqrt {1 + {2 \over {{x^3}}}} \int\nolimits_0^x {{{\left({{u \over {2 + {u^3}}}} \right)}^{3/2}}du,} \quad \quad x \equiv {{{2^{1/3}}{{(\Omega _{\rm{m}}^{- 1} - 1)}^{1/3}}} \over {1 + z}}.$$(66)

_{m}and Ω

_{Λ}refer to the

*present*values of the density parameter and the dimensionless cosmological constant, respectively, which will be frequently used in the rest of the review.

## 3 Statistics of Cosmological Density Fluctuations

### 3.1 Gaussian random field

**x**_{i}. The density field is regarded as a stochastic variable, and thus forms a random field. The conventional assumption is that the primordial density field (in its linear regime) is Gaussian, i.e., its

*m*-point joint probability distribution obeys the multi-variate Gaussian,

*m*. Here

*M*

_{ij}≡ 〈

*δ*

_{i}

*δ*

_{j}〉 is the covariance matrix, and

*M*

^{−1}is its inverse. Since

*M*

_{ij}=

*ξ*(

**x**_{i},

**x**_{j}), Equation (71) implies that the statistical nature of the Gaussian density field is completely specified by the two-point correlation function

*ξ*and its linear combination (including its derivative and integral). For an extensive discussion of the cosmological Gaussian density field, see [4].

The Gaussian nature of the primordial density field is preserved in its linear evolution stage, but this is not the case in the nonlinear stage. This is clear even from the definition of the Gaussian distribution: Equation (71) formally assumes that the density contrast distributes symmetrically in the range of −∞ < *δ*_{i} < ∞, but in the real density field *δ*_{i} cannot be less than −1. This assumption does not make any practical difference as long as the fluctuations are (infinitesimally) small, but it is invalid in the nonlinear regime where the typical amplitude of the fluctuations exceeds unity.

*δ*

_{k}is a complex variable, it is decomposed by a set of two real variables, the amplitude

*D*

_{k}and the phase

*ϕ*

_{k}:

*ϕ*(

*t*) rapidly converges to a constant value. Thus

*D*

_{k}evolves following the growing solution in linear theory.

*. This is the Fourier transform of the two-point correlation function,*

**k***x*does not denote an amplitude of the position vector, but a comoving wavelength 2

*π*/

*k*corresponding to the wavenumber

*k*= ∣

*∣. It should be noted that neither the power spectrum nor the two-point correlation function contains information for the phase*

**k***ϕ*

_{k}. Thus in principle two clustering patterns may be completely different even if they have the identical two-point correlation functions. This implies the practical importance to describe the statistics of phases

*ϕ*

_{k}in addition to the amplitude

*D*

_{k}of clustering.

*ϕ*

_{k}and

*D*

_{k}that are explicitly written as

*k*. The phase distribution is uniform, and thus does not carry information. The above probability distribution function is also derived when the real and imaginary parts of the Fourier components

*δ*

_{k}are uncorrelated and Gaussian distributed (with the dispersion

*P*(

*k*)/2) independently of

*. As is expected, the distribution function (79) is completely fixed if*

**k***P*(

*k*) is specified. This rephrases the previous statement that the Gaussian field is completely specified by the two-point correlation function in real space.

Incidentally the one-point phase distribution turns out to be essentially uniform even in a strongly non-Gaussian field [81, 21]. Thus it is unlikely to extract useful information directly out of it mainly due to the cyclic property of the phase. Very recently, however, Matsubara [51] and Hikage et al. [31] succeeded in detecting a signature of phase correlations in Fourier modes of mass density fields induced by nonlinear gravitational clustering using the distribution function of the phase sum of the Fourier modes for triangle wavevectors. Several different statistics which carry the phase information have been also proposed in cosmology, including the void probability function [97], the genus statistics [26], and the Minkowski functionals [57, 76].

### 3.2 Log-normal distribution

A probability distribution function (PDF) of the cosmological density fluctuations is the most fundamental statistic characterizing the large-scale structure of the Universe. As long as the density fluctuations are in the linear regime, their PDF remains Gaussian. Once they reach the nonlinear stage, however, their PDF significantly deviates from the initial Gaussian shape due to the strong non-linear mode-coupling and the non-locality of the gravitational dynamics. The functional form for the resulting PDFs in nonlinear regimes are not known exactly, and a variety of phenomenological models have been proposed [34, 74, 9, 25].

*σ*

_{nl}≲ 4 and the over-density

*δ*≲ 100. The above function is characterized by a single parameter

*σ*

_{1}which is related to the variance of

*δ*. Since we use

*δ*to represent the density fluctuation field smoothed over

*R*, its variance is computed from its power spectrum

*P*

_{nl}explicitly as

*σ*

_{1}depends on the smoothing scale

*R*alone and is given by

*σ*

_{nl}(

*R*) and thus

*σ*

_{1}(

*R*) very accurately using a fitting formula for

*P*

_{nl}(

*k*) (see, e.g., [67]). In this sense, the above log-normal PDF is completely specified without any free parameter.

*N*-body simulations in SCDM, LCDM, and OCDM (for Standard, Lambda, and Open CDM) models, respectively [36, 40]. The simulations employ

*N*= 256

^{3}dark matter particles in a periodic comoving cube (100

*h*

^{−1}Mpc)

^{3}. The density fields are smoothed over Gaussian (left panels) and Top-hat (right panels) windows with different smoothing lengths:

*R*= 2

*h*

^{−1}Mpc, 6

*h*

^{−1}Mpc, and 18

*h*

^{−1}Mpc. Solid lines show the log-normal PDFs adopting the value of

*σ*

_{nl}directly evaluated from simulations (shown in each panel). The agreement between the log-normal model and the simulation results is quite impressive. A small deviation is noticeable only for

*δ*≲ −0.5.

*g*smoothed over

*R*obeying the Gaussian PDF,

*δ*from

*g*as

*δ*is simply given by \((dg/d\delta)P_{\rm{G}}^{(1)}(g)\), which reduces to Equation (80).

At this point, the transformation (85) is nothing but a mathematical procedure to relate the Gaussian and the log-normal functions. Thus there is no physical reason to believe that the new field *δ* should be regarded as a nonlinear density field evolved from *g* even in an approximate sense. In fact it is physically unacceptable since the relation, if taken at face value, implies that the nonlinear density field is completely determined by its linear counterpart locally. We know, on the other hand, that the nonlinear gravitational evolution of cosmological density fluctuations proceeds in a quite nonlocal manner, and is sensitive to the surrounding mass distribution. Nevertheless the fact that the log-normal PDF provides a good fit to the simulation data, empirically implies that the transformation (85) somehow captures an important aspect of the nonlinear evolution in the real Universe.

### 3.3 Higher-order correlation functions

**x**_{i}now labels the position of the

*i*-th object (galaxy). Then the two-point correlation function

*ξ*

_{12}≡

*ξ*(

*x*

_{1},

*x*

_{2}) is defined also in terms of the joint probability of the pair of objects located in the volume elements of

*δV*

_{1}and

*δV*2,

*ζ*

_{123}≡

*ζ*(

**x**_{1},

**x**_{2},

**x**_{3}) and

*η*

_{1234}≡

*η*(

**x**_{1},

**x**_{2},

**x**_{3},

**x**_{4}), in a straightforward manner:

*ξ*

_{12},

*ζ*

_{123}, and

*η*

_{1234}are symmetric with respect to the change of the indices. Define the following quantities with the same symmetry properties:

*Q*,

*R*

_{a},

*R*

_{b}, and

*R*

_{c}are constants. In fact, the analysis of the two-dimensional galaxy catalogues [68] revealed

*N*-point correlation functions is suspected to hold generally,

*hierarchical clustering ansatz*. Cosmological

*N*-body simulations approximately support the validity of the above ansatz, but also detect the finite deviation from it [82].

### 3.4 Genus statistics

*δ*(

*) at the position*

**x***in the survey volume*

**x***V*

_{all}. This may be evaluated, for instance, by taking the ratio of the number of galaxies

*N*(

*,*

**x***V*

_{f}) in the volume

*V*

_{f}centered at

*to its average value \(\bar N({V_f})\):*

**x***σ*(

*V*

_{f}) is its r.m.s. value. Consider the isodensity surface parameterized by a value of

*ν*≡

*δ*(

*,*

**x***V*

_{f})/

*σ*(

*V*

_{f}). Genus is one of the topological numbers characterizing the surface defined as

*k*is the Gaussian curvature of the isolated surface. The Gauss-Bonnet theorem implies that the value of

*g*is indeed an integer and equal to the number of holes minus 1. This is qualitatively understood as follows: Expand an arbitrary two-dimensional surface around a point as

*κ*=

*κ*

_{1}

*κ*

_{2}. A surface topologically equivalent to a sphere (a torus) has

*κ*= 1 (

*κ*= 0), and thus Equation (99) yields

*g*= −1 (

*g*= 0) which coincides with the number of holes minus 1.

*ν*, and thus it is more convenient to define the genus density in the survey volume

*V*

_{all}using the additivity of the genus:

*A*

_{i}(

*i*= 1 ∼

*I*) denote the disconnected isodensity surfaces with the same value of

*ν*= δ(

*x*,

*V*

_{f})/

*σ*(

*V*

_{f}). Interestingly the Gaussian density field has an analytic expression for Equation (101):

*k*

^{2}weighted over the power spectrum of fluctuations

*P*(

*k*) and the smoothing function \({{\tilde W}^2}(kR)\) (see, e.g., [4]). It should be noted that in the Gaussian density field the information of the power spectrum shows up only in the proportional constant of Equation (102), and its functional form is deterimined uniquely by the threshold value

*ν*. This

*ν*-dependence reflects the phase information which is ignored in the two-point correlation function and power spectrum. In this sense, genus statistics is a complementary measure of the clustering pattern of Universe.

*H*

_{1}=

*ν*,

*H*

_{2}=

*ν*

^{2}− 1,

*H*

_{3}=

*ν*

^{3}− 3

*ν*,

*H*

_{4}=

*ν*

^{4}− 6

*ν*

^{2}+3,

*H*

_{5}=

*ν*

^{5}− 10

*ν*

^{3}+ 15

*ν*, … The three quantities

*δ*. This expression plays a key role in understanding if the non-Gaussianity in galaxy distribution is ascribed to the primordial departure from the Gaussian statistics.

### 3.5 Minkowski functionals

In fact, genus is one of the complete sets of *N* + 1 quantities, known as the Minkowski functionals (MFs), which determine the morphological properties of a pattern in *N*-dimensional space. In the analysis of galaxy redshift survey data, one considers isodensity contours from the three-dimensional density contrast field *δ* by taking its excursion set *F*_{ν}, i.e., the set of all points where the density contrast *δ* exceeds the threshold level *ν* as was the case in the case of genus described in the above subsection.

*V*

_{tot},

*V*

_{k}(

*k*= 1, 2, 3) are calculated by the surface integration of the local \(MF{\rm{s}}\,\;\upsilon _k^{{\rm{loc}}}\). The general expression is

*k*= 1, 2, 3 given by

*R*

_{1}and

*R*

_{2}are the principal radii of curvature of the isodensity surface.

*δ*is the density contrast.

The above MFs can be indeed interpreted as well-known geometric quantities: the volume fraction *V*_{0}(*ν*), the total surface area *V*_{1}(*ν*), the integral mean curvature *V*_{2}(*ν*), and the integral Gaussian curvature, i.e., the Euler characteristic *V*_{3}(*ν*). In our current definitions (see Equations (101, 108), or Equations (102, 115)), one can easily show that *V*_{3}(*ν*) reduces simply to −*G*(*ν*). The MFs were first introduced to cosmological studies by Mecke et al. [57], and further details may be found in [57, 32]. Analytic expressions of MFs in weakly non-Gaussian fields are derived in [52].

## 4 Galaxy Biasing

### 4.1 Concepts and definitions of biasing

As discussed above, luminous objects, such as galaxies and quasars, are not direct tracers of the mass in the Universe. In fact, a difference of the spatial distribution between luminous objects and dark matter, or a *bias*, has been indicated from a variety of observations. Galaxy biasing clearly exists. The fact that galaxies of different types cluster differently (see, e.g., [16]) implies that not all of them are exact tracers of the underlying mass distribution (see also Section 6).

*mass*distribution against observational data, one needs a relation of density fields of mass and luminous objects. The biasing of density peaks in a Gaussian random field is well formulated [37, 4], and it provides the first theoretical framework for the origin of galaxy density biasing. In this scheme, the galaxy-galaxy and mass-mass correlation functions are related in the linear regime via

*b*is a constant independent of scale

*r*. However, a much more specific linear biasing model is often assumed in common applications, in which the local density fluctuation fields of galaxies and mass are assumed to be deterministically related via the relation

The above deterministic linear biasing is not based on a reasonable physical motivation. If *b* > 1, it must break down in deep voids because values of *δ*_{g} below −1 are forbidden by definition. Even in the simple case of no evolution in comoving galaxy number density, the linear biasing relation is not preserved during the course of fluctuation growth. Non-linear biasing, where *b* varies with *δ*_{m}, is inevitable.

Indeed, an analytical model for biasing of halos on the basis of the extended Press-Schechter approximation [59] predicts that the biasing is nonlinear and provides a useful approximation for its behavior as a function of scale, time, and mass threshold. *N*-body simulations provide a more accurate description of the nonlinearity of the halo biasing confirming the validity of the Mo and White model [35, 103].

### 4.2 Modeling biasing

Biasing is likely to be *stochastic*, not deterministic [15]. An obvious part of this stochasticity can be attributed to the discrete sampling of the density field by galaxies, i.e., the shot noise. In addition, a statistical, physical scatter in the efficiency of galaxy formation as a function of *δ*_{m} is inevitable in any realistic scenario. For example, the random variations in the density on smaller scales is likely to be reflected in the efficiency of galaxy formation. As another example, the local geometry of the background structure, via the deformation tensor, must play a role too. Such ‘hidden variables’ would show up as physical scatter in the density-density relation [87].

*δ*

_{obj}(

*,*

**x***z*∣

*R*) and

*δ*

_{m}(

*,*

**x***z*∣

*R*), at a position

*and a redshift*

**x***z*smoothed over a scale

*R*[86]. In general, the former should depend on various other auxiliary variables \(\vec {\mathcal A}\) defined at different locations

*′ and redshifts*

**x***z*′ smoothed over different scales

*R*′ in addition to the mass density contrast at the same position,

*δ*

_{m}(

*x*,

*z∣R*). While this relation can be schematically expressed as

*δ*

_{obj}(

*,*

**x***z*∣

*R*) and

*δ*

_{m}(

*,*

**x***z*∣

*R*), the relation becomes inevitably

*stochastic*and

*nonlinear*due to the dependence on unspecified auxiliary variables \(\vec {\mathcal A}\).

*biasing*factor as the ratio of the density contrasts of luminous objects and mass:

*nonlocal stochastic nonlinear*factor in terms of

*δ*

_{m}may be approximated by

- a
*local stochastic nonlinear*bias,$${B_{{\rm{obj}}}}(x,z\vert R) = b_{{\rm{obj}}}^{({\rm{sn}})}[x,z,R,{\delta _{\rm{m}}}(x,z\vert R),\vec{\mathcal{A}}(x,z\vert R), \ldots ],$$(120) - a
*local deterministic nonlinear*bias,and$${B_{{\rm{obj}}}}(x,z\vert R) = b_{{\rm{obj}}}^{({\rm{dn}})}[z,R,{\delta _{\rm{m}}}(x,z\vert R)],$$(121) - a
*local deterministic linear*bias,$${B_{{\rm{obj}}}}(x,z\vert R) = {b_{{\rm{obj}}}}(z,R).$$(122)

*b*

_{obj}(

*z*,

*R*) was neglected in many previous studies of biased galaxy formation until very recently. Currently, however, various models beyond the deterministic linear biasing have been seriously considered with particular emphasis on the nonlinear and stochastic aspects of the biasing [71, 15, 87, 86].

### 4.3 Density peaks and dark matter halos as toy models for galaxy biasing

Let us illustrate the biasing from numerical simulations by considering two specific and popular models: primordial density peaks and dark matter halos [86]. We use the *N*-body simulation data of *L* = 100 *h*^{−1} Mpc again for this purpose [36]. We select density peaks with the threshold of the peak height *ν*_{th} = 1.0, 2.0, and 3.0. As for the dark matter halos, these are identified using the standard friend-of-friend algorithm with a linking length of 0.2 in units of the mean particle separation. We select halos of mass larger than the threshold *M*_{th} = 2.0 × 10^{12}*h*^{−1}*M*_{⊙}, *M*_{th} = 5.0 × 10^{12}*h*^{−1}*M*_{⊙}, and *M*_{th} = 1.0 × 10^{13}*h*^{−1}*M*_{⊙}.

*z*= 0 and

*z*= 2.2 within a circular slice (

*comoving*radius of 150

*h*

^{−1}Mpc and thickness of 15

*h*

^{−1}Mpc). We locate a fiducial observer in the center of the circle. Then the

*comoving*position vector

*for a particle with a*

**r***comoving*peculiar velocity

*at a redshift*

**v***z*is observed at the position

*s*in redshift space:

*z*) is the Hubble parameter at

*z*. The right panels in Figures 5 and 6 plot the observed distribution in redshift space, where the redshift-space distortion is quite visible: The coherent velocity field enhances the structure perpendicular to the line-of-sight of the observer (

*squashing*) while the virialized clump becomes elongated along the line-of-sight (

*finger-of-God*).

We use two-point correlation functions to quantify stochasticity and nonlinearity in biasing of peaks and halos, and explore the signature of the redshift-space distortion. Since we are interested in the relation of the biased objects and the dark matter, we introduce three different correlation functions: the auto-correlation functions of dark matter and the objects, *ξ*_{mm} and *ξ*_{oo}, and their cross-correlation function *ξ*_{om}. In the present case, the subscript o refers to either h (halos) or *ν* (peaks). We also use the superscripts R and S to distinguish quantities defined in real and redshift spaces, respectively. We estimate those correlation functions using the standard pair-count method. The correlation function *ξ*^{(S)} is evaluated under the distant-observer approximation.

*ξ*> 1) the finger-of-God effect suppresses the amplitude of

*ξ*

^{(S)}relative to

*ξ*

^{(R)}, while

*ξ*

^{(S)}is larger than

*ξ*

^{(R)}in linear regimes (

*ξ*< 1) due to the coherent velocity field.

### 4.4 Biasing of galaxies in cosmological hydrodynamic simulations

Popular models of the biasing based on the peak or the dark halos are successful in capturing some essential features of biasing. None of the existing models of bias, however, seems to be sophisticated enough for the coming precision cosmology era. The development of a more detailed theoretical model of bias is needed. A straightforward next step is to resort to numerical simulations which take account of galaxy formation even if phenomenological at this point. We show an example of such approaches from Yoshikawa et al. [103] who apply cosmological smoothed particle hydrodynamic (SPH) simulations in the LCDM model with particular attention to the comparison of the biasing of dark halos and simulated galaxies (see also [78]).

Galaxies in their simulations are identified as clumps of cold and dense gas particles which satisfy the Jeans condition and have the SPH density more than 100 times the mean baryon density at each redshift. Dark halos are identified with a standard friend-of-friend algorithm; the linking length is 0.164 times the mean separation of dark matter particles, for instance, at *z* = 0. In addition, they identify the surviving high-density substructures in dark halos, DM cores (see [103] for further details).

*z*= 0 where galaxies are more strongly clustered than dark halos. Figure 10 depicts a close-up snapshot of the most massive cluster at

*z*= 0 with a mass

*M*≃ 8 × 10

^{14}

*M*

_{⊙}. The circles in the lower panels indicate the positions of galaxies identified in our simulation.

*δ*

_{h}and

*δ*

_{g}with the mass density field

*δ*

_{m}at redshift

*z*= 0, 1, and 2 smoothed over

*R*

_{s}= 12

*h*

^{−1}Mpc. The conditional mean relation \({\bar \delta _i}({\delta _{\rm{m}}})\) computed directly from the simulation is plotted in solid lines, while dashed lines indicate theoretical predictions of halo biasing by Taruya and Suto [87]. For a given smoothing scale, the simulated halos exhibit positive biasing for relatively small

*δ*

_{m}in agreement with the predictions. On the other hand, they tend to be underpopulated for large

*δ*

_{m}, or

*anti-biased*. This is mainly due to the exclusion effect of dark halos due to their finite volume size which is not taken into account in the theoretical model. Since our simulated

*galaxies*have smaller spatial extent than the halos, the exclusion effect is not so serious. This is clearly illustrated in the lower panels in Figure 11, and indeed they show much better agreement with the theoretical model.

*ξ*

_{ii}(

*r*) and

*ξ*

_{mm}(

*r*) are two-point correlation functions of objects i and of dark matter, respectively. While the above biasing parameter is ill-defined where either

*ξ*

_{ii}(

*r*) or

*ξ*

_{mm}(

*r*) becomes negative, it is not the case at clustering scales of interest (< 10

*h*

^{−1}Mpc).

*b*

_{ξ}(

*r*) for those objects (lower panels) at

*z*= 0, 1, and 2. In the lower panels, we also plot the parameter

*b*

_{var,i}≡

*σ*

_{i}/

*σ*

_{m}, which are defined in terms of the one-point statistics (variance), for comparison on smoothing scales

*R*

_{s}= 4

*h*

^{−1}Mpc,

*R*

_{s}= 8

*h*

^{−1}Mpc, and

*R*

_{s}= 12

*h*

^{−1}Mpc at

*r*=

*R*

_{s}for each kind of objects by different symbols. In the upper panels, we show the correlation functions of DM cores identified with two different maximum linking lengths,

*l*

_{max}= 0.05 and

*l*

_{max}=

*b*

_{h}/2. Correlation functions of DM cores identified with

*l*

_{max}= 0.05 are similar to those of galaxies. On the other hand, those identified with

*l*

_{max}=

*b*

_{h}/2 exhibit much weaker correlation, and are rather similar to those of dark halos. This is due to the fact that the present algorithm of group identification with larger

*l*

_{max}tends to pick up lower mass halos which are poorly resolved in our numerical resolution.

The correlation functions of galaxies are almost unchanged with redshift, and the correlation functions of dark halos only slightly evolve between *z* = 0 and 2. By contrast, the amplitude of the dark matter correlation functions evolve rapidly by a factor of ∼ 10 from *z* = 2 to *z* = 0. The biasing parameter *b*_{ξ,g} is larger at a higher redshift, for example, *b*_{ξ,g} ≃ 2–2.5 at *z* = 2. The biasing parameter *b*_{ξ,h} for dark halos is systematically lower than that of galaxies and DM cores again due to the volume exclusion effect. At *z* = 0, galaxies and DM cores are slightly anti-biased relative to dark matter at *r* ≃ 1 ^{h−1} Mpc. In lower panels, we also plot the one-point biasing parameter *b*_{var,i} ≡ *σ*_{i}/*σ*_{m} at *r* = *R*_{s} for comparison. In general we find that *b*_{ξ,i} is very close to b_{var,i} at *z* ∼ 0, but systematically lower than *b*_{var,i} at higher redshifts.

For each galaxy identified at *z* = 0, we define its formation redshift *z*_{f} by the epoch when half of its *cooled gas* particles satisfy our criteria of galaxy formation. Roughly speaking, *z*_{f} corresponds to the median formation redshift of *stars* in the present-day galaxies. We divide all simulated galaxies at *z* = 0 into two populations (the young population with *z*_{f} < 1.7 and the old population with *z*_{f} > 1.7) so as to approximate the observed number ratio of 3/1 for late-type and early-type galaxies.

*z*= 0 as plotted in Figure 13. The old population indeed clusters more strongly than the mass, and the young population is anti-biased. The relative bias between the two populations \(b_{\xi, {\rm{g}}}^{{\rm{rel}}} \equiv \sqrt {{\xi _{{\rm{old}}}}/{\xi _{{\rm{young}}}}}\) ranges 1.5 and 2 for 1

*h*

^{−1}Mpc <

*r*< 20

*h*

^{−1}Mpc, where

*ξ*

_{young}and

*ξ*

_{old}are the two-point correlation functions of the young and old populations.

### 4.5 Halo occupation function approach for galaxy biasing

Since the clustering of dark matter halos is well understood now, one can describe the galaxy biasing if the halo model is combined with the relation between the halos and luminous objects. This is another approach to galaxy biasing, *halo occupation function* (HOF), which has become very popular recently. Indeed the basic idea behind HOF has a long history, but the model predictions have been significantly improved with the recent accurate models for the mass function, the biasing and the density profile of dark matter halos. We refer the readers to an extensive review on the HOF by Cooray and Sheth [13]. Here we briefly outline this approach.

*M*

_{min}of halos which host the population of galaxies, a normalization parameter which can be interpreted as the critical mass

*M*

_{1}above which halos typically host more than one galaxy (note that

*M*

_{1}may exceed

*M*

_{min}since the above relation represents the statistical expected value of number of galaxies), and the power-law index

*α*of the mass dependence of the efficiency of galaxy formation. We will put constraints on the three parameters from the observed number density and clustering amplitude for each galaxy population. In short, the number density of galaxies is most sensitive to

*M*

_{1}which changes the average number of galaxies per halo. The clustering amplitude on large scales is determined by the hosting halos and thus very sensitive to the mass of those halos,

*M*

_{min}. The clustering on smaller scales, on the other hand, depends on those three parameters in a fairly complicated fashion; roughly speaking,

*M*

_{min}changes the amplitude, while a, and to a lesser extent

*M*

_{1}as well, change the slope.

*z*is given by

*n*

_{halo}(

*M*) denotes the halo mass function.

*N*

_{g}(

*N*

_{g}− 1)〉(

*M*) within a halo of mass

*M*of the form:

*p*= 2 for 〈

*N*

_{g}(

*N*

_{g}− 1)〉 > 1 and

*p*= 1 for 〈

*N*

_{g}(

*N*

_{g}− 1)〉 < 1. The 2-halo term on the assumption of the linear halo bias model [59] reduces to

*P*

_{lin}(

*k*) is the linear dark matter power spectrum,

*b*(

*M*) is the halo bias factor, and

*y*(

*k*,

*M*) is the Fourier transform of the halo dark matter profile normalized by its mass, \(y(k,M) = \tilde \rho (k,M)/M\) [77].

The halo occupation formalism, although simple, provides a useful framework in deriving constraints on galaxy formation models from large data sets of the upcoming galaxy redshift surveys. For example, Zehavi et al. [105] used the halo occupation formalism to model departures from a power law in the SDSS galaxy correlation function. They demonstrated that this is due to the transition from a large-scale regime dominated by galaxy pairs in different halos to a small-scale regime dominated by those in the same halo. Magliocchetti and Porciani [47] applied the halo occupation formalism to the 2dFGRS clustering results per spectral type of Madgwick et al. [45]. This provides constraints on the distribution of late-type and early-type galaxies within the dark matter halos of different mass.

## 5 Relativistic Effects Observable in Clustering at High Redshifts

Redshift surveys of galaxies definitely serve as the central database for observational cosmology. In addition to the existing *shallower* surveys (*z* < 0.2), clustering in the Universe in the range *z* = 1–3 has been partially revealed by, for instance, the Lyman-break galaxies and X-ray selected AGNs. In particular, the 2dF and SDSS QSO redshift surveys promise to extend the observable scale of the Universe by an order of magnitude, up to a few Gpc. A proper interpretation of such redshift surveys in terms of the clustering evolution, however, requires an understanding of many cosmological effects which can be neglected for *z* ≪ 1 and thus have not been considered seriously so far. These cosmological *contaminations* include linear redshift-space (velocity) distortion, nonlinear redshift-space (velocity) distortion, cosmological redshift-space (geometrical) distortion, and the cosmological light-cone effect.

We describe a theoretical formalism to incorporate those effects, in particular the cosmological redshift-distortion and light-cone effects, and present several specific predictions in CDM models. The details of the material presented in this section may be found in [83, 101, 100, 46, 28, 29].

### 5.1 Cosmological light-cone effect on the two-point correlation functions

Observing a distant patch of the Universe is equivalent to observing the past. Due to the finite light velocity, a line-of-sight direction of a redshift survey is along the time, as well as spatial, coordinate axis. Therefore the entire sample does not consist of objects on a constant-time hypersurface, but rather on a light-cone, i.e., a null hypersurface defined by observers at *z* = 0. This implies that many properties of the objects change across the depth of the survey volume, including the mean density, the amplitude of spatial clustering of dark matter, the bias of luminous objects with respect to mass, and the intrinsic evolution of the absolute magnitude and spectral energy distribution. These aspects should be properly taken into account in order to extract cosmological information from observed samples of redshift surveys.

- 1.
nonlinear gravitational evolution,

- 2.
linear redshift-space distortion,

- 3.
nonlinear redshift-space distortion,

- 4.
weighted averaging over the light-cone,

- 5.
cosmological redshift-space distortion due to the geometry of the Universe, and

- 6.
object-dependent clustering bias.

*k*

_{⊥}and

*k*

_{∥}are the comoving wavenumber perpendicular and parallel to the line-of-sight of an observer, and \(P_{{\rm{mass}}}^{{\rm{(R)}}}(k;z)\) is the mass power spectrum in real space. The second factor on the r.h.s. comes from the linear redshift-space distortion [38], and the last factor is a phenomenological correction for the non-linear velocity effect [67]. In the above, we introduce

*σ*

_{P}being the 1-dimensional pair-wise peculiar velocity dispersion. Then the finger-of-God effect is modeled by the damping function

*D*

_{vel}[

*k*

_{∥}

*σ*

_{P}(

*z*)]:

*μ*is the direction cosine in

*k*-space, and the dimensionless wavenumber

*k*is related to the peculiar velocity dispersion

*σ*

_{P}in the physical velocity units:

*h*

^{−1}Mpc, we adopt the following fitting formula throughout the analysis below which better approximates the small-scale dispersions in physical units:

*μ*, one obtains the direction-averaged power spectrum in redshift space:

*z*

_{min}and

*z*

_{max}denote the redshift range of the survey, and

*S*

_{K}(

*χ*) is determined by the sign of the curvature

*K*as

*a*

_{0}is normalized as unity, and the spatial curvature

*K*is given as

*χ*(

*z*) is computed by

*D*

_{c}(

*z*) at redshift

*z*is equivalent to

*S*

^{−1}(

*χ*(

*z*)), and, in the case of Ω

_{Λ}= 0, is explicitly given by Mattig’s formula:

*dV*

_{c}/

*dz*, the comoving volume element per unit solid angle, is explicitly given as

### 5.2 Evaluating two-point correlation functions from *N*-body simulation data

The theoretical modeling described above was tested against simulation results by Hamana, Colombi, and Suto [28]. Using cosmological *N*-body simulations in SCDM and Λ-CDM models, they generated light-cone samples as follows: First, they adopt a distance observer approximation and assume that the line-of-sight direction is parallel to the *Z*-axis regardless of its (*X*, *Y*) position. Second, they periodically duplicate the simulation box along the *Z*-direction so that at a redshift *z*, the position and velocity of those particles locating within an interval *χ*(*z*) ± Δ*χ*(*z*) are dumped, where Δ*χ*(*z*) is determined by the output time-interval of the original *N*-body simulation. Finally they extract five independent (non-overlapping) cone-shape samples with the angular radius of 1 degree (the field-of-view of *π* degree^{2}). In this manner, they have generated mock data samples on the light-cone continuously extending up to *z* = 0.4 (relevant for galaxy samples) and *z* = 2.0 (relevant for QSO samples) from the small and large boxes, respectively.

*x*

_{12}of two objects located at

*z*

_{1}and

*z*

_{2}with an angular separation

*θ*

_{12}is given by

*x*

_{1}≡

*D*

_{c}(

*z*

_{1}) and

*x*

_{2}≡

*D*

_{c}(

*z*

_{2}).

*z*

_{obs}for each object differs from the “real” one

*z*

_{real}due to the velocity distortion effect:

*v*

_{pec}is the line of sight relative peculiar velocity between the object and the observer in

*physical*units. Then the comoving separation

*s*

_{12}of two objects in redshift space is computed as

*s*

_{1}≡

*D*

_{c}(

*z*

_{obs,1}) and

*s*

_{2}=

*D*

_{c}(

*z*

_{obs,2}).

*M*

_{B}at

*z*(with the luminosity distance

*d*

_{L}(

*z*)), we applied the K-correction,

*L*

_{ν}∝

*ν*

^{−p}(we use

*p*= 0.5). In practice, we adopt the galaxy selection function

*ϕ*

_{gal}(<

*B*

_{lim},

*z*) with

*B*

_{lim}= 19 and

*z*

_{min}= 0.01 for the small box realizations, and the QSO selection function

*ϕ*

_{qso}(<

*B*

_{lim},

*z*) with

*B*

_{lim}= 21 and

*z*

_{min}= 0.2 for the large box realizations. We do not introduce the spatial biasing between selected particles and the underlying dark matter.

### 5.3 Cosmological redshift-space distortion

Consider a spherical object at high redshift. If the wrong cosmology is assumed in interpreting the distance-redshift relation along the line of sight and in the transverse direction, the sphere will appear distorted. Alcock and Paczynski [2] pointed out that this curvature effect could be used to estimate the cosmological constant. Matsubara and Suto [54] and Ballinger, Peacock, and Heavens [3] developed a theoretical framework to describe the geometrical distortion effect (cosmological redshift distortion) in the two-point correlation function and the power spectrum of distant objects, respectively. Certain studies were less optimistic than others about the possibility of measuring this Alcock-Paczynski effect. For example, Ballinger, Peacock, and Heavens [3] argued that the geometrical distortion could be confused with the dynamical redshift distortions caused by peculiar velocities and characterized by the linear theory parameter \(\beta \equiv \Omega _{\rm{m}}^{0.6}/b\). Matsubara and Szalay [55, 56] showed that the typical SDSS and 2dF samples of normal galaxies at low redshift (*z* ∼ 0.1) have sufficiently low signal-to-noise, but they are too shallow to detect the Alcock-Paczynski effect. On the other hand, the quasar SDSS and 2dFGRS surveys are at a useful redshift, but they are too sparse. A more promising sample is the SDSS Luminous Red Galaxies survey (out to redshift *z* ∼ 0.5) which turns out to be optimal in terms of both depth and density.

While this analysis is promising, it remains to be tested if non-linear clustering and complicated biasing (which is quite plausible for red galaxies) would not ‘contaminate’ the measurement of the equation of state. Even if the Alcock-Paczynski test turns out to be less accurate than other cosmological tests (e.g., CMB and SN Ia), the effect itself is an interesting and important ingredient in analyzing the clustering pattern of galaxies at high redshifts. We shall now present the formalism for this effect.

*observable*separations perpendicular and parallel to the line-of-sight direction,

*x*

_{s⊥}= (

*c*/

*H*

_{0})

*zδθ*and

*x*

_{s∥}= (

*c*/

*H*

_{0})

*δz*, are mapped differently to the corresponding comoving separations in real space

*x*

_{⊥}and

*x*

_{∥}:

*d*

_{A}(

*z*) being the angular diameter distance. The difference between

*c*

_{⊥}(

*z*) and

*c*

_{∥}(

*z*) generates an apparent anisotropy in the clustering statistics, which should be isotropic in the comoving space. Then the power spectrum in cosmological redshift space

*P*

^{(CRD)}is related to

*P*

^{(S)}defined in the

*comoving*redshift space as

*k*

_{s⊥}=

*c*

_{⊥}(

*z*)

*k*

_{⊥}and

*k*

_{s∥}=

*c*

_{∥}(

*z*)

*k*

_{∥}are the wavenumber perpendicular and parallel to the line-of-sight direction.

*P*

^{(CRD)}(

*k*

_{s},

*μ*

_{k};

*z*= 2.2). As specific examples, we consider SCDM, LCDM, and OCDM models, which have (Ω

_{m}, Ω

_{Λ},

*h*,

*σ*

_{8}) = (1.0, 0.0, 0.5, 0.6), (0.3, 0.7, 0.7, 1.0), and (0.3, 0.0, 0.7, 1.0), respectively. Clearly the linear theory predictions (

*σ*

_{P}= 0; top panels) are quite different from the results of

*N*-body simulations (bottom panels), indicating the importance of the nonlinear velocity effects (

*σ*

_{P}computed according to [58]; middle panels).

*L*

_{l}(

*μ*

_{k}) are the

*l*-th order Legendre polynomials. Similarly, the two-point correlation function is decomposed as

*μ*

_{x}between the separation vector and the line-of-sight. The above multipole moments satisfy the following relations:

*j*

_{l}(

*kx*) being spherical Bessel functions. Substituting

*P*

^{(CRD)}(

*k*

_{s},

*μ*

_{k};

*z*) in Equation (159) yields \(P_l^{({\rm{CRD}})}({k_{\rm{s}}};z)\), and then

*ξ*

^{(CRD)}(

**x**_{s};

*z*) can be computed from Equation (161).

_{m}, Ω

_{Λ}). Figure 17 indicates the feasibility, which interestingly results in a constraint fairly orthogonal to that from the supernovae Ia Hubble diagram.

### 5.4 Two-point clustering statistics on a light-cone in cosmological red-shift space

*ξ*

^{(CRD)}(

*x*

_{s⊥},

*x*

_{s∥};

*z*), is computed as

*ξ*

^{(S)}(

*x*

_{⊥},

*x*

_{∥};

*z*) is the redshift-space correlation function defined through Equation (131).

*ϕ*(

*z*) is the selection function determined by the observational target selection and the luminosity function of the objects. Then, the final expressions [84] reduce to

*z*

_{min}and

*z*

_{max}denote the redshift range of the survey, \(d{V_c}/dz = d_{\rm{C}}^2(z)/H(z)\) is the comoving volume element per unit solid angle.

Note that *k*_{s} and *x*_{s}, defined in \(P_l^{({\rm{CRD}})}({k_{\rm{s}}};z)\) and \(\xi _l^{{\rm{CRD}}}({x_{\rm{s}}};z)\), are related to their comoving counterparts at *z* through Equations (158) and (154), while those in \(P_l^{({\rm{LC,CRD}})}({k_{\rm{s}}})\) and \(\xi _l^{({\rm{LC,CRD}})}({x_{\rm{s}}})\) are not specifically related to any comoving wavenumber and separation. Rather, they correspond to the quantities averaged over the range of *z* satisfying the observable conditions \({x_{\rm{s}}} = (c/{H_0})\sqrt {\delta {z^2} + {z^2}\delta {\theta ^2}}\) and *k*_{s} = 2*π*/*x*_{s}.

Let us show specific examples of the two-point clustering statistics on a light-cone in cosmological redshift space. We consider SCDM and LCDM models, and take into account the selection functions relevant to the upcoming SDSS spectroscopic samples of galaxies and quasars by adopting the *B*-band limiting magnitudes of 19 and 20, respectively.

*z*<

*z*

_{max}= 0.2 and QSOs in 0 <

*z*<

*z*

_{max}= 5, respectively. The left and right panels present the results in SCDM and LCDM models. For simplicity we adopt a scale-independent linear bias model [23]:

*b*(

*k*,

*z*= 0) = 1 and 1.5 for galaxies and quasars, respectively.

The upper and lower panels correspond to magnitude-limited samples of galaxies (*B* < 19 in 0 < *z* < *z*_{max} = 0.2; no bias model) and QSOs (B < 20 in 0 < *z* < *z*_{max} = 5; Fry’s linear bias model), respectively. We present the results normalized by the real-space power spectrum in linear theory *P*^{(R,lin)}(*k*; *z*) [4], and \(P_0^{({\rm{S}})}(k;z = 0),P_0^{({\rm{S}})}(k;z = {z_{\max}}),P_0^{({\rm{CRD}})}({k_{\rm{s}}};z = {z_{\max}})\) and \(P_0^{({\rm{LC,CRD}})}({k_{\rm{s}}})\) are computed using the nonlinear power spectrum [67].

Consider first the results for the galaxy sample (upper panels). On linear scales (*k* <; 0.1 *h* Mpc−1), \(P_0^{({\rm{S}})}(k;z = 0)\) plotted in dashed lines is enhanced relative to that in real space, mainly due to a linear redshift-space distortion (the Kaiser factor in Equation (131)). For nonlinear scales, the nonlinear gravitational evolution increases the power spectrum in real space, while the finger-of-God effect suppresses that in redshift space. Thus, the net result is sensitive to the shape and the amplitude of the fluctuation spectrum, itself; in the LCDM model that we adopted, the nonlinear gravitational growth in real space is stronger than the suppression due to the finger-of-God effect. Thus, \(P_0^{({\rm{S}})}(k;z = 0)\) becomes larger than its real-space counterpart in linear theory. In the SCDM model, however, this is opposite and \(P_0^{({\rm{S}})}(k;z = 0)\) becomes smaller.

The power spectra at *z* = 0.2 (dash-dotted lines) are smaller than those at *z* = 0 by the corresponding growth factor of the fluctuations, and one might expect that the amplitude of the power spectra on the light-cone (solid lines) would be in-between the two. While this is correct, if we use the comoving wavenumber, the actual observation on the light-cone in the cosmological redshift space should be expressed in terms of *k*_{s} (see Equation (158)). If we plot the power spectra at *z* = 0.2 taking into account the geometrical distortion, \(P_0^{({\rm{CRD}})}({k_{\rm{s}}};z = 0.2)\) in the dotted lines becomes significantly larger than \(P_0^{({\rm{S}})}(k;z = 0.2)\). Therefore, \(P_0^{({\rm{LC,CRD)}}}({k_{\rm{s}}})\) should take a value between those of \(P_0^{({\rm{CRD)}}}({k_{\rm{s}}};z = 0) = P_0^{({\rm{S}})}(k;z = 0)\). This explains the qualitative features shown in the upper panels of Figure 18. As a result, both the cosmological redshift-space distortion and the light-cone effect substantially change the predicted shape and amplitude of the power spectra, even for the galaxy sample [60]. The results for the QSO sample can be basically understood in a similar manner, except that the evolution of the bias makes a significant difference, since the sample extends to much higher redshifts.

*k*∼ 2

*π*/

*x*. Unlike the power spectra, however, two-point correlation functions are not positive definite. The funny features in Figure 19 on scales larger than 30

*h*

^{−1}Mpc (100

*h*

^{−1}Mpc) in SCDM (LCDM) originate from the fact that

*ξ*

^{(R,lin)}(

*x*,

*z*= 0) becomes negative there.

In fact, since the resulting predictions are sensitive to the bias, which is unlikely to quantitatively be specified by theory, the present methodology will find two completely different applications. For relatively shallower catalogues, like galaxy samples, the evolution of bias is not supposed to be so strong. Thus, one may estimate the cosmological parameters from the observed degree of the redshift distortion, as has been conducted conventionally. Most importantly, we can correct for the systematics due to the light-cone and geometrical distortion effects, which affect the estimate of the parameters by ∼ 10%. Alternatively, for deeper catalogues like high-redshift quasar samples, one can extract information on the object-dependent bias only by correcting the observed data on the basis of our formulae.

In a sense, the former approach uses the light-cone and geometrical distortion effects as real cosmological signals, while the latter regards them as inevitable, but physically removable, noise. In both cases, the present methodology is important in properly interpreting the observations of the Universe at high redshifts.

## 6 Recent Results from 2dF and SDSS

### 6.1 The latest galaxy redshift surveys

Redshifts surveys in the 1980s and the 1990s (e.g., the CfA, IRAS, and Las campanas surveys) measured thousands to tens of thousands galaxy redshifts. Multifibre technology now allows us to measure redshifts of millions of galaxies. Below we summarize briefly the properties of the main new surveys 2dFGRS, SDSS, 6dF, VIRMOS, DEEP2, and we discuss key results from 2dFGRS and SDSS. Further analysis of these surveys is currently underway.

#### 6.1.1 The 2dF galaxy redshift survey

*b*

_{J}< 19.45. The main survey regions are two declination strips, in the northern and southern Galactic hemispheres, and also 100 random fields, covering in total about 1800 deg

^{2}(see Figures 20 and 21). The median redshift of the 2dFGRS is \(\bar z \sim 0.1\) (see [11, 65] for reviews).

#### 6.1.2 The SDSS galaxy redshift survey

The SDSS (Sloan Digital Sky Survey) is a U.S.-Japan-Germany joint project to image a quarter of the Celestial Sphere at high Galactic latitude as well as to obtain spectra of galaxies and quasars from the imaging data[93]. The dedicated 2.5 meter telescope at Apache Point Observatory is equipped with a multi-CCD camera with five broad bands centered at 3561, 4676, 6176, 7494, and 8873 Å. For further details of SDSS, see [102, 80]

*z*< 0.05 with thickness of 10

*h*

^{−1}Mpc centered around the equatorial plane in the upper-left panel;

*z*< 0.1 with a thickness of 15

*h*

^{−1}Mpc in the upper-right panel;

*z*< 0.2 with a thickness of 20

*h*

^{−1}Mpc in the lower panel.

#### 6.1.3 The 6dF galaxy redshift survey

The 6dF (6-degree Field) [91] is a survey of redshifts and peculiar velocities of galaxies selected primarily in the Near Infrared from the new 2MASS (Two Micron All Sky Survey) catalogue[90]. One goal is to measure redshifts of more than 170,000 galaxies over nearly the entire Southern sky. Another exciting aim of the survey is to measure peculiar velocities (using 2MASS photometry and 6dF velocity dispersions) of about 15,000 galaxies out to 150 *h*^{−1} Mpc. The high quality data of this survey could revive peculiar velocities as a cosmological probe (which was very popular about 10–15 years ago). Observations have so far obtained nearly 40,000 redshifts and completion is expected in 2005.

#### 6.1.4 The DEEP galaxy redshift survey

The DEEP survey is a two-phased project using the Keck telescopes to study the properties and distribution of high redshift galaxies [92]. Phase 1 used the LRIS spectrograph to study a sample of ∼ 1000 galaxies to a limit of I = 24.5. Phase 2 of the DEEP project will use the new DEIMOS spectrograph to obtain spectra of ∼ 65,000 faint galaxies with redshifts *z* ∼ 1. The scientific goals are to study the evolution of properties of galaxies and the evolution of the clustering of galaxies compared to samples at low redshift. The survey is designed to have the fidelity of local redshift surveys such as the LCRS survey, and to be complementary to ongoing large redshift surveys such as the SDSS project and the 2dF survey. The DEIMOS/DEEP or DEEP2 survey will be executed with resolution R 4000, and we therefore expect to measure linewidths and rotation curves for a substantial fraction of the target galaxies. DEEP2 will thus also be complementary to the VLT/VIRMOS project, which will survey more galaxies in a larger region of the sky, but with much lower spectral resolution and with fewer objects at high redshift.

#### 6.1.5 The VIRMOS galaxy redshift survey

The on-going Franco-Italian VIRMOS project[94] has delivered the VIMOS spectrograph for the European Southern Observatory Very Large Telescope (ESO-VLT). VIMOS is a VIsible imaging Multi-Object Spectrograph with outstanding multiplex capabilities: With 10 arcsec slits, spectra can be taken of 600 objects simultaneously. In integral field mode, a 6400-fibre Integral Field Unit (IFU) provides spectroscopy for all objects covering a 54 × 54 arcsec^{2} area. VIMOS therefore provides unsurpassed efficiency for large surveys. The VIRMOS project consists of: construction of VIMOS, and a Mask Manufacturing Unit for the ESO-VLT. The VIRMOS-VLT Deep Survey (VVDS), a comprehensive imaging and redshift survey of the deep Universe based on more than 150,000 redshifts in four 4 square-degree fields.

### 6.2 Cosmological parameters from 2dFGRS

#### 6.2.1 The power spectrum of 2dF Galaxies on large scales

An initial estimate of the convolved, redshift-space power spectrum of the 2dFGRS was determined by Percival et al. [72] for a sample of 160,000 redshifts. On scales 0.02 *h* Mpc^{−1} < *k* < 0.15 *h* Mpc^{−1}, the data are fairly robust and the shape of the power spectrum is not significantly affected by redshift-space distortion or non-linear effects, while its overall amplitude is increased due to the linear redshift-space distortion effect (see Section 5).

*k*, one can constrain the cosmological parameters. For instance, assuming a Gaussian prior on the Hubble constant

*h*= 0.7 ± 0.07 (from [22]), Percival et al. [72] obtained the 68 percent confidence limits on the shape parameter

*Ω*

_{m}

*h*= 0.20 ± 0.03, and a baryon fraction

*Ω*

_{b}/

*Ω*

_{m}= 0.15 ± 0.07. For a fixed set of cosmological parameters, i.e.,

*n*= 1, Ω

_{m}= 1 − ΩΛ = 0.3, Ω

_{b}

*h*

^{2}= 0.02, and

*h*= 0.70, the r.m.s. mass fluctuation amplitude of 2dFGRS galaxies smoothed over a top-hat radius of 8

*h*

^{−1}Mpc in redshift space turned out to be \(\sigma _{8g}^S({L_{\rm{s}}},{z_{\rm{s}}}) \approx 0.94\).

#### 6.2.2 An upper limit on neutrino masses

The recent results of atmospheric and solar neutrino oscillations [24, 1] imply non-zero mass-squared differences of the three neutrino flavours. While these oscillation experiments do not directly determine the absolute neutrino masses, a simple assumption of the neutrino mass hierarchy suggests a lower limit on the neutrino mass density parameter, Ω_{ν} = *m*_{ν,tot}*h*^{−2}/(94 eV) ≈ 0.001. Large scale structure data can put an upper limit on the ratio Ω_{ν}/Ω_{m} due to the neutrino’ free streaming’ effect [33]. By comparing the 2dF galaxy power-spectrum of fluctuations with a four-component model (baryons, cold dark matter, a cosmological constant, and massive neutrinos) it was estimated that Ω_{ν}/Ω_{m} < 0.13 (95% CL), or with concordance prior of Ω_{m} = 0.3, Ω_{ν} < 0.04, or an upper limit of ∼ 2 eV on the total neutrino mass, assuming a prior of *h* ≈ 0.7 [20, 19] (see Figure 24). In order to minimize systematic effects due to biasing and non-linear growth, the analysis was restricted to the range 0.02 < *k* < 0.15 *h* Mpc^{−1}. Additional cosmological data sets bring down this upper limit by a factor of two [79].

#### 6.2.3 Combining 2dFGRS and CMB

While the CMB probes the fluctuations in matter, the galaxy redshift surveys measure the perturbations in the light distribution of particular tracer (e.g., galaxies of certain type). Therefore, for a fixed set of cosmological parameters, a combination of the two can better constrain cosmological parameters, and it can also provide important information on the way galaxies are ‘biased’ relative to the mass fluctuations,

*C*

_{ℓ}. The connection between the harmonic

*ℓ*and

*k*is roughly

_{m}= 0.3, the 2dFGRS range 0.02 <

*k*< 0.15

*h*Mpc

^{−1}corresponds approximately to 200 <

*ℓ*< 1500, which is well covered by the recent CMB experiments.

Recent CMB measurements have been used in combination with the 2dF power spectrum. Efstathiou et al. [17] showed that 2dFGRS+CMB provide evidence for a positive cosmological constant Ω_{Λ} ∼ 0.7 (assuming *w* = −1), independently of the studies of supernovae Ia. As explained in [72], the shapes of the CMB and the 2dFGRS power spectra are insensitive to Dark Energy. The main important effect of the dark energy is to alter the angular diameter distance to the last scattering, and thus the position of the first acoustic peak. Indeed, the latest result from a combination of WMAP with 2dFGRS and other probes gives \(h = 0.71_{- 0.03}^{+ 0.04},\,\,{\Omega _{\rm{b}}}{h^2} = 0.0224 \pm 0.0009,\,\>{\Omega _{\rm{m}}}{h^2} = 0.135_{- 0.009}^{+ 0.008}\), *σ*_{8} = 0.84 ± 0.04, Ω_{tot} = 1.02 ± 0.02, and *w* < −0.78 (95% CL, assuming *w* ≥ −1) [79].

#### 6.2.4 Redshift-space distortion

An independent measurement of cosmological parameters on the basis of 2dFGRS comes from redshift-space distortions on scales ≲ 10 *h*^{−1} Mpc: a correlation function *ζ*(*π*, *σ*) in parallel and transverse pair separations *π* and *σ*. As described in Section 5, the distortion pattern is a combination of the coherent infall, parameterized by \(\beta = \Omega _{\rm{m}}^{0.6}/b\) and random motions modelled by an exponential velocity distribution function (see Equation (133)). This methodology has been applied by many authors. For instance, Peacock et al. [66] derived *β*(*L*_{s} = 0.17, *z*_{s} = 1.9*L*_{*}) = 0.43 ± 0.07, and Hawkins et al. [30] obtained *β*(*L*_{s} = 0.15, *z*_{s} = 1.4*L*_{*}) = 0.49 ± 0.09 and a velocity dispersion *σ*_{P} = 506 ± 52 km s^{−1}. Using the full 2dF+CMB likelihood function on the (*b*, Ω_{m}) plane, Lahav et al. [42] derived a slightly larger (but consistent within the quoted error-bars) value, *β*(*L*_{s} = 0.17, *z*_{s} = 1.9*L*_{*}) ≃ 0.48 ± 0.06.

#### 6.2.5 The bi-spectrum and higher moments

*k*< 0.5

*h*Mpc

^{−1}they found

*b*

_{1}= 1.04 ± 0.11 and

*b*

_{2}= −0.054 ± 0.08, in support of no biasing on large scale. This is a non-trivial result, as the analysis covers non-linear scales. Baugh et al. [5] and Croton et al. [14] measured the moments of the galaxy count probability distribution function in 2dFGRG up to order

*p*= 6 (order

*p*= 2 is the variance,

*p*= 3 is the skewness, etc.). They demonstrated the hierarchical scaling of the averaged

*p*-point galaxy correlation functions. However, they found that the higher moments are strongly affected by the presence of two massive superclusters in the 2dFGRS volume. This poses the question of whether 2dFGRS is a’ fair sample’ for high order moments.

### 6.3 Luminosity and spectral-type dependence of galaxy clustering

Although biasing was commonly neglected until the early 1980s, it has become evident obser-vationally that on scales ≲ 10 *h*^{−1} Mpc different galaxy populations exhibit different clustering amplitudes, the so-called morphology-density relation [16]. As discussed in Section 4, galaxy biasing is naturally predicted from a variety of theoretical considerations as well as direct numerical simulations [37, 59, 15, 87, 86, 103]. Thus, in this Section we summarize the extent to which the galaxy clustering is dependent on the luminosity, spectral-type, and color of the galaxy sample from the 2dFGRS and SDSS.

#### 6.3.1 2dFGRS: Clustering per luminosity and spectral type

*η*≈ 0.5

*pc*

_{1}+

*pc*

_{2}. Qualitatively,

*η*is an indicator of the ratio of the present to the past star formation activity of each galaxy. This allows one to divide the 2dFGRS into

*η*-types, and to study, e.g., luminosity functions and clustering per type. Norberg et al. [61] showed that, at all luminosities, early-type galaxies have a higher bias than late-type galaxies, and that the biasing parameter, defined here as the ratio of the galaxy to matter correlation function \(b \equiv \sqrt {{\xi _{\rm{g}}}/{\xi _{\rm{m}}}}\) varies as

*b/b*

_{*}= 0.85 + 0.15

*L/L*

_{*}. Figure 25 indicates that for

*L*

_{*}galaxies, the real space correlation function amplitude of

*η*early-type galaxies is ∼ 50% higher than that of late-type galaxies.

*ζ*(

*σ*,

*π*). The correlation function calculated from the most passively (‘red’, for which the present rate of star formation is less than 10 % of its past averaged value) and actively (‘blue’) star-forming galaxies. The clustering properties of the two samples are clearly distinct on scales ≲ 10

*h*

^{−1}Mpc. The ‘red’ galaxies display a prominent finger-of-God effect and also have a higher overall normalization than the ‘blue’ galaxies. This is a manifestation of the well-known morphology-density relation. By fitting

*ζ*(

*π*,

*σ*) over the separation range 8–20

*h*

^{−1}Mpc for each class, it was found that

*β*

_{active}= 0.49±0.13,

*β*

_{passive}= 0.48±0.14 and corresponding pairwise velocity dispersions

*σ*

_{P}of 416 ± 76 km s

^{−1}and 612 ± 92 km s

^{−1}[45]. At small separations, the real space clustering of passive galaxies is stronger than that of active galaxies: The slopes

*γ*are respectively 1.93 and 1.50 (see Figure 27) and the relative bias between the two classes is a declining function of separation. On scales larger than 10

*h*

^{−1}Mpc the biasing ratio is approaching unity.

Another statistic was applied recently by Wild et al. [98] and Conway et al. [12], of a joint counts-in-cells on 2dFGRS galaxies, classified by both color and spectral type. Exact linear bias is ruled out on all scales. The counts are better fitted to a bivariate log-normal distribution. On small scales there is evidence for stochasticity. Further investigation of galaxy formation models is required to understand the origin of the stochasticity.

#### 6.3.2 SDSS: Two-point correlation functions per luminosity and color

Zehavi et al. [104] analyzed the Early Data Release (EDR) sample of the SDSS 30,000 galaxies to explore the clustering of per luminosity and color. The inferred real-space correlation function is well described by a single power-law: *ζ*(*r*) = (*r*/6.1 ± 0.2 *h*^{−1} Mpc)^{−1.75±0.03} for 0.1 *h*^{−1} Mpc ≤ *r* ≤ 16 h^{−1} Mpc. The galaxy pairwise velocity dispersion is *σ*_{12} ≈ 600 ± 100 km s^{−1} for projected separations 0.15 *h*^{−1} Mpc ≤ *r*_{p} ≤ 5 *h*^{−1} Mpc. When divided by color, the red galaxies exhibit a stronger and steeper real-space correlation function and a higher pairwise velocity dispersion than do the blue galaxies. In agreement with 2dFGRS there is clear evidence for a scale-independent luminosity bias at *r* ∼ 10 *h*^{−1} Mpc. Subsamples with absolute magnitude ranges centered on *M*_{*} − 1.5,

*M*_{*}, and *M*_{*} + 1.5 have real-space correlation functions that are parallel power laws of slope ≈ −1.8 with correlation lengths of approximately 7.4 *h*^{−1} Mpc, 6.3 *h*^{−1} Mpc, and 4.7 *h*^{−1} Mpc, respectively.

#### 6.3.3 SDSS: Three-point correlation functions and the nonlinear biasing of galaxies per luminosity and color

*ζ*(

*r*

_{12},

*r*

_{23},

*r*

_{31}) obeys the

*hierarchical relation*:

*Q*

_{r}being a constant. The value of

*Q*

_{r}in real space deprojected from these angular catalogues is 1.29 ± 0.21 for

*r*< 3

*h*

^{−1}Mpc. Subsequent analyses of redshift catalogs confirmed the hierarchical relation, at least approximately, but the value of

*Q*

_{z}(in redshift space) appears to be smaller,

*Q*

_{z}∼ 0.5−1.

As we have seen in Section 6.3.2, galaxy clustering is sensitive to the intrinsic properties of the galaxy samples under consideration, including their morphological types, colors, and luminosities. Nevertheless the previous analyses were not able to examine those dependences of 3PCFs because of the limited number of galaxies. Indeed Kayo et al. [39] were the first to perform the detailed analysis of 3PCFs explicitly taking account of the morphology, color, and luminosity dependence. They constructed volume-limited samples from a subset of the SDSS galaxy redshift data, ‘Large-scale Structure Sample 12’. Specifically they divided each volume limited sample into color subsamples of red (blue) galaxies, which consist of 7949 (8329), 8930 (8155), and 3706 (3829) galaxies for −22 < *M*_{r} − 5 log *h* < −21, −21 < *M*_{r} − 5 log *h* < −20, and −20 < *M*_{r} − 5 log *h* < −19, respectively.

*Q*

_{z}is almost scale-independent and ranges between 0.5 and 1.0, and that no systematic dependence is noticeable on luminosity and color. This implies that the 3PCF itself does depend on the galaxy properties since two-point correlation functions (2PCFs) exhibit clear dependence on luminosity and color. Previous simulations and theoretical models [82, 53, 50, 85] indicate that

*Q*decreases with scale in both real and redshift spaces. This trend is not seen in the observational results.

*i*runs over each sample of galaxies with different colors and luminosities. The predictions of the mass 2PCFs in redshift space,

*ξ*

_{z,ΛCDM}(

*s*),

*in the Λ cold dark matter model are computed following*[28]

*δ*

_{g,i}for the

*i*-th population of galaxies is given by

*b*

_{g,i(1)}and

*b*

_{g,i(1)}are constant and the mass density field

*δ*

_{mass}≪ 1, Equation (174) implies that

*b*

_{g,i(2)}= 0) simply implies that

*Q*

_{g,i}is inversely proportional to

*b*

_{g,i(1)}, which is plotted in Figure 30. A comparison of Figures 29 and 30 indicates that the biasing in the 3PCFs seems to compensate the difference of

*Q*

_{g}purely due to that in the 2PCFs.

Such behavior is unlikely to be explained by any simple model inspired by the perturbative expansion like Equation (176). Rather it indeed points to a kind of regularity or universality of the clustering hierarchy behind galaxy formation and evolution processes. Thus the galaxy biasing seems much more complex than the simple deterministic and linear model. More precise measurements of 3PCFs and even higher-order statistics with future SDSS datasets would be indeed valuable to gain more specific insights into the empirical biasing model.

### 6.4 Topology of the Universe: Analysis of SDSS galaxies in terms of Minkowski functionals

All the observational results presented in the preceding Sections 6.1, 6.2, and 6.3 were restricted to the two-point statistics. As emphasized in Section 3, the clustering pattern of galaxies has much richer content than the two-point statistics can probe. Historically the primary goal of the topological analysis of galaxy catalogues was to test Gaussianity of the primordial density fluctuations. Although the major role for that goal has been superseded by the CMB map analysis [41], the proper characterization of the morphology of large-scale structure beyond the two-point statistics is of fundamental importance in cosmology. In order to illustrate a possibility to explore the topology of the Universe by utilizing the new large surveys, we summarize the results of the Minkowski Functionals (MF) analysis of SDSS galaxy data [32].

In an apparent-magnitude limited catalogue of galaxies, the average number density of galaxies decreases with distance because only increasingly bright galaxies are included in the sample at larger distance. With the large redshift surveys it is possible to avoid this systematic change in both density and galaxy luminosity by constructing volume-limited samples of galaxies, with cuts on both absolute-magnitude and redshift. This is in particular useful for analyses such as MF and was carried out in the analysis shown here.

*ν*

_{f}defined from the volume fraction [26]:

*δ*=

*υ*

_{f}

*σ*, for a Gaussian random field with r.m.s. density fluctuations

*σ*. If the evolved density field may approximately have a good one-to-one correspondence with the initial random-Gaussian field, then this transformation removes the effect of evolution of the PDF of the density field. Under this assumption, the MFs as a function of volume fraction would be sensitive only to the topology of the isodensity contours rather than evolution with time of the density threshold assigned to a contour. While the limitations of the approximation of monotonicity in the relation between initial and evolved density fields are well recognized [40], we plot the result in this way for simplicity.

The good match between the observed MFs and the mock predictions based on the LCDM model with the initial random-Gaussianity, as illustrated in Figure 31, might be interpreted to imply that the primordial Gaussianity is confirmed. A more conservative interpretation is that, given the size of the estimated uncertainties, these data do not provide evidence for initial non-Gaussianity, i.e., the data are *consistent* with primordial Gaussianity. Unfortunately, due to the statistical limitation of the current SDSS data, it is not easy to put a more quantitative statement concerning the initial Gaussianity. Moreover, in order to go further and place more quantitative constraints on primordial Gaussianity with upcoming data, one needs a more precise and reliable theoretical model for the MFs, which properly describes the nonlinear gravitational effect possibly as well as galaxy biasing beyond the simple mapping on the basis of the volume fraction. In fact, galaxy biasing is a major source of uncertainty for relating the observed MFs to those obtained from the mock samples for dark matter distributions. If LCDM is the correct cosmological model, the good match of the MFs for mock samples from the LCDM simulations to the observed SDSS MFs may indicate that nonlinearity in the galaxy biasing is relatively small, at least small enough that it does not significantly affect the MFs (the MFs as a function of *ν*_{σ} remain unchanged for the linear biasing).

### 6.5 Other statistical measures

In this section, we have presented the results on the basis of particular statistical measures including the two-point correlation functions, power spectrum, redshift distortion, and Minkowski functionals. Of course there are other useful approaches in analysing redshift surveys: the void probability function, counts-in-cells, Voronoi cells, percolation, and minimal spanning trees. Another area not covered here is optimal reconstruction of density field (e.g, using the Wiener filtering). The reader is referred to a good summary of those and other methodology in the book by Martinez and Saar [48].

Admittedly the results that we presented here are rather observational and phenomenological, and far from being well-understood theoretically. It is quite likely that when other on-going and future surveys are being analysed in great detail, the nature of galaxy clustering will be revealed in a much more quantitative manner. They are supposed to act as a bridge between cosmological framework and galaxy formation operating in the Universe. While the proper understanding of physics of galaxy formation is still far away, the future redshift survey data will present interesting challenges for constructing models of galaxy formation.

## 7 Discussion

As a classical probe, galaxy redshift surveys still remain an important tool for studying cosmology and galaxy formation. On large scales (> 10 *h*^{−1} Mpc or so) they nicely complement the cosmic microwave background, supernovae Ia, and gravitational lensing in quantifying in detail the cosmological model. On small scales (< 10 *h*^{−1} Mpc) the clustering patterns of different galaxy types (defined by structural or spectral properties) provide important constraints on models of biased galaxy formation.

The redshift surveys mainly constrain Ω_{m} via both redshift distortion (which also depends on biasing) and the shape of the Λ-CDM power spectrum, which depends on the primordial spectrum, the product Ω_{m}*h*, and also the baryon density Ω_{b}. Redshift surveys at a given epoch are not sensitive to the Dark Energy (or the cosmological constant, in a specific case), but combined with the CMB they can constrain the cosmic equation of state.

*w*= − 1). Within the Λ-CDM model this scenario can be characterised by the six parameters given in Table 2 based on [88]. The WMAP data used are both the temperature and polarization fluctuations. It can be seen that by adding the SDSS information more than halves the WMAP-only error bars on some of the parameters. These results are in good agreement with the joint analysis WMAP+2dF [79].

Symbol | WMAP alone | WMAP+SDSS | Description |
---|---|---|---|

Ω | \(0.75_{- 0.10}^{+ 0.10}\) | \(0.699_{- 0.045}^{+ 0.042}\) | Dimensionless cosmological constant |

Ω | \(0.0245_{- 0.0019}^{+ 0.0050}\) | \(0.0232_{- 0.0010}^{+ 0.0013}\) | Baryon density parameter |

Ω | \(0.140_{- 0.018}^{+ 0.020}\) | \(0.1454_{- 0.0082}^{+ 0.0091}\) | Total matter density parameter |

| \(0.99_{- 0.14}^{+ 0.19}\) | \(0.917_{- 0.072}^{+ 0.090}\) | Mass fluctuation amplitude at 8 |

| \(1.02_{- 0.06}^{+ 0.16}\) | \(0.977_{- 0.025}^{+ 0.039}\) | Primordial scalar spectral index at |

| \(0.21_{- 0.11}^{+ 0.24}\) | \(0.124_{- 0.057}^{+ 0.083}\) | Re-ionization optical depth |

Both components of the model, Λ and CDM, have not been directly measured. Are they ‘real’ entities or just ‘epicycles’? Historically epicycles were actually quite useful in forcing observers to improve their measurements and theoreticians to think about better models!

‘The Old Cosmological Constant problem’: Why is Ω

_{Λ}at present so small relative to what is expected from Early Universe physics?‘The New Cosmological Constant problem’: Why is Ω

_{m}∼ Ω_{Λ}at the present-epoch? Why is*w*∼ −1? Do we need to introduce new physics or to invoke the anthropic principle to explain it?There are still open problems in Λ-CDM on the small scales, e.g., galaxy profiles and satellites.

Could other (yet unknown) models fit the data equally well?

Where does the field go from here? Should the activity focus on refinement of the cosmological parameters within Λ-CDM, or on introducing entirely new paradigms?

## Acknowledgements

We thank Joachim Wambsganss and Bernard Schutz for inviting us to write the present review article. O.L. thanks members of the 2dFGRS team and the Leverhulme Quantitative Cosmology group for helpful discussions. Y.S. thanks all his students and collaborators for over many years, in particular Thomas Buchert, Takashi Hamana, Chiaki Hikage, Yipeng Jing, Issha Kayo, Hiromitsu Magira, Takahiko Matsubara, Hiroaki Nishioka, Jens Schmalzing, Atsushi Taruya, Kazuhiro Yamamoto, and Kohji Yoshikawa among others, for enjoyable and fruitful collaborations whose results form indeed the important elements in this review. Y.S. is also grateful for the hospitality at the Institute of Astronomy, University of Cambridge, where the most of present review was put together and written up for completion. O.L. acknowledges a PPARC Senior Research Fellowship. We also thank Idit Zehavi for permitting us to use Figure 28.

Numerical simulations were carried out at the ADAC (Astronomical Data Analysis Center) of the National Astronomical Observatory, Japan (project ID: mys02, yys08a). This research was also supported in part by the Grants-in-Aid from Monbu-Kagakusho and the Japan Society of Promotion of Science.