[
  {
    "path": "README.md",
    "content": "# 🟢 Gaussian Splatting Notes (WIP)\nThe text version of my explanatory stream (Chinese with English CC) on gaussian splatting https://youtube.com/live/1buFrKUaqwM\n\n# 📖 Table of contents\n\n- [Introduction](#-introduction)\n- [Foward pass](#%EF%B8%8F-forward-pass)\n  - placeholder\n- Backward pass\n  - placeholder\n\n# 📑 Introduction\nThis guide aims at deciphering the formulae in the rasterization process (*forward* and *backward*). **It is only focused on these two parts**, and I want to provide as many details as possible since here lies the core of the algorithm. I will paste related code from the [original repo](https://github.com/graphdeco-inria/gaussian-splatting) to help you identify where to look at.\n\nIf you see sections starting with 💡, it's something I think important to understand.\n\nBefore continuing, please read the [original paper](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/3d_gaussian_splatting_high.pdf) of how the gaussian splatting algorithm works in a big picture. Also note that the full algorithm has other important parts such as point densification and pruning which *won't* be covered in this article since I think those parts are relatively easier to understand.\n\n# ➡️ Forward pass\nThe forward pass consists of two parts:\n1.  Compute the attributes of each gaussian\n2.  Compute the color of each pixel\n\n## 1. Compute the attributes of each gaussian\n\nEach gaussian holds the following *raw* attributes:\n\n```python3\n# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L47-L52\nself._xyz = torch.empty(0)            # world coordinate\nself._features_dc = torch.empty(0)    # diffuse color\nself._features_rest = torch.empty(0)  # spherical harmonic coefficients\nself._scaling = torch.empty(0)        # 3d scale\nself._rotation = torch.empty(0)       # rotation expressed in quaternions\nself._opacity = torch.empty(0)        # opacity\n\n# they are initialized as empty tensors then assigned with values on\n# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L215\n```\n\nTo project the gaussian onto a 2D image, we must go through some more computations to transform the attributes to 2D:\n\n### 1-1. Compute derived attributes (radius, uv, cov2D)\n\nFirst, from `scaling` and `rotation`, we can compute *3D covariance* from the formula\n\n$\\Sigma = RSS^TR^T \\quad \\text{Eq. 6}$ where\n```cuda\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L134-L138\nglm::mat3 R = glm::mat3(\n  1.f - 2.f * (y * y + z * z), 2.f * (x * y - r * z), 2.f * (x * z + r * y),\n  2.f * (x * y + r * z), 1.f - 2.f * (x * x + z * z), 2.f * (y * z - r * x),\n  2.f * (x * z - r * y), 2.f * (y * z + r * x), 1.f - 2.f * (x * x + y * y)\n);\n```\nand\n```cuda\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L121-L124\nglm::mat3 S = glm::mat3(1.0f); // S is a diagonal matrix\nS[0][0] = mod * scale.x;\nS[1][1] = mod * scale.y;\nS[2][2] = mod * scale.z;\n```\nNote that `S` is multiplied with a scale factor `mod` that is kept as `1.0` during training.\n\nIn inference, this value (`scaling_modifier`) and be modified on\n```python3\n# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L18\ndef render(..., scaling_modifier = 1.0, ...):\n```\nto control the scale of the gaussians. In their demo they showed how it looks by setting this number to something <1 (shrinking the size). Theoretically this value can also be set >1 to increase the size.\n\n------------------------\n💡 quote from the paper 💡\n> An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our pa- rameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices.\n\nThe design of optimizing the 3D covariance by decomposing it to `R` and `S` separately is not a random choice. It is a trick we call \"reparametrization\". By making it expressed as $RSS^TR^T$, it is guaranteed to be **always** positive semi-definite (matrix of the form $A^TA$ is always positive semi-definite).\n\n------------------------\n\nNext, we need to get 3 things: `radius`, `uv` and `cov` (2D covariance, or equivalently its inverse `conic`) which are the 2D attributes of a gaussian projected on an image.\n\nWe can get `cov` by $\\Sigma' = JW\\Sigma W^TJ^T \\quad \\text{Eq. 5}$\n```cuda\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L99-L106\nglm::mat3 T = W * J;\nglm::mat3 Vrk = glm::mat3(\n\t\tcov3D[0], cov3D[1], cov3D[2],\n\t\tcov3D[1], cov3D[3], cov3D[4],\n\t\tcov3D[2], cov3D[4], cov3D[5]);\nglm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T;\n```\n\nLet's put ![1](https://github.com/graphdeco-inria/gaussian-splatting/assets/11364490/2819c95a-e216-4352-8739-90c692b13c91) (remember the 2D and 3D covariance matrices are symmetric) for the calculation that we're going to do in the following.\n\nIts inverse `conic` (honestly I don't know why they've chosen such a bad variable name, calling it `cov_inv` would've been 100x better) can be expressed as ![1](https://github.com/graphdeco-inria/gaussian-splatting/assets/11364490/6cefc42e-273b-4b30-8eab-1db944670f3e) (actually it's a very useful thing to remember: to invert a 2D matrix, you invert the diagonal, put negative signs on the off-diagonal entries and finally put a `1/det` in front of everything).\n```cuda\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219\nfloat det = (cov.x * cov.z - cov.y * cov.y);\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L222-L223\nfloat det_inv = 1.f / det;\nfloat3 conic = { cov.z * det_inv, -cov.y * det_inv, cov.x * det_inv };  // since the covariance matrix is symmetric, we only need to save the upper triangle\n```\n\n--------------------------------\n💡 A small trick to ensure the numerical stability of the inverse of `cov` 💡\n```cuda\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L110-L111\ncov[0][0] += 0.3f;\ncov[1][1] += 0.3f;\n```\nBy construction, `cov` is only positive *semi-* definite (recall that it's in the form $A^TA$) which is not sufficient for this matrix to be *invertible* (which we need it to be because we need to calculate Eq. 4).\n\nHere we add `0.3` to the diagonal to make it invertible. Why is this true? Let's put $cov = A^TA$; adding some positive value to the diagonal means adding $\\lambda I$ to the matrix ($\\lambda$ is the value we add, and $I$ is the identity matrix), so $cov = A^TA + \\lambda I$. Now for any vector $x$, if we compute $x^T \\cdot cov \\cdot x$, it is equal to $x^TA^TAx + \\lambda x^Tx = ||Ax||^2 + \\lambda ||x||^2$ which is **strictly positive**. Why are we computing this quantity? This is actually the definition of a matrix being **positive definite** (note that we have gotten rid of the *semi-*) which means not only it's invertible, but also all of its eigenvalues are strictly positive.\n\n--------------------------------\n\nHaving `cov` in hand, we can now proceed to compute the `radius` of a gaussian.\n\nTheoretically, when projecting an ellipsoid onto an image, you get an *ellipse*, not a circle. However, storing the attributes of an ellipse is much more complicated: you need to store the center, the long and short axis lengths and the orientation; whereas for a circle, you only need its center and the radius. Therefore, the authors choose to approximate the projection with a circle circumscribing the ellipse (see the following figure). This is what the `radius` attribute represents.\n\n<img width=\"277\" alt=\"\" src=\"https://github.com/lumalabs/luma-pynerf/assets/11364490/63f25c15-18cd-4be9-8e61-cc5db715c308\">\n\nHow to get the `radius` from `cov`? Let's make analogy from the 1-dimensional case.\n\nImagine we have a 1D gaussian like the following:\n\n![image](https://github.com/lumalabs/luma-pynerf/assets/11364490/b50d4359-dc23-4ded-8107-4c2165e55e50)\n\nHow can we define the \"radius\" of such a gaussian? Intuitively, it is some value $r$ that we expect that if we crop the graph from $-r$ to $r$, it still covers most of the graph. Following this intuition and our high-school math knowledge, it is not difficult to come up with the value $r = 3 \\cdot \\sqrt{var}$ where $var$ is the variation of this gaussian (btw, this covers 99.73% of the gaussian).\n\nFortunately, the analogy applies to *any* dimension, just be aware that the \"radius\" is different along each axis (remember there are two axes in an ellipse).\n\nWe said $r = 3 \\cdot \\sqrt{var}$. How to, then, get the $var$ of a 2D gaussian given its covariance matrix? It is the **two eigenvalues** of the covariance matrix. Therefore, the problem now comes down to the calculation of the two eigenvalues.\n\nI could've given you the answer directly, but out of personal preference (I ❤️ linear-algebra), I want to detail it more. First of all, for a square matrix $A$ we say it has eigenvalue $\\lambda$ with the associated eigenvector $x$ if $\\lambda$ and $x$ satisfy $Ax = \\lambda x, x \\neq 0$. There are as many eigenvalues (and associated eigenvectors) as the dimension of $A$ if we operate in the domain of complex numbers.\n\nIn general, to calculate *all* eigenvalues of $A$, we solve the equation $det(A-λ\\cdot I) = 0$ (the variable being $λ$). If we\nreplace with the `cov` matrix we have above, this equation can be expressed as $(a-λ)(c-λ)-b^2 = 0$ which is a quadratic equation that all of us are familiar with.\n\nThe solutions (eigenvalues) are `lambda1` and `lambda2` in the following code\n```cuda\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219\nfloat det = (cov.x * cov.z - cov.y * cov.y);  // this is a*c - b*b in our expression\n...\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L229-L231\nfloat mid = 0.5f * (cov.x + cov.z);\nfloat lambda1 = mid + sqrt(max(0.1f, mid * mid - det));  // I'm not too sure what 0.1 serves here\nfloat lambda2 = mid - sqrt(max(0.1f, mid * mid - det));\n```\nThen we finally get `radius` as 3 times the square root of the bigger eigenvalue:\n```cuda\nhttps://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L232\nfloat my_radius = ceil(3.f * sqrt(max(lambda1, lambda2)));  // ceil() to make it at least 1 because we operate in pixel space\n```\n\nLast thing, which is probably the most obvious, is the `uv` (image coordinates) of the gaussian. It is done via a simple projection from the 3D center:\n```cuda\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L197-L200\nfloat3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };\nfloat4 p_hom = transformPoint4x4(p_orig, projmatrix);\nfloat p_w = 1.0f / (p_hom.w + 0.0000001f);\nfloat3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };\n...\n// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L233\nfloat2 point_image = { ndc2Pix(p_proj.x, W), ndc2Pix(p_proj.y, H) };  // I like to call it uv\n```\n\nPhew, we finally got the three quantities we need to know: **radius, uv and conic**. Let's move on to the next part.\n\n### 1-2. Compute which tiles each gaussian covers\n\nBefore computing the color of an image, the authors introduces a special but *very effective* way that significantly accelerates rendering. Specifically, we divide the whole image into `tiles` which are **16x16** pixel blocks like the following (the tiles might exceed image borders if height/width is not a multiple of 16):\n\n<img width=\"513\" alt=\"2\" src=\"https://github.com/kwea123/gaussian_splatting_notes/assets/11364490/15a5f829-5608-4d90-93ef-7d0b12d2af79\">\n\nWe also order the tiles in row-major order (left-top is tile 0, the one on its right is 1, etc). The number below the tile number is its tile coordinates.\n\nThen, we compute which tiles each gaussian covers by using the `uv` and `radius` computed above. See the following figure:\n\n## 2. Compute the color of each pixel\n"
  }
]