Many years ago, back when I was in my early teens, I picked up an interest in math. Arguably much more superficial than it is now. My gateway drug to discovering abstract algebra was a YouTube video about the unsolvability of the quintic.

Of course, I didn’t understand shit after watching it. I think that I got lost somewhere midway the video, where the author decided that the best idea is to keep on with the “graphical” and “friendly” explanation of groups by rotating cubes or balls, and explained the rest of the concepts in this insurmountably confusing and pointless way. It must’ve sucked to be me, because every exposition of basic group theory was either that, or needless complexity galore that required much more mathematical maturity than a 13 year old could’ve spared.

When I was procrastinating working, now - give or take - 8 years later, I saw another pointless rehashing of the same topic in the same pointless logical framework. So I thought that maybe I can do a better job.

What is a group?

Suppose that you have a set. Of anything, really - potatoes, dodgeballs, whatever. Actually, who am I lying to? Just take non-negative integers smaller than 4. Call it GG.

Now suppose that you have an action (for potatoes it could be mashing, for dodgeballs - this, for integers - addition modulo 4) that you can perform on any two elements of that set, and that action (call it \cdot - like the product operator) has the following properties:

  • (ab)c=a(bc)(a \cdot b) \cdot c = a \cdot (b \cdot c) for any a,b,cGa, b, c \in G (associativity).
  • There exists an element εG\varepsilon \in G such that for any aGa \in G, εa=aε=a\varepsilon \cdot a = a \cdot \varepsilon = a (identity element).
  • For any aGa \in G, there exists an element bGb \in G such that ab=ba=εa \cdot b = b \cdot a = \varepsilon (inverse element).

From here it should also be clear that ():G×GG(\cdot) : G \times G \to G (closure; i.e. ()(\cdot) maps two elements of GG to another element of GG). Then the pair (G,)(G, \cdot) is called a group.

So in our example, G=0,1,2,3G = {0, 1, 2, 3} and ()(\cdot) is addition modulo 4. The identity element is 0, because adding 0 to any number doesn’t change it. The inverse of 1 is 3, because 1+30mod41 + 3 \equiv 0 \bmod 4; the inverse of 2 is 2, because 2+20mod42 + 2 \equiv 0 \bmod 4; and the inverse of 3 is 1.

So here we discover one of the most straightforward families of groups: integers modulo nn form a group. Formally speaking,

Zn=(0,1,,n1,+modn) \mathbb{Z}_n = ({0, 1, \ldots, n-1}, +_{\bmod n})

is a group for any integer n1n \geq 1.

What is this useful for? You could model a hand of a clock with it, for example. What happens when an analog clock shows 10:35 a.m. and you add 50 minutes to it? It shows 11:25 a.m. In other words, 35+5025mod6035 + 50 \equiv 25 \bmod 60.

Simplest example of a group is when we set G=εG = { \varepsilon }, a set with a single element, and define ()(\cdot) such that εε=ε\varepsilon \cdot \varepsilon = \varepsilon. This group is called the trivial group.

Exercise: It easily follows that (Z,+)(\mathbb{Z}, +) is a group. Is (Z,)(\mathbb{Z}, -) a group? Don’t skim over this. Try to informally prove or disprove it. Would this construction violate any of the group properties?

Bijections and the group of bijections.

A bijection is essentially a function that maps elements from one set to another in a one-to-one manner. In other words, no two elements from the first set map to the same element in the second set, and every element in the second set has a corresponding element in the first set.

For example, we can define a bijection between N\mathbb{N} and Z\mathbb{Z} as follows:

f(n)={n2,if n is even n+12,if n is oddf(n) = \begin{cases} \frac{n}{2}, & \text{if } n \text{ is even} \ -\frac{n+1}{2}, & \text{if } n \text{ is odd} \end{cases}

We see that f:NZf : \mathbb{N} \to \mathbb{Z} is a bijection, because every natural number maps to a unique integer, and every integer has a corresponding natural number. We can also invert this mapping to get f1:ZNf^{-1} : \mathbb{Z} \to \mathbb{N}.

As another example, we can define a bijection that maps natural numbers to even natural numbers as f(n)=2nf(n) = 2n. Detour: It may seem a bit illogical: after all, aren’t there more natural numbers than even natural numbers? Infinite sets are a bit tricky to reason about. In mathematics we would say that both sets have the same cardinality (because there is a bijection between them), but they have a different density (because even natural numbers are less frequent in the set of natural numbers).

Now, let’s take our newfound group powers. The group Bij(X)\text{Bij}(X) is the group of all bijections from a set to itself, with the group operation being function composition. So fBij(X):XXf \in \text{Bij}(X) : X \to X for some set XX, and the group operation \circ is defined as (fg)(x)=f(g(x))(f \circ g)(x) = f(g(x)) for any f,gBij(X)f, g \in \text{Bij}(X) and xXx \in X.

Is this actually a group? Well, go and check: the function composition is associative, the identity function id(x)=x\mathrm{id}(x) = x serves as the identity element, and every bijection has an inverse function that is also a bijection. Furthermore, if f,gf, g are bijections then so is fgf \circ g, so the closure property holds as well.

How to demonstrate this formally, axiom by axiom:

Axiom 1: Notice that for any functions f,g,h:XXf,g,h:X\to X and any xXx\in X, we have: ((fg)h)(x)=(fg)(h(x))=f(g(h(x)))=f((gh)(x))=(f(gh))(x). ((f\circ g)\circ h)(x)=(f\circ g)(h(x))=f(g(h(x)))=f((g\circ h)(x))=(f\circ (g\circ h))(x). . Since the two compositions agree on every xXx\in X, (fg)h=f(gh)(f\circ g)\circ h=f\circ (g\circ h). So (more generally) composition is associative on the set of all functions from XX to XX.

Axiom 2: Define idX:XX\mathrm{id}_X:X\to X by idX(x)=x\mathrm{id}_X(x)=x. This map is bijective. For any bijective ff and any xXx\in X, we have (idXf)(x)=idX(f(x))=f(x),(fidX)(x)=f(idX(x))=f(x). (\mathrm{id}_X\circ f)(x)=\mathrm{id}_X(f(x))=f(x),\quad (f\circ \mathrm{id}_X)(x)=f(\mathrm{id}_X(x))=f(x). Hence (idX)(\mathrm{id}_X) is an identity element.

Axiom 3: Let ff be a bijection. For each yXy \in X there exists a unique xXx \in X with f(x)=yf(x) = y. Define f1:XXf^{-1}: X \to X by “f1(y)=f^{-1}(y) = the unique xx such that f(x)=yf(x) = y”. This is a well-defined function. Then for any xXx \in X, f1(f(x))=xf^{-1}(f(x)) = x, so (f1f)(x)=x(f^{-1} \circ f)(x) = x, i.e. f1f=idXf^{-1} \circ f = \mathrm{id}_X. For any yXy \in X, f(f1(y))=yf(f^{-1}(y)) = y, so (ff1)(y)=y(f \circ f^{-1})(y) = y, i.e. ff1=idXf \circ f^{-1} = \mathrm{id}_X. So f1f^{-1} is bijective (its inverse is ff).

Closure: This is proven by first working forwards (prove that the composition of two bijections is injective) and then backwards (prove that the composition of two bijections is surjective). You can find a trillion proofs of this online.

Homomorphisms

A homomorphism is a “structure-preserving map” between two groups. What a useless definition. Take two groups, (G,)(G, \cdot) and (H,)(H, *). A homomorphism from GG to HH is a function f:GHf : G \to H such that for any a,bGa, b \in G, f(ab)=f(a)f(b)f(a \cdot b) = f(a) * f(b).

Example: ln(xy)=ln(x)+ln(y)\ln(xy) = \ln(x) + \ln(y) shows that the natural logarithm is a homomorphism from the multiplicative group of positive real numbers (R+,)(\mathbb{R}^+, \cdot) to the additive group of real numbers (R,+)(\mathbb{R}, +).

Important: an isomorphism is a bijective homomorphism. In other words, it’s a structure-preserving map between two groups that is also one-to-one and onto. If two structures are isomorphic, we write GHG \cong H.

Example: The groups (R+,)(\mathbb{R}^+, \cdot) and (R,+)(\mathbb{R}, +) are isomorphic via the natural logarithm function ln:R+R\ln : \mathbb{R}^+ \to \mathbb{R}, which is a bijective homomorphism (inverse function is the exponential function exp:RR+\exp : \mathbb{R} \to \mathbb{R}^+).

That’s it.

Subgroups

A subgroup is a subset of a group that is itself a group under the same operation. Formally, if (G,)(G, \cdot) is a group and HGH \subseteq G, then (H,)(H, \cdot) is a subgroup of (G,)(G, \cdot) if:

  • For any a,bHa, b \in H, abHa \cdot b \in H (closure).
  • There exists an element εH\varepsilon \in H such that for any aHa \in H, εa=aε=a\varepsilon \cdot a = a \cdot \varepsilon = a (identity element).
  • For any aHa \in H, there exists an element bHb \in H such that ab=ba=εa \cdot b = b \cdot a = \varepsilon (inverse element).
  • For any a,b,cHa, b, c \in H, (ab)c=a(bc)(a \cdot b) \cdot c = a \cdot (b \cdot c) (associativity).

This is tricky, because we have to maintain closure, identity, inverses, and associativity.

Example: Consider the group (Z,+)(\mathbb{Z}, +). The set of even integers 2Z=,4,2,0,2,4,2\mathbb{Z} = {\ldots, -4, -2, 0, 2, 4, \ldots} is a subgroup of (Z,+)(\mathbb{Z}, +) because:

  • The sum of any two even integers is an even integer (closure).
  • The identity element 0 is an even integer (identity element).
  • The inverse of any even integer is also an even integer (inverse element).
  • Addition is associative for all integers, including even integers (associativity).

Cayley’s theorem.

Every group (G,)(G,\cdot) is isomorphic to a subgroup of Bij(G)\mathrm{Bij}(G), the group of bijections of GG under composition.

We constructively demonstrate this below. For each aGa \in G, define left multiplication by aa as the map La:GG,La(x)=ax. L_a : G \to G,\quad L_a(x)=a\cdot x .

Now:

  1. Each LaL_a is a bijection: its inverse is La1L_{a^{-1}}.
  2. Composition matches multiplication: LaLb=Lab. L_a\circ L_b = L_{a\cdot b}.
  3. The map Φ:GBij(G)\Phi:G\to\mathrm{Bij}(G) given by Φ(a)=La\Phi(a)=L_a is a homomorphism.
  4. Φ\Phi is injective, since La(ε)=aL_a(\varepsilon)=a.

Hence: Φ(G)\Phi(G) is a subgroup of Bij(G)\mathrm{Bij}(G), and GΦ(G)G\cong \Phi(G). Thus every group can be realized as a group of bijections.

Just as a simple example: Cayley’s theorem says that ZnΦ(Zn)\mathbb{Z}_n \cong \Phi(\mathbb{Z}_n) for Φ(a)=La\Phi(a) = L_a where La(x)=a+modnxL_a(x) = a +_{\bmod n} x. Notice that Φ\Phi here is not our invention: The theorem demands we take each group element and turn it into a bijection of the set Zn\mathbb{Z}_n onto itself by “left multiplication” (here, addition modulo nn). So each element k of the group becomes the function “add k”. Every group element is doing something to a set, and group multiplication is just doing those things one after another.

The image is the set of functions Ψ(Zn)=,xx+k(modn)kZn,.\Psi(\mathbb Z_n) = {,x\mapsto x+k \pmod n \mid k\in\mathbb Z_n,}.

This is a subset of Bij(Zn)\mathrm{Bij}(\mathbb Z_n), and it is closed under composition: (xx+a)(xx+b)=(xx+a+b). (x\mapsto x+a)\circ(x\mapsto x+b) = (x\mapsto x+a+b).

This is about the right time to explain symmetric groups: the symmetric group on a set XX, denoted Sym(n)\textrm{Sym}(n) typically taking X=1,2,,nX = {1, 2, \ldots, n}, is the group of all bijections from XX to itself under composition. The order (number of contained elements) of the symmetric group Sym(n)\textrm{Sym}(n) is n!n!, which counts the number of ways to arrange nn distinct elements.

See the link already? An alternative concrete rephrasing of Cayley’s theorem is that every (finite) group of order nn is isomorphic to a subgroup of the symmetric group Sym(n)\textrm{Sym}(n).

Cosets

Suppose that (G,)(G, \cdot) is a group and (H,)(H, \cdot) is a subgroup of (G,)(G, \cdot) (stated tersely as HGH \le G). For any element gGg \in G, the left coset of HH in GG with respect to gg is the set gH=gh:hHgH = { g \cdot h : h \in H }. Similarly, the right coset of HH in GG with respect to gg is the set Hg=hg:hHHg = { h \cdot g : h \in H }.

Basic facts:

  • Every element of GG belongs to some left-coset.
  • Two left-cosets are either disjoint or identical.
  • All left-cosets of HH in GG have the same number of elements as HH.

Cosets, vaguely speaking, are useful for partitioning groups. Given a group GG and a subgroup HGH \le G, we let:

G/H=gH:gG G / H = { gH : g \in G }

be the set of all left-cosets of HH in GG. Now suppose that we wanted to turn this set into a group. For this purpose, we want to define:

(gH)(kH)=(gk)H (gH) \cdot (kH) = (g \cdot k)H

However, this definition is only valid if it doesn’t depend on the choice of representatives gg and kk. In other words, if gH=gHgH = g’H and kH=kHkH = k’H, we need to ensure that (gk)H=(gk)H(g \cdot k)H = (g’ \cdot k’)H. From g1H=g2Hg_1H = g_2H we get g21g1Hg_2^{-1}g_1 \in H, and from k1H=k2Hk_1H = k_2H we get k21k1Hk_2^{-1}k_1 \in H. Thus, for the operation to be well-defined, we need:

(g2g2)1(g1g1)=(g2)1(g21g1)g1H (g_2 g_2’)^{-1} (g_1 g_1’) = (g_2’)^{-1} (g_2^{-1} g_1) g_1’ \in H

And thus we see g21g1Hg_2^{-1} g_1 \in H, but it is conjugated by g1g_1’:

(g2)1Hg1 (g_2’)^{-1} H g_1'

For this product to be in HH, we need HH to be invariant under conjugation by any element of GG, i.e. gHg1=HgHg^{-1} = H for all gGg \in G. Such subgroups are called normal subgroups (denoted HGH \triangleleft G).

So now if we augment G/N=gN:gGG/N = { gN : g \in G } with the operation (gN)(kN)=(gk)N(gN)(kN) = (gk)N, we get a group called the quotient group.

This is a well-defined, pretty regular, group:

  • Identity: NN
  • Inverse: (gN)1=g1N(gN)^{-1} = g^{-1}N
  • Associativity inherited from GG

This construction has a very special purpose: it is the only way groups can be simplified without losing structure.

First Isomorphism Theorem

Suppose that f:GHf : G \to H is a homomorphism between two groups (G,)(G, \cdot) and (H,)(H, *). The kernel of ff is the set ker(f)=gG:f(g)=εH\ker(f) = { g \in G : f(g) = \varepsilon_H }, where εH\varepsilon_H is the identity element of HH. The image of ff is the set im(f)=hH:h=f(g) for some gG\mathrm{im}(f) = { h \in H : h = f(g) \text{ for some } g \in G }.

As a simple example of a kernel, consider the homomorphism f:(Z,+)(Zn,+modn)f : (\mathbb{Z}, +) \to (\mathbb{Z}_n, +_{\bmod n}) defined by f(k)=kmodnf(k) = k \bmod n. The kernel of this homomorphism is the set of all integers that are multiples of nn, i.e. ker(f)=nZ=,2n,n,0,n,2n,\ker(f) = n\mathbb{Z} = {\ldots, -2n, -n, 0, n, 2n, \ldots} - because (informally) these are the integers that map to 0 in Zn\mathbb{Z}_n.

We know for a fact that the kernel of a homomorphism is always a normal subgroup of the domain group. In our example, nZn\mathbb{Z} is indeed a normal subgroup of (Z,+)(\mathbb{Z}, +). Furthermore, ff is injective if and only if ker(f)=εG\ker(f) = {\varepsilon_G}, where εG\varepsilon_G is the identity element of GG.

As an example of an image, consider the same homomorphism f:(Z,+)(Zn,+modn)f : (\mathbb{Z}, +) \to (\mathbb{Z}_n, +_{\bmod n}). The image of this homomorphism is the entire group Zn\mathbb{Z}_n, i.e. im(f)=Zn\mathrm{im}(f) = \mathbb{Z}_n - because every element in Zn\mathbb{Z}_n can be obtained by taking some integer modulo nn.

The structural theorem (or the First Isomorphism Theorem) states that if f:GHf : G \to H is a homomorphism, then the quotient group G/ker(f)G / \ker(f) is isomorphic to the image of ff, i.e. G/ker(f)im(f)G / \ker(f) \cong \mathrm{im}(f).

Constuctively we can define a map φ:G/ker(f)im(f)\varphi : G / \ker(f) \to \mathrm{im}(f) by φ(gker(f))=f(g)\varphi(g \ker(f)) = f(g). This map is well-defined, because if gker(f)=gker(f)g \ker(f) = g’ \ker(f), then g1gker(f)g’^{-1} g \in \ker(f), which implies that f(g1g)=εHf(g’^{-1} g) = \varepsilon_H, and thus f(g)=f(g)f(g) = f(g’). The map φ\varphi is a homomorphism, because: φ((gker(f))(kker(f)))=φ((gk)ker(f))=f(gk)=f(g)f(k)=φ(gker(f))φ(kker(f)).\varphi((g \ker(f))(k \ker(f))) = \varphi((gk) \ker(f)) = f(gk) = f(g) * f(k) = \varphi(g \ker(f)) * \varphi(k \ker(f)). The map φ\varphi is surjective by definition of the image, and it is injective because if φ(gker(f))=εH\varphi(g \ker(f)) = \varepsilon_H, then f(g)=εHf(g) = \varepsilon_H, which implies that gker(f)g \in \ker(f), and thus gker(f)=ker(f)g \ker(f) = \ker(f).

Intuitively: A homomorphism collapses elements of G that differ by elements of the kernel. Once you factor out exactly this redundancy, what remains is structurally identical to the image.

Going back to our example with f:(Z,+)(Zn,+modn)f : (\mathbb{Z}, +) \to (\mathbb{Z}_n, +_{\bmod n}), we have ker(f)=nZ\ker(f) = n\mathbb{Z} and im(f)=Zn\mathrm{im}(f) = \mathbb{Z}_n. According to the First Isomorphism Theorem, we have: Z/nZZn.\mathbb{Z} / n\mathbb{Z} \cong \mathbb{Z}_n. This means that the quotient group Z/nZ\mathbb{Z} / n\mathbb{Z} is structurally identical to the group Zn\mathbb{Z}_n, and we can give an explicit isomorphism between them: φ:Z/nZZn,φ(k+nZ)=kmodn.\varphi : \mathbb{Z} / n\mathbb{Z} \to \mathbb{Z}_n, \quad \varphi(k + n\mathbb{Z}) = k \bmod n.

It’s useful to visualize the algebraic structures at hand a little bit:

  • nZn\mathbb{Z} is the subgroup of Z\mathbb{Z} consisting of all multiples of nn, i.e. nZ=,2n,n,0,n,2n,=nkkZn\mathbb{Z} = {\ldots, -2n, -n, 0, n, 2n, \ldots} = { nk\mid k \in \mathbb{Z} }.
  • The quotient group Z/nZ\mathbb{Z} / n\mathbb{Z} consists of the cosets of nZn\mathbb{Z} in Z\mathbb{Z}, i.e. Z/nZ=k+nZ:kZ\mathbb{Z} / n\mathbb{Z} = { k + n\mathbb{Z} : k \in \mathbb{Z} }. There are exactly nn distinct cosets, which can be represented by the integers 0,1,,n10, 1, \ldots, n-1.

In Z/nZ\mathbb{Z} / n\mathbb{Z}, an element is not a single integer, but rather a set of integers that differ by multiples of nn. For example, the coset 2+nZ2 + n\mathbb{Z} includes all integers of the form 2+kn2 + kn for any integer kk. Addition is defined as: (a+nZ)+(b+nZ)=(a+b)+nZ.(a + n\mathbb{Z}) + (b + n\mathbb{Z}) = (a + b) + n\mathbb{Z}. Making it a group (check the propeties!).

Tying the knot.

A group GG is simple if it has no normal subgroups other than the trivial group ε{\varepsilon} and itself GG. Simple groups are like the prime numbers of group theory - they cannot be broken down into smaller, non-trivial normal subgroups.

The factorisation lets us analyse complex groups by breaking them down into simpler components. This is the essence of the Jordan-Hölder theorem. A composition series of a finite group GG is a sequence of subgroups:

ε=G0G1G2Gn=G { \varepsilon } = G_0 \triangleleft G_1 \triangleleft G_2 \triangleleft \ldots \triangleleft G_n = G

where each GiG_{i} is a normal subgroup of Gi+1G_{i+1}, and the quotient groups Gi+1/GiG_{i+1} / G_i are simple groups. The Jordan-Hölder theorem states that any two composition series of a finite group have the same length and the same simple quotient groups, up to isomorphism and order.

Detour: Applications of the First Isomorphism Theorem

An integer qq is called a quadratic residue modulo nNn \in \mathbb{N} if it is congruent to a perfect square modulo nn; in other words, if there exists xZx \in \mathbb{Z} such that x2qmodnx^2 \equiv q \bmod n.

When the number p>2p > 2 is prime, it has (p1)/2(p-1)/2 quadratic residues. This is a pretty elementary result in number theory, but we can also prove it elegantly using the First Isomorphism Theorem:

Let Zp\mathbb{Z}_p be the group of integers modulo pp under addition, and let Zp\mathbb{Z}_{p}^* be the group of invertible elements of Zp\mathbb{Z}_p under integer multiplication (i.e. Zp=1,2,,p1\mathbb{Z}_p^* = {1, 2, \ldots, p-1}). Define the group homomorphism f:ZpZpf : \mathbb{Z}_p^* \to \mathbb{Z}_p^* by f(x)=x2modpf(x) = x^2 \bmod p. The kernel of ff is the set of all xZpx \in \mathbb{Z}_p^* such that x21modpx^2 \equiv 1 \bmod p, which is 1,1{1, -1} (or 1,p1{1, p-1}; both follow from the fact that pp is prime).

By the First Isomorphism Theorem, we have: Zp/ker(f)im(f).\mathbb{Z}_p^* / \ker(f) \cong \mathrm{im}(f). Hence: im(f)=Zp/ker(f)=(p1)/2.|\mathrm{im}(f)| = |\mathbb{Z}_p^*| / |\ker(f)| = (p-1) / 2.

Which concludes the proof.

It’s quite magical, so anticipate to sit with this for a while. This theorem lets us prove that if nn is prime and λ<1/2\lambda < 1/2, then quadratic probing will always find a vacant bucket, and furthermore, no buckets will be checked twice.

Classification of finite groups, algebraically.

A finite simple group SS is isomorphic to exactly one of the following:

  1. A cyclic group of prime order, i.e. SZpS \cong \mathbb{Z}_p for some prime pp.
  2. An alternating group AnA_n for some n5n \geq 5.
  3. A group of Lie type (finite groups of Fq\mathbb{F}_q-rational points of simple algebraic groups over finite fields Fq\mathbb{F}_q, modulo center plus twisted forms):
  • Classical groups: PSLn(q)\mathrm{PSL}_n(q), PSp2n(q)\mathrm{PSp}_{2n}(q), PSUn(q)\mathrm{PSU}_n(q), PΩn±(q)\mathrm{P\Omega}_n^{\pm}(q).
  • Exceptional and twisted groups: G2(q)G_2(q), F4(q)F_4(q), E6(q)E_6(q), E7(q)E_7(q), E8(q)E_8(q), 2E6(q){}^2E_6(q), 3D4(q){}^3D_4(q), 2B2(q){}^2B_2(q), 2G2(q){}^2G_2(q), 2F4(q){}^2F_4(q).
  1. One of the 26 sporadic groups.

Every group is some gluing and extension of these (this is actually a pretty impressive result!).

Alternating groups.

Not really super related to our topic, but alternating groups kind of interesting. Recall that Sym(n)\mathrm{Sym}(n) is the group of all bijections of the set 1,2,,n{1,2,\dots,n}, with group operation given by composition.

There exists a surjective group homomorphism Ψ:Sym(n)Z2\Psi : \mathrm{Sym}(n) \longrightarrow \mathbb{Z}_2 whose kernel has index 2 (the kernel is half the size of Sym(n)\mathrm{Sym}(n)). We do not need to construct Ψ\Psi explicitly; its existence is a theorem.

We define the alternating group AnA_n to be exactly this kernel:

An:=ker(Ψ) A_n := \ker(\Psi)

By general group theory we notice that:

  • AnSym(n)A_n \triangleleft \mathrm{Sym}(n) (normal subgroup).
  • [Sym(n):An]=2[\mathrm{Sym}(n) : A_n] = 2 (index 2).
  • An=n!/2|A_n| = n! / 2 (order).
  • Finally, per the First Isomorphism Theorem, we have Sym(n)/AnZ2 \mathrm{Sym}(n) / A_n \cong \mathbb{Z}_2 - important!

The unremarkable cases are as follows:

A1A_1 is the trivial group; we see that definitionally Sym(1)=ε\mathrm{Sym}(1) = {\varepsilon}, so A1=εA_1 = {\varepsilon}.

A2A_2 is also the trivial group; Sym(2)Z2\mathrm{Sym}(2) \cong \mathbb{Z}_2, so A2=ker(Ψ)=εA_2 = \ker(\Psi) = {\varepsilon}.

A3A_3 is isomorphic to Z3\mathbb{Z}_3; Sym(3)\mathrm{Sym}(3) has 6 elements, so A3A_3 has 3 elements. It can be verified that A3Z3A_3 \cong \mathbb{Z}_3.

To explain A4A_4, we must first expose Lagrange’s theorem and Cauchy’s theorem. Lagrange’s theorem states that for any finite group GG and any subgroup HGH \le G, the order of HH divides the order of GG. In other words, G=[G:H]H|G| = [G : H] \cdot |H|, where [G:H][G : H] is the index of HH in GG (the number of distinct left cosets of HH in GG).

The proof of Lagrange’s theorem is straightforward: the left cosets of HH in GG partition the group GG into disjoint subsets, each of size H|H|. Therefore, the total number of elements in GG is equal to the number of left cosets multiplied by the size of each coset, which gives us G=[G:H]H|G| = [G : H] \cdot |H|. From this equation, it follows that H|H| divides G|G|.

Cauchy’s theorem states that if GG is a finite group and pp is a prime number that divides the order of GG, then GG contains an element of order pp. Consequently, GG also contains a subgroup of order pp. We will adjourn the proof of this for later, because it’s a little involved.

Now, A4A_4 has 12 elements. By Lagrange’s theorem, the possible orders of subgroups of A4A_4 are 1, 2, 3, 4, 6, and 12.

To decide which of these orders actually occur, we will use Cauchy’s theorem: 2122 \mid 12, so there exists xA4x \in A_4 such that x2=εx^2 = \varepsilon. Pick xx as such element. Then, ε,xA4{\varepsilon, x} \le A_4 is such a subgroup of order 2.

If the group had one element of order 2, then the subgroup would be normal and the quotient would have order 6, which is impossible because A4Sym(4)A_4 \le \mathrm{Sym}(4). If there were more than three such elements, then their pairwise disjointness would imply that A4A_4 has more than 12 elements. Therefore, there are exactly three elements of order 2 in A4A_4, which we can denote as xx, yy, and zz.

Now we can form the subgroup: H=ε,x,y,zA4H = {\varepsilon, x, y, z} \le A_4 where x,y,zx, y, z are the three elements of order 2. Inverses are automatic, closure holds because the product of any two elements of order 2 is the third element of order 2 (e.g., xy=zxy = z), and the identity element is included; further the associativity is inherited from A4A_4.

As a direct result, we see that the quotient group A4/HA_4 / H has order 3. We can prove that Sym(A4/H)Sym(3)\mathrm Sym(A_4 / H) \cong \mathrm Sym(3) (exercise for the reader). Taking the kernel of the action K=ker(Φ:A4Sym(A4/H))K = \ker(\Phi : A_4 \to \mathrm{Sym}(A_4 / H)), we see that KA4K \triangleleft A_4 and [A4:K]6[A_4 : K] \mid 6 (Lagrange’s theorem). Since [A4:H]=3[A_4 : H] = 3, then A4=3H|A_4| = 3|H|. Since KHK \subseteq H, [A4:K]=[A4:H][H:K]=3[H:K][A_4 : K] = [A_4 : H][H : K] = 3[H : K], so [A4:K][A_4 : K] is either 3 or 6. If it were 6, then this would give a subgroup of order 2 in Sym(3)\mathrm{Sym}(3) arising as a quotient of HH, which is impossible because the image of HH under the action must fix a coset. Hence [A4:K]=3[A_4 : K] = 3, so [H:K]=1[H : K] = 1 and therefore K=HK = H. Since ker(Φ)\mathrm{ker}(\Phi) is normal in A4A_4, we have shown that HA4H \triangleleft A_4.

This is enough to conclude via the first isomorphism theorem that A4/HZ3A_4 / H \cong \mathbb{Z}_3, giving us a non-trivial normal subgroup HH of A4A_4 that proves that A4A_4 is not simple.

As a side note, HH here is the Klein four-group, often denoted V4V_4 or just VV. It is isomorphic to the direct product Z2×Z2\mathbb{Z}_2 \times \mathbb{Z}_2.

Starting with A5A_5, the alternating groups become simple. One stanard presentation is as follows:

A5x,yx2=y3=(xy)5=ε A_5 \cong \langle x, y \mid x^2 = y^3 = (xy)^5 = \varepsilon \rangle

This means that A5A_5 is generated by two elements xx and yy with the relations x2=εx^2 = \varepsilon, y3=εy^3 = \varepsilon, and (xy)5=ε(xy)^5 = \varepsilon.

Via Cayley’s theorem we let G=A5G = A_5 act on itself by left-multiplication. This gives a homomorphism Φ:A5Sym(A5)\Phi : A_5 \to \mathrm{Sym}(A_5) defined by Φ(g)(h)=gh\Phi(g)(h) = gh for all g,hA5g, h \in A_5. Since A5A_5 has 60 elements, Sym(A5)\mathrm{Sym}(A_5) is isomorphic to Sym(60)\mathrm{Sym}(60).

The characterisaton of these groups for n5n \geq 5 is hard as heck, and I don’t know how to do this myself. So we will just leave it at that: for n5n \geq 5, the alternating group AnA_n is simple.

Detour: Burnside’s lemma

The amount of binary sequences of length n1n \geq 1 distinct under cyclic shift is given by:

a(n)=1nk=1n12gcd(n,k) a(n) = \frac{1}{n} \sum_{k=1}^{n-1} 2^{\gcd(n,k)}

This is super useful when analysing cyclic redundancy codes, like CRC-32 (who knew?).

A cyclic shift is a permutation of the vector x=(x0,x1,,xn1)x = (x_0, x_1, \ldots, x_{n-1}) to the vector (xn1,x0,x1,,xn2)(x_{n-1}, x_0, x_1, \ldots, x_{n-2}). Cyclic codes (parent family of CRC-32 and others) are defined in such a way that the code is invariant under cyclic shifts, i.e. if xx is a codeword, then so is any cyclic shift of xx. We will avoid talking too much about that because it requires leaping to a different river of a topic (polynomial rings over finite fields).

Anyway, how come this formula works? A group GG is cyclic if there exists an element gGg \in G such that G=g=gk:kZ. G = \langle g \rangle = { g^k : k \in \mathbb{Z} }. The cyclic group of order nn is Cn=r=ε,r,r2,,rn1,rn=ε. C_n = \langle r \rangle = { \varepsilon, r, r^2, \dots, r^{n-1} }, \qquad r^n = \varepsilon. Further, let the group GG act on a set XX. For any element xx, the orbit of xx is the subset: Orb(x)=gx:gGX. \mathrm{Orb}(x) = { g \cdot x : g \in G } \subseteq X. In other words: the orbit of xx is the set of all elements of XX that can be reached by applying elements of GG to xx.

Let X=0,1nX = {0,1}^n be the set of binary strings of length nn. Define an action of CnC_n on XX by letting the generator rr act as a one-step cyclic rotation: r(x0x1xn1)=(xn1x0x1xn2). r \cdot (x_0 x_1 \dots x_{n-1}) = (x_{n-1} x_0 x_1 \dots x_{n-2}). Then rkr^k acts as rotation by kk positions, and rn=εr^n = \varepsilon acts trivially.

Two strings are equivalent under cyclic shift if and only if they lie in the same orbit of this action. Hence the number of distinct binary sequences under cyclic shift is a(n)=X/Cn a(n) = |X / C_n| the number of orbits (binary necklaces).

Burnside’s lemma says that for a finite group action GXG \to X, X/G=1GgGFix(g), |X/G| = \frac{1}{|G|} \sum_{g \in G} |\mathrm{Fix}(g)|, where Fix(g)=xX:gx=x. \mathrm{Fix}(g) = { x \in X : g \cdot x = x }.

Applying this to CnC_n over XX, we obtain a(n)=1nk=0n1Fix(rk). a(n) = \frac{1}{n} \sum_{k=0}^{n-1} |\mathrm{Fix}(r^k)|.

Thus it remains to compute Fix(rk)|\mathrm{Fix}(r^k)|.

Fix k0,1,,n1k \in {0,1,\dots,n-1}. A string x=(x0,x1,,xn1) x = (x_0, x_1, \dots, x_{n-1}) is fixed by rkr^k if and only if xi=xi+kmodnfor all i. x_i = x_{i+k \bmod n} \quad \text{for all } i.

Thus the indices 0,1,,n1{0,1,\dots,n-1} are partitioned into cycles under the permutation ii+k(modn), i \longmapsto i + k \pmod n, and the string must be constant on each cycle.

Let d=gcd(n,k)d = \gcd(n,k). Consider the cycle containing 00: 0,,k,,2k,,3k,,(modn). 0,, k,, 2k,, 3k,, \dots \pmod n. Its length is the smallest t>0t>0 such that tk0(modn). tk \equiv 0 \pmod n.

Write n=dnn = dn’, k=dkk = dk’ with gcd(n,k)=1\gcd(n’,k’) = 1. Then tk0(modn)    tdk0(moddn)    tk0(modn). tk \equiv 0 \pmod n \iff tdk’ \equiv 0 \pmod{dn’} \iff tk’ \equiv 0 \pmod{n’}. Since gcd(k,n)=1\gcd(k’,n’) = 1, this holds if and only if ntn’ \mid t. Hence the minimal such tt is t=n=nd. t = n’ = \frac{n}{d}.

Therefore:

  • each cycle has length n/dn/d,
  • the total number of cycles is nn/d=d=gcd(n,k). \frac{n}{n/d} = d = \gcd(n,k).

On each cycle, all bits must agree, but different cycles may be chosen independently. Hence Fix(rk)=2gcd(n,k). |\mathrm{Fix}(r^k)| = 2^{\gcd(n,k)}.

Finally we substitute into Burnside’s lemma: a(n)=1nk=0n12gcd(n,k). a(n) = \frac{1}{n} \sum_{k=0}^{n-1} 2^{\gcd(n,k)}.

This completes the proof.

Burnside’s lemma (also seldom called Cauchy-Frobenius lemma) follows by noticing the following (as givne by Ben Lynn in Polya Theory):

Let the orbits X1,X2,,XkX_1, X_2, \ldots, X_k be the partition of XX under the action of GG. Observe that the resulting sets FixXk(g)\mathrm{Fix}_{X_k}(g) for knk \le n also partition Fix(g)\mathrm{Fix}(g). As such, we have:

gGFix(g)=gGi=1kFixXi(g) =(g,x)gG,xFixXi(g) =i=1kgGGx \begin{aligned} \sum_{g \in G} |\mathrm{Fix}(g)| &= \sum_{g \in G} \sum_{i=1}^k |\mathrm{Fix}_{X_i}(g)| \ &= |{(g,x)\mid g \in G, x \in \mathrm{Fix}_{X_i}(g)}| \ &= \sum_{i=1}^k \sum_{g \in G} |G_x| \end{aligned}

By the orbit-stabiliser theorem and Lagrange’s theorem, we have G=GxOrb(x)|G| = |G_x| \cdot |\mathrm{Orb}(x)|, and hence the result follows.

Extra reading:

  • Cauchy’s theorem (partial converse to Lagrange’s theorem)
  • Sylow theorems (existence of subgroups of order pkp^k)
  • Orbit-stabilizer theorem; isotropy groups (here GxG_x).

Solvable groups

It’s a little late to introduce this, but an Abelian group is a special type of a group where the group operation is commutative, i.e. for any a,bGa, b \in G, ab=baa \cdot b = b \cdot a. That’s just an additional axiom on top of the group axioms.

We call a finite group GG solvable if there exists a finite sequence of subgroups: ε=G0G1G2Gn=G{ \varepsilon } = G_0 \triangleleft G_1 \triangleleft G_2 \triangleleft \ldots \triangleleft G_n = G such that each GiG_{i} is a normal subgroup of Gi+1G_{i+1}, and the quotient groups Gi+1/GiG_{i+1} / G_i are Abelian groups. Equivalently, all (simple) composition factors of GG are cyclic groups of prime order.

There’s two immediate facts that we will state without proof:

  • If GG is solvable, then every subgroup and every quotient group of GG is also solvable.
  • If NGN \triangleleft G is a normal subgroup such that both NN and G/NG/N are solvable, then GG is also solvable.

Part 2

Fields, automorphisms, Galois groups, extensions, fundamental theorem of Galois theory, solvability by radicals. One day.