Michael Shulman categories of variables

Categories of variables

Categories of variables

Categories of variables

Definition

A category of variables consists of:

  • A full subcategory 𝔽\mathbb{F} of FinSet,
  • 𝔽\emptyset\in\mathbb{F}.
  • For each V𝔽V\in \mathbb{F}, an inhabited set of fresh extensions, each of which is an injection i:VWi:V\hookrightarrow W in 𝔽\mathbb{F} whose complement has cardinality 11 (“an inclusion of cocardinality 1”). Given a fresh extension i:VWi:V\to W, we will refer to the unique element xWi(V)x\in W\setminus i(V) as the fresh name.
  • For each function σ:VU\sigma:V\to U in 𝔽\mathbb{F} and fresh extension i:VWi:V\to W, a specified pushout, called the fresh pushout, consisting of a function τ:WZ\tau:W\to Z and a fresh extension j:UZj:U\to Z.
  • For any injection σ:UV\sigma:U\hookrightarrow V in 𝔽\mathbb{F} of cocardinality 1 (not necessarily a fresh extension), and any fresh extension i:VWi:V\to W, a specified pushout complement, called the fresh pushout complement, consisting of a fresh extension j:UZj:U\to Z and a function τ:ZW\tau:Z\to W (necessarily also an injection of cocardinality 1).

Remarks

From the second and third axioms, it follows by mathematical induction that 𝔽\mathbb{F} contains at least one set of each finite cardinality; thus the inclusion functor 𝔽FinSet\mathbb{F}\hookrightarrow FinSet is an equivalence of categories.

The intuition is that each object of 𝔽\mathbb{F} is a possible “set of free variables” that may occur in a term. In a moment we will define the set of “terms with free variables in VV” for any V𝔽V\in \mathbb{F}. The fresh extensions are the allowable ways to “add a new free variable”, such as when extending a context or passing under a binder. In particular, this means that in practice the only objects of 𝔽\mathbb{F} that will be relevant are those obtainable from \emptyset by iterated fresh extension.

Fresh pushouts tell us how to transform “new free variables” under structural rules such as weakening, exchange, and contraction applied to the previous free variables. Similarly, fresh pushout complements tell us how to transform “new free variables” under substitution for a previously free variable. This will become more clear later. Note that just as FinSetFinSet has all pushouts, it also has all pushout complements of (complemented) injections; thus the content of the fresh pushouts and fresh pushout complements is not that they exist, but that they are specified and that the morphism across from the given fresh extension is also a fresh extension.

Examples

First we have the tautological example.

Example

Let 𝔽=FinSet\mathbb{F}=FinSet, let every injection with cocardinality 1 be a fresh extension, and choose pushouts and pushout complements arbitrarily.

Now a couple of examples using “named variables”. For both of them, let NameName be any infinite set (the “variable names”) equipped with an operation fr:P fin(Name)×NameNamefr : P_{fin}(Name) \times Name \to Name such that fr(V,x)xfr(V,x) \neq x and fr(V,x)Vfr(V,x)\notin V, i.e. a way to “freshen” a given variable name with respect to a given finite set of existing variables.

Example

Let 𝔽\mathbb{F} consist of all finite subsets of NameName. The fresh extensions are the literal subset inclusions (not mere injections) with cocardinality 1.

This category of variables roughly represents the use of named variables with a “local Barendregt convention” that the name of any bound variable differs from all other free variables in scope at the point of binding (though distinct bound variables with non-overlapping scopes might still have the same name). The issue in this approach is to ensure that the convention is maintained under weakening: adding a new free variable might force us to rename a bound variable to avoid conflict.

In our abstract setting, this renaming is accomplished by the definition of fresh pushout. Given a fresh extension i:VWi:V\to W with fresh name xx and a function σ:VU\sigma:V\to U, if xUx\notin U then the fresh pushout has vertex U{x}U\cup \{x\}, while if xUx\in U then it has vertex U{fr(U,x)}U \cup \{fr(U,x)\}. The definition of fresh pushout complement is similar.

Example

Let 𝔽\mathbb{F} consist again of all finite subsets of NameName, but now let the fresh extensions of VV consist of the subset inclusions with cocardinality 1 together with the maps of the form

V=(V{x}){x}(V{x}){fr(V{x},x)}(V{x}){fr(V{x},x)}{x}V = (V\setminus \{x\}) \cup \{x\} \cong (V\setminus \{x\}) \cup \{fr(V\setminus\{x\},x)\} \hookrightarrow (V\setminus \{x\}) \cup \{fr(V\setminus\{x\},x)\} \cup \{x\}

for any xVx\in V.

This category of variables roughly represents a different way of using of named variables: a bound variable is allowed to have the same name as a variable already in the context, but instead of literally “shadowing” the latter, the latter is given a modified name (such as x 1x_1 or outer.xouter.x) only in the scope of the new binder. The fresh extensions that are not subset inclusions “carry along the information” about this renaming, so that in particular when substituting inside the scope of that binder any references to the outer free variable can automatically be suitably renamed (to avoid variable capture).

In this example, given a fresh extension i:VWi:V\to W with fresh name xx and a function σ:VU\sigma:V\to U, if xUx\notin U then the new fresh extension is the subset inclusion UU{x}U\hookrightarrow U\cup \{x\}, while if xUx\in U then it is of the other form determined by UU and xx. The definition of fresh pushout complements is similar.

Inspecting these examples and comparing them to the general definition, we can conclude that in general a fresh extension can include information about how both a variable and a given finite set of variables must be renamed in order to make the former fresh for the latter. The same point is made by the de Bruijn examples:

Example

Let 𝔽\mathbb{F} consist of all sets [n]={1,,n}[n] = \{1,\dots,n\}, with the unique fresh extension of [n][n] being (λm.m+1):[n][n+1](\lambda m.m+1) : [n] \to [n+1] (so the fresh name is always {1}\{1\}). Since every object has a unique fresh extension, fresh pushouts and fresh pushout complements are uniquely determined. This corresponds to de Bruijn indices: newly bound variables are denoted 11, with all existing variable numbers incremented. The fresh extensions (λm.m+1)(\lambda m.m+1) again carry along the information of how free variables in terms being substituted must be modified when passing under a binder.

Example

Let 𝔽\mathbb{F} consist again of all sets [n]={1,,n}[n] = \{1,\dots,n\}, but now the unique fresh extension of [n][n] is the inclusion [n][n+1][n] \hookrightarrow [n+1] (for which the fresh name is {n+1}\{n+1\}). Again, since every object has a unique fresh extension, fresh pushouts and fresh pushout complements are uniquely determined. This corresponds to “de Bruijn levels”.

Iterated fresh extensions

We refer to a composable sequence of fresh extensions Vi 1V 1i nV nV \xrightarrow{i^1} V^1 \to \cdots \xrightarrow{i^n} V^n as an nn-ary fresh extension i:VV ni:V\to V^n. (Note that an nn-ary fresh extension ii is by definition the list (i 1,,i n)(i^1,\dots,i^n), not its composite (i ni 1):VV n(i^n\circ\cdots\circ i^1) :V\to V^n, although we will sometimes treat i(i ni 1)i \mapsto (i^n\circ\cdots\circ i^1) as an implicit coercion.) Since pushout squares compose, we can say that every nn-ary fresh extension has a fresh pushout along any function which is again an nn-ary fresh extension, and similarly any nn-ary fresh extension has a fresh pushout complement relative to any injection of cocardinality 1 which is again an nn-ary fresh extension.

Basic definitions

Sorts and operators

We will work at the generality of the abstract binding trees of Harper (Practical foundations of programming languages). Thus, we suppose given a set SortSort of sorts, and also a set OpOp of operators. Moreover, each operator is assigned a signature or generalized arity of the following form:

((s 1 1,,s 1 n 1;s 1),,(s m 1,,s m n m;s m);s) ((s_{1}^{1},\dots,s_{1}^{n_1}; s_1),\dots, (s_{m}^{1},\dots,s_{m}^{n_m}; s_m); s)

where each s,s k,s k ls,s_k,s_{k}^{l} is a sort. The meaning is that such a operator is of sort ss and takes mm arguments, where the k thk^{th} argument binds n kn_k arguments with sorts s k 1,,s k n is_{k}^{1},\dots, s_{k}^{n_i} and has sort s ks_k. Thus we would write an application of it as

C(x 1 1x 1 n 1.M 1,,x m 1x m n m.M m) \mathtt{C}(x_{1}^{1}\dots x_{1}^{n_1}. M_1, \dots , x_{m}^{1}\dots x_{m}^{n_m}. M_m)

Put differently, if we regard the sorts as the base types (zeroth-order types) in a simply typed lambda-calculus, then each signature is a second-order type: an iterated function type whose arguments are themselves iterated function types whose arguments are base types.

For instance, to represent the raw syntax of dependent type theory (with fully annotated terms), there would be two sorts tmtm and tyty, with operators such as Π:((;ty),(tm;ty);ty)\Pi : ((;ty),(tm;ty);ty) and λ:((;ty),(tm;ty);(tm;tm);tm)\lambda : ((;ty),(tm;ty);(tm;tm);tm).

Abstract binding trees

A sort context EE consists of an object V𝔽V\in \mathbb{F} and a function E:VSortE:V\to Sort. We define by mutual induction sets Tm E(s)Tm_E(s) of terms, for a sort context EE and a sort ss. The clauses are:

  • If xVx\in V and E(x)=sE(x)=s, then xTm E(s)x\in Tm_E(s).

  • Let C\mathtt{C} be a operator of signature ((s 1 1,,s 1 n 1;s 1),,(s m 1,,s m n m;s m);s)((s_{1}^{1},\dots,s_{1}^{n_1}; s_1),\dots, (s_{m}^{1},\dots,s_{m}^{n_m}; s_m); s), and let E:VSortE:V\to Sort be a sort context. For each 1km1\le k\le m, choose an n kn_k-ary fresh extension i k:VV k n k=V ki_k : V\to V_k^{n_k} = V_k, and extend EE to E k:V kSortE_k : V_{k}\to Sort by assigning to the fresh name of each i k li^l_k the sort s k ls_{k}^{l}. Suppose furthermore we have M kTm E k(s k)M_k \in Tm_{E_k}(s_k). Then we have a new term C(i 1.M 1,,i m.M m)Tm E(s)\mathtt{C}(i_1.M_1,\dots,i_m.M_m) \in Tm_E(s).

Structural action

Given any E:VSortE:V\to Sort and sSorts\in Sort, we define for any σ:VU\sigma:V\to U and F:USortF:U\to Sort with E=FσE = F\circ \sigma an action

Tm E(s) Tm F(s) M M[σ] \array{ Tm_{E}(s) & \to & Tm_{F}(s) \\ M & \mapsto & M[\sigma] }

by recursion on MM. (Note that the notation is technically ambiguous as the context FF is not specified. If σ\sigma is surjective, then FF is uniquely determined by EE, but in general this need not be.)

  • If xVx\in V with E(x)=sE(x)=s, then also F(σ(x))=E(x)=sF(\sigma(x)) = E(x)=s, so we can set x[σ]=σ(x)x[\sigma] = \sigma(x).

  • Given C(i 1.M 1,,i m.M m)Tm E(s)\mathtt{C}(i_1.M_1,\dots,i_m.M_m) \in Tm_E(s) as above, for each kk we choose an n kn_k-ary fresh pushout consisting of an n kn_k-ary fresh extension j k:UU kj_k : U \to U_k and a function τ k:V kU k\tau_k:V_k\to U_k, and define C(i 1.M 1,,i m.M m)[σ]=C(j 1.M 1[τ 1],,j m.M m[τ m])\mathtt{C}(i_1.M_1,\dots,i_m.M_m)[\sigma] = \mathtt{C}(j_1.M_1[\tau_1],\dots,j_m.M_m[\tau_{m}]). Here we extend the context F:USortF:U\to Sort to U kU_{k} using the same sorts s k ls_{k}^{l} in the signature of C\mathtt{C} that extend EE to V kV_{k}.

It is straightforward to prove by induction that M[id V]=MM[id_V] = M for any MM. However, it will only be symmetric and transitive up to α\alpha-equivalence.

α\alpha-equivalence

We usually think of α\alpha-equivalence as a “homogeneous” binary relation \sim on each set Tm E(s)Tm_E(s). However, the inductions are much cleaner if we first define it in “heterogeneous” form, as a family of relations ρ\sim_\rho between Tm E(s)Tm_E(s) and Tm E(s)Tm_{E'}(s) for any bijection ρ:VV\rho:V\cong V' such that Eρ=EE' \circ \rho = E. These relations are defined inductively as follows.

  • For any xVx\in V with E(x)=sE(x)=s, we have x ρρ(x)x \sim_\rho \rho(x). (Note that by assumptions on ρ\rho, it is equivalent to say that for any yVy\in V' with E(y)=sE'(y)=s we have ρ 1(y) ρy\rho^{-1}(y)\sim_\rho y.)

  • Suppose given C(i 1.M 1,,i m.M m)\mathtt{C}(i_1.M_1,\dots,i_m.M_m) in Tm E(s)Tm_E(s) and C(i 1,M 1,,i m,M m)\mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m) in Tm E(s)Tm_{E'}(s). Note that if i:VWi:V\to W and i:VWi':V'\to W' are two fresh extensions and ρ:VV\rho:V\cong V' is a bijection, then there is a unique bijection ρ:WW\rho':W\to W' such that ρi=iρ\rho' \circ i = i'\circ \rho and ρ\rho' preserves the fresh names. Applying this repeatedly for each kk, we obtain a sequence of bijections ρ k l\rho^l_k starting with ρ k 0=ρ:VV\rho^0_k = \rho : V \cong V' and ending with ρ k=ρ k n k:V kV k\rho_k = \rho^{n_k}_k : V_{k} \cong V'_{k}. If M k ρ kM kM_k \sim_{\rho_k} M'_k, then we assert C(i 1.M 1,,i m.M m)C(i 1,M 1,,i m,M m)\mathtt{C}(i_1.M_1,\dots,i_m.M_m)\sim \mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m).

I conjecture that if the inductive family TmTm is defined in homotopy type theory where 𝔽\mathbb{F} is a univalent category, then this definition of α\alpha-equivalence coincides with their ordinary (dependent) equality types.

It is straightforward to prove that each homogeneous relation id\sim_{id} is reflexive, and they are “congruences” for all the operators essentially by definition. Note that in general, the definition of id\sim_{id} at operators with binding involves ρ\sim_\rho for non-identity ρ\rho. However, this is not the case in the de Bruijn examples where every object has a unique fresh extension, and indeed in this case it is not hard to see that id\sim_{id} reduces to simple equality, as expected.

We now prove successively that they are congruences for the structural rules, are functorial, and are equivalence relations.

Basic properties

α\alpha-equivalence is symmetric

We prove by induction on \sim that if M ρMM\sim_\rho M' then M ρ 1MM'\sim_{\rho^{-1}} M.

  • In the case x ρρ(x)x\sim_\rho \rho(x), we have ρ(x) ρ 1ρ 1(ρ(x))=x\rho(x) \sim_{\rho^{-1}} \rho^{-1}(\rho(x)) = x.

  • Suppose C(i 1.M 1,,i m.M m) ρC(i 1,M 1,,i m,M m)\mathtt{C}(i_1.M_1,\dots,i_m.M_m)\sim_{\rho} \mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m) because M k ρ kM kM_k \sim_{\rho_{k}} M'_k for all kk, as in the definition. Then the inverses (ρ 1) k l(\rho^{-1})_{k}^l are the comparison maps in the other direction from i ki'_{k} to i ki_{k}, and by the inductive hypothesis we have M k ρ k 1M kM'_k \sim_{\rho^{-1}_k} M_k for all kk, which is what we need to conclude C(i 1,M 1,,i m,M m) ρ 1C(i 1.M 1,,i m.M m)\mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m) \sim_{\rho^{-1}} \mathtt{C}(i_1.M_1,\dots,i_m.M_m)

α\alpha-equivalence is transitive

We prove by induction on \sim that if M ρMM\sim_\rho M' and M ρMM' \sim_{\rho'} M'' then M ρρMM\sim_{\rho'\circ \rho} M''.

  • In the case x ρρ(x)x\sim_\rho \rho(x) and ρ(x) ρρ(ρ(x))\rho(x) \sim_{\rho'} \rho'(\rho(x)), we have x ρρ(ρρ)(x)=ρ(ρ(x))x \sim_{\rho'\circ \rho} (\rho'\circ \rho)(x) = \rho'(\rho(x)).

  • Suppose C(i 1.M 1,,i m.M m) ρC(i 1,M 1,,i m,M m)\mathtt{C}(i_1.M_1,\dots,i_m.M_m)\sim_{\rho} \mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m) because M k ρ kM kM_k \sim_{\rho_{k}} M'_k for all kk, and similarly C(i 1.M 1,,i m.M m) ρC(i 1,M 1,,i m,M m)\mathtt{C}(i'_1.M'_1,\dots,i'_m.M'_m)\sim_{\rho'} \mathtt{C}(i''_1,M''_1,\dots,i''_m,M''_m) because M k ρ kM kM'_k \sim_{\rho'_{k}} M''_k for all kk. Then by the inductive hypothesis, we have M k ρ kρ kM kM_k \sim_{\rho'_k \circ \rho_k} M''_k, and the ρ k lρ k l\rho'_k^l \circ \rho_k^l are the comparison maps for ρρ\rho'\circ\rho. Thus this is what we need to conclude C(i 1,M 1,,i m,M m) ρρC(i 1.M 1,,i m.M m)\mathtt{C}(i_1,M_1,\dots,i_m,M_m) \sim_{\rho'\circ \rho} \mathtt{C}(i''_1.M''_1,\dots,i''_m.M''_m).

Structural actions respect α\alpha-equivalence

Suppose given a commutative square σρ=ρσ\sigma' \circ \rho = \rho'\circ \sigma, where ρ,ρ\rho,\rho' are bijections. We prove by induction on \sim that if M ρMM\sim_{\rho} M' then M[σ] ρM[σ]M[\sigma]\sim_{\rho'} M'[\sigma].

  • In the case x ρρ(x)x\sim_\rho \rho(x), we have ρ(σ(x))=σ(ρ(x))\rho'(\sigma(x)) = \sigma'(\rho(x)), hence x[σ]=σ(x) ρσ(ρ(x))=(ρ(x))[σ]x[\sigma] = \sigma(x) \sim_{\rho'} \sigma'(\rho(x)) = (\rho(x))[\sigma'].

  • Suppose C(i 1.M 1,,i m.M m) ρC(i 1,M 1,,i m,M m)\mathtt{C}(i_1.M_1,\dots,i_m.M_m)\sim_{\rho} \mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m) because M k ρ kM kM_k \sim_{\rho_{k}} M'_k for all kk, as in the definition. The n kn_k-ary fresh pushouts of each i ki_k along σ\sigma and each i ki'_k along σ\sigma' give us n kn_k-ary fresh extensions j kj_k and j kj'_k and functions τ k\tau_k and τ k\tau'_k. Now by pasting of pushouts we can uniquely fill in the sequence of bijections ρ k l\rho'_k^l ending in ρ k=ρ k n k\rho'_k = \rho'_k^{n_k}. By the inductive hypothesis, we have M k[τ k] ρ kM k[τ k]M_k[\tau_k] \sim_{\rho'_k} M'_k[\tau'_k] for each kk, which by definition is what we need to ensure C(i 1.M 1,,i m.M m)[σ] ρC(i 1,M 1,,i m,M m)[σ]\mathtt{C}(i_1.M_1,\dots,i_m.M_m)[\sigma] \sim_{\rho} \mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m)[\sigma'].

Structural actions are functorial

Finally, we prove by induction on MM that whenever σσ=ρσ\sigma' \circ \sigma = \rho \circ \sigma'', with ρ\rho a bijection, we have M[σ] ρM[σ][σ]M[\sigma''] \sim_\rho M[\sigma][\sigma'] .

  • Given xx, we have x[σ]=σ(x) ρρ(σ(x))=σ(σ(x))=x[σ][σ]x[\sigma''] = \sigma''(x) \sim_\rho \rho(\sigma''(x)) = \sigma'(\sigma(x)) = x[\sigma][\sigma'].

  • Given C(i 1.M 1,,i m.M m)Tm E(s)\mathtt{C}(i_1.M_1,\dots,i_m.M_m) \in Tm_E(s) for E:VSortE:V\to Sort, for any kk we have the n kn_k-ary fresh pushout (j k,τ k)(j_k,\tau_k) of (i k,σ)(i_k,\sigma), the n kn_k-ary fresh pushout (j k,τ k)(j'_k,\tau'_k) of (j k,σ)(j_k,\sigma'), and also the n kn_k-ary fresh pushout (j k,τ k)(j''_k,\tau''_k) of (i k,σ)(i_k,\sigma''). By the universal properties of pushouts, the bijection ρ\rho extends to a unique sequence of bijections ρ k\rho_k between the sequences j kj'_k and j kj''_k that commute with everything, which are also the unique comparison map between the n kn_k-ary fresh extensions j kj'_k and j kj''_k induced by ρ\rho as in the definition of \sim. Thus, the inductive hypothesis ensures M k[τ k] ρ kM k[τ k][τ k]M_k[\tau''_k] \sim_{\rho_k} M_k'[\tau_k][\tau'_k] for all kk, which is what we need to conclude C(i 1.M 1,,i m.M m)[σ] ρC(i 1.M 1,,i m.M m)[σ][σ]\mathtt{C}(i_1.M_1,\dots,i_m.M_m)[\sigma''] \sim_\rho \mathtt{C}(i_1.M_1,\dots,i_m.M_m)[\sigma][\sigma'].

Conclusion: functoriality and congruence

It now follows that the relations id\sim_{id} are equivalence relations and congruences, and if we quotient the sets Tm E(s)Tm_E(s) by these equivalence relations, the actions [σ][\sigma] become functorial. However, we will not actually perform this quotienting at this stage; it seems to cause more trouble (see “general judgments” below) to do so prematurely here.

It does also follow that the heterogeneous α\alpha-equivalence is reducible to the homogeneous one: we have M ρMM \sim_\rho M' if and only if M[ρ] idMM[\rho] \sim_{id} M'.

Substitution

Let σ:UV\sigma:U\to V be an injection of cocardinality 1 with Vσ(U)={x}V\setminus \sigma(U) = \{x\}, let E:VSortE:V\to Sort and MTm E(s)M\in Tm_E(s), and let NTm Eσ(E(x))N\in Tm_{E\circ\sigma}(E(x)). We define the substitution M[N/σ]Tm Eσ(s)M[N/\sigma] \in Tm_{E\circ \sigma}(s) by induction on MM.

  • If M=xM = x, then M[N/σ]=NM[N/\sigma] = N, which is well-typed since then s=E(x)s = E(x).

  • If M=yV{x}=σ(U)M = y \in V\setminus \{x\} = \sigma(U), then y=σ(z)y = \sigma(z) for a unique zz, and we define M[N/σ]=zM[N/\sigma] = z.

  • If M=C(i 1.M 1,,i m.M m)M = \mathtt{C}(i_1.M_1,\dots,i_m.M_m), for each kk we have the n kn_k-ary fresh pushout complement of σ\sigma and i ki_k consisting of an n kn_k-ary fresh extension j k:UU kj_k: U\to U_k and a function τ k:U kV k\tau_k : U_k \to V_k of cocardinality 1. The signature of C\mathtt{C} extends EE to E k:V kSortE_k : V_k \to Sort, which thereby restricts to E kτ k:U kSortE_k \circ \tau_k : U_k \to Sort which extends EσE\circ \sigma along j kj_k. Thus, we can apply the functorial structural rules to get N[j k]Tm E kτ k(E k(i k(x)))N[j_k] \in Tm_{E_k\circ \tau_k}(E_k(i_k(x))), while by definition we have M kTm E k(s k)M_k \in Tm_{E_k}(s_k). So the inductive hypothesis gives us M k[N[j k]/τ k]Tm E kτ k(s k)M_k[N[j_k]/\tau_k] \in Tm_{E_k\circ \tau_k}(s_k), which we can then use to define C(i 1.M 1,,i m.M m)[N/σ]\mathtt{C}(i_1.M_1,\dots,i_m.M_m)[N/\sigma] to be C(j 1.M 1[N[j 1]/τ 1],,j m.M m[N[j m]/τ m])\mathtt{C}(j_1.M_1[N[j_1]/\tau_1],\dots,j_m.M_m[N[j_m]/\tau_m]).

Note how it is impossible for any variables to be “captured”, because the variables in NN get “renamed” (if necessary) by the weakenings [j k][j_k]. Recall that in Examples and the fresh extensions carry the information of how the variables in the context have to appear with different names inside a binder; while in Examples and where the fresh extensions are subset inclusions, the bound variables are always distinct from those in the context. The categorical and strongly-typed framework frees us from having to think about which of these situations we are in and prevents us from accidentally making a wrong definition of substitution that would capture variables: NN itself does not have the correct variable-set and sort-context to be substituted into M kM_k; we must weaken it by j kj_k first.

Substitution preserves α\alpha-equivalence

Suppose given a commutative square σρ=ρσ\sigma'\circ \rho' = \rho\circ \sigma, where σ,σ\sigma,\sigma' are injections of cocardinality 1 and ρ,ρ\rho,\rho' are bijections, with sort-contexts E,EE,E' with Eρ=EE'\circ \rho = E, terms M ρMM \sim_\rho M', and N ρNN\sim_{\rho'} N'. We prove by induction (on M ρMM \sim_\rho M') that M[N/σ] ρM[N/σ]M[N/\sigma] \sim_{\rho'} M'[N'/\sigma'].

  • Note that if Vσ(U)={x}V\setminus \sigma(U) = \{x\} and Vσ(U)={x}V'\setminus \sigma'(U') = \{x'\}, we must have ρ(x)=x\rho(x) = x'. Thus, in the case M=xM=x we must have M=xM'=x', hence M[N/σ]=NM[N/\sigma] = N and M[N/σ]=NM'[N'/\sigma] = N', so we are done by the assumption N ρNN\sim_{\rho'} N'.

  • In the case of other variables y ρρ(y)y\sim_\rho \rho(y), we have y=σ(z)y = \sigma(z) and ρ(y)=σ(ρ(z))\rho(y) = \sigma'(\rho'(z)), so we conclude by z ρρ(z)z\sim_{\rho'} \rho'(z).

  • Suppose C(i 1.M 1,,i m.M m) ρC(i 1,M 1,,i m,M m)\mathtt{C}(i_1.M_1,\dots,i_m.M_m)\sim_{\rho} \mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m) because M k ρ kM kM_k \sim_{\rho_{k}} M'_k for all kk. Because pushout complements (of monomorphisms in an adhesive category) are unique up to unique isomorphism, the isomorphisms ρ,ρ,ρ k\rho,\rho',\rho_k induce isomorphisms ρ k\rho'_k. The compatibility of structural actions with α\alpha-equivalence implies that N[j k] ρ kN[j k]N[j_k] \sim_{\rho'_k} N'[j'_k], so the inductive hypothesis yields M k[N[j k]/τ k] ρ kM k[N[j k]/τ k]M_k[N[j_k]/\tau_k] \sim_{\rho'_k} M'_k[N'[j'_k]/\tau'_k], whence C(i 1.M 1,,i m.M m)[N/σ] ρC(i 1,M 1,,i m,M m)[N/σ]\mathtt{C}(i_1.M_1,\dots,i_m.M_m)[N/\sigma] \sim_{\rho'} \mathtt{C}(i'_1,M'_1,\dots,i'_m,M'_m)[N'/\sigma'] by definition.

Dependent type theory

Let us specialize to the case of dependent type theory, with two sorts tmtm and tyty. Note that tmtm is a sort, while Tm E(s)Tm_E(s) is the set of abstract binding trees of sort ss; thus the “raw terms” of DTT are Tm E(tm)Tm_E(tm) and the “raw types” are Tm E(ty)Tm_E(ty). Since we do not generally hypothesize types, we consider only sort-contexts that assign the sort tmtm to every variable; we henceforth omit to mention these sort-contexts. Thus we speak of raw terms with variables VV and raw types with variables VV.

Contexts

We define the raw contexts with variables VV, for any V𝔽V\in \mathbb{F}, inductively as follows.

  • There is a unique raw context with variables \emptyset, the empty context \lozenge.

  • If Γ\Gamma is a raw context with variables VV and AA is a raw type with variables VV, and i:VWi:V\to W is a fresh extension, then Γ,i:A\Gamma, i:A is a raw context with variables WW.

Similarly, if ρ:VV\rho : V\cong V' is a bijection and Γ,Γ\Gamma,\Gamma' are raw contexts with variables V,VV,V' respectively, we define Γ ρΓ\Gamma \sim_\rho \Gamma' inductively:

  • id \lozenge \sim_{id_\emptyset} \lozenge

  • If Γ ρΓ\Gamma \sim_\rho \Gamma and A ρAA\sim_\rho A', and ρ:WW\rho':W\cong W' is the unique bijection induced by ρ:VV\rho:V\cong V' and the fresh extensions i:VWi:V\to W and i:VWi':V'\to W', then (Γ,i:A) ρ(Γ,i:A)(\Gamma,i:A) \sim_{\rho'} (\Gamma',i':A').

Note that this “variable renaming” of contexts includes heterogeneous α\alpha-equivalence (renaming bound variables) of the types occurring in the context. Thus, its homogeneous version Γ idΓ\Gamma \sim_{id} \Gamma' is exactly α\alpha-equivalence of all the types.

Judgments

The basic judgments of DTT are:

  • For any raw context Γ\Gamma and raw type AA, both with variables VV, we have a judgment ΓAtype\Gamma \vdash A\, type.
  • For any raw context Γ\Gamma and raw types AA and BB, all with variables VV, we have a judgment ΓABtype\Gamma \vdash A \equiv B \, type.
  • For any raw context Γ\Gamma, raw type AA, and raw term MM, all with variables VV, we have a judgment ΓM:A\Gamma \vdash M:A.
  • For any raw context Γ\Gamma, raw type AA, and raw terms MM and NN, all with variables VV, we have a judgment ΓMN:A\Gamma \vdash M \equiv N:A.

Recall that we did not actually quotient our raw syntax by α\alpha-equivalence. Thus there is no problem in stating a general judgment rule such as

Γ,i:AM:B[i]Γ(λi.M):AB. \frac{\Gamma, i:A \vdash M:B[i]}{\Gamma \vdash (\lambda i.M):A\to B.}

Here ii denotes the same thing in both premise and conclusion: a fresh extension of the set of variables of Γ\Gamma. We write B[i]B[i] in the premise meaning the explicit weakening (structural action) of BB along ii.

We deal with α\alpha-equivalence by proving afterwards by induction that all our valid judgments are invariant under context renaming / heterogeneous α\alpha-equivalence. For instance, the part of this result referring to the term rule is that whenever Γ ρΓ\Gamma\sim_\rho \Gamma', A ρAA\sim_\rho A' and M ρMM\sim_\rho M', if ΓM:A\Gamma \vdash M:A then also ΓM:A\Gamma' \vdash M':A'. And the corresponding inductive clause for the above λ\lambda-abstraction rule is:

  • Suppose Γ(λi.M):AB\Gamma \vdash (\lambda i.M):A\to B is derived from Γ,i:AM:B[i]\Gamma, i:A \vdash M:B[i] by the above rule, and that we have Γ ρΓ\Gamma\sim_\rho \Gamma', (AB) ρC(A\to B) \sim_\rho C, and (λi.M) ρN(\lambda i.M)\sim_\rho N. By inversion, we must have C=(AB)C = (A'\to B') with A ρAA\sim_\rho A' and B ρBB\sim_\rho B', and similarly N=(λi.M)N = (\lambda i'.M') with M ρMM \sim_{\rho'} M', where ρ:WW\rho':W\cong W' is the bijection induced by pushing ρ\rho forward along ii and ii'. Then by definition we have (Γ,i:A) ρ(Γ,i:A)(\Gamma,i:A) \sim_{\rho'} (\Gamma',i':A'), while B[i] ρB[i]B[i] \sim_{\rho'} B'[i'] by the above lemma that structural actions respect α\alpha-equivalence. Therefore, the inductive hypothesis applies to show that Γ,i:AM:B[i]\Gamma', i':A' \vdash M':B'[i'], hence Γ(λi.M):AB\Gamma' \vdash (\lambda i'.M'):A'\to B' by applying the rule again.

Perhaps one could formulate a general notion of “judgment” and “rule” that would allow proving this sort of lemma once and for all, rather than piece by piece about every rule in a particular theory, analogous to how abstract binding trees can deal simultaneously with all the binding constructions of all theories (Π\Pi, λ\lambda, casecase, recrec, etc.).

On uniqueness

I expect it should be possible to prove that the sets Tm E(s)Tm_E(s) and all their structure are independent, in a suitable sense, of the choice of 𝔽\mathbb{F}. Probably the easiest way to do this would be to translate from an arbitrary 𝔽\mathbb{F} to the tautological example 𝔽=FinSet\mathbb{F}=FinSet and show that this translation is an isomorphism (up to α\alpha-equivalence).

On practical notation

Our term syntax, for a general 𝔽\mathbb{F}, is very similar to the way terms with binding are usually written. Of course there is the usual syntactic desugaring of Π(x:A).B\Pi(x:A).B to Π(A,x.B)\Pi(A,x.B) and so on. Also, in some cases one traditionally binds a single variable (or, in our case, a fresh extension) in two or more subterms (such as a fully-annotated abstraction λ(x:A.B).M\lambda(x:A.B).M where xx is bound in both BB and MM), whereas we have to desugar this to copy the variable (e.g. λ(A,x.B,x.M)\lambda(A,x.B,x.M)). These trivialities are common to other approaches to abstract binding as well.

More significantly, consider the three places that the same notion of “variable name” is used in traditional named syntax:

  1. As terms (“uses” of variables).
  2. At binding sites (including variable declarations in a context).
  3. As things being substituted for.

In de Bruijn syntax, natural numbers are used in cases (1) and (3), while at (2) there is no “variable name” used at all. In our general framework, all three of these situations take different inputs: for (1) we give an element of a set VV, for (2) we give a fresh extension of the ambient set VV of free variables, and for (3) we give an injection of cocardinality 1 into the current set VV of free variables.

We could coalesce (3) to (1) by equipping our 𝔽\mathbb{F}s with a further operation: given V𝔽V\in \mathbb{F} and xVx\in V, a specified injection σ x:UV\sigma_x:U\to V whose complement is {x}\{x\}. In the named-variable Examples and σ x\sigma_x would be the subset inclusion of V{x}V\setminus \{x\}, while in the de Bruijn Examples and it would be the ordered injection [n][n+1][n] \to [n+1] that omits xx. This would probably work just as well; there seems no reason that substitution into one variable needs to be allowed to rename the other variables.

We could also make (1) and (2) appear to coalesce by declaring “the fresh name of a fresh extension” to be an implicit coercion. Then we could write a term like λx.xx\lambda x. x x where the first occurrence of xx denotes a fresh extension, while the latter two automatically coerce it to its fresh name.

However, there seems no way around the fact that our syntax requires all variables in the context to be explicitly (potentially) renamed every time they pass under a binder. Thus even with this coercion convention we cannot write λx.λy.x\lambda x. \lambda y. x, since the fresh extension y:V 2V 3y:V_2\to V_3 might rename the fresh name of the fresh extension x:V 1V 2x:V_1\to V_2, and inside the binder λy\lambda y we can only use elements of V 3V_3 as terms. We have to instead write something like λx.λy.x[y]\lambda x. \lambda y. x[y]. This is still somewhat easier for a human to read than de Bruijn syntax, since the binders are referred to by name rather than by number, and it’s probably not unreasonable when doing formal theory (e.g. the explicit weakening B[i]B[i] in the abstraction rule above) — for the Initiality Project I’m tempted to use this formalism explicitly, as the variable sets VV correspond fairly closely to the semantic interpretations I have in mind.

However, for actual practice in a type theory this syntax is more verbose than one would want, so one would probably choose a particular “implementation” — e.g. the Barendregt convention (Example ) for human readers, or de Bruijn indices (Example ) for a computer. The expected uniqueness theorem would enable translation between them as needed.

Last revised on November 7, 2018 at 20:03:59. See the history of this page for a list of all contributions to it.