General Supervised Learning as Change Propagation with Delta Lenses

Delta lenses are an established mathematical framework for modelling and designing bidirectional model transformations (Bx). Following the recent observations by Fong et al, the paper extends the delta lens framework with a a new ingredient: learning over a parameterized space of model transformations seen as functors. We will define a notion of an asymmetric learning delta lens with amendment (ala-lens), and show how ala-lenses can be organized into a symmetric monoidal (sm) category. We also show that sequential and parallel composition of well-behaved (wb) ala-lenses are also wb so that wb ala-lenses constitute a full sm-subcategory of ala-lenses.


Introduction
The goal of the paper is to develop a formal model of supervised learning in a very general context of bidirectional model transformation or Bx, i.e., synchronization of two arbitrary complex structures (called models) related by a transformation. 1 Rather than learning parameterized functions between Euclidean spaces as is typical for machine learning (ML), we will consider learning mappings between model spaces and formalize them as parameterized functors between categories, f : P ×A → B, with P being a parameter space. The basic ML-notion of a training pair (A, B ) ∈ A 0 × B 0 will be considered as an inconsistency between models caused by a change (delta) v: B → B of the target model B = f (p, A), p ∈ P , that was first consistent with A w.r.t. the transformation (functor) f (p, _). An inconsistency is repaired by an appropriate change of the source structure, u: A → A , changing the parameter p to p , and an amendment of the target structure v @ : B → B @ so that f (p , A ) = B @ is a consistent state of the parameterized two-model system.
The setting above without parameterization and learning (i.e., p = p always holds), and without amendment (v @ = id B always holds), is well known in the Bx literature under the name of delta lenses-mathematical structures, in which consistency restoration via change propagation is modelled by functoriallike algebraic operations over categories [12,6]. There are several types of delta lenses tailored for modelling different synchronization tasks and scenarios, particularly, symmetric and asymmetric. In the paper, we only consider asymmetric delta lenses and will often omit explicit mentioning these attributes. Despite their extra-generality, (delta) lenses have been proved useful in the design and implementation of practical model synchronization systems with triple graph grammars (TGG) [5,2]; enriching lenses with amendment is a recent extension of the framework motivated and formalized in [11]. A major advantage of the lens framework for synchronization is its compositionality: a lens satisfying several equational laws specifying basic synchronization requirements is called wellbehaved (wb), and basic lens theorems state that sequential and parallel composition of wb lenses is again wb. In practical applications, it allows the designer of a complex synchronizer to avoid integration testing: if elementary synchronizers are tested and proved to be wb, their composition is automatically wb too.
The present paper makes the following contributions to the delta lens framework for Bx. a) We motivate model synchronization enriched with learning and, moreover, with categorical learning, in which the parameter space is a category, and introduce the notion of a wb asymmetric learning (delta) lens with amendment (a wb ala-lens) (this is the content of Sect. 3). b) We prove compositionality of wb ala-lenses and show how their universe can be organized into a symmetric monoidal (sm) category (Theorems 1-3 in Sect. 4). All proofs (rather straightforward but notationally laborious) can be found in the long version of the paper [9]. One more compositional result is c) a definition of a compositional bidirectional transformation language (Def. 6) that formalizes an important requirement to model synchronization tools, which (surprisingly) is missing from the Bx literature. Background Sect. 2 provides a simple example demonstrating main concepts of Bx and delta lenses in the MDE context. Section 5 briefly surveys related work, and Sect. 6 concludes.
Notation. Given a category A, its objects are denoted by capital letters A, A , etc. to recall that in MDE applications, objects are complex structures, which themselves have elements a, a , ....; the collection of all objects of category A is denoted by A 0 . An arrow with domain A ∈ A 0 is written as u: A → _ or u ∈ A(A, _); we also write dom(u) = A (and sometimes u dom = A to shorten formulas). Similarly, formula u: _ → A denotes an arrow with codomain u.cod = A . Given a functor f : A → B, its object function is denoted by f 0 : A 0 → B 0 .
A subcategory B ⊂ A is called wide if it has the same objects. All categories we consider in the paper are small.

Background: Update propagation and delta lenses
Although Bx ideas work well only in domains conforming to the slogan any implementation satisfying the specification is good enough such as code generation (see [10] for discussion), and have limited applications in databases (only so called updatable views can be treated in the Bx-way), we will employ a simple database example: it allows demonstrating the core ideas without any special domain knowledge required by typical Bx-amenable areas. The presentation will be semi-formal as our goal is to motivate the delta lens formalism that abstracts the details away rather than formalize the example as such.

Why deltas.
Bx-lenses first appeared in the work on file synchronization, and if we have two sets of strings, say, B = {John, Mary} and B = {Jon, Mary}, we can readily see the difference: John = Jon but Mary = Mary. We thus have a structure in-between B and B (which maybe rather complex if B and B are big files), but this structure can be recovered by string matching and thus updates can be identified with pairs. The situation dramatically changes if B and B are object structures, e.g., B = {o 1 , o 2 } with Name(o 1 ) = John, Name(o 2 ) = Mary and similarly B = {o 1 , o 2 } with Name(o 1 ) = Jon, Name(o 2 ) = Mary. Now string matching does not say too much: it may happen that o 1 and o 1 are the same object (think of a typo in the dataset), while o 2 and o 2 are different (although equally named) objects. Of course, for better matching we could use full names or ID numbers or something similar (called, in the database parlance, primary keys), but absolutely reliable keys are rare, and typos and bugs can compromise them anyway. Thus, for object structures that Bx needs to keep in sync, deltas between models need to be independently specified, e.g., by specifying a sameness relation u ⊂ B×B between models. For example, u = {o 1 , o 1 } says that John@B and Jon@B are the same person while Mary@B and Mary@B are not. Hence, model spaces in Bx are categories (objects are models and arrows are update/delta specifications) rather than sets (codiscrete categories). Figure 1 presents a simple example of delta propagation for consistency restoration. Models consist of objects (in the sense of OO programming) with attributes (a.k.a. labelled records), e.g., the source model A consists of three objects identified by their oids (object identifiers) #A, #J, #M (think about employees of some company) with attribute values as shown in the table: attribute Expr. refers to Experience measured by a number of years, and Depart. is the column of department names. The schema of the table, i.e., the triple S A of attributes (Name, Expr., Depart.) with their domains of values String String String, Integer Integer Integer, String String String resp., determines a model space A. A model X ∈ A is given by its set of objects OID X together with three functions Name X , Expr. X , Depart. X from the same domain OID X to targets String String String, Integer Integer Integer, String String String resp., which are compactly specified by tables as shown for model A. The target model space B is given by a similar schema S B consisting of two attributes. The B-view get(X) of an A-model X is computed by selecting those oids #O ∈ OID X for which Depart. X (#O) is an IT-department, i.e., an element of the set IT We assume that all column names in schemas S A , and S B are qualified by schema names, e.g., OID@S A , OID@S B etc, so that schemas are disjoint except elementary domains like String String String etc. Also disjoint are OID-values, e.g., #J@A and #J@B are different elements, but constants like John and Mary are elements of set String String String shared by both schemas. To shorten long expressions in the diagrams, we will often omit qualifiers and write #J = #J meaning #J@A = #J@B or #J@B = #J@B depending on the context given by the diagram; often we will also write #J and #J for such OIDs. Also, when we write #J = #J inside block arrows denoting updates, we actually mean a pair, e.g., (#J@B, #J@B ).

Consistency restoration via update propagation: An Example
Given two models over the same schema, say, B and B over S B , an update v: B → B is a relation v ⊂ OID B ×OID B ; if a schema contains several nodes, an update should provide a relation v N for each node N in the schema.
Note an essential difference between the two parallel updates v 1 , v 2 : B → B specified in the figure. Update v 1 says that John's name was changed to Jon (think of fixing a typo), and the experience data for Mary were also corrected (either because of a typo or, e.g., because the department started to use a new ML method for which Mary has a longer experience). Update v 2 specifies the same story for John but a new story for Mary: it says that Mary #M left the IT-view and Mary #M is a new employee in one of IT-departments.  The updated view B is inconsistent with the source A and the latter is to be updated accordingly -we say that update v is to be propagated back to A. Propagation of v 1 is easy: we just update accordingly the values of the attributes as shown in the figure in the block arrow u 1 : A → A 1 (of black colour). Importantly, propagation needs two pieces of data: the view update v 1 and the original state A of the source as shown in the figure by two data-flow lines into the chevron 1:put; the latter denotes invocation of the backward propagation operation put (read "put view update back to the source"). The quadruple 1 = (v 1 , A, u 1 , A ) can be seen as an instance of operation put, hence the notation 1:put (borrowed from the UML). Propagation of update v 2 is more challenging: Mary can disappear from the IT-view because a) she quit the company, b) she transitioned to a non-IT department, and c) the view definition has changed, e.g., the new view must only show employees with experience more than 5 years. Choosing between these possibilities is often called choosing an (update) policy. We will consider the case of changing the view in Sect. 3, and in the current section discuss policies a) and b) (ignore for a while the propagation scenario shown in blue in the right lower corner of the figure that shows policy c)).

Source model
For policy a), further referred to as quiting and briefly denoted by qt, the result of update propagation is shown in the figure with green colour: notice the update (block) arrow u qt 2 and its result, model A qt 2 , produced by invoking operation put qt . Note that while we know the new employee Mary works in one of IT departments, we do not know in which one. This is specified with a special value ' ?' (a.k.a. labelled null in the database parlance).
For policy b), further referred to as transition and denoted tr, the result of update propagation is shown in the figure with orange colour: notice update arrow u tr 2 and its result, model A tr 2 produced by put tr . Mary #M is the old employee who transitioned to a new non-IT department, for which her expertize is unknown. Mary #M' is a new employee in one of IT-departments (we assume that the set of departments is not exhausted by those appearing in a particular state A ∈ A). There are also updates whose backward propagation is uniquely defined and does not need a policy, e.g., update v 1 is such.
An important property of update propagations we have considered is that they restore consistency: the view of the updated source equals to the updated view initiated the update: get(A ) = B ; moreover, this equality extends for update arrows: get(u i ) = v i , i = 1, 2. Such extensions can be derived from view definitions if the latter are determined by so called monotonic queries (which encompass a wide class of practically useful queries including the Select-Project-Join class). For views defined by non-monotonic queries, in order to obtain get's action on source updates u: A → A , a suitable policy is to be added to the view definition (see [1,14,12] for details and discussion). Moreover, normally get preserves identity updates, get(id A ) = id get(A) , and update composition: for any u: A → A and u : A → A , equality get(u; u ) = get(u); get(u ) holds.

Delta lenses
Our discussion of the example can be summarized in the following algebraic terms. We have two categories of models and updates, A and B, and a functor get: A → B incrementally computing B-views of A-models (we will often write A.get for get(A)). We also suppose that for a chosen update policy, we have worked out precise procedures for how to propagate any view update backwards. This gives us a family of operations put A : Definition 1 (Delta Lenses ( [12])) Let A, B be two categories. An (asymmetric delta) lens from A (the source of the lens) to B (the target) is a pair = (get, put), where get: A → B is a functor and put is a family of operations The last condition is called (co)discrete Putget law: where get 0 denotes the object function of functor get. We will write a lens as an arrow : A → B going in the direction of get.
Note that family put corresponds to a chosen update policy, e.g., in terms of the example above, for the same view functor get, we have two families of put-operations, put qt and put tr , corresponding to the two updated policies we discussed. These two policies determine two lenses qt = (get, put qt ) and tr = (get, put tr ) sharing the same get. Definition 2 (Well-behavedness) A (lens) equational law is an equation to hold for all values of two variables: A ∈ A 0 and v: A.get → T . A lens is called well-behaved (wb) if the following two laws hold: Remark 1. Stability law says that a wb lens does nothing if nothing happens on the target side (no actions without triggers). Putget requires consistency after the backward propagation is finished. Note the distinction between the Putget 0 condition included into the very definition of a lens, and the full Putget law required for the wb lenses. The former is needed to ensure smooth tiling of put-squares (i.e., arrow squares describing application of put to a view update and its result) both horizontally (for sequential composition) and vertically (not considered in the paper). The full Putget assures true consistency as considering a state B alone does not say much about the real update and elements of B cannot be properly interpreted. The real story is specified by delta v: B → B , and consistency restoration needs the full (PutGet) law as above. 2 A more detailed trailer of lenses can be found in the long version [9].

Asymmetric Learning Lenses with Amendments
We will begin with a brief motivating discussion, and then proceed with formal definitions

Does Bx need categorical learning?
Enriching delta lenses with learning capabilities has a clear practical sense for Bx. Having a lens (get, put): A → B and inconsistency A.get = B , the idea of learning extends the notion of the search space and allows us to update the transformation itself so that the final consistency is achieved for a new transformation get : A.get = B . For example, in the case shown in Fig. 1, disappearance of Mary #M in the updated view B can be caused by changing the view definition, which now requires to show only those employees whose experience is more than 5 years and hence Mary #M is to be removed from the view, while Mary #M' is a new IT-employee whose experience satisfies the new definition. Then the update v 2 can be propagated as shown in the bottom right corner of Fig. 1, where index par indicates a new update policy allowing for view definition (parameter) change.
To manage the extended search possibilities, we parameterize the space of transformations as a family of mappings get p : A → B indexed over some parameter space p ∈ P. For example, we may define the IT-view to be parameterized by the experience of employees shown in the view (including any experience as a special parameter value). Then we have two interrelated propagation operations that map an update B B to a parameter update p p and a source update A A . Thus, the extended search space allows for new update policies that look for updating the parameter as an update propagation possibility. The possibility to update the transformation appears to be very natural in at least two important Bx scenarios: a) model transformation design and b) model transformation evolution (cf. [21]), which necessitates the enrichment of the delta lens framework with parameterization and learning. Note that all transformations get p , p ∈ P are to be elements of the same lens, and operations put are not indexed by p, hence, formalization of learning by considering a family of ordinary lenses would not do the job.
Categorical vs. codiscrete learning Suppose that the parameter p is itself a set, e.g., the set of departments forming a view can vary depending on some context. Then an update from p to p has a relational structure as discussed above, i.e., e: p → p is a relation e ⊂ p×p specifying which departments disappeared from the view and which are freshly added. This is a general phenomenon: as soon as parameters are structures (sets of objects or graphs of objects and attributes), a parameter change becomes a structured delta and the space of parameters gives rise to a category P. The search/propagation procedure returns an arrow e: p → p in this category, which updates the parameter value from p to p . Hence, a general model of supervised learning should assume P to be a category (and we say that learning is categorical). The case of the parameter space being a set is captured by considering a codiscrete category P whose only arrows are pairs of its objects; we call such learning codiscrete.

Ala-lenses
The notion of a parameterized functor (p-functor) is fundamental for ala-lenses, but is not a lens notion per se and is thus placed into Appendix Sect. A.1. We will work with its exponential (rather than equivalent product-based) formulation but will do uncurrying and currying back if necessary, and often using the same symbol for an arrow f and its uncurried versionf .
Definition 3 (ala-lenses) Let A and B be categories. An ala-lens from A (the source of the lens) to B (the target) is a pair = (get, put) whose first component is a p-functor get: A P -B and the second component is a triple of (families of) operations put = (put upd p,A , put req p,A , put self p,A ) indexed by pairs p ∈ P 0 , A ∈ A 0 ; arities of the operations are specified below after we introduce some notation. Names req (for 'request') and upd (for 'update') are chosen to match the terminology in [17].
Categories A, B are called model spaces, their objects are models and their arrows are (model) updates or deltas. Objects of P are called parameters and are denoted by small letters p, p , .. rather than capital ones to avoid confusion with [17], in which capital P is used for the entire parameter set. Arrows of P are called parameter deltas. For a parameter p ∈ P 0 , we write get p for the functor get(p): A → B (read "get B-views of A"), and if A ∈ A 0 is a source model, its get p -view is denoted by get p (A) or A.get p or even A p (so that _ p becomes yet another notation for functor get p ). Given a parameter delta e: p → p and a source model A ∈ A 0 , the model delta get(e): get p (A) → get p (A) will be denoted by get e (A) or e S (rather than A e as we would like to keep capital letters for objects only). In the uncurried version, get e (A) is nothing butǧ et(e, id S ) Since get e is a natural transformation, for any delta u: A → A we have a commutative square e S ; u p = u p ; e A (whose diagonal isǧ et(e, u)). We will denote the diagonal of this square by u.get e or u e : A p → A p . Thus, we use notation Now we describe operations put. They all have the same indexing set P 0 × A 0 , and the same domain: for any index p, A and any model delta v: A p → B in B, the value put x p,A (p, A), x ∈ {req, upd, self} is defined and unique: (2) put upd p,A : p → p is a parameter delta from p, put req p,A : A → A is a model delta from A, put self p,A : B → A p is a model delta from B called the amendment and denoted by v @ .
Note that the definition of put self involves an equational dependency between all three operations: for all A ∈ A 0 , v ∈ B(A.get, _), we require We will write an ala-lens as an arrow = (get,

Fig. 2: Ala-lens operations
Diagram in Fig. 2 shows how a lens' operations are interrelated. The upper part shows an arrow e: p → p in category P and two corresponding functors from A to B. The lower part is to be seen as a 3D-prism with visible front face AA p A p A and visible upper face AA p A p , the bottom and two back faces are invisible, and the corresponding arrows are dashed. The prism denotes an algebraic term: given elements are shown with black fill and white font while derived elements are blue (recalls being mechanically computed) and blank (double-body arrows are considered as "blank"). The two pairs of arrows originating from A and A are not blank because they denote pairs of nodes (the UML says links) rather than mappings/deltas between nodes. Equational definitions of deltas e, u, v @ are written up in the three callouts near them. The right back face of the prism is formed by the two vertical derived deltas u p = u.get p and u p = u.get p , and the two matching them horizontal derived deltas e S = get e (A) and e A = get e (A ); together they form a commutative square due to the naturality of get(e) as explained earlier.
Definition 4 (Well-behavedness) An ala-lens is called well-behaved (wb) if the following two laws hold for all p ∈ P 0 , A ∈ A 0 and v: A p → B : (Stability) if v = id Ap then all three propagated updates e, u, v @ are identities: Note that Remark 1 about the Putget law is again applicable.
Example 1 (Identity lenses). Any category A gives rise to an ala-lens id A with the following components. The source and target spaces are equal to A, and the parameter space is 1. Functor get is the identity functor and all puts are identities. Obviously, this lens is wb.
Example 2 (Iso-lenses). Let ι: A → B be an isomorphism between model spaces. It gives rise to a wb ala-lens (ι): A → B with P (ι) = 1 = { * } as follows. Given any A in A and v: ι(A) → B in B, we define put (ι).req * ,A (v) = ι −1 (v) while the two other put operations map v to identities.
Example 3 (Bx lenses). Examples of wb aa-lenses modelling a Bx can be found in [11]: they all can be considered as ala-lenses with a trivial parameter space 1.
Example 4 (Learners). Learners defined in [17] are codiscretely learning codiscrete lenses with amendment, and as such satisfy (the amended) Putget (Remark 1). Looking at the opposite direction, ala-lenses are a categorification of learners as detailed in Fig. 8 on p. 194.

Compositionality of ala-lenses
This section explores the compositional structure of the universe of ala-lenses; especially interesting is their sequential composition. We will begin with a small example demonstrating sequential composition of ordinary lenses and showing that the notion of update policy transcends individual lenses. Then we define sequential and parallel composition of ala-lenses (the former is much more involved than for ordinary lenses) and show that wb ala-lenses can be organized into an sm-category. Finally, we formalize the notion of a compositional update policy via the notion of a compositional bidirectional language.  Fig. 1 with a new model space C whose schema consists of the only attribute Name, and a view of the IT-view, in which only employees of the ML department are to be shown. Thus, we now have two functors, get1: A → B and get2: B → C, and their composition Get: A → C (referred to as the long get). The top part of Fig. 3 shows how it works for model A considered above.

Compositionality of update policies: An example
Each of the two policies, policy qt (green) and policy tr (orange), in which person's disappearance from the view are interpreted, resp., as quiting the company and transitioning to a department not included into the view, is applicable to the new view mappings get2 and Get, thus giving us six lenses shown in Fig. 4 with solid arrows; amongst them, lenses, L qt and L tr are obtained by applying policy pol to the (long) functor Get;, and we will refer to them long lenses. In addition, we can compose lenses of the same colour as shown in Fig. 4 3 demonstrates how the mechanisms work with a simple example. We begin with an update w of the view C that says that Mary #M left the ML department, and a new Mary #M was hired for ML. Policy qt interprets Mary's disappearance as quiting the company, and hence this Mary doesn't appear in view B qt produced by put2 qt nor in view A qt 12 produced from B qt by put1 qt , and updates v qt and u qt 12 are written accordingly. Obviously, Mary also does not appear in view A qt produced by the long lens's Put qt . Thus, put1 qt A (put2 qt A (w)) = Put qt A (w), and it is easy to understand that such equality will hold for any source model A and any update w: C → C due to the nature of our two views get1 and get2. Hence, L qt = qt 1 ; qt 2 where L qt = (Get, Put qt ) and qt i = (geti, puti qt ). The situation with policy tr is more interesting. Model A tr 12 produced by the composed lens tr 1 ; tr 2 , and model A tr produced by the long lens L tr = (Get, Put tr ) are different as shown in the figure (notice the two different values for Mary's department framed with red ovals in the models). Indeed, the composed lens has more information about the old employee Mary-it knows that Mary was in the IT view, and hence can propagate the update more accurately. The comparison update δ tr A,w : A tr → A tr 12 adds this missing information so that equality u tr ; δ tr A,w = u tr 12 holds. This is a general phenomenon: functor composition looses information and, in general, functor Get = get1; get2 knows less than the pair (get1, get2). Hence, operation Put back-propagating updates over Get (we will also say inverting Get) will, in general, result in less certain models than composition put1 • put2 that inverts the composition get1; get2 (a discussion and examples of this phenomenon in the context of vertical composition of updates can be found in [8]). Hence, comparison updates such as δ tr A,w should exist for any A and any w: A.Get → C , and together they should give rise to something like a natural transformation between lenses, δ tr A,B,C : L tr ⇒ tr 1 ; tr 2 . To make this notion precise, we need a notion of natural transformation between "functors" put, which we leave for future work. In the present paper, we will consider policies like qt, for which strict equality holds.

Sequential composition of ala-lenses
Let k : A → B and : B → C be two ala-lenses with parameterized functors get k : P → [A, B] and get : Q → [B, C] resp. Their composition is the following ala-lens k ; . Its parameter space is the product P × Q, and the get-family is defined as follows. For any pair of parameters (p, q) (we will write pq), get k ; pq = get k p ; get q : A → C. Given a pair of parameter deltas, e: p → p in P and h: q → q in Q, their get k ; -image is the Godement product * of natural transformations, get k ; (eh) = get k (e) * get (h) ( we will also write get k e || get h )

Fig. 5: Sequential composition of apa-lenses
Now we define k ; 's propagation operations puts. Let (A, pq, A pq ) with A ∈ A 0 , pq ∈ (P×Q) 0 , A.get k p .get q = A pq ∈ C 0 be a state of lens k ; , and w: A pq → C is a target update as shown in Fig. 3. For the first propagation step, we run lens as shown in Fig. 3 with the blue colour for derived elements: this is just an instantiation of the pattern of Fig. 2 with the source object being A p = A.get p and parameter q. The results are deltas Next we run lens k at state (p, A) and the target update v produced by lens ; it is yet another instantiation of pattern in Fig. 2 (this time with the green colour for derived elements), which produces three deltas These data specify the green prism adjoint to the blue prism: the edge v of the latter is the "first half" of the right back face diagonal A p A p of the former. In order to make an instance of the pattern in Fig. 2 for lens k ; , we need to extend the blue-green diagram to a triangle prism by filling-in the corresponding "empty space". These filling-in arrows are provided by functors get and get k and shown in orange (where we have chosen one of the two equivalent ways of forming the Godement product -note two curve brown arrows). In this way we obtain yet another instantiation of the pattern in Fig. 2 denoted by k ; : where v @ q denotes v @ .get q . Thus, we built an ala-lens k ; , which satisfies equation Putget 0 by construction.
Theorem 1 (Sequential composition and lens laws). Given ala-lenses k : A → B and : B → C, let lens k ; : A → C be their sequential composition as defined above. Then the lens k ; is wb as soon as lenses k and are such.
Theorem 2 (Parallel composition and lens laws). Lens 1 || 2 is wb as soon as lenses 1 and 2 are such.

Symmetric monoidal structure over ala-lenses
Our goal is to organize ala-lenses into an sm-category. To make sequential composition of ala-lenses associative, we need to consider them up to some equivalence (indeed, Cartesian product is not strictly associative).
Definition 5 (Ala-lens Equivalence) Two parallel ala-lenses ,ˆ : A → B are called equivalent if their parameter spaces are isomorphic via a functor ι: P → P such that for any A ∈ A 0 , e: p → p ∈ P and v: (A.get p ) → T the following holds (for x∈{req, self}): Remark 3. It would be more categorical to require delta isomorphisms (i.e., commutative squares whose horizontal edges are isomorphisms) rather than equalities as above. However, model spaces appearing in Bx-practice are skeletal categories (and even stronger than skeletal in the sense that all isos, including iso loops, are identities), for which isos become equalities so that the generality would degenerate into equality anyway.
It is easy to see that operations of lens' sequential and parallel composition are compatible with lens' equivalence and hence are well-defined for equivalence classes. Below we identify lenses with their equivalence classes by default.
Theorem 3 (Ala-lenses form an sm-category). Operations of sequential and parallel composition of ala-lenses defined above give rise to an sm-category aLaLens aLaLens aLaLens, whose objects are model spaces (= categories) and arrows are (equivalence classes of ) ala-lenses. See [9, p.17 and Appendix A.4] for a proof.

Functoriality of learning in the delta lens setting
As example in Sect. 4.1 shows, the notion of update policy transcends individual lenses. Hence, its proper formalization needs considering the entire category of ala-lenses and functoriality of a suitable mapping.

Definition 6 (Bx-transformation language)
A compositional bidirectional model transformation language L bx is given by (i) an sm-category pGet pGet pGet(L bx ) whose objects are (L bx -)model spaces and arrows are (L bx -)transformations which is supplied with forgetful functor into pCat pCat pCat, and (ii) an sm-functor L L bx : pGet pGet pGet(L bx ) → aLaLens aLaLens aLaLens such that the lower triangle in the inset diagram commutes. (Forgetful functors in this diagram are named "−X" with X referring to the structure to be forgotten.) Example. A major compositionality result of Fong et al [17] states the existence of an sm-functor from the category of Euclidean spaces and parameterized differentiable functions (pd-functions) P ara P ara P ara into the category Learn Learn Learn of learning algorithms (learners) as shown by the inset commutative diagram. (The functor P ara P ara P ara L ε,err -Learn Learn Learn is itself parameterized by a step size 0 < ε ∈ R and an error function err: R×R → R needed to specify the gradient descent procedure.) However, learners are nothing but codiscrete ala-lenses (see Sect. A.2), and thus the inset diagram is a codiscrete specialization of the diagram in Def. 6 above. That is, the category of Euclidean spaces and pd-functions, and the gradient descent method for back propagation, give rise to a (codiscrete) compositional bx-transformation language (over pSet pSet pSet rather than pCat pCat pCat).
Finding a specifically Bx instance of Def. 6 (e.g., checking whether it holds for concrete languages and tools such as eMoflon [23] or groundTram [22]) is laborious and left for future work.   [17] by Fong, Spivak and Tuyéras is fundamental: they defined the notion of a codiscrete learning lens (called a learner), proved a fundamental results about sm-functoriality of the gradient descent approach to ML, and thus laid a foundation for the compositional approach to change propagation with learning. One follow-up of that work is paper [16] by Fong and Johnson, in which they build an sm-functor Learn Learn Learn → sLens sLens sLens which maps learners to so called symmetric lenses. That paper is probably the first one where the terms 'lens' and 'learner' are met, but the initial observation that a learner whose parameter set is a singleton is actually a lens is due to Jules Hedges, see [16]. There are conceptual and technical distinctions between [16] and the present paper. On the conceptual level, by encoding learners as symmetric lenses, they "hide" learning inside the lens framework and make it a technical rather than conceptual idea. In contrast, we consider parameterization and supervised learning as a fundamental idea and a first-class citizen for the lens framework, which grants creation of a new species of lenses. Moreover, while an ordinary lens is a way to invert a functor, a learning lens is a way to invert a parameterized functor so that learning lenses appear as an extension of the parameterization idea from functors to lenses. (This approach can probably be specified formally by treating parameterization as a suitably defined functorial construction.) Besides technical advantages (working with asymmetric lenses is simpler), our asymmetric model seems more adequate to the problem of learning functions rather than relations. On the technical level, the lens framework we develop in the paper is much more general than in [16]: we categorificated both the parameter space and model spaces, and we work with lenses with amendment (which allows us to relax the Putget law if needed).

Related work
As for the delta lens roots (the point (1,0) in the figure), delta lenses were motivated and formally defined in [12] (the asymmetric case) and [13] (the symmetric one). Categorical foundations for the delta lens theory were developed by Johnson and Rosebrugh in a series of papers (see [20] for references); this line is continued in Clarke's work [6]. The notion of a delta lens with amendments (in both asymmetric and symmetric variants) was defined in [11], and several composition results were proved. Another extensive body of work within the delta-based area is modelling and implementing model transformations with triple-graph grammars (TGG) [4,23]. TGG provide an implementation framework for delta lenses as is shown and discussed in [5,19,2], and thus inevitably consider change propagation on a much more concrete level than lenses. The author is not aware of any work considering functoriality of update policies developed within the TGG framework.
The present paper is probably the first one at the intersection (1,1) of the plane. The preliminary results have recently been reported at ACT'19 in Oxford to a representative lens community, and no references besides [17], [16] mentioned above were provided.

Conclusion
The perspective on Bx presented in the paper is an example of a fruitful interaction between two domains-ML and Bx. In order to be ported to Bx, the compositional approach to ML developed in [17] is to be categorificated as shown in Fig. 8 on p. 194. This opens a whole new program for Bx: checking that currently existing Bx languages and tools are compositional (and well-behaved) in the sense of Def. 6 p. 190. The wb compositionality is an important practical requirement as it allows for modular design and testing of bidirectional transformations. Surprisingly, but this important requirement has been missing from the agenda of the Bx community, e.g., the recent endeavour of developing an effective benchmark for Bx-tools [3] does not discuss it.
In a wider context, the main message of the paper is that the learning idea transcends its applications in ML: it is applicable and usable in many domains in which lenses are applicable such as model transformations, data migration, and open games [18]. Moreover, the categorificated learning may perhaps find useful applications in ML itself. In the current ML setting, the object to be learnt is a function f : R m → R n that, in the OO class modelling perspective, is a very simple structure: it can be seen as one object with a (huge) amount of attributes, or, perhaps, a predefined set of objects, which is not allowed to be changed during the search -only attribute values may be changed. In the delta lens view, such changes constitute a rather narrow class of updates and thus unjustifiably narrow the search space. Learning with the possibility to change dimensions m, n may be an appropriate option in several contexts. On the other hand, while categorification of model spaces extends the search space, categorification of the parameter space would narrow the search space as we are allowed to replace a parameter p by parameter p only if there is a suitable arrow e: p → p in category P. This narrowing may, perhaps, improve performance. All in all, the interaction between ML and Bx could be bidirectional! We prefer the former formulation as it corresponds to the notation f : A P -B visualizing P as a hidden state of the transformation, which seems adequate to the intuition of parameterized in our context. (If some technicalities may perhaps be easier to see with the product formulation, we will switch to the product view thus doing currying and uncurrying without special mentioning.) Sequential composition of of f : A P -B and g: B Q -C is f.g: A P×Q -C given by (f.g) pq def = f p .g q for objects, i.e., pairs p∈P, q∈Q, and by the Godement product of natural transformations for arrows in P×Q. That is, given a pair e: p → p in P and h: q → q in Q, we define the transformation (f.g) eh : f p .g q ⇒ f p .g q to be the Godement product f e * g h .

A Appendices
Any category A gives rise to a p-functor Id A : A 1 -A, whose parameter space is a singleton category 1 with the only object * , Id A ( * ) = id A and Id A (id * ) : id A ⇒ id A is the identity transformation. It's easy to see that p-functors Id _ are units of the sequential composition. To ensure associativity we need to consider p-functors up to an equivalence of their parameter spaces. Two parallel p-functors f : A P -B andf : AP -B, are equivalent if there is an isomorphism α: P →P such that two parallel functors f : P → [A, B] and α;f : P → [A, B] are naturally isomorphic; then we write f ≈ αf . It's easy to see that if f ≈ αf : A → B and g ≈ βĝ : B → C, then f ; g ≈ α×βf ;ĝ: A → C, i.e., sequential composition is stable under equivalence. Below we will identify p-functors and their equivalence classes. Using a natural isomorphism (P×Q)×R ∼ = P×(Q×R), strict associativity of the functor composition and strict associativity of the Godement product, we conclude that sequential composition of (equivalence classes of) p-functors is strictly associative. Hence, pCat pCat pCat is a category.
pCat pCat pCat pSet pSet pSet Our next goal is to supply it with a monoidal structure. We borrow the latter from the smcategory (Cat Cat Cat,×), whose tensor is given by the product. There is an identical on objects embedding (Cat Cat Cat,×) --pCat pCat pCat that maps a functor f : A → B to a p-functorf : A 1 -B whose parameter space is the singleton category 1. Moreover, as this embedding is a functor, the coherence equations for the associators and unitors that hold in (Cat Cat Cat,×) hold in pCat pCat pCat as well (this proof idea is borrowed from [17]). In this way, pCat pCat pCat becomes an sm-category. In a similar way, we define the sm-category pSet pSet pSet of small sets and parametrized functions between them -the codiscrete version of pCat pCat pCat. The diagram in Fig. 7 shows how these categories are related.
A.2 Ala-lenses as categorification of ML-learners Figure 8 shows a discrete two-dimensional plane with each axis having three points: a space is a singleton, a set, a category encoded by coordinates 0,1,2 resp. Each of the points x ij is then the location of a corresponding sm-category of  (asymmetric) learning (delta) lenses. Category {1 1 1 1 1 1 1 1 1} is a terminal category whose only arrow is the identity lens 1 1 1 1 1 1 1 1 1 = (id 1 , id 1 ): 1 → 1 propagating from a terminal category 1 to itself. Label * * * refers to the codiscrete specialization of the construct being labelled: L L L * * * means codiscrete learning (i.e., the parameter space P is a set considered as a codiscrete category) and aLens aLens aLens * * * refers to codiscrete model spaces. The category of learners defined in [17] is located at point (1,1), and the category of learning delta lenses with amendments defined in the present paper is located at (2,2). There are also two semi-categorificated species of learning lenses: categorical learners at point (1,2) and codiscretely learning delta lenses at (2,1), which are special cases of ala-lenses.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.