Appendix 1: Comparing the VIM and the MCM
This section will discuss the comparison of the two methods in two respects: symmetric and asymmetric.
Symmetric Algorithm
To compare the symmetric version methods, we plot the average values of the learner for various \(\nu \) values from 0 to 1 in Fig. 9. The average value is noted as approximately the steady state of the learner. Power functions will be used for the remainder of the discussion.
In Fig. 9, both methods produce similar shaped plots. The absolute error for each \(\nu \) between the methods are also plotted in Fig. 9; note that the absolute error is no more than 0.012 and that the relative error decreases as \(\nu \) increases.
Asymmetric Algorithm
We now introduce an asymmetric version of VIM and MCM, where the increments and decrements are different depending on whether the source’s input value matches the highest propensity, or preferred form, of the learner.
Let F be defined as above in the formulation of VIM. Let us define the preferred form of the learner as form m, that is \(x_m=\max _{i=1,2}\{x_i\}\). Suppose that \(x_m=x_1\), if the source emits form 1, then we have the following updates for the VIM version of the algorithm:
$$\begin{aligned} x_1\rightarrow \left\{ \begin{array}{ll} x_1+sF^+(x_1), &{}\quad \text{ if } \text{ form } \text{1 } \text{ is } \text{ received },\\ x_1-pF^-(x_1), &{}\quad \text{ if } \text{ form } \text{2 } \text{ is } \text{ received }, \end{array}\right. \end{aligned}$$
(21)
where \(F^+\) and \(F^-\) are some functions of one variable on [0, 1]. The rules above can be rewritten for the probability to utter form 2, \(x_2=1-x_1\):
$$\begin{aligned} x_2\rightarrow \left\{ \begin{array}{ll} x_2-sF^+(x_2), &{}\quad \text{ if } \text{ form } \text{1 } \text{ is } \text{ received },\\ x_2+pF^-(x_2), &{}\quad \text{ if } \text{ form } \text{2 } \text{ is } \text{ received }, \end{array}\right. \end{aligned}$$
(22)
where s and p are values between 0 and 1. The MCM asymmetric algorithm is just the translation of the VIM asymmetric algorithm into a Markov walk as described in Sect. 2.2 with the transition matrix, P, and function F.
For simplicity, we will refer to the asymmetric version of VIM and MCM as AVIM and AMCM, respectively. To compare AVIM and AMCM, we executed each algorithm with a fixed s and p value, given function \((x^{0.4})\) and calculated the average learner values as we did with the symmetric case. Figures 10 and 11 are the contour plots comparing both methods over various s and p values.
Observation: Differences in Initial Learner Values
An important observation here is that updates occur more frequently for certain values of s and p when the learner is reassured; otherwise, updates occur less frequently. Figure 10 is the plot of AVIM and AMCM where the learner and the source both prefers form 1. We can see that when \(s\gg p\), the learner will experience frequency boosting to nearly 1. This is due to the fact that in the algorithm, we use the “preferred” update when the source utters a form that coincides with the preferred form of the learner; the learner is reassured and takes a much bigger increment of update for its frequency of usage than it would otherwise.
Figure 11 is the plot of AVIM and AMCM when the learner and the source prefer different forms. Again, we can see that when \(s \gg p\), updates are performed when the learner is reassured that its preferred form is used by the source; otherwise, the frequencies are updated by a significantly smaller amount.
Observation: Varying the Value of L
The discretization parameter, L, has a role in the stochastic behavior of the updates and ultimately determines when both methods are the same. Note that VIM (and AVIM) algorithm is updating with \(\Delta x = F/L\); which is L times slower than MCM (and AMCM). Thus, the updates for VIM and AVIM are relatively small increments as opposed to the increments of MCM and AMCM. In Fig. 12a, the plots of AVIM have small increments for updates of \(x_1\) and will oscillate around 0.38. The values for \(X_1\), however, may oscillate around 0.4 in the early stages but will eventually reach 0.5 given some random probability which then means that the learner’s preferred form will be the same as the source and experience the observations in 1. This is also the cause of the discrepancy in the relative and absolute error seen in Fig. 11.
Figure 12b is AVIM and AMCM with a bigger L value. Since L was increased then \(X_1\) now updates with constant increments of a much smaller expectancy than before, and thus, the frequency stabilizes around the same value as what is exhibited in AVIM.
Figure 13 is a plot of the max relative and absolute error of the results for AVIM and AMCM when we vary L over all \(s,p\in [0,1]\). We see that as L increases, then we have that the relative error between AVIM and AMCM decreases. Thus, as \(L\rightarrow \infty \), the resulting learner values from the two algorithms coincide.
Appendix 2: Proof of Frequency Boosting Theorem
Proof of the Theorem for a Particular Example
Here, we set \(L=2\) and prove the theorem in this simple case. The general proof is provided in “The general case in Appendix 2.” We will use the convenient notation
$$\begin{aligned} \lambda =\frac{\nu }{1-\nu }. \end{aligned}$$
We want to show that \(\nu _\mathrm{learn} > \nu \) when \(\nu > \frac{1}{2}\). Thus, with the notation above, it suffices to show that \(\nu _\mathrm{learn}> \frac{\lambda }{1+\lambda }\), when \(\lambda >1\). That is
$$\begin{aligned} \frac{\sum _{i=0}^L i \lambda ^i C_L^i}{L\sum _{i=0}^L \lambda ^i C_L^i} > \frac{\lambda }{1+\lambda }. \end{aligned}$$
(23)
For \(L=2\), the LHS of (23) becomes:
$$\begin{aligned} \frac{\lambda \frac{F(1)}{F(1/2)}+2\lambda ^2}{2\left( 1+\lambda \frac{F(1)}{F(1/2)}+\lambda ^2\right) }. \end{aligned}$$
So we want to show that
$$\begin{aligned} \frac{\lambda \frac{F(1)}{F(1/2)}+2\lambda ^2}{2\left( 1+ \lambda \frac{F(1)}{F(1/2)}+ \lambda ^2\right) }>\frac{\lambda }{1+\lambda }. \end{aligned}$$
Which is true if and only if
$$\begin{aligned} \frac{ \frac{F(1)}{2F(1/2)}+\lambda }{\left( 1+ \lambda \frac{F(1)}{F(1/2)}+\lambda ^2\right) }>\frac{1}{1+\lambda }. \end{aligned}$$
Which is equivalent to
$$\begin{aligned} (1+\lambda )\left( \frac{F(1)}{2F(1/2)}+\lambda \right) > \left( 1+ \lambda \frac{F(2)}{F(1)}+\lambda ^2\right) . \end{aligned}$$
After some algebra, we can simplify the expression to
$$\begin{aligned} \lambda \left( 1-\frac{F(1)}{2F(1/2)}\right) > 1-\frac{F(1)}{2F(1/2)}. \end{aligned}$$
(24)
Note that \(\lambda >1\), so it suffices to show that
$$\begin{aligned} 1-\frac{F(1)}{2F(1/2)} >0. \end{aligned}$$
(25)
That is we want to show that
$$\begin{aligned} \frac{F(1/2)}{1/2} > \frac{F(1)}{1}. \end{aligned}$$
(26)
From the definition of a function that is concave down as was described in Sect. 3.1, we can conclude that we have (26) and thus have (23) is true when \(L=2\).
The General Case
Now to the proof of the theorem for a general integer L.
Proof
Expanding out (23) we have that the frequency boosting property now becomes
$$\begin{aligned} \frac{\lambda C_L^1+2\lambda ^2C_L^2+\dots +L \lambda ^L}{L\left( 1+\lambda C_L^1+\lambda ^2 C_L^2+\dots +\lambda ^L\right) } > \frac{\lambda }{1+\lambda }, \end{aligned}$$
(27)
which is equivalent to
$$\begin{aligned} (1+\lambda )\left( \frac{1}{L}C_L^1+\frac{2}{L}\lambda C_L^2+\dots +\lambda ^{L-1}\right) > 1+\lambda C_L^1+\lambda ^2 C_L^2+\dots +\lambda ^L. \end{aligned}$$
Simplifying we get,
$$\begin{aligned}&\frac{1}{L}C_L^1+\frac{2}{L}\lambda C_L^2+\dots + \lambda ^{L-1}+\frac{1}{L}\lambda C_L^1+\frac{2}{L}\lambda ^2 C_L^2+\dots + \lambda ^{L} \\&\quad > 1+\lambda C_L^1+\lambda ^2 C_L^2+\dots +\lambda ^L. \end{aligned}$$
Combining like terms, we get,
$$\begin{aligned} \sum _{k=1}^{L-1}\lambda ^k \left( \frac{k+1}{L}C_L^{k+1} -C_L^{k}\left( 1-\frac{k}{L}\right) \right) > 1-\frac{C_L^1}{L}. \end{aligned}$$
(28)
For notation, let
$$\begin{aligned} \alpha _{k+1}=\frac{k+1}{L}C_L^{k+1}-C_L^{k}\left( 1-\frac{k}{L}\right) =\frac{k+1}{L}C_L^{k+1}-\frac{L-k}{L}C_L^k. \end{aligned}$$
(29)
Note that \(1-\frac{C_L^1}{L}= \frac{(L-1)+1}{L}C_L^{(L-1)+1} -\frac{L-(L-1)}{L}C_L^{L-1}=\alpha _{(L-1)+1}=\alpha _L\). So then (28) becomes
$$\begin{aligned} \sum _{k=1}^{L-1}\lambda ^k \alpha _{k+1} > \alpha _{L}. \end{aligned}$$
Since \(\alpha _L=1-\frac{C_L^1}{L}\), then by similar reasoning as in the \(L=2\) case, with \(x=1\) and \(y=L\), in (26), we have that \(\frac{C_L^1}{L}<1\), that is \(\alpha _L>0\). Since \(\lambda >1\), and \(\alpha _L>0\), we have that \(\lambda ^{L-1} \alpha _L > \alpha _L\), thus to show (28), it suffices to show
$$\begin{aligned} \lambda \alpha _2+\lambda ^2 \alpha _3+\dots + \lambda ^{L-2}\alpha _{L-1}>0. \end{aligned}$$
(30)
With the notation of (9), we can make more simplifications to (30). Note that \(C_L^{L-k}=C_L^k\), thus we have that (29) becomes
$$\begin{aligned} \alpha _{k+1}=\frac{k+1}{L}C_L^{L-(k+1)} - \frac{L-k}{L}C_L^{L-k}, \end{aligned}$$
(31)
which can also be written as
$$\begin{aligned} -\alpha _{k+1}=\frac{L-k}{L}C_L^{L-k}-\frac{L-(L-k-1)}{L}C_L^{L-k-1}. \end{aligned}$$
(32)
From (29), by replacing the \(k+1\) with \(L-k\), we have that
$$\begin{aligned} \alpha _{L-k}=\frac{L-k}{L}C_L^{L-k}-\frac{L-(L-k-1)}{L}C_L^{L-k-1}. \end{aligned}$$
(33)
Thus
$$\begin{aligned} \alpha _{L-k}=-\alpha _{k+1}. \end{aligned}$$
(34)
Therefore, (30) simplifies into two cases, when L is even or odd.
Suppose first that \(L=2j\) for some integer j. Then using (34), we have that (30) simplifies to showing that
$$\begin{aligned} \alpha _2(\lambda -\lambda ^{2j-2})+ \alpha _3(\lambda ^2-\lambda ^{2j-3})+\dots + \alpha _j(\lambda ^{j-1}-\lambda ^{2j-j})>0. \end{aligned}$$
Which is equivalent to showing
$$\begin{aligned} \alpha _2\lambda (1-\lambda ^{2j-3})+ \alpha _3\lambda ^2(1-\lambda ^{2j-5})+\dots + \alpha _j\lambda ^{j-1}(1-\lambda )>0. \end{aligned}$$
(35)
Similarly when \(L=2j+1\) for some integer j. Then (30) simplifies to showing that
$$\begin{aligned} \alpha _2(\lambda -\lambda ^{2j-1})+ \alpha _3(\lambda ^2-\lambda ^{2j-2})+\dots + \alpha _j\left( \lambda ^{j-1}-\lambda ^{2j-(j-1)}\right) +\alpha _{j+1}\lambda ^j>0. \end{aligned}$$
Note that since \(L=2j+1\), we have that \(\alpha _{L-j}=\alpha _{j+1}\) and that \(\alpha _{L-j}=-\alpha _{j+1}\) from (34), thus \(\alpha _{j+1}=0\). Thus for the odd case, (30) simplifies to showing that
$$\begin{aligned} \alpha _2\lambda (1-\lambda ^{2j-2})+\alpha _3\lambda ^2(1-\lambda ^{2j-4})+ \dots + \alpha _j \lambda ^{j-1} (1-\lambda ^2) >0. \end{aligned}$$
(36)
Thus for either L is even or odd, from (35) and (36), it suffices to show that
$$\begin{aligned} \alpha _i<0, \end{aligned}$$
for \(i=2,3,\dots , j\), since \(1-\lambda ^m <0\) for any m, because \(1<\lambda \).
From (9), we can also write
$$\begin{aligned} C_L^i=C_L^{i-1}\frac{F \left( \frac{L-i+1}{L}\right) }{F\left( \frac{i}{L}\right) }. \end{aligned}$$
Thus, after simplification we have that (29) becomes
$$\begin{aligned} \alpha _i=C_L^{i-1} \left( \frac{iF \left( \frac{L-i+1}{L}\right) }{LF\left( \frac{i}{L}\right) }-\frac{L-(i-1)}{L}\right) . \end{aligned}$$
Because \(F(x)>0\) for all \(x>0\), then \(C_L^{i-1}>0\), then it suffices to show that
$$\begin{aligned} \frac{iF\left( \frac{L-i+1}{L}\right) }{LF \left( \frac{i}{L}\right) } -\frac{L-(i-1)}{L}<0. \end{aligned}$$
That is, it suffices to show that
$$\begin{aligned} \frac{F\left( \frac{L-i+1}{L}\right) }{\frac{L-i+1}{L}}< \frac{F\left( \frac{i}{L}\right) }{\frac{i}{L}}, \end{aligned}$$
(37)
for \(i=2,3,\dots ,j\), where \(L=2j\) or \(L=2j+1\).
Using the same argument made in the \(L=2\) case, we have (37) and therefore can conclude that \(\alpha _i <0\), for \(i=2,3,\dots , j\) where \(L=2j\) or \(L=2j+1\), and therefore can conclude (35, 36) and thus have (30). Hence, (23) is true for any integer L and for a function F that satisfies the properties listed in Sect. 3.1. \(\square \)