A scoring rule and global inaccuracy measure for contingent varying importance

Levinstein recently presented a challenge to accuracy-first epistemology. He claims that there is no strictly proper, truth-directed, additive, and differentiable scoring rule that recognises the contingency of varying importance, i.e., the fact that an agent might value the inaccuracy of her credences differently at different possible worlds. In my response, I will argue that accuracy-first epistemology can capture the contingency of varying importance while maintaining its commitment to additivity, propriety, truth-directedness, and differentiability. I will construct a scoring rule — a weighted scoring rule — and a global inaccuracy measure that has all four required properties and recognises the contingency of varying importance. I will show that Levinstein’s and my results coexist without contradicting each other


Introduction
In a recent paper "An objection of varying importance to epistemic utility theory" published in Philosophical Studies, Levinstein presented the following challenge to accuracy-first epistemology (see Theorem A.4 in Levinstein, 2019Levinstein, , p. 2929)).He claims that no scoring rule with four properties (additivity, propriety, truth-directedness, and differentiability) usually assumed by accuracy-firsters (see Levinstein, 2019, p. 2928and, for example, compare to Joyce, 2009or Pettigrew, 2016) recognises the contingency of varying importance, i.e., the fact that an agent might value the inaccuracy of her credences differently at different possible worlds.In my response, I will argue that accuracy-first epistemology can accommodate the contingency of varying importance while maintaining its commitment to additivity, propriety, truth-directedness, and differentiability.I will construct an inaccuracy measure of individual credences, s X * , and an inaccuracy measure of entire credence functions, I s * , that has all four required properties and recognises the contingency of varying importance.I will also show that Levinstein's and my results coexist without contradiction and explain why it is so.
One of the main goals of accuracy-first epistemology is to quantify epistemic value of one's credences (degrees of belief).The idea behind such quantification is alethic (Levinstein 2019(Levinstein , p. 2921)), i.e., the higher one's credences in truths and the lower one's credences in falsehoods, the more epistemic value those credences have.In this context, one often speaks of the accuracy of one's credences and assumes (which I will follow in this paper) that accuracy is what is ultimately of epistemic value (e.g., see Levinstein, 2019, p. 2920 or Pettigrew's veritism in Pettigrew, 2016, p. 8).However, from the formal point of view, it is often more convenient to quantify epistemic disvalue (i.e., inaccuracy) rather than epistemic value (i.e., accuracy).In this paper, I will quantify inaccuracy and work with measures of inaccuracy such as scoring rules; this brings no significant change since the accuracy of a credence is the negative of its inaccuracy (see Pettigrew, 2019, p. 141) or is similarly related. 1 So, I will say that the lower the credences in truths and the higher the credences in falsehoods, the more inaccurate credences one has.I will assume that one aims to minimise (expected) inaccuracy of one's credences since an inaccuracy score is a form of penalty.
One immediate question is what properties the measures of inaccuracy should have, and a natural follow-up question is whether there exists a measure of inaccuracy that has all the desired properties.Levinstein's Theorem A.4 lists four properties (additivity, propriety, truth-directedness, and differentiability) that accuracyfirsters usually want measures of inaccuracy to have, but it also adds a fifth property to the list, recognising the contingency of varying importance or contingent importance, for short.This last property, roughly speaking, means that the degree to which the inaccuracy of one's credence in a proposition matters can differ from one proposition to another and from one possible world to another (see Levinstein, 2019Levinstein, , pp. 2925Levinstein, -2927)).To me, this sounds like a desirable additional property.Theorem A.4 (Levinstein, 2019(Levinstein, , p. 2929), 2 however, claims that no scoring rule with the four aforementioned properties recognises the contingency of varying importance.
1 When Levinstein discusses accuracy-first epistemology, he often uses an epistemic utility measure U (see Levinstein, 2019Levinstein, , pp. 2922Levinstein, -2924)).Since, by assumption, an agent wants to maximise (expected) utility, Levinstein's discussion is often presented in the context of maximising one's (expected) epistemic utility.Levinstein, however, expresses U in terms of a measure of inaccuracy I , i.e., U = 1 − I (see Levinstein, 2019, p. 2924or Levinstein, 2019, p. 2921, for a construction of a specific utility function, the Brier utility).Moreover, in the formulation and the proof of Theorem A.4, he uses I instead of U. Since the central part of my paper works with Theorem A.4 and its proof, this provides additional motivation to frame my discussion in terms of minimising expected inaccuracy rather than maximising expected epistemic utility. 2 Names in brackets next to results, e.g., theorems, indicate those results' original author/authors.Sometimes, I will make minor changes, e.g., notational, to such results.

3
A scoring rule and global inaccuracy measure for contingent…

Inaccuracy measures
Assume, for now, that one wants to quantify the epistemic disvalue of an agent's credence in a single proposition X. Accuracy-firsters use functions called scoring rules to measure the inaccuracy of one's credence in a single proposition at a possible world .A scoring rule s takes the omniscient credence in X at together with one's credence c(X) and returns an inaccuracy score s(v (X), c(X)) of having credence c(X) in X at that , see (Pettigrew, 2016, p. 36). 4 Formally, one can define a scoring rule as follows (see Definition I.B.3 in Pettigrew, 2016, p. 86).
Note that a particular credence in a proposition gets a score at a world that depends only on the value of that credence in that proposition and the truth-value of that proposition (i.e., its vindicated credence) at that world.An example of a scoring rule is the squared Euclidean distance: s(v (X), c(X)) = (v (X) − c(X)) 2 , e.g., see Pettigrew (2016, pp. 4-5).Sometimes, the requirement that s(0, 0) = s(1, 1) = 0 is not a part of the definition of a scoring rule, but I will include it since accuracyfirsters require it anyway.If the context is clear, I will drop the argument of a scoring rule and write only s instead of s(v (X), c(X)).
In Footnote 4 (Levinstein 2019(Levinstein , p. 2924), Levinstein introduces a superscript X, which allows one to use different scoring rules to evaluate credences in different propositions.Following Levinstein, I will use the same superscript to indicate that a given scoring rule is used to score one's credences in a given proposition.For example, I will write s X (v (X), c(X)) to indicate that s is used to score one's credences in X.But I will assume that one uses the same s to evaluate any of one's credences in a single proposition at any possible world.So, for one's credence c(X) and any two possible worlds 1 , 2 ∈ Ω , s X (v 1 (X), c(X)) and s X (v 2 (X), c(X)) use the same s.One usually has credences in more than a single proposition.If one has credences in multiple propositions, one can use scoring rules to measure the inaccuracy score of each of one's individual credences at and combine those scores into an overall (or global) score of one's credences at that .Accuracy-firsters (e.g., see Pettigrew, 2016, p. 36) call such measures of overall inaccuracy scores "(global) inaccuracy measures".I will use I to denote such global measures.So, I(c, ) gives an overall inaccuracy score of a credence function c (representing all the credences one has) at a world .It will be helpful to use a subscript with I to indicate what type of scor- ing rule one uses to construct that global inaccuracy measure.For example, I will write I s (c, ) to indicate that one uses a scoring rule -possibly different scoring rules for different propositions -satisfying Definition 1 without further qualification.I will sometimes omit the predicate "global" when talking about global inaccuracy measures, and, if the context is clear, I will drop the argument and write only I s instead of I s (c, ).

Four properties
I will use I s and s X for now, but the following definitions also hold for inaccuracy measures and scoring rules introduced later.One method of constructing a global inaccuracy measure is by adding the inaccuracy scores of one's individual credences at , which gives an additive inaccuracy measure (compare to Levinstein, 2019, p.  2924). 5  Definition 2 (Additivity) Let (Ω, F) , a credence function c, a world ∈ Ω , and Defi- nition 1 be given.A global measure of inaccuracy I s (c, ) is additive just in case Truth-directedness captures the alethic idea of accuracy-first epistemology that as one's credence c(X) changes and gets closer to the ideal credence in X at , the inaccuracy score of c(X) will get better, i.e, smaller (e.g., see Levinstein, 2017, p. 617).A measure of inaccuracy is truth-directed if it always assigns lower inaccuracy to one credence function than to another when each credence assigned by the first is closer to the ideal credence than the corresponding credence assigned by the second (see Pettigrew, 2016, p. 62); compare Definition 3 to Levinstein (2019Levinstein ( , p. 2922)), Joyce (2009, p. 269), or Pettigrew (2016, p. 40).
Definition 3 (Truth-Directedness) Let (Ω, F) , credence functions c and ĉ , a world ∈ Ω , and a global measure of inaccuracy I s be given.
In Definition 3, condition (i) says that, at , c always assigns credences at least as close to the ideal credence v (X) as ĉ .That is, ĉ(X) ≤ c(X) ≤ 1 for all X true at and 0 ≤ c(X) ≤ ĉ(X) for all X false at .Condition (ii) says that, at , c assigns a strictly higher credence than ĉ to at least one truth or a strictly lower credence than ĉ to at least one falsehood.That is, ĉ(X) < c(X) ≤ 1 for some X true at or 0 ≤ c(X) < ĉ(X) for some X false at .If conditions (i) and (ii) hold, then c gets a strictly lower inaccuracy score than ĉ at that according to any legitimate measure of inaccuracy I s .

3
A scoring rule and global inaccuracy measure for contingent… Differentiability is a technical requirement to which Levinstein pays little attention (see Levinstein, 2019Levinstein, , p. 2924)).Differentiability means that if one differentiates s X (v (X), c(X)) or I s (c, ) with respect to c(X) -where X ∈ F is in the domain of c -that derivative will exist; notice that X in c(X) is fixed while differentiating.Differentiability implies continuity (e.g., see Pug, 2015, p. 149), so if s X (v (X), c(X)) and I s (c, ) are differentiable, then they are continuous functions of c(X) and c, respectively, for all worlds .Roughly speaking, continuity means that there are no "jumps" in inaccuracy as credences change, so small changes in one's credence will give rise to small changes in inaccuracy (see Pettigrew, 2016, p. 52).I mention this relation because accuracy-firsters often require continuity of scoring rules/inaccuracy measures (see Pettigrew, 2016, pp. 51-57 for a more detailed discussion).
Finally, strict propriety says that every probability function expects itself to be least inaccurate.For a scoring rule, it is defined as follows (see Pettigrew, 2016, p.66).
Definition 4 (Strictly Proper Scoring Rule) Let (Ω, F) , X ∈ F , a credence func- tion c, and Definition 1 be given.A scoring rule s X is strictly proper only if, for all 0 ≤ p ≤ 1 , it holds that: is uniquely minimised as a function of c(X) at c(X) = p.
Strict propriety can be generalised for global inaccuracy measures if one has a notion of expected inaccuracy for a whole credence function c, which amounts to weighting the inaccuracy of c at each world by probabilistic weights (possibly interpreted as one's credences that a given is the actual world).Let E P I s (c) stand for the expected inaccuracy of a credence function c with respect to an inaccuracy measure I s and weights given by a probability function P. Strict propriety of global inaccuracy measures means that every probability function P assigns itself the lowest expected inaccuracy.Definition 5 corresponds to Levinstein's definition of propriety in Levinstein (2019Levinstein ( , p. 2923); Levinstein considers c ∈ P because a stronger dominance condition holds for c ∉ P (see Theorem 1 in Predd, 2009, p. 4788 or  Pettigrew, 2016, p. 65), but the inequality in Definition 5 holds for any credence function c ≠ P.
Definition 5 (Strictly Proper Global Measure of Inaccuracy) Let (Ω, F) , P , and an inaccuracy measure I s be given.I s is strictly proper just in case, for any distinct cre- dence functions c and P ∈ P , it holds that E P I s (P) < E P I s (c).
Notice that, in Definition 5, the inequality E P I s (P) < E P I s (c) compares only the overall expected inaccuracy scores of functions P and c.Beside propriety (what I call strict propriety), Levinstein defines strong propriety (see Definition A.1 in Levinstein, 2019Levinstein, , p. 2929)).Strong propriety makes the overall comparison expressed by the inequality E P I s (P) < E P I s (c) but, in addition, looks at the expected inaccu- racy of single credences.An inaccuracy measure is strongly proper if it expects each (1) of the credences (not only the whole credence function) to be least inaccurate (Levinstein 2019(Levinstein , p. 2929)).That is, if one's optimal credence in X is the probability of X, i.e., P(X).To notationally differentiate the global (concerning the whole credence function) perspective from the local (concerning credence in a single proposition) perspective, let E P s X (c(X)) = ∑ ∈Ω P( )s X (v (X), c(X)) be the expected inaccuracy of one's credence c(X) in a single proposition X with respect to a probability function P and a scoring rule s X .6Definition 6 (Strong Propriety) Let (Ω, F) , P , a credence function c, a world ∈ Ω , and Definition 1 be given.An additive global inaccuracy measure ) is strongly proper just in case E P I s (c) is uniquely minimised at P(X) = c(X) for all X ∈ F and all probability functions P ∈ P .So, given s X , it holds that E P s X (P(X)) < E P s X (c(X)) for all X ∈ F , all probability func- tions P ∈ P , and all credence functions c such that P(X) ≠ c(X).

The contingency of varying importance
The contingency of varying importance is based on two claims.First, varying importance says that the degree to which the inaccuracy of one's credence in a proposition matters can differ from one proposition to another.In other words, one can value the inaccuracy of one's credence in an important proposition more than in an unimportant one.For example, having low inaccuracy in propositions concerning fundamental laws of nature is better than having low inaccuracy in the claim that one wore wool socks on January 8th, 2004 (Levinstein (2019(Levinstein ( , p. 2925)).Secondly, varying importance is contingent, according to Levinstein.That is, the degree to which the inaccuracy of one's credence in a proposition matters in one world might differ from the degree to which it matters in another world (Levinstein 2019(Levinstein , p. 2926)).
Levinstein differentiates two ways in which varying importance is contingent.First, the level of importance differs at worlds where a given proposition is true from worlds where it is false (Levinstein 2019(Levinstein , p. 2926)).Levinstein believes that accuracy-firsters can accommodate at least some of these cases; he refers to an approach from Merkle and Steyvers (2013) as a possible but complicated solution (see Levinstein, 2019Levinstein, , p. 2926)).In this paper, I do not discuss these first-type cases since they are not the focus of Theorem A.4.
I will exclusively focus on the following second way of recognising the contingency of varying importance: the levels of importance differ at worlds where a given proposition has the same truth-value.Levinstein interprets these second-type cases as situations when the importance of one proposition depends on the truth-value of another proposition (Levinstein 2019(Levinstein , p. 2926)).Consider one of his examples: "Bill is obsessed with popular music, but he's also terribly elitist.He wants to know everything about the lives of the singers who are actually the most talented musicians.As it turns out, in one world, Beyoncé meets the cut, but in a very distant one, 1 3 A scoring rule and global inaccuracy measure for contingent… she doesn't.In only some worlds, Bill places high importance on knowing where Beyoncé was born, the name of her high school, and the sales figures of her first album" (Levinstein 2019(Levinstein , p. 2927)).For example, the inaccuracy of Bill's credence about where Beyoncé was born is highly important in the worlds where Beyoncé makes the cut and has low importance in the worlds where she does not make it.So, how much the inaccuracy of that credence matters changes from one possible world to another depending on the truth-value of another proposition, i.e., whether Beyoncé makes the cut.To formally capture these levels of importance, let me introduce a weight function (X, ) that expresses how epistemically valuable it is not to be inaccurate (i.e., to be accurate) about X at and restrict its range to ℝ >0 .7 Definition 7 (Weight Function for Varying Importance) Given a space (Ω, F) , let (X, )∶=F × Ω → (0, ∞) be a weight function for some X ∈ F and ∈ Ω.
The value of (X, ) from Definition 7 will be high in worlds where it is impor- tant to be accurate (i.e., not to be inaccurate) about X and low in worlds where it is not that important.The range of a weight function may differ.For example, in Holzmann & Klar (2017, pp. 2409-2410), weights are restricted to [0, 1].But one can be less restrictive (e.g., see Theorem 1 in Ranjan & Gneiting, 2011, p. 413).I will assume that (X, ) takes values strictly between 0 and ∞ .So, one always cares about the inaccuracy of one's credence in X at any ∈ Ω at least a little bit, and the inaccuracy of one's credence in no proposition is infinitely important at any ∈ Ω .One can define a weighted scoring rule by joining a weight function with a scoring rule from Definition 1, e.g., compare to Ranjan & Gneiting (2011, p. 413) or Pelenis (2014, p. 9) (but I assume a strictly proper scoring rule s X ).
Definition 8 (Weighted Scoring Rule) Let (Ω, F) , X ∈ F , and s X sat- isfying Definition 1 be given.A weighted scoring rule is a function Definition 8 is a general definition, but, for the purpose of this paper, I will restrict Definition 8 to the following interpretation.Let {0, 1} be ideal credences, the values from [0, 1] one's credences, and weights are values of function (X, ) from Definition 7. Also, I will assume that if (X, ) >  � (X, ) and one's cre In this paper, weights will scale the score by multiplying it, i.e., s X (v (X), c(X), (X, )) = (X, )s X (v (X), c(X)) .One can then define what it means to recognise the contingency of varying importance (compare Definition 9 to Definition A.3 in Levinstein, 2019Levinstein, , p. 2929).I will write I s to indicate that one uses weighted scoring rules s X in a global inaccuracy measure.
Definition 9 (Recognising Contingent Importance) Assume that (Ω, F) , a credence function c, and a weight function (X, ) from Definition 7 is given.Let then ) be an additive inaccuracy measure.I s (c, w) recognises contingent importance just in case there exist worlds 1 and 2 from Ω and a proposition X from For example, let X stand for the proposition that Beyoncé was born in Houston, Texas.Let c(X) be Bill's credence in X and 1 , 2 ∈ Ω be two possible worlds.If X is true at both . But suppose that Beyoncé makes the cut only at 1 and not at 2 .It means that, to Bill, the inaccuracy of his credence in X matters more at 3 The proof and idea behind Theorem A.4 It will be useful to briefly discuss Levinstein's proof of Theorem A.4 to understand the idea behind it.Following Levinstein (see Levinstein, 2019Levinstein, , p. 2929)), let me formulate a reductio assumption and then use it to prove that I s is not a strictly proper inaccuracy measure.8 Assumption 1 (Reductio Assumption for I s ) Suppose I s is additive, strictly proper, truth-directed, differentiable, and recognises contingent importance.
First let me note that Levinstein makes two assumptions about s X in s X that are important for understanding his proof of Theorem A.4. First, s X is differentiable, see Levinstein, 2019Levinstein, , p. 2930).Secondly, s X is strictly proper (for example, he considers the strictly proper Brier score, see Levinstein, 2019Levinstein, , p. 2927).9So, let me make the same assumptions about s X in s X .
Assumption 2 (Assumptions about s X in s X ) Given Definition 8, a scoring rule s is strictly proper and differentiable with respect to its second argument, c(X).

3
A scoring rule and global inaccuracy measure for contingent… By Lemma A.1 in Schervish et al. (1989Schervish et al. ( , p. 1874) (here stated as Lemma 1), strictly proper scoring rules are truth-directed.By Assumption 2, s X is truth-directed, i.e., s X (1, c(X)) is strictly decreasing in c(X) and s X (0, c(X)) is strictly increasing in c(X).
Let me now discuss Levinstein's proof of Theorem A.4.Following Levinstein (see Levinstein, 2019Levinstein, , p. 2930 for the same step), let me fix a probability function Pr defined on some (Ω, F) such that 0 < Pr(X) < 1 for X ∈ F and use Pr to define another probability function, Pr ′ .
Notice that if X is true (or false) at both 1 and 2 , then Pr(X) = Pr � (X) (and Pr � (¬X) = Pr(¬X) ) since for a finite number n of ∈ Ω , one has that: If, following Definition 9, weighting strictly proper s X by non-constant weights (X, ) preserves strict propriety, then the expected inaccuracy of c(X) with respect to s X and Pr ′ , i.e., E Pr � s X (c(X)) , is minimised at c(X) = Pr � (X) .Similarly, E Pr s X (c(X)) is then minimised at c(X) = Pr(X) .So, both expectations are minimised at the same point since Pr(X) = Pr � (X) .But Levinstein's Theorem A.4 (see Theorem 1 below) shows that E Pr � s X (c(X)) and E Pr s X (c(X)) are not minimised at the same point if one assumes the contingency of varying importance.Thus, I s is not strictly proper, which contradicts Assumption 1.
Theorem 1 (Levinstein) Let I s from Assumption 1 and Assumption 2 be given.If the contingency of varying importance holds, then E Pr � s X (c(X)) and E Pr s X (c(X)) are not minimised at the same point.Thus, s X and I s is not strictly proper.

Minimisers for weighted scoring rules
By Theorem 1, E Pr s X (c(X)) and E Pr � s X (c(X)) have different minimisers if the con- tingency of varying importance holds.It will be useful for the construction of s X * to know what those minimisers are.But, instead of finding minimisers for E Pr s X (c(X)) and E Pr � s X (c(X)) separately, let me find one for a general expectation E P s X (c(X)) , so I can overlook the use of any specific weight function (Definition 7 still holds) or a probability function.Theorem 2 follows from restricting Gneiting and Ranjan's Theorem 1 in Ranjan & Gneiting (2011, pp. 413-414) to finite cases.The proof strategy is the same for both the continuous and discrete cases.Take weighted scores and find the normalising constant (I call it ).Since is a constant, it can be taken out or placed inside the sums and integrals with no issue and used to find the result.
The strict propriety of s X in s X (and of s ¬X in s ¬X ) means that c * (X) (and c * (¬X) ) are unique minimisers, where, by Theorem 2, c * (¬X) = . In what follows, the existence and the uniqueness of minimisers c * (X) and c * (¬X) from Theorem 2 is an important corollary of Assumption 2 and Definition 7. But, in general, one must be careful because uniqueness might not hold, or a minimiser might not exist, for example, when (X, ) is always 0; see (Brehmer & Gneiting, 2020) for further dis- cussion about the uniqueness and existence of minimisers.Since c * (X) and c * (¬X) represent one's optimal credences, one might want to know under what conditions they are probabilistic.The numeric bound and the probability of the entire space and the empty set follow directly.
Additivity of c * (X) and c * (¬X) does not come so easily and requires, for example, the following additional assumption.
Assumption 3 says that, at any ∈ Ω , an agent values the inaccuracy of her cre- dence in any X ∈ F to the same degree as the inaccuracy of her credence in ¬X .For example, assume that X says that it is raining.According to Assumption 3, if it is raining at , one values the closeness of her credence in X to the ideal credence of 1 to the same degree as the closeness of her credence in ¬X to the ideal credence of 0. Notice that Assumption 3 considers weights of importance for two propositions (X and ¬X ) and requires that those weights are equal at the same , which I indicated 1 3 A scoring rule and global inaccuracy measure for contingent… by the subscript i.It does not require the weights of importance for X and ¬X to be the same across different possible worlds.So, Assumption 3 does not conflict with the contingency of varying importance from Definition 9, which considers weights for a single proposition and requires that those weights are different at different possible worlds.

Scoring rule s X *
I will now use the minimiser c * (X) from Theorem 2 to define a function s X * and prove that s X * is a strictly proper, truth-directed, and differentiable weighted scoring rule. 10 Definition 11 Let (Ω, F) , X ∈ F , P ∈ P , s X satisfying Definition 1, (X, ) satisfy- ing Definition 7, and a credence function c be given.Let c * (X) be a unique mini- miser with respect to X, P, and a weighted scoring rule s X satisfying Definition 8.A function s X * for one's credence c(X) is then defined as follows: and where = c(X) − P(X) and By Definition 11, the score assigned by s X * to c(X) is the score that s X assigns to a value of c * (X) + k( ) that is then weighted by (X, ) .Notice that one can rewrite c * (X) + k( ) as k( )c(X) + [c * (X) − P(X)k( )] , which is a linear function of c(X).So, formally, (0, c * (X) + k( )) and (1, c * (X) + k( )) are results of an aff- ine transformation of (v (X), c(X)) , see Proposition 1 for details.The product k( ) determines the degree of punishment one receives for not aligning one's credence with a probability function P. That is, the more c(X) diverges from P(X), the bigger inaccuracy score s X * assigns to c(X).So, s X * is always defined with respect to some P ∈ P.
10 Function s X * in Definition 11 is not a random choice.It is a modification of a scoring rule defined by Gneiting and Brehmer in their Theorem 1 (see Brehmer & Gneiting, 2020, p. 660).Their general approach tells us how to properise scoring rules (including the weighted scoring rules), so truth-telling becomes an optimal strategy (Brehmer and Gneiting (2020, p. 660).One needs to modify Gneiting and Brehmer's properisation strategy to be applicable to accuracy-first epistemology, which is my goal with Definition 11.My modification is not a unique option, but it works.
Lemma 7 Given Assumption 3, P ∈ P , and c(¬X) − P(¬X) = � , then A scoring rule and global inaccuracy measure for contingent… Finally, assume that a weighted scoring rule s X (e.g., the one from Theorem 1) is strictly proper, i.e., let the value of (X, ) be constant for the given X at any .Then, s X * assigns the same inaccuracy score to one's credence c(X) as that strictly proper s X (see Brehmer & Gneiting, 2020, p. 660 for a more general discussion).
Lemma 8 (Gneiting and Brehmer) If s X is strictly proper, then s X * and s X assign the same score to one's credence c(X).Let me start with additivity which follows directly from the assumption that one can add weighted scores (e.g., see Levinstein, 2019Levinstein, , p. 2929) and the fact that s X * is a weighted scoring rule.Differentiability also follows easily.By Proposition 1, s X * is differentiable with respect to c(X) and summing differentiable functions preserves differentiability, so an additive I s * is differentiable.Since differentiability implies continuity, I s * (c, ) is a continuous function of c for all worlds .Let me now move to strict/strong propriety and truth-directedness.
Lemma 9 (Strict and Strong Propriety) I s * is strongly and strictly proper, thus truth-directed.
Finally, consider the contingency of varying importance.I will show that, E Pr s X * (c(X)) and E Pr � s X * (c(X)) have a common minimiser while recognising the con- tingency of varying importance from Definition 9.
Lemma 10 (Contingency of Varying Importance) Given I s * , E Pr s X * (c(X)) and E Pr � s X * (c(X)) are uniquely minimised at the same point.
So, the contingency of varying importance does not clash with strict propriety (or any other property listed in Proposition 2).Therefore, Proposition 2 holds without contradiction.

Example and discussion
Let me show how some of the abstract results work on a concrete example.I will consider the expected total inaccuracy discussed by Kierland and Monton in Kierland and Monton (2005).Kierland and Monton introduced expected total inaccuracy as a possible expected-inaccuracy-minimising approach to solving the Sleeping Beauty problem.The Sleeping Beauty problem has many variations, but following the major part of Kierland and Monton (2005), I will consider its basic version that goes as follows11 : "On Sunday Sleeping Beauty is put to sleep, and she knows that on Monday researchers will wake her up, and then put her to sleep with a memory-erasing drug that causes her to forget that waking-up.She also knows that the researchers will then flip a fair coin; if the result is Heads, they will allow her to continue to sleep, and if the result is Tails, they will wake her up again on Tuesday.Thus, when she is awakened, she will not know whether it is Monday or Tuesday.On Sunday, she assigns probability 1 2 to the proposition H that the coin lands Heads.What probability should she assign to H on Monday, when she wakes up?" (Kierland & Monton (2005, p. 389).Note that there are three moments at which Beauty can be awake: 1.) Monday and Heads, 2.) Monday and Tails, and 3.) Tuesday and Tails.In other words, there is one awakening when Heads (i.e., H is true) and two awakenings when Tails (i.e., H is false).
To find the expected total inaccuracy S ET (H) of Beauty's credence c(H) in H, first, find the inaccuracy score of c(H) for each awakening.Kierland and Monton use the Brier score (see Kierland & Monton, 2005, p. 385) to find the inaccuracy score of c(H) for each awakening, and I will do the same.Then, weight those scores by the probability of reaching the given awakening and sum those weighted scores, which gives the following formula to minimise (see Kierland & Monton, 2005, p. 389): where 1 2 is the probability of Heads/Tails since the coin is fair by assumption.Note that the number of awakenings where H is true/false serves as a weight.Since there is one awakening where H is true, (1 − c(H)) 2 is weighted by 1, and (0 − c(H)) 2 is weighted by 2 since H is false for two awakenings.Since the Brier score is strictly proper and S ET (H) uses unequal positive weights, by Levinstein's Theorem A.4, the weighted Brier score used in S ET (H) is not strictly proper.One can confirm this by showing that S ET (H) is minimised at c(H) = 1 3 instead of c(H) = 1 2 (see Kierland & Monton, 2005, p. 389).But if the weights were equal/constant, then S ET (H) would be minimised at c(H) = 1 2 (see footnote 9 or Ranjan & Gneiting, 2011, p. 413 for details). (2) 1 3 A scoring rule and global inaccuracy measure for contingent… (H, ¬H ) = 2 (i.e., one counts centres within the given uncentred world).We know that S ET (H) is minimised at c(H) = 1 3 , so c * (H) = 1 3 .For the sake of notational simplicity, let Beauty's credence in Heads c(H) = x , so = x − P(H) .By Definition 11, if P(H) = 1 2 (i.e., the coin is fair), then the values of k( ) in our example are as follows: If s H * is strictly proper, then S * ET (H) will be minimised at c(H) = 1 2 for a fair coin.So, my goal is to show that for each of the three values of k( ) , S * ET (H) is minimised at c(H) = 1 2 .I will go case by case, but to avoid repetition, let me make a general observation for x = c(H) and = x − P(H) = x − 1 2 : I will now crunch the numbers for the three cases (i.e.,  > 0 ,  < 0 , and = 0 ).For  > 0 , one has: One can now easily verify that S * ET (H) is minimised at c(H) = x = 1 2 .For  < 0 , one has: One can easily verify that S * ET (H) is again minimised at c(H) = x = 1 2 .Finally, for = 0 , one has: For a variable z ∈ ℝ , we know that 1 2 (1 − z) 2 + (0 − z) 2 is minimised at z = 1 3 (one can also check Eq. ( 2)), so 1 2 (1 − 1 3 ) 2 + (0 − 1 3 ) 2 gives the minimum.This is achieved for = 0 , i.e., 1 2 = P(H) = c(H) = x , as required.I have verified that S * ET (H) is always minimised at c(H) = 1 2 , as it should be if one uses a strictly proper scoring rule and a fair coin.
Note that I used Levinstein's Theorem A.4 to show that the weighted Brier score in S ET (H) is not strictly proper and Definition 11 to show that s H * is strictly proper without reaching any contradiction along the way.S ET (H) uses the Brier score, a strictly proper scoring rule, and unequal positive weights.In this case, Levinstein's Theorem A.4 applies and says that the weighted Brier score is not strictly proper.My results about Definition 11 say nothing about a situation in which one weights a strictly proper scoring rule (well, Theorem 2 agrees with Theorem A.4). My result says that one can use unequal positive weights with a scoring rule such that the combination is strictly proper.But it is not done by weighting a strictly proper scoring rule.It is done by weighting a scoring rule that is not strictly proper.Specifically, note that strictly proper s H * is a combination of weights and a scoring rule s H .But s H is not strictly proper.To see it, consider a situation with (X,  H ) = (H,  ¬H ) > 0 , so weights are positive equal: If s H is strictly proper, then equal positive weights will not interfere with its strict propriety (see footnote 9 or Ranjan & Gneiting, 2011, p. 413 for details).Differentiating Eq. ( 3) with respect to x and setting it equal to 0 gives: Separating x gives (note that k( ) ≠ 0 , so the following equation is well-defined): which corresponds to the formula c(X) = p−c * (X) k( ) + p from Lemma 6, where X is H and p = P(H) .By plugging values k( ) = 4 3 , k( ) = 2 3 , and k( ) = 1 into Eq.( 4), one gets that c(H) = 5 8 , c(H) = 3 4 , and c(H) = 2 3 , respectively.But we know, by assumption, that P(H) = 1 2 .So, there is no value of k( ) for which S * ET (H) with equal weights is minimised at c(H) = P(H) , which would be the case if s H was strictly proper.Hope- fully, this clarifies the difference between Levinstein's result and my approach and why they coexist without contradicting each other. (3) 1 3 A scoring rule and global inaccuracy measure for contingent… Lastly, one of the reviewers expressed a worry about a case when P is not the agent's credence function but is still an argument of s X * .This is potentially worrying, the reviewer says, because in many applications of scoring rules, we take the distribution P with respect to which expected inaccuracy is calculated to be the "true" distribution.One reason to require strict propriety is so that the expected value of a scoring rule measuring the inaccuracy of an agent's credence function is minimised when that credence function matches the true data-generating distribution.In such a context, when we define a scoring rule such that the distribution with respect to which its expected value is calculated is also an argument of that scoring rule, we effectively assume that we have access to the true data-generating distribution when evaluating accuracy.But this is not always the case, such that our ability to actually estimate the expected value of such a scoring rule (i.e., s X * ) may be significantly limited.
Given the discussed example, my understanding is that the reviewer is worried about what happens if one does not know the coin's bias (i.e., P(H), which one can see as a true data-generating distribution) since it is needed for constructing s H * .First, let me say that, in the context of accuracy-first epistemology, P is often interpreted as one's probabilistic credence function (e.g., see Pettigrew, 2016, p. 24 or Pettigrew, 2016, pp.189-190).But more is needed to answer the point fully.I would argue that what limits our ability to estimate the expected value is that one formulates expectation with respect to an unknown probability function P rather than P being an argument of s X * .For example, assume that, in our example, P(H) is the chance of the coin landing Heads, and one does not know the value of P(H).Then,

2−P(H)
. 12 So, one is limited in estimating the expected value independently of s H * (since S ET (H) does not use it).But note that, for equal weights, S ET (H) is minimised at c * (H) = P(H) whether or not one knows the value of P(H).Similarly, S * ET (H) is min- imised at c(H) = P(H) whether or not one knows the value of P(H).For c * (H) = P (H)  2−P(H) and unknown P(H), one can still find the values of k( ): One can now plug these values into S * ET (H) , but for the sake of brevity, let me write the formula only for k( ) = 2 2−P(H) (and leave the rest to the reader): 12 Note that c(H) = P(H) 2−P(H) , where Taking the derivative with respect to x gives 8(x−P(H)) 2−P(H) , which is minimised at x = c(H) = P(H) .For k( ) = 1 2−P(H) , the derivative w.r.t.x gives 2(x−P(H)) 2−P(H) minimised at c(H) = P(H).
Note that Gneiting and Brehmer in Brehmer and Gneiting (2020) do not use P as an argument in their properisation approach.But they use P to find a minimiser that I have called c * (X) , which is an argument in their properisation approach (see Brehmer & Gneiting, 2020, p. 660).So, knowledge of P plays a crucial role also in their approach and work outside accuracy-first epistemology.That said, I admit that requiring P(X) and c * (X) to construct and evaluate s X * and its expected value is a demanding assumption.There may be a way around it, but as it stands now, I do not know how to do it.
Let me briefly comment on the existence and uniqueness assumption of minimiser c * (X) .In our example, c * (H) exists and is unique.But, in general, it is not guaranteed that c * (X) exists or is unique (see Brehmer & Gneiting, 2020, especially Sect. 3).There is a need for an argument justifying this assumption in the context of accuracy-first epistemology.Accuracy-first epistemology already uses assumptions that make such an argument possible.For example, scoring rules are generated by strictly convex functions (e.g., see Pettigrew, 2016, pp. 84-85) and are bounded from below (i.e., ideal credences get the zero inaccuracy score, e.g., see Definition 1 or (Schervish et al., 2009, p. 206) for details.But to formulate this argument properly, one needs a formal set-up that is too complicated to start here and now, so I will leave this question open in this paper.

Conclusion
Levinstein argued that there does not exist a scoring rule that is additive, proper, truth-directed, and differentiable such that it recognises the contingency of varying importance (see Theorem A.4 in Levinstein, 2019Levinstein, , p. 2929)).He concluded that accuracy-first epistemology could not capture the contingency of varying importance while maintaining its commitment to propriety and truth-directedness.In this paper, I argue that accuracy-first epistemology can capture the contingency of varying importance while maintaining its commitment to additivity, propriety, truthdirectedness, and differentiability.I argue that there exists a strictly proper, truthdirected, and differentiable weighted scoring rule s X * (an inaccuracy measure of individual credences) that recognises the contingency of varying importance and a global inaccuracy measure I s * (an inaccuracy measure of entire credence functions) that also has all the required properties.That is, I s * is truth-directed, differentiable, proper (strictly and strongly), additive (which, avoiding redundancy, I predicate only about global inaccuracy measures), and it recognises the contingency of varying importance.I also discuss how Levinstein's and my results coexist without contradicting each other and why it is so.

3
A scoring rule and global inaccuracy measure for contingent… A proofs for Sect. 2 (the proof and idea behind Theorem A.4) Theorem 1 (Levinstein) Let I s from Assumption 1 and Assumption 2 be given.If the contingency of varying importance holds, then E Pr � s X (c(X)) and E Pr s X (c(X)) are not minimised at the same point.Thus, s X and I s is not strictly proper.
Proof Following the notational convention from Subsection 1.3, one has: By Assumption 2, E Pr s X (c(X)) and E Pr � s X (c(X)) are differentiable with respect to c(X), so use the first derivative test to find the optima (let me drop the superscript X on the right-hand side of the following equations): which are equations that any minimiser of E Pr s X (c(X)) and E Pr � s X (c(X)) , respec- tively, must satisfy.Using basic arithmetic operations, one gets that: where, by assumption, 0 < Pr(X), Pr � (X) < 1 and 0 < (X, ) < ∞ , so the fractions are well-defined.Assuming that c * (X) is the common minimiser of E Pr s X (c(X)) and E Pr � s X (c(X)) , then c * (X) must satisfy both equalities in 5: It is, however, impossible for Eq. ( 6) to hold by the construction of Pr ′ and Defini- tion 9; compare to Levinstein (2019Levinstein ( , p. 2930)).By Definition 9, (X, 1 ) ≠ (X, 2 ) and either 1 , 2 ∈ X or 1 , 2 ∉ X , thus consider two cases.
So, there exists no c * (X) that satisfies all the equalities in Eq. ( 6), i.e., there is no common minimiser for E Pr s X (c(X)) and E Pr � s X (c(X)) .But if s X is strictly proper, then E Pr � s X (c(X)) and E Pr s X (c(X)) are minimised at the same point.So, s X is not strictly proper, which implies that I s is not strictly proper.For reductio, assume that s X is not strictly proper, but I s is strictly proper.So, by Definition 5, E P I s  (P) < E P I s  (c) for any distinct functions c and P ∈ P .But since s X is not strictly proper, then, by Definition 4, there is a credence ĉ(X) ≠ P(X) such that E P s X  (ĉ(X)) ≤ E P s X  (P(X)) .Let then one's credence function c be such that c(X) = ĉ(X) and, for any other propo- sition Y to which P assigns a value, let c(Y) = P(Y) .Since c(X) ≠ P(X) , P and c are different functions.For an additive I s , it holds that E P I s (P) ≥ E P I s (c) , which vio- lates Definition 5. Therefore, I s is not strictly proper.◻ B proofs for Sect. 3 (Constructing s X * and I s * ) Theorem 2 (Gneiting and Ranjan) Let (Ω, F) , X ∈ F , a credence function c, P ∈ P , and Assumption 2 be given.E P s X (c(X)) is uniquely minimised at c * (X) = ∑ ∈X P( ) (X, ) ∑ ∈Ω P( ) (X, ) .Proof For the sake of notational simplicity, let me call the following sum : Note that is the normalising constant.It will be useful noting that if 0 <  < ∞ , one can multiply both sides of Eq. ( 7) by 1 to get: Assume that s X satisfies Definition 8, so, by Definition 7, 0 <  < ∞ .I can then write that: ) .
1 3 A scoring rule and global inaccuracy measure for contingent… does not hold (cases iii -iv).Since and ′ is positive, negative, or 0, there are nine options, but I will cluster together similar cases.
Proof By Definition 6, I s * (c, ) ∑ X∈F s X * (v (X), c(X), (X, )) is strongly proper just in case E P I s * (c) is uniquely minimised at P(X) = c(X) for all X ∈ F and all probability functions P ∈ P .Pick any X ∈ F and differentiate E P I s * (c) with respect to c(X) as follows; I omit the superscript X in the middle part of the following equation: By Proposition 1, s X * is strictly proper, so holds at P(X) = c(X) .That is, E P I s * (c) is uniquely minimised at P(X) = c(X) for all X ∈ F and all probability functions P ∈ P , as required.In other words, for any X ∈ F , any probability function P ∈ P , and any credence function c such that P(X) ≠ c(X) , it holds that E P s X * (P(X)) < E P s X * (c(X)) .Strict propriety of I s * follows.By Definition 5, I s * is strictly proper if for any distinct credence functions P ∈ P and c it holds that (I use the fact that I s * is additive): By the strict propriety of s X * , the inequality E P s X * (P(X)) < E P s X * (c(X)) holds for any c(X) ≠ P(X) .So, for any distinct P and c, there is X ∈ F such that c(X) ≠ P(X) and,

4. 3
Global inaccuracy measure I s * I will now use s X * from Definition 11 to construct a global inaccuracy measure I s * and show that I s * has all the properties listed in Theorem A.4, without reaching a contradiction.Proposition 2 I s * is additive, proper (strictly and strongly), truth-directed, differenti- able, and recognises the contingency of varying importance.
will now consider S ET (H) but use it with s X * from Definition 11.Let X be H and S * ET (H) be S ET (H) using s H * .I assume that s H in s H * works as the Brier score.So, for example, s H (1, c * (H) + k( )) = (1 − [c * (H) + k( )]) 2 .If H is a world where H is true and ¬H is a world where H is false, then weights are (H, H ) = 1 and