Towards Automatic Mathematical Exercise Solving

Knowledge graphs are widely applied in many applications. Automatically solving mathematical exercises is also an interesting task which can be enhanced by knowledge reasoning. In this paper, we design MathGraph, a knowledge graph aiming to solve high school mathematical exercises. Since it requires fine-grained mathematical derivation and calculation of different mathematical objects, we design a crowdsourcing-based method to help build MathGraph. MathGraph supports massive kinds of mathematical objects, operations and constraints which may be involved in exercises. Furthermore, we propose an algorithm to align a semantically parsed exercise to MathGraph and figure out the answer automatically. Extensive experiments on real-world datasets verify the effectiveness of MathGraph.


Introduction
Currently, large-scale knowledge graphs are widely used in many real-world applications, such as semantic web search, question-answer systems, natural language processing and data analysis. For example, if we ask "What is the highest mountain?" on a web search engine, it may directly show the answer "Everest" with the help of a knowledge graph.
Recently, intelligent education has become more and more popular and automatically resolving mathematical exercises can help students improve the comprehensive ability. However, it is rather challenging to automatically resolve mathematical exercises without knowledge graphs, because it requires to use complex semantics and extra calculations. In this paper, we propose MathGraph, a knowledge graph aiming to solve high school mathematical exercises. Math-Graph must be specially designed and differentiated from other knowledge graphs. The reasons are listed as follows:

Knowledge in MathGraph belongs to a specific domain
Building MathGraph requires specific mathematical knowledge. Traditional knowledge graphs are built based on extensive semantic data, e.g. Wikipedia. However, it is very hard to get the semantic data for mathematical problems. 2. Knowledge in MathGraph is stored in class level rather than instance level. Most of the traditional knowledge graphs focus on extracting instances, categories and relations among instances. For example, a 3-tuple (Beijing, isCaptialOf, China) shows a relation between two instances. However, in MathGraph, there is no instance in the origin graph, but only many class-level mathematical objects (such as Complex Number and ellipse). Only if an exercise is given, instances will be created accordingly. 3. MathGraph supports mathematical derivation and calculation. The reasoning process of mathematical problems is different from other problems, because besides logical relation, mathematical derivation must be included in the knowledge graph to solve mathematical exercises.
Moreover, there are numerous mathematical entities that need to be extracted, and it is very difficult to parse them automatically from the exercise texts. It is too expensive to ask a large number of experts to extract the entities for us. However, if we hire a few experts, it is difficult to derive complete entities in the domain. To address this, we decide to construct Math-Graph via crowdsourcing. Existing works [4,24] that focused on entity extraction of knowledge graph mainly extract entities from general web pages. However, our entities such as math 1 3 objects, operations and constrains are from mathematical exercises, which is more complicated and domain-specific. Therefore, we have to design special tasks for MathGraph. Thus, in this paper, we focus on designing and building a knowledge graph MathGraph for resolving mathematical problems. We also propose an effective algorithm to align a mathematical problem to MathGraph, and use the aligned sub-graph to resolve a mathematical exercise. Our contributions are as follows. • We specially design the structure of MathGraph to support mathematical derivation and calculation. We model different mathematical objects, operations and constraints in MathGraph. To the best of our knowledge, this is the first attempt to build a knowledge graph for resolving mathematical problems. • We propose an approach to construct MathGraph via crowdsourcing. • We propose an algorithm to align a mathematical problem to MathGraph. • We design a method to resolve mathematical exercises with the help of a semantic parser. • Experimental study shows great performance of Math-Graph and our proposed method. Figure 1 gives an overview of the exercise-solving process with MathGraph. We detail the structure of MathGraph and the exercise-solving algorithm later. The rest of this paper is organized as follows. Section 2 introduces some related works. Section 3 introduces some concepts involved in MathGraph. Section 4 overviews the structure of MathGraph. Section 5 introduces how to build MathGraph using crowdsourcing. Section 6 proposes some algorithms to solve mathematical exercises. Section 7 gives the experiment results, and we conclude the paper in Sect. 8.

Reasoning with Knowledge Graph
Since knowledge graphs can provide well-structured information and relations of the entities, it is known to be useful to do reasoning in many tasks, such as query answering and relation inference (i.e. to infer missing relations in the knowledge graph [10,21,22]). Gu et al. [15] proposed a technique to answer queries on knowledge graph by "compositionalizing" a broad class of vector space models, which performs well on query answering and knowledge graph completion. Toutanova et al. [32] proposed a dynamic programming algorithm to incorporate all paths in knowledge graph within a bounded length, and modelled entities and relations in the compositional path representations. Zhang et al. [35] presented a deep learning architecture and a variational learning algorithm, which can handle noise in the question and do multi-hop reasoning in knowledge graph simultaneously. Zheng et al. [37] used a large number of binary templates rather than semantic parsers to query knowledge graph with natural language. A low-cost technique that can generate a large number of templates automatically is also proposed.
Our work is different from above works. Firstly, there are some differences between the structure of MathGraph and existing knowledge graphs (e.g. Freebase and NELL [3]). Secondly, to solve a math exercise usually requires multi-step Overview of using MathGraph to solve a mathematical exercise mathematical derivation, and the derivation procedures need to be output as the problem-solving process. Thirdly, derivation and calculation should be performed simultaneously when solving an exercise to retrieve the answer.

Automatically Solving Mathematical Problems
Automatically solving mathematical problems has been studied over years. But they only focused on easy problems, e.g. mathematical problems in primary schools. Kojiri et al. [18] constructed a mechanism called solution network to automatically generate the answers for mathematical exercises. The solution network is represented as a tree to describe inclusive relations of exercises. Tomas et al. [31] proposed a framework of constraint logic programming to automatically generate and solve mathematical exercises. This paper proposed to concentrate on the solving procedures rather than many simple exercise templates so that the generation and explanation of these exercises are easy. Ganesalingam et al. [13] proposed a method that solves elementary mathematical problems using logical derivation and shows solutions which are made difficult to distinguish from human's writing.
However, these works all have some limits. For example, some can solve those problems only involving elementary math (e.g. set theory, basic algebraic operation) without deeper theorems; some only support very limited logical derivation. Thus, in this paper, we present a knowledge graph to represent as many mathematical entities and logical relationships as possible.

Entity Extraction and Knowledge Graph Construction with Crowdsourcing
Crowdsourcing is widely used to extract entities and knowledge from massive types of data [5,6,14,27,34]. Chai et al. [4] focused on collecting entities using crowdsourcing with low cost and high quality. Dumitrache et al. [11] proposed a method for collecting medical relation using crowdsourcing. Seifert et al. [30] presented a method to extract entities from scientific literature, which further can be used to create an open knowledge base. Crowdsourcing is also used to construct, update or integrate knowledge graph, such as Freebase [2]. Xin et al. [33] proposed a method for subjective knowledge base construction, which leverages crowd workers to annotate the subjective properties of the instances. McCoy et al. [23] used crowdsourcing to construct a clinical knowledge base by identifying relationships between medication pairs. Meng et al. [24] proposed a framework for large-scale knowledge base integration through crowdsourcing.
Compared with the conference version [36], we make the following contributions. Firstly, we design several user-friendly interfaces to leverage the crowd to build the MathGraph. Secondly, we design more quality control methods customized to the MathGraph construction problem. Thirdly, we conduct extensive experiment to evaluate the crowd-based method. Experiment results show that our method can achieve a higher quality than the expert-only approach while spending not so much money. Fourthly, we discuss more related works in this manuscript.

Preliminaries
In this section, we describe the entities that may appear in MathGraph, including mathematical objects and instances, operations and constraints. Table 1 shows the notations used in this paper.

Mathematical Object and Instance
A mathematical object is an abstract object which has a definition and some properties, and can be taken as the target of some operations or derivation. Note that a mathematical object can be defined in terms of other objects. A concrete object that satisfies the definition of the mathematical object is called an instance.
For example, Complex Number can be considered as a mathematical object: • Definition A complex number is a number that can be in the form a + bi , where a and b are both real numbers and i is the imaginary unit which satisfies i 2 = − 1. • Property example Imaginary part is a property of a complex number. The imaginary part of a complex number a + bi is b.
(a 1 b 2 + a 2 b 1 )i • Derivation example: If (a 1 + b 1 i) and (a 2 + b 2 i) are conjugated to each other, then a 1 = a 2 and b 1 + b 2 = 0. Different mathematical objects should be described as different structures in MathGraph. Thus, in MathGraph, a mathematical object is represented with a tuple of key properties p 1 , p 2 , … , p n . The key properties of a mathematical object are those properties that together can form and describe all the information of an instance of the object. Table 2 shows examples of key properties of some mathematical objects. Two instances of a mathematical object are equivalent if and only if all the key properties are equivalent.
In a mathematical exercise, instances can be categorized into certain instances and uncertain instances depending on whether it contains some uncertain values as its key properties. An instance is a certain instance if all key properties are certain, uncertain instance otherwise. For example, a real number 2.3 and a function f (x) = x + sin(x) are certain; a complex number 3 + ai (where a ∈ ℝ ) and a random triangle ΔABC are uncertain.

Operation
Generally, an operation is an action or procedure which, given one or more mathematical objects as inputs (known as operands), produces a new object. Simple examples include addition, subtraction, multiplication, division and exponentiation. In addition, other procedures such as calculating the real part of a complex number, the derivative of a function and the area of a triangle can also be considered as operations.

Constraint
A constraint is a description or condition about one or more instances, at least one of which is an uncertain instance. There are four types of constraints: descriptive constraints (e.g. complex numbers x and y are conjugated), equality constraints (e.g. a + 2 = b ), inequality constraints (e.g. a 2 ≤ 5 ) and set constraints (e.g. a ∈ ℕ).
Most descriptive constraints cannot be applied directly to solve the exercise, but can be converted into other three types of constraints using some definitions or theorems. For example, if an exercise says " a + 3i and 7 − bi are a conjugate pair", by the definition of conjugate complex, we can know that a = 7 and 3 + (− b) = 0 by derivation.

The Structure of MathGraph
MathGraph is a directed graph G = ⟨V, E⟩ , in which each node v ∈ V denotes a mathematical object, an operation or a constraint, and each edge e ∈ E is the relation of two nodes.

Nodes
In general, nodes are categorized into three different types: object nodes, operation nodes and constraint nodes.

Object Nodes
An object node v o = (t, P, C) represents a mathematical object, where t denotes an instance template of this mathematical object; P = (P 1 , P 2 , … , P n ) is a tuple indicating key properties of the mathematical object; and C is a set of constraints that, according to the definition or some theorems, must be satisfied by this mathematical object. Table 3 shows an example of "triangle" as an object node. We can see that properties and theorems of triangles are included in the constraint set.

Operation Nodes
An operation node v p = (X 1 , X 2 , … , X k , Y, f ) represents a k-ary operation, where X i (i = 1, 2, … , k) and Y are object nodes representing the domain of the ith operand x i and the result of the operation y, respectively, and f is a function that implements the operation and can be finished by a series of symbolic execution [1,9,17] process using a symbolic execution library (e.g. SymPy [26], Mathematica [16]) even if some operands are uncertain instances.
For example, getting the modulus of a complex number is an unary operation where X 1 = ⟨ ⟩ , Y = ⟨ ⟩ and f can be implemented by the following symbolic execution process: (1) get the real part of x 1 ; (2) get the imaginary part of x 1 ; (3) return the squared root of the sum of (1) squared and (2) squared.

Constraint Nodes
A constraint node v c = (d, X 1 , X 2 , … , X k , f ) represents a descriptive constraints of k instances, where d is the description of the constraint, X i (i = 1, 2, … , k) are object nodes representing the domain of each involving instance, and f is a function which maps this descriptive constraint into several equality constraints, inequality constraints and set constraints. For example, a constraint node represents that x 1 and x 2 are a conjugate pair, where X 1 = X 2 = ⟨Complex Number⟩ and f can be implemented by the following process: (1) get the real part of x 1 as a 1 ; (2) get the real part of x 2 as a 2 ; (3) get the imaginary part of x 1 as b 1 ; (4) get the imaginary part of x 2 as b 2 ; (5) return two equality constraints: a 1 = a 2 and b 1 + b 2 = 0.

Edges
There are two types of edges in MathGraph: the derive edges and the flow edges.

The derive Edge
For two object nodes X and Y, there may be a derive edge e DERIVE = (X, Y, f ) to indicate a general-special relationship between them, such as Triangle and Isosceles Trian- an instance of X can be reassigned as an instance of Y if certain conditions are met. These conditions are encapsulated into a function f ∶ X → {False, True} : if these conditions are met, the function f will return True and reassign the instance from X to Y; otherwise, it will simply return False.
For example, there is a derive edge from object node TriaNgle to isosCeles TriaNgle, where the function f can be implemented as: (1) if the values of key properties or a constraint shows that two angles or lengths of two edges of the origin instance are equal, return an instance of Isosceles Triangle with the same key properties; (2) otherwise, return False.
When solving an exercise, reassigning an instance to a more specific object node will bring more constraints of this object and help find the answer. For example, for a rhombus ABCD, if we know that ∠A = 90 • , we can infer, by the derive edge from object node rhombus to square, that ABCD is a square and has constraints that ∠A = ∠B = ∠C = ∠D = 90 • .

The flow Edge
A flow edge e FLOW = (X, Y) indicates the flow direction of instances during the exercise-solving process, which may only exist from an object node to an operation node, from an operation node to an object node or from an object node to a constraint node.
The flow edges between object nodes and operation nodes represent the process of passing instances as  Table 3 An example of object node: triangle parameters before the operation and the process of returning a new instance after it. For example, in Fig. 2, the two flow edges pointing to the operation node "addition" indicate that this operation takes two instances of complex number as its input values, and the edge leading from this operation node indicates that it returns a new instance of complex numbers. The flow edges from object nodes to constraint nodes also represent the process of passing parameters of the constraints. For example, in Fig. 2, the two flow edges pointing to the constraint node "x and y are a conjugate pair" indicates that this constraint takes two complex number as its input. Note that constraints nodes only convert descriptive constraints into other types of constraints and generate no instances, so there are no flow edges from a constraint node to an object node.
In summary, MathGraph is a well-structured graph supporting different mathematical objects, operations and constraints. Next, we will discuss how to solve mathematical exercises using it.

MathGraph Construction using Crowdsourcing
As is mentioned above, MathGraph can be used to solve mathematical exercises. However, the objects, operations and constraints in MathGraph need to be extracted and refined by mathematical logic, so it is very difficult to construct MathGraph automatically. If one or a small group of people are chosen to create MathGraph manually, it is highly likely that some entities are missing, incomplete or incorrect. Therefore, we tackle this problem by leveraging the power of crowdsourcing to construct and validate the MathGraph.
Our whole task in this section can be described as follows. Given a set of mathematical exercises R , we try to build MathGraph in a crowdsourcing platform (such as Amazon Mechanical Turk) and crowd workers. First of all, we need to extract all the mathematical objects, operations and constraints from the exercises. We randomly partition the exercise set R into several disjoint subsets {R 1 , R 2 , …} , where every subset contains no more than k (1 ≤ k ≤ |R|) exercises. In practice, considering one worker can only handle limited exercises at once, k is recommended to take values between 5 and 20. Then we assign m crowd workers to each subset and design a set of user interfaces and questions to extract the objects, operations and constraints from text.
Quality control is also necessary in this task. Note that the workers who are familiar with the mathematical concepts and exercises can do a good job. Therefore, the hired workers need to know some fundamental and simple domain knowledge of math. To address this, we provide a detail instruction of the mathematical exercises which aims to guide the workers. In addition, to block the workers who are not qualified, we provide a quiz for each incoming worker. Only the workers who achieve high score can participate in the following tasks. Furthermore, since the answers still may contain incorrect or duplicated entities, we need to design corresponding algorithms to validate them.

Extracting Objects
The user interface we designed for extracting objects is shown in Fig. 3a. After workers submit their answers on the platform, we can obtain a collection of object names where O i contains the names answered a worker. However, there exists two types of errors in the collected answers: 1. Duplicate answers Different names are actually referring to the same mathematical object, e.g. "Complex" and "Complex Number", "Point on the plane" and "Point in two-dimensional space". 2. Wrong answers Names of other entities are incorrectly categorized as mathematical objects. For instance, some worker submit "Complex Conjugate" as a mathematical object, which is actually an operation.
In order to solve the first type of error, we apply a classic entity resolution technique [7,8]. Given a pair of collected object names we can compute the similarity s ij by utilizing any similarity function, e.g. Jaccard similarity, edit distance. We take Jaccard similarity as an example here. We first tokenize o i into a set of tokens and compute Jaccard on token sets as follows.
Then, we select all pairs with similarity no less than a given similarity threshold (e.g. 0.3) and design questions for each pair to ask multiple workers whether two names are actually one object. Figure 3d shows an example of the question. After that, we can easily determine whether the pair should be merged into one object by these workers' (uniform or weighted [19]) majority vote.
As for the second type of error, we first count the number of occurrences of every name in O , denoting as c(o i ) . The frequency of the name can be further defined as The higher the frequency, the more likely o i is a mathematical object. Given a frequency threshold f ( e.g. 0.8) , for a entity name o i , (1) if f (o i ) ≥ f , it will be inferred as a valid mathematical object; (2) if f (o i ) ≤ 1 − f , it will not be a mathematical object; (3) otherwise, we will transform it into a question (see Fig. 3e), send it to the crowdsourcing platform and obtain the answers from crowd workers.

Extracting Operations and Constraints
Operations and constraints also need to be recognized and extracted from every given exercise subset. We also design the corresponding user interface on the crowdsourcing platform, shown in Fig. 3b, c. Note that the difference from extracting objects is that the workers have to submit not only the name of the operation/constraint, but also the type of the operands and the result of the operation, or the type of the parameters of the constraint. For example, the operation "Find the modulus of a complex number" should be submitted as a key-value map: Thus, we collect the workers' answer as sets: P = {P 1 , P 2 , … , P m } and C = {C 1 , C 2 , … , C m } , where every set P i and C i contains several operations and constraints, respectively. Here, an operation p is denoted as a tuple (p. name, p.operands, p.result) and a constraint c (c.name, p. parameters), where p.operands and c.parameters are both unordered list containing the domains of the operands of p and the parameters of c. Similar to the concept of type signature in programming languages, we define (p.operands, p.result) and c.parameters as the signature of p and c, respectively.
Extracting operations and constraints also face the same two possible errors as extracting objects. For the second type of error (i.e. a submitted entry is actually not an operation/ constraint), we can follow the same approach as above. However, for the first type of error (i.e. one operation/constraint are submitted by several workers with different names), because the number of the operations and constraints in MathGraph is much larger than the mathematical objects, and operations and constraints may have various description from different workers, simply using the same method above will result in too many questions for the workers. Therefore, a more efficient method is needed in this case.
We note that two different descriptions can refer to one operation/constraint only if their signatures are identical. Thus, we design Algorithm 1 to handle duplicated operations via crowdsourcing. First, we group these submitted entries based on the signature (line 2). Then for each entry in a group, we ask a crowdsourcing question to verify whether there is already an operation that has the same meaning (lines [11][12][13][14][15][16][17][18][19]. If not, it will be considered as a new operation (lines 20-23). Initialize Ω as an empty list of list; 6 Randomly choose an entryê from σ i ; 7 Initialize ω as an empty list;

Extracting Edges
To construct a complete MathGraph, we still need to extract the edges. The flow edges can be created automatically by the submitted signature of the operations and constraints. For instance, according to the signature of the operation "Complex Modulus", two flow edges (Complex Number, Complex Modulus) and (Complex Modulus, Real Number) will be added into MathGraph.
However, the derive edges contain extra information, so they have to be extracted by crowdsourcing. For every ordered pair of the mathematical objects, we design a question as shown in Fig. 3f to ask the workers whether there is a flow edge between these two objects, and retrieve the answer from multiple workers' majority vote.
After extracting all the nodes and edges in MathGraph through crowdsourcing, several experts (i.e. people who can write code) are asked to program the logical information in these nodes and edges, such as the function f in an operation node v p (see Sect. 4.1.2). At last, we can construct MathGraph from extracted entities by crowds and coded logic by experts.

Solving Mathematical Exercises with MathGraph
In this section, we propose a framework to solve a mathematical exercise using MathGraph. First, we use a semantic parser mapping exercise text to the instances, operations and constraints, respectively. Then, we solve the constraints and update uncertain instances. Finally, we return the answer of this exercise.

Mapping Text in MathGraph
Considering the limited information and expression in the mathematical exercises, we can easily use a rule-based semantic parser to parse the exercise text and then map them to corresponding nodes in MathGraph. The rule-based semantic parser uses a set of rules to parse every sentence of the exercise and recognize the logical relationship in the text. For example, "Let x and y be complex numbers" will be parsed as declaration of two uncertain instances; "Find the coordinates of the conjugate complex of (i + 1) (i − 1) " will be parsed as a declaration of a certain instance and two operations.

Mapping Instances
With the semantic parser, every instance generated from the exercise should have already mapped into the corresponding object node. That is, a set of instances I = {(x 1 , X 1 ), … , (x k , X k )} is generated by parsing the text of the exercise, where x i denotes the instance and X i denotes the corresponding object node.
Instances are classified as certain instances or uncertain instances depending on whether the exercise provide certain values or expressions of them. For uncertain instances generated from text, key properties with unknown value should be generated as instances, since they may be used in the operations and constraints of this exercise. For example, for the exercise shown in Fig. 4, x and y are both uncertain instances of object node Complex Number. Therefore, we need to generate a x , b x , a y and b y as four uncertain instances of object node real Number, where a x and b x stand for the two key properties of x, and a y and b y stand for the key properties of y.

Mapping Operations
The semantic parser can also parse out the a set of operations from the text. Every operation in it will be aligned to the corresponding operation node in MathGraph with its operands, trigger the function in the operation node and then finally generate a new instance as the output of the operation. S dependency ← S dependency ∪{(p, x)}; 12 for each (o, ( if y is a certain instance then for each (c, (x 1 , X 1 ), · · · , (x k , X k )) ∈ C do 23 if c is a descriptive constraint then 24 c ← c.f (x 1 , · · · , x k ); 25 return I certain , I uncertain , C, S dependency

Mapping Constraints
Similar to mapping operations, for every descriptive constraint (c, (x 1 , X 1 ), … , (x n , X n )) in the exercise, the semantic parser can map it to the corresponding constraint node c with some involving instances, trigger the function in the node and convert it to several equality/inequality/set constraints. Also, note that when an uncertain instance is generated, some constraints may also be generated according to the constraint set of the corresponding object node. After that, we gather all the constraints in the exercise as a set for further using.
Algorithm 2 shows the process of mapping text of the exercise, where instances are mapped in lines 7-11, operations are mapped in lines 12-21, and constraints are mapped in lines 22-25.

Solving Uncertain Instances and Constraints
After parsing all the instances and operations in the exercise, the answer of the exercise should already be generated as an instance (from the text or by an operation). If this instance is a certain instance, we can directly return the value of this instance as the answer; otherwise, we must deal with these uncertain instances and solve the constraints in the exercise to update their values and finally retrieve the answer of the exercise.

Reassign Uncertain Instances
First, we need to check every uncertain instance whether it can be reassigned to a more specific object node in Math-Graph by a derive edge. For an uncertain instance i that is assigned to an object node v o , we check every outgoing derive edge of v o , and if the function of an edge e returns true, then we reassign i to the object node that e points to and add all the constraints in this node to the constraint set. Algorithm 3 shows the pseudocode of this process.
For example, if we have an uncertain instance ΔABC , and there is a constraint ∠B = ∠C in the constraint set, then the derive edge from TriaNgle to isosCeles TriaNgle should return true. So the instance should be reassigned to isos-Celes TriaNgle, and a new constraint AB = AC should be added to the constraint set.

Organizing Uncertain Instances
Note that for two uncertain instances and , there may be a dependency relationship between them, which is caused because either is one of the inputs of an operation node and is the output or is one of the key properties of . Thus, we use a graph G I = ⟨V I , E I ⟩ to describe dependency of all the uncertain instances, where v ∈ V I is a node representing an uncertain instance and e ∈ E I is a directed edge representing a dependency relationship of two nodes. Note that G I is always a DAG, since there will be no dependency loop in it.
Let S I = v|v ∈ V I ∧ ∀u ∈ V I , (u, v) ∉ E I denote the set containing all node without any incoming edges in G I . It is obvious that if all nodes in S I can turn into certain instances, the instance corresponding to the answer can be derived to a certain instance. Algorithm 4 demonstrates this process. For example, Fig. 5 shows G I of the exercise in Fig. 4, where x and y depend on their respective key properties, and z = x + y depends on its two operands. In this context, S I = a x , b x , a y , b y and the instance corresponding to the answer is z.

Organizing and Solving Constraints
After the last step, we now have a set of constraints. First, we need to make sure every variable in every constraint is in S I . If not, this constraint needs to be rewritten by using its key properties as the variable. For example, for the exercise in Fig. 4, the set of the constraint i s {x + y = 6, xy = 10, a x = a y , b x + b y = 0} . S i n c e x, y ∉ S I , the first two constraints will be rewritten as a x + b x i + a y + b y i = 6 and (a x + b x i)(a y + b y i) = 10.
Now the constraint set includes and formalizes all the constraints in the exercise. So we can apply methods of a symbolic execution library [16,26] or some approximation algorithms [12,29] to solve these equations and/or inequalities. Finally, we will get the value (or range of value) of every instance in S I . Algorithm 5 shows this process.

Updating Uncertain Instances and Retrieving the Answer
After solving all the constraints in the exercise, we need to update the value of the rest instances in G I . Since we now know the value of instances in S i , we can traverse every instance in G I in the topological sorting order and update their values in turn. Finally, we return the value of the instance corresponding to the answer. Algorithm 6 shows the complete process of using MathGraph to solve exercise.

Experiments
In this section, we conduct extensive experiments on real mathematical datasets to evaluate the performance of our method.

Datasets and Experiment Setting
We collect four real-world datasets of mathematical exercises of Chinese high schools, namely Complex, TriaNgle, CoNiC and solid. The exercises are stored in plain text, and the mathematical expressions are stored in the LaTeX format.
• Exercises in the four datasets are categorized into three levels (i.e. easy, medium and hard) based on the difficulty (which is classified according to the accuracy of many high school students). Table 4 shows the number of exercises with different difficulty levels in the datasets.
In the experiments, we use Neo4j [28] as the graph database platform to build and index MathGraph. For the datasets, we build the knowledge graph manually involving only the instances, operations and constraints that may exist in these exercises. All algorithms are implemented in Python 3.7. Sympy [25] is used to do the work of symbolic execution. All the experiments are conducted in a machine with 2.40 GHz Intel Xeon CPU E52630, 48 GB RAM, running Ubuntu 14.04.

MathGraph Construction
We randomly choose 50% exercises from each dataset and use them to construct MathGraph. Those elements to be extracted are done by workers on ChinaCrowds 1 [20], which is a user-friendly crowdsourcing platform. In the experiment, we set m = 5 . Moreover, for task Fig. 3a-c, we pay 5 RMB each. For each task Fig. 3d-f, we pay 1 RMB because they are simpler than the above tasks. The total cost is 15,480 RMB. We compare with the baseline that utilizes 3 experts to do extraction on a sampled dataset (due to the limited ability of a single expert) on precision, recall and F1-score. The ground truth is retrieved by multiple experts proofreading the crowdsourcing result.
As shown in Fig. 6a, for all datasets, our crowd-based strategy has a much higher recall than the expert-based strategy because we use 5 workers to answer a task and combine their answers. For the expert-based strategy, each expert has to answer a lot of questions, and thus, they cannot cover so many entities, which results in a low recall. For example, on TriaNgle dataset, crowd-based strategy has a recall of 89%, which is 20% more than that of expertbased strategy (68%). On Complex dataset, crowd-based and expert-based strategies achieve similar recalls (95% and 94%, respectively) because the dataset is simple and has a small number of entities to be extracted. Therefore, experts can also achieve a high recall.
For precision, as shown in Fig. 6b, the expert-based strategy can achieve a high precision because it leverages the human's expertise. However, we can see that our crowdbased strategy is comparable with the expert-based one because we remove the duplicated answers and verify the wrong answers. For example, on Complex dataset, both methods have a precision of 100%. Moreover, on CoNiC dataset, crowd-based and expert-based strategies achieve a similar precision of 93% and 92%, respectively. Overall, for the F1-score, Fig. 6c shows that crowd-based method is better than the expert-based one because our method has a much higher recall and a comparable precision compare with the expert-based solution. For example, on solid dataset, crowd-based strategy has a recall of 93%, which is better than that of expert-based strategy by 7%.

Exercise Solving
We implement a rule-based baseline method as the following procedures: 1. We still use a rule-based semantic parser to parse the text and extract the information. 2. A large quantity of rules are written in advance to match different situations of exercises. We randomly selected 20% exercises and assign 8 programmers to program rules, which can align and solve the exercises in this set. Every rule represents an exercise type and has a built-in solving process only for this exercise type. 3. Then, these rules are used to solve all exercises. If an exercise matches a rule, then we apply the solving process of the rule and return the answer.
MathGraph is created by crowdsourcing and proofread by experts. It only includes nodes and edges of the four types of the exercises in our dataset, containing 89 mathematical objects, 723 operations and 875 constraints. Figure 7 shows the exercise-solving accuracy on four datasets. We can see that in every dataset, our method achieves higher accuracy than baseline, e.g. 20% higher accuracy. This result shows the effectiveness of solving problems using MathGraph. Figure 8 demonstrates the exercise-solving accuracy on different difficulty levels. From the experiment result, we have the following observations. Firstly, as the difficulty of the exercises increases, the accuracy of both methods decreases. Secondly, for easy exercises, the baseline and our method have similar performance, but for medium and hard exercises, MathGraph significantly outperforms the baseline, because our method can use the knowledge graph to do mathematical derivation.
The rule-based baseline considers the exercise as a whole and solving it according to the logic specified by a rule. This means that this method relies on a large amount of rules, and the more complex the exercise is, the more rules and the higher difficult it needs to write. Therefore, this method has a poor performance in hard exercises. However, our method extracts the mathematical objects, calculations and constraints from these rules and models them into a graph, so it can be used for multi-step calculation and derivation.

Conclusion
In this paper, we proposed MathGraph, a knowledge graph for automatically solving mathematical exercises. Math-Graph is specially designed to represent different mathematical objects, operations and constraints. Considering the complexity of the semantics of the mathematical exercises, we use crowdsourcing to construct MathGraph. Given an exercise, we can use the proposed method to solve it with the help of MathGraph and a pre-built semantic parser. Experimental study on four real-world datasets demonstrates the accuracy of our method.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.