Markov networks are useful for a wide variety of applications including natural language processing, computer vision and biology. In the last few years markov networks with submodular potential has become increasing useful for structured output prediction problems due to efficient and optimal MAP inference algorithms. Still the parameters of clique potentials are artificially constructed using domain knowledge using some heuristics because lack of efficient algorithms for the learning the parameters from data. Learning problem is cast as a quadratic programming (QP) problem following a Structural SVM approach. The popular approach to solve the above QP is using the cutting plane method which solves it in time polynomial of the number of data points if the MAP inference is optimal. So to solve the QP optimally Fix et al., (ICCV,2013) has enforced the submodularity of parameters by including another set of exponential number of linear constraints, but as the clique size increases the number of submodularity constraints becomes intractable. Submodularity is required to solve the problem optimally, but not the requirement of the learning problem. So in this talk, we will focus on how reduce the number of submodularity constraints without losing the optimality guarantee of the cutting plane algorithm.