Visual Sudoku SolverNikolas PilavakisMInf Project (Part 1) ReportMaster of InformaticsSchool of InformaticsUniversity of Edinburgh2020

3AbstractIn this report, the design, implementation and testing of a visual Sudoku solver appfor android written in Kotlin is discussed. The produced app is capable of recognising a Sudoku puzzle using the phone’s camera and finding its solution(s) using abacktracking algorithm. To recognise the puzzle, multiple vision and machine learning techniques were employed, using the OpenCV library. Techniques used includegrayscaling, adaptive thresholding, Gaussian blur, contour edge detection and template matching. Digits are recognised using AutoML, giving promising results. Thechosen methods are explained and compared to possible alternatives. Each componentof the app is then evaluated separately, with a variety of methods. A very brief userevaluation was also conducted. Finally, the limitations of the implemented app arediscussed and future improvements are proposed.

4AcknowledgementsI would like to express my sincere gratitude towards my supervisor, professor NigelTopham for his continuous guidance and support throughout the academic year.I would also like to thank my family and friends for the constant motivation.

Table of Contents1Introduction2Background2.1 Sudoku Background .2.2 Solving the Sudoku .2.3 Sudoku recognition .2.4 Image processing . .2.4.1 Grayscale . .2.4.2 Thresholding2.4.3 Gaussian blur347.99101213131314Design3.1 Design decisions . . . . . . . . . . .3.1.1 Development environment . .3.1.2 Programming language . . . .3.1.3 Solving algorithm . . . . . .3.1.4 Vision Library . . . . . . . .3.1.5 Real-time vs static recognition3.1.6 Grid Detection . . . . . . . .3.1.7 Image processing . . . . . . .3.1.8 Clue extraction . . . . . . . .3.2 Requirements . . . . . . . . . . . . .3.2.1 Solver . . . . . . . . . . . . .3.2.2 User Interface . . . . . . . . .3.2.3 Grid detector . . . . . . . . .3.2.4 Clue extractor . . . . . . . . .3.3 Split between two years . . . . . . . .Implementation4.1 App Overview . . . . . . . . . . . . .4.2 User Interface . . . . . . . . . . . . .4.2.1 Board . . . . . . . . . . . . .4.2.2 Puzzle Representation . . . .4.2.3 Puzzle Editing . . . . . . . .4.2.4 Displaying multiple solutions4.3 Solving Algorithm . . . . . . . . . .5.

6TABLE OF CONTENTS4.45674.3.1 Kotlin Implementation4.3.2 Multithreading . . . .Recognition . . . . . . . . . .4.4.1 Library Setup . . . . .4.4.2 Camera Use . . . . . .4.4.3 Puzzle Detection . . .4.4.4 Orientation Correction4.4.5 Clue detection . . . .3233333333343535Evaluation5.1 User interface . . . . . .5.1.1 Response time .5.1.2 Error handling .5.2 Solving . . . . . . . . .5.2.1 Time taken . . .5.3 Recognition . . . . . . .5.3.1 Grid detection . .5.3.2 Extracting clues .5.4 Requirements evaluation5.4.1 Solver . . . . . .5.4.2 User Interface . .5.4.3 Grid detector . .5.4.4 Clue extractor . .5.5 User Evaluation . . . . .414141414242434344474748484849Conclusions6.1 Project Achievements6.2 Project limitations . .6.3 Future Work . . . . .6.4 General remarks . . .5151515253.Appendix55Bibliography57

Chapter 1IntroductionPeter Gordon and Frank Longo [26] refer to Sudoku as n2 n2 grid that is initiallyfilled in with a certain number of cells (referred to as clues hereafter). The objectiveis to fill every cell with numbers 1 to n2 , without using any number more than oncein the same row, column, or n n block. In this report, only the most popular type ofSudoku is considered, for which n 3. A famous Sudoku example is shown in figure1.1. Nicknamed Al Escargot, it is considered the hardest Sudoku for human solvers.Various advanced solving techniques are necessary to solve this puzzle.The initial goal of this project was ”to recognize a Sudoku puzzle in an image andthen solve it”. More specifically, the project required that ”the phone user should beable to point the phone camera at a printed puzzle and see the image overlaid with thesolution”. The description of the project suggested that the app was developed for IOSdevices. After careful consideration, the goal was changed to the implementation ofan android app. Instead of the puzzle being overlaid with the solution, a picture of thepuzzle must be taken and the solution is presented separately. The rationale behindthese decisions and the resulting advantages are discussed in Chapter 3.The capabilities of the developed app include detecting and isolating a Sudoku puzzleusing the phone’s rear camera. If necessary, the orientation of the detected puzzle canthen be corrected. Afterwards, the clues can be extracted from the puzzle. Incorrectclues can also be corrected. The Sudoku with the corresponding clues can then besolved instantaneously. If the Sudoku has multiple solutions, navigation between solutions is also available. The app can be used as a companion when solving printedSudoku, or to attempt to solve custom puzzles entered by the user. The app can alsobe used to verify whether the solution found is the only solution to the puzzle. Someprinted puzzles might have more than one solution even though the puzzle was thoughtto have a single solution As this is a two-year project, the accuracy of detection is expected to improve next year. Capability to manually solve puzzles in the app will alsobe added.In chapter 2, a review of existing literature is presented. The chapter begins with somegeneral information about Sudoku puzzles. A review of existing solving algorithms isfollowed by a summary of literature involving Sudoku recognition. The chapter con7

8Chapter 1. Introductioncludes with a description of some common image processing techniques. In the nextchapter, various decisions that were made before the beginning of the implementationof the app are discussed, including possible alternatives and the requirements that theapp must fulfil. The design of the solving algorithm used by the app is also presented.In chapter 4, the implementation of the app is explained, including various obstaclesthat were encountered. The process that was followed in order to create the app andconnect the various capabilities is presented. In chapter 5, methods used to evaluatethe various aspects of the app are discussed, followed by some user evaluation. In thefinal chapter, the goals achieved during the project are summarised and then followedby some possible improvements.Figure 1.1: Al Escargot.

Chapter 2Background2.1Sudoku BackgroundThere are 6,670,903,752,021,072,936,960 valid Sudoku grids. The number was firstcalculated by user QSCGZ on a google groups thread [3]. In 2006, Felgenhauer andJavris [22] confirmed this number and calculated the number of essentially differentgrids to be 5472730538 [23]. Most printed Sudoku have a single valid solution, however there exist examples of puzzles with more than one solution as show by figure 2.1Figure 2.1: A Sudoku puzzle with two solutionsIn 2010, Lin and Wu [40] used a checker created by McGuire and proposed an algorithm called Disjoint Minimal Unavoidable Set (DMUS) to conclude that the minimumnumber of clues required for a Sudoku to have a unique solution is 17. McGuire [48]confirmed this number in 2012.According to some publications, the difficulty of the Sudoku is determined by the9

10Chapter 2. Backgroundnumber of cells that are initially filled, the fewer the cells, the harder the puzzle. Lee[38] separates Sudoku on 5 difficulty levels ranging extremely easy to evil, dependingon the total number of clues, as shown by table 2.1 or the lower bound of number ofclues in each row or column, as show by table 2.2.Difficulty level1 (Extremely easy)2 (Easy)3 (Medium)4 (Difficult)5 (Evil)Number of clues 4636-4632-3528-3117-27Table 2.1: Difficulty of puzzles based on number of given clues.Difficulty level1 (Extremely easy)2 (Easy)3 (Medium)4 (Difficult)5 (Evil)Lower bound on number of clues in row or column.54320Table 2.2: Difficulty of puzzles based on lower bound of clues in any row or columnHowever, this approach ignores the position of the clues in the puzzle. A more accuratemethod of determining the difficulty of a puzzle, is based on the complexity of thetechniques that must be used for the puzzle to be solved by a human. Maji and Pal[31] made use of this system. This method has a significant drawback. The puzzle inquestion must be solved by a human for its difficulty to be determined. Ravasz andToloczkai [21] proposed a system to quantify the difficulty of a puzzle. The proposedscale ranges from 1 to 4. Under this scale, the hardest puzzle, nicknamed ”platinumblond” was found to have a difficulty of 3.5789.2.2Solving the SudokuA plethora of different approaches were developed to solve the puzzle. In 2003, Yatoand Seta [73] proved Sudoku to be ASP-complete and hence NP complete. Because ofthis, any know approach has a complexity which increases exponentially with the sizeof the grid. The most popular algorithm for solving Sudoku is backtracking, whichis widely used in computer science [29]. Backtracking is a brute force algorithm thatuses depth first search paradigm, completely exploring one branch before expandinganother branch. Due to the finite number of valid grids, backtracking can be practicaland is guaranteed to find all solutions to the puzzle, if they exist, given enough time. Inits simplest form, the algorithm visits empty cells in an orderly fashion, filling in onenumber at a time and checking if all constraints are satisfied, in which case it movesto the next empty cell. If any of the row, column or box constraints is violated, thenthe value is incremented. If none of the 9 numbers (1-9) are allowed in the specific

2.2. Solving the Sudoku11cell, the algorithm backtracks and changes the value of the previously set cell. Ifone solution is sufficient, the algorithm terminates when the first solution is found,otherwise continues to search the entire search space. There are many ways to improvethe algorithm. The simplest example would be only assigning numbers that will notviolate any rule, instead of ranging from 1-9. A variety of different techniques havebeen proposed [30] [31] . Lee [38] proposed an approached based on elimination,that incorporates forward checking while picking the most constrained value first.In2000, Knuth [37] implemented Dancing links algorithm which solves the puzzle invery short time. The methods mentioned above include frequent guessing which leadsto redundant computation. Schottlender [59] explained how guesses can influence theefficiency of a backtracking algorithm. In 2014, Maji and Pal [42] proposed a guessfree solver that considers individual 3x3 grids.According to a survey hosted on a 2011blog post by user attractivechaos [9], which was later referenced by Iyer et al. [30],The fastest known Sudoku solvers use a version of backtracking or dancing links.A variety of soft-computing algorithms were also proposed, all of which use eitherbee colony or genetic algorithm. The Artificial Bee Colony (ABC) algorithm was firstintroduced by Karaboga [35] in 2005. An improved version [53] was presented in2008 and was used by Pacurib et al. [52] in 2009. Yusiong and Pacurib [74] in 2010and Mantere [44] in 2013. Genetic Annealing (GA) was first used to solve Sudoku byMantere and Koljonen [45] [46] in 2006-7. Genetic algorithms are inspired by naturalevolution to generate solutions, such as crossover, selection, mutation and inheritance.Those algorithms are exhaustive and hence tend to be extremely time consuming. In2010, two algorithms were proposed to solve Sudoku using genetic annealing [56][58]. In 2015, Wang et al. [70] proposed using GA with filtered mutations, yieldinghigher success rates as the algorithm was able to solve all test instances. In 2016, Chelet al. [15] proposed a multistage GA algorithm that is solving Sudoku puzzles withmore than one solution.Malakonakis et al. [43] made use of simulated annealing to solve a large Sudoku ofsize 15x15x15x15 on FPGA.Kamal et al. [33] compared the results obtained when using backtracking, simulatedannealing and genetic algorithms to solve the puzzle. Results indicated backtrackingto be the best algorithm to solve Sudoku puzzles in a short time frame. Other solutionson FPGAs include Bok et al. [68], Gonzalez et al. [25], Dittrich et al. [19] and [34]that used brute-force. Brute-force was also used by Wicht and Hennebert [71] as wellas Job and Paul [32].Chowdhury and Akhter [16] proposed a solver using Boolean algebra.Boolean satisfiability and Constrained Programming (CP) [11] were first used by Simonis [62] in 2005 and then by Rossi et al. [54] in 2006 and Crawford et al. [17] in2008 but the efficiency of this approach was heavily influenced by the complexity ofthe problem. An advantage of CP, similar to backtracking, is that it will always find asolution if there is one.In 2006, Moraglio et al. [51] suggested a method using metaheuristics and came tosome encouraging results, followed by simillar succes of Lewis [39] a year later. Dur-

12Chapter 2. Backgrounding 2013-2015, Soto et al. [63] [64] [65] suggested Hybrids combining metaheuristicsand CP filtering.Herzberg [28] described Sudoku puzzles as a graph colouring problem. This is possiblebecause each cell interacts with other cells as described by Bartlett [12].Other techniques used include rewriting rules [57], Sinkhorn balancing [50], entropyminimization [27], Hall’s marriage theorem [66], integer programming [12], harmonysearch [24], membrane computing [18] and Hybrid Ant Colony Optimization (ACO)[55].2.3Sudoku recognitionLiterature for recognising a Sudoku puzzle from a picture, appropriately processing thepicture and extracting the puzzle in a solvable format is very limited and much morerecent compared to that of solving it.In 2012, Simha et al. [61] proposed a model based on MATLAB that uses templatematching [49] [14] to recognize the puzzle and the clues enclosed in it. While thismethod is very straightforward it tends to be laborious and struggles to adapt to newproblems. The supplied image is cropped for efficiency and then converted into abinary image using block processing [36]. By applying a median filter to the producedblocks, most of the noise including grid lines is removed. The filter is used with anadaptive threshold [69] to tackle uneven illumination of the image. Any componentsthat are connected to the border are then removed then removed for additional noiseremoval. The puzzle is then recognized to be the largest box in the processed image.The advantage of this approach for detecting the puzzle instead of finding extremedistance points from the origin of the image is that no extra steps are required fordifferent angles of the image. A visual grid is then created. Anything on the grid isthen removed using blob analysis [67]. Most of the errors occur in this stage, so it mustbe handled with extreme caution. The width/height ratio of each digit is then matchedto that of the template and then compare to identify the digit. Overall, the model wasrelatively successful with a success rate of 90%.In 2014-15 Wicht et al. [71] [72] used a model based on convolutional deep beliefnetworks. Image processing includes Hough transformations and contour detection.The extracted features are then extracted using a support vector machine. reportedresults show an accuracy of 92%In 2015, Ly and Vo [41] created a model based on Neural networks. While this requiresmassive quantities of training data and training can be computationally expensive, itcan be very fast and reliable once it’s trained. After the image is binarized, the angle isdetected and the image is rotated accordingly. The study reports an accuracy of 95% ,which can be improved even further with a larger training set.In the same year,Kamal et al.[34] [33] used adapting thresholding, Hough and geometric transformations to process the image. The digits were then extracted using Opticalcharacter recognition (OCR). This approach led to excellent results, close to 100%accuracy.

2.4. Image processing13In 2016, Dutta and Ghosh [20] suggested a character recognition in MATLAB using kNearest Neighbor algorithm. This algorithm does not require training, however it canbe computationally expensive for large sets of data. This method also uses appropriatepre-processing techniques and template matching for extracting the digits.2.4Image processingIn this section the most popular techniques used to process an image of the puzzle inorder to extract the clues are discussed.2.4.1GrayscaleAny colour present in the image to be processed does not contribute any useful information, it can be discarded by converting the image to grayscale. To store a grayscaleimage, only one channel of 8 bits is used for each pixel, compared to three channels(Red, Green, Blue) used in RGB, cutting the file size to a third of the original.As a result, processing of the image is simpler and faster. There are multiple algorithms available for converting an image to grayscale. The simplest algorithm takesthe average of the three values for each pixel:Gray R G B3(2.1)As show by figure 2.3 the resulting image looks unnatural. This happens because thehuman eye is much more sensitive to green compared to the other two colours. Analternative algorithm for, taking this into account is:Gray 0.299 R 0.587 G 0.114 B(2.2)This algorithm produces images that reflect luminosity as perceived by the human eye,as seen in figure 2.4 and is used by major image processing libraries such as openCV[6] and MATLAB. Ahmed at al. [10] compared different algorithms for convertingimages to grayscale and found that the algorithm chosen can have significant effect onthe efficiency of edge detection, which is used on recognising Sudoku.d2.4.2ThresholdingThresholding is a process used to transform the grayscale image, so that only twocolours exist in the image, black and white. The process is commonly used whenrecognising Sudoku as it is an effective tool to distinguish an object from the background. A variety of methods exist [60]. The most commonly used ones are discussedin this section. 2. BackgroundGlobal ThresholdingGlobal thresholding is the simplest form of thresholding. A threshold value T is chosen, ranging from 0 to 255. Every pixel value of the image is compared with thethreshold. If it smaller, it is set to 0 (black), otherwise it is set to 1(white). dst(x) src(x) T1 otherwiseAdaptive ThresholdingGlobal thresholding is not very effective in cases where different lighting conditionsare present in the image, such as glare. Adaptive thresholding is a similar processto global thresholding that tackles differences in spatial illumination across the image[13]. This is achieved by partitioning the image to various ”neighbourhoods” andcalculating a different threshold value for each. This method is the most effectivewhen recognising Sudoku and is hence used widely in existing literature. The twomost common ways of calculating the threshold for each region is either by takingthe mean of all the pixels in the region or by using a weighted sum of all the pixels.Usually, the latter method works better for Sudoku as show by figure blurGaussian blur is a technique used to smooth out any noise that could be introducedduring the camera’s capturing process. It is based on the Gaussian function:1 (x µ)2P(x) eσ 2π.2σ2(2.3)

2.4. Image processing15Figure 2.2: thresholding. Credit [5]

Chapter 3Design3.1Design decisionsIn this section, decisions made before the start of implementation are explained.3.1.1Development environmentThe project proposal required an app to be running on IOS however, I decided to develop an android app to take advantage of prior experience I had in the domain, whichallowed faster development. Furthermore, android is the most popular OS for smartphones, meaning that the resulting app would be suitable for more people, makingfinding test subjects easier.3.1.2Programming languageThe programming languages available for android development are Java, Kotlin andC . Since I only had experience with the first two, C was ruled out. BetweenJava and Kotlin, I chose to use the latter as it is the official programming language forandroid as of 2019 [2]. Furthermore, Kotlin and Java compile to very similar bytecode,resulting to similar performance. Translation from Java to Kotlin code can happen veryeasily as there are tools available. Files for both languages can co-exist within the sameproject without any extra cost, as a result, this decision is not binding and can be revisedat any point. Last but not least, due to the fact that Kotlin is relatively new, no majorsimilar projects exist on Kotlin. The comparison of the performance of the two is ofpersonal interest and could happen during the project.3.1.3Solving algorithmIt was decided to solve the puzzles using a backtracking algorithm due to the significant performance advantage discussed in section 2.2. To reiterate, the fastest availablealgorithms all use some version of backtracking, as shown by the review conductedby blog post user attractiveChaos [9]. Speed is one of the most important factors for17

18Chapter 3. Designcommercial apps. Furthermore, some of the algorithms found in literature are not guaranteed to find a solution, while most of these algorithms assume that a Sudoku onlyhas one valid solution. A well designed backtracking algorithm is guaranteed to findall solutions to any given puzzle, given enough time. designEven though countless solvers are available online, implementing an algorithm fromscratch was of personal interest. An outline of the simplest possible backtrackingalgorithm is shown below:Algorithm 1 SolveSudoku(grid)1: if FindEmptyCell(grid) NULL then2:return true {Solving successful}3: else4:for digit 1 to 9 do5:if digit is legal for cell then6:Assign digit to cell7:if SolveSudoku(grid) then8:return true9:else10:Undo Assignment11:return false {backtrack}12:end if13:end if14:end for15: end ifAlgorithm 2 FindEmptyCell(Grid)1: for Cell in grid do2:if cell is empty then3:return cell4:end if5: end for6: return NULLThis algorithm blindly assigns all possible numbers to a random cell, until a solutionis found. It can be further improved by using various heuristics when choosing a cell.The most widely used heuristics are:1. Minimum remaining values: Choose the cell with the least legal values. In thecase that there is a cell with only one remaining legal value, the heuristic implements the hidden singles technique.2. Most constraining: Choose the cell that maximises constraints on other cells.Using this heuristic minimises branching.

3.1. Design decisions19When selecting a value for a cell, selecting the least constraining value maximises theprobability that the branch to be explored will lead to a solution.The algorithm was improved by keeping track of the illegal values for each empty cellusing a set. The sets are initialised before the execution of the algorithm. Every timethe sets are updated, they are ordered depending on their size. The cell with the mostillegal values is selected and assigned a number. The sets corresponding to cells of thesame row, column or grid are updated and the process is repeated. When a cell is filled,the corresponding set is removed. The revised algorithm is shown below:Algorithm 3 SolveSudokuRevised(grid,setsArray)Require: a set of illegal values for each empty cell. Sets are ordered by descendingsize and stored in an array.1: largestSet setsArray s.t.size(largestSet) max{size(A) A setsArray}2: if largestSet NULL then3:return true {Solving successful}4: else5:for digit {1.9} largestSet do6:Assign digit to cell corresponding to largestSet.7:UpdatedSets UpdateSets(grid,sets,cell)8:if UpdatedSets NULL then9:return false {Branch cannot lead to solution, backtrack}10:end if11:if SolveSudokuRevised(grid,UpdatedSets) then12:return true {further explore branch}13:else14:Undo Assignment15:return false {backtrack}16:end if17:end for18: end ifAlgorithm 4 UpdateSets(grid,sets,cell)1: for each set in row,column, grid do2:add content of updated cell to set3:if set.size 9 then4:return NULL {Empty cell has no more legal values}5:end if6: end for7: sort sets8: return setsIt is worth noting that the revised algorithm uses much more space compared to theprevious one. This is a small price to pay for modern smartphones and should not benoticeable at run-time due to the small depth of each explored branch.

20Chapter 3. DesignThe algorithm stops when the first solution is found. To allow for multiple solutions,the algorithm was tweaked to store a solution when found and continue the search. Thefinal version of the algorithm is shown below:Algorithm 5 SolveSudokuMultiple(grid,setsArray)Require: a set of illegal values for each empty cell. Sets are ordered by descendingsize and stored in an array.1: largestSet setsArray s.t.size(largestSet) max{size(A) A setsArray}2: if largestSet NULL then3:return true {Solving successful}4: else5:for digit {1.9} largestSet do6:Assign digit to cell corresponding to largestSet.7:UpdatedSets UpdateSets(grid,sets,cell)8:if UpdatedSets NULL then9:return false {Branch cannot lead to solution, backtrack}10:end if11:if SolveSudokuMultiple(grid,UpdatedSets) then12:if grid full then13:print solution14:Undo assignment15:return false {backtrack to try and find more solutions}16:end if17:return true {further explore branch}18:else19:Undo Assignment20:return false {backtrack}21:end if22:end for23: end ifNaturally, this version of the algorithm takes more time to terminate, as every branchof the search tree is explored. This was taken into account during the evaluation of thealgorithm, discussed in section LibraryThe most common vision library for android is OpenCV [1]. OpenCV is a cross platform open source library with a large user community, exceeding 18 million downloadsat the time of writing. Multiple projects can be found online that make use of OpenCV.One popular alternative is Google vision API. This option is mostly suitable for standardised tasks such as face recognition or scanning barcodes and is not so flexible.Also, the community is much smaller compared to OpenCV. Another alternative isOpenIMAJ. This option was rejected because of the very small community. Furthermore, no specific advantages were found over OpenCV. As far as speed is concerned,no benchmarks specific to android were found. As a result, the deciding factors were

3.1. Design decisions21community size and existing projects. As there were no apparent disadvantages ofusing OpenCV, it was believed to be the most suitable for this project.3.1.5Real-time vs static recognitionTwo different approaches were considered regarding the way that the app completesits goal of recognising a puzzle and displaying the solution(s) to the user. The firstapproach was to solve the puzzle in real time using the phone’s camera and overlay thecamera feed with a solution. This approach requires efficient use of vision techniquesas any additional delay would be easily noticed by the user. This task can be difficultto implement without prior experience in vision.The second approach considered was to prompt the user to take a picture of a Sudokuand then use the static picture to display the solution(s) to the puzzle. Even though theresults of this approach would not be as visually attractive, there are numerous advantages. Firstly, this approach allows the user to ensure that the puzzle detected is correct,perhaps allowing editing of clues before solving. Secondly, the first approach can onlydisplay one solution for any puzzle, where multiple solutions can be displayed with astatic approach. Thirdly, better performance is likely when the camera is positioned atan angle towards the puzzle. Fourthly, static image processing is much easier to test,as the same image can be reused without having to worry about difference in lighting.Last but not least, this approach would allow the user to try and solve the puzzle onthe phone. The only throwback of this solution is the need to design a responsive userinterface to cover the extra functionality.Due to the above factors, the use of a static picture was preferred instead of real-timesolving.3.1.6Grid DetectionFor the detection of the grid, it was decided to use a closed feedback loop, highlightingthe grid before the picture is taken. This approach allows the user to adjust the positionof the camera to increase the efficiency of the recognition of the puzzle. The puzzlecan be highlighted and extracted using contour edge detection. The

3 Abstract In this report, the design, implementation and testing of a visual Sudoku solver app for android written in K